Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DateTimeDescription

 

21:55:53 

First error in indico.log of redis being unavailable:

ConnectionError: Error -2 connecting to master.production-events-redis.service.ha.geant.net:6379. Name or service not known.

 

10:42First user query about EMS login problem (Slack #general)

 

11:14

Ian Galpin identified the dns resolution problem

Code Block
languagetext
themeDJango
[root@prod-events01 log]# ping master.production-events-redis.service.ha.geant.net
ping: master.production-events-redis.service.ha.geant.net: Name or service not known

 

12:06Service degradation incident email sent out to product owner (Steffie Bosman)

 

12:12

Massimiliano Adamo identified a problem with PowerDNS

Code Block
languagetext
themeDJango
[root@prod-events02 ~]# host slave.production-events-redis.service.ha.geant.net
slave.production-events-redis.service.ha.geant.net has address 83.97.94.19
[root@prod-events02 ~]# host slave.production-events-redis.service.ha.geant.net
slave.production-events-redis.service.ha.geant.net has address 83.97.94.19
Host slave.production-events-redis.service.ha.geant.net not found: 3(NXDOMAIN)
[root@prod-events02 ~]# host slave.production-events-redis.service.ha.geant.net
Host slave.production-events-redis.service.ha.geant.net not found: 3(NXDOMAIN)
[root@prod-events02 ~]# host slave.production-events-redis.service.ha.geant.net
Host slave.production-events-redis.service.ha.geant.net not found: 3(NXDOMAIN)
[root@prod-events02 ~]# host slave.production-events-redis.service.ha.geant.net
Host slave.production-events-redis.service.ha.geant.net not found: 3(NXDOMAIN)
[root@prod-events02 ~]# host slave.production-events-redis.service.ha.geant.net
slave.production-events-redis.service.ha.geant.net has address 83.97.94.19


consul DNS resolution seemed to work:

Code Block
languagetext
themeDJango
dig slave.production-events-redis.service.ha.geant.net @prod-consul01.geant.org -p 8600
dig slave.production-events-redis.service.ha.geant.net @prod-consul02.geant.org -p 8600
dig slave.production-events-redis.service.ha.geant.net @prod-consul03.geant.org -p 8600
13:01


 

 

12:30

Massimiliano Adamo resolved the PowerDNS issue by disabling the packetcache config option:

the problem was this parameter (we are almost sure):
https://docs.powerdns.com/recursor/settings.html#disable-packetcache
it defaults to NO
but now I have set to yes

The following GitHub issue might explain the issue: https://github.com/PowerDNS/pdns/issues/8160

 

 

 

 

13:01Service restored email sent out to product owner (Steffie Bosman)

Proposed Solution

  • Additional monitoring (Sensu checks) will be added

...