You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Incident description

During the testing CloudBot framework puppet server was connected to CloudBot service which changed configuration on some nodes pointing them to wrong environment (test instead of production). This change triggered puppet running on nodes to change everything and break service functions. 

 

Incident severity: CRITICAL

Data loss: NO

OTRS tickets: 1 (in SWD queue)  0 in OTRS

Timeline


Time (CET)
03 Aug, 12:36

Issue Reported by Cristian Bandea on slack channel #techies

03 Aug, 13:40

Picked up by Konstantin Lepikhov on his return from lunch

03 Aug, 13:40

Massimiliano Adamo started looking into this because problem was related to memcache configuration which he introduced recently

03 Aug, 13:46

user-7da5d pointed out to CloudBolt and found the issue (wrong puppet environment)

03 Aug, 14:00

user-7da5d have switched off prod-idp01 VM leaving only prod-idp02 functioning, at least this restored IDP service operation

03 Aug, 14:43Sympa can't connect to CAMS - other servers affected (reported by Linda)
03 Aug, 16:00Saltstack was used to fix all the servers at once


Total downtime: 1.5 hours?

Proposed Solution


  • No labels