Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

During the investigation of incident with IDP somebody restarted sympa service on production instance (prod-lists01.geant.net):

(all times below are UTC)

Aug  3 12:57:35 prod-lists01 bounced[29387]: notice main::sigterm() Signal TERM received, still processing current task

...

sympa_msg died due missing ca-bundle file in /etc/sympa directory. I don't know why it's missing, right now I've created a symlink pointing to /etc/pki/tls/certs/ca-bundle.crt. In this state sympa remained broken till Aug 6th ~ 13:00 CEST. sympa_msg process responsible for message delivery from ml spool to recipients. NOTE: only recipient message delivery where broken, everything else (ml posting, archiving etc) worked normally.


Incident severity: CRITICAL

Data loss: NO

Total disruption: 3 days.

Affected mail lists

Following mail lists were affected:

...

  • Puppet configuration lacks many things which sympa still depends. Strictly speaking at current state this puppet configuration is not fully suitable for management because many critical files are handled manually.

Timeline

...

  • .

Resolution

We need to re-write existing puppet module as fast as possible because it doesn't handle such things properly. The work started in test branch and current state there is way better (sympa installation and config handling are fully automated and it runs recent version already).

...