Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DateTimeNotes
06/03/202011:21First critical alert received. Decision to review and see how fast the partition review would consume space

06/03/2020 

17:10Alert was reviewed again and found to be consuming more space than expected
06/03/202017:30Logged on and added new disk via VMware UI. Logged onto server and attempted to extend the existing LVM in the ususal manner. The server produced errors when the physical volume was created with a message about a missing UUID which pvs confirmed. Remediation to retrieve the situation were unsuccessful and a reboot was requested to confirm if a device rescan would fix the issue or provide more information.
06/03/202019:54Emergency ticket to perform a reboot at 21:00 and was approved by NOC.  
06/03/202021:00Unfortunately the VM did not boot so we were forced to restore from backup. Mutliple fsck options were tried but were not successful
06/03/202021:30Restore from backup was requested.
06/03/202022:25After issues with the restore a good VM version was restored and booted.
06/03/202022:30Investigated the mass of relay logs in /var/lib/mysql
06/03/202022:33Logged in to mysql on vie and reset the id default value in data_template_data_rra, poller_item, data_input_data tables 
06/03/202022:53Logged into prod-cacti01-fra-de.geant.net to fixe the replication break from restore showing older binlog entry than expeted.
06/03/202023:14Notified NOC that this fix will need review on Monday but that replication was fixed. No notification at this point that anything was broken.


Cacti runs "unison" to perform a two ways synchronization. Unison stopped working the first time as a consequence of the filesystem corruption, and didn't work with the restored system, because the two VMs were not in sync. We have removed the DB created by Unison and started Unison from scratch on both systems and the sync started working again.