MySQL is setup in Master-Master replication technique on prod-newdboard01.geant.net, and prod-newdboard02.geant.net. Sometimes, though very rarely, the replication might break. This note provides solutions to apply when such a situation occurs.
A Nagios check setup on gnmonitoring.geant.net checks for the replication status periodically. If it fails, then it sends an email to Richard Sanigar and Trupti Kulkarni. Log in to Nagios on gnmonitoring and find out the server on which replication has broken down; it will be one of prod-newdboard01.geant.net, or prod-newdboard02.geant.net
sudo mysql-as-root
SHOW SLAVE STATUS \G
Either or both of Slave_IO_Running and Slave_SQL_Running will be 'No', and Seconds_Behind_Master will be NULL. Time to act.
STOP SLAVE;
FLUSH LOGS;
RESET SLAVE;
START SLAVE;
SHOW SLAVE STATUS \G
Slave_IO_Running or Slave_SQL_Running should display 'Yes', and Seconds_Behind_Master should come down to 0 (after a short while, if not immediately).
If yes, then congratulations, you have restored the replication!
If no, then proceed to Method 2.
mysqldump -u newdashboardBiz -p --opt alarms > /tmp/alarms.sql
When prompted, enter password, which can be found in Systems KeePass on \\chfile01\Systems
Transfer this file (using scp) to the offending Dashboard VM (on which replication has broken) on /tmp directory.
sudo mysql-as-root
And run the following command:
SHOW MASTER STATUS;
Make a note of the File and Position values. Quit the MySQL shell.
sudo mysql-as-root
STOP SLAVE;
Quit the MySQL shell.
mysql -u newdashboardBiz -p alarms < /tmp/alarms.sql
sudo mysql-as-root
And restore replication using the following commands:
CHANGE MASTER TO MASTER_HOST='host of partner (non-offending) server', MASTER_USER='replicator', MASTER_PASSWORD='<please check Systems KeePass>', MASTER_LOG_FILE='<partner's log file name>', MASTER_LOG_POS=<partner's log file position>;
where 'partner's log file name' and 'partner's log file position' are the values recorded in step 4 in this section above.
Now, start the slave:
START SLAVE;
Check if replication has been setup successfully:
SHOW SLAVE STATUS;
Both Slave_IO_Running or Slave_SQL_Running should display 'Yes', and Seconds_Behind_Master should come down to 0 (after a short while, if not immediately).
That’s all! You have successfully restored replication on production Dashboard VMs! If you still see an error, please check mysqld log file and investigate.