...
- If one of the 3 nodes is failing or missing from the list, log into the failing server via ssh and restart the RabbitMQ service:
sudo systemctl restart rabbitmq-server
- After a minute or two the management consoles should show the cluster is restored.
Solution #2
- If all 3 nodes appear in the list, but if the state of the nodes is different when logging into their respective administration gui's
- log into all 3 nodes
- on all 3 nodes type
sudo systemctl stop rabbitmq-server
- wait for the service to stop on all 3 nodes
- then on all 3 nodes type:
sudo systemctl start rabbitmq-server
Collectors have stopped working
...
Possible Cause: Correlators have stopped working
Analysis
- Open https://netprod-alarmsdashboardv3-monitoring.geant.org/d/hESYQotZz/correlation-services?orgId=1
- Scroll down to the "Collectors" panel
- Check that the graph shows the leader collector processing a non-zero rate of traps. The current leader can be identified by the FORWARDER with state 2 in the "Raft States" panel.
...