Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Traps not being processed by RabbitMQ

Analysis

  1. Open one or more of the following RabbitMQ management consoles.  (Credentials are in the "GÉANT Dashboard v3" LastPass folder)

...

...

  1. Scroll down to the "Nodes

...

Solution

...

  1. " section
  2. There should be 3 rows in the table and all status icons should be green.  The expected node hostnames are:
    • prod-noc-alarms01.geant.org
    • prod-noc-alarms02.geant.org
    • prod-dashboard-storage01.geant.org
Solution
  1. If one of the 3 nodes is failing or missing from the list, log into the failing server and restart the RabbitMQ service:
    • sudo systemctl restart rabbitmq-server
  2. After a minute or two the management consoles should show the cluster is restored.

Possible Cause

Collectors have stopped working

Analysis

  1. Open

...

  1. https://

...

  1. net-

...

  1. alarms-monitoring.geant.org

...

  1. /d/hESYQotZz/correlation-services?orgId=1
  2. Scroll down to the "Collectors" panel
  3. Check that the

...

  • Collectors
Solution
  1. graph shows a nonzero rate of traps being processes
Solution
  1. On each of the following servers:
    • net-alarms01.geant.org
    • net-alarms02.geant.org
  2. Log in and execute the following command:
    • sudo systemctl restart trap_collector

...

Possible Cause

Correlators have stopped working

Analysis

  1. Open

...

  1. https://

...

  1. net-

...

  1. alarms-monitoring.geant.org

...

  1. /d/hESYQotZz/correlation-services?orgId=1
  2. Scroll down to the "Collectors" panel
  3. Check

...

  • Correlators - received
  • Correlators - handled
Solution

...

  1. that the graph shows one of the collector processes processing a non-zero rate of traps.
    • note that is is normal for only one of the collectors to be processing traps, the other line should remain at zero.
Solution
  1. On each of the following servers:
  2. Log in and execute the following command:
    • sudo systemctl restart trap_correlator

Content by Label
showLabelsfalse
max5
spacesSD
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel = "kb-troubleshooting-article" and type = "page" and space = "SD"
labelskb-troubleshooting-article

...