Network Map Repopulation
High-level Algorithm













Web application


There are 2 possible triggers for a network map refresh:

  1. A scheduled refresh, the frequency of which is controlled by the "Network Map Repopulation" system setting.
  2. A manual refresh, which occurs when a user clicks the "Refresh Map" button in Production dashboard (the button is disabled on other environments).


In both cases, a notification JMS message is sent to the OpsDBCopier application on the "prodrefresh.network.queue".

OpsDBCopier


  • Receive message to refresh the network map on "prodrefresh.network.queue" (note that, unlike other queues, there is no uat and test version – there is only one consumer for network map refresh requests).
  • Take a dump of current production OpsDB
  • Archive and compress the dump file
  • Transfer the compressed file to the machines on which the NetworkMapDBPopulator application resides for all environments (test-dboard01-corr, uat-dboard01-corr and prod-dboard01-corr)
  • Notify each NetworkMapDBPopulator by sending JMS messages to the following queues:
    • prodrepopulate.map.queue
    • uatrepopulate.map.queue
    • testrepopulate.map.queue


Network Map DB Repopulator


  • Receive JMS message on relevant "repopulate.map.queue" for the environment in question (eg. testrepopulate.map.queue, uatrepopulate.map.queue).
  • Import newly provided OpsDB dump from the location where it has been deposited by the OpsDBCopier (/tmp/dumps at the time of writing).
  • Execute repopulation from OpsDB data:
    • Create all router entities in network map asynchronously, including all router interfaces, and set up intra-router interface relations
    • Create all infinera xtc port interfaces in network map asynchronously
    • Create all circuits connected to router interfaces
    • Create all segments for higher-level circuits, connect to respective interfaces at either end, and create hierarchical relationships between segment interfaces and higher/lower level interfaces
    • Create infinera path circuits between xtc ports
    • Create geant plus circuits (and ensure there is a connection to interfaces of their underlying circuits)
    • Create geant lambda circuits (and ensure there is a connection to interfaces of their underlying circuits)
    • Create all juniper to infinera connections
  • Execute router script queries (.sh and .php files) to populate bgp data in the "routers_population" schema The .sh and .php scripts take some time to run and are not fully protected by transactions, so the respective tables in the network map schema cannot be populated directly by these scripts without either causing absurdly long delays (> 15 mins) to event processing or corrupting the data on which event processing depends, so they are instead run against a fresh schema, and the results are then imported by the correlator into the network map proper in a thread-safe way. :
    • Populate routers table
    • Populate router_details table (containing interface and peering information)
    • Populate IX BGPv4 peers
    • Populate IX BGPv6 peers
    • Populate GEANT Open peers
    • Populate BGP VPN RR peers
  • Save all notifications generated during population to the "alarms2.notifications" table.
  • Send JMS message to respective "reload.map.queue" for the environment in question to notify the correlator that the network map db has been repopulated.



Correlator


  • Receive JMS message on relevant "reload.map.queue" for the environment in question (eg. testreload.map.queue, uatreload.map.queue).
  • Attempt to reload the network map:
    • Take a dump of the "routers_population" schema populated by the network map router query scripts
    • Import the dump into the "networkmap" schema proper (overwriting the existing query-script-specific tables) while holding the network map write lock to ensure the PeerInterfaceFinder cannot read from the tables until dump import is complete.
    • Load the basis of the replacement network map model from the network map database:
      • Retrieve all infinera xtc port database entities and convert to network map model xtc port objects
      • Retrieve all router interfaces entities and convert to network map model router interface objects
      • Create circuits and relationships between infinera port model objects
      • Create circuits between infinera port model objects and router interface model objects
      • Create circuits between router interface model objects
      • Create bgp vpn rr model objects and their connections
      • Create bgp ix model objects and their connections
    • Attempt to swap the existing network map model for the replacement in a thread-safe way (if there are events currently registered on the network map model or the network map write lock cannot be acquired before the pre-configured timeout, sleep for a pre-defined number of milliseconds and then retry, provided the number of retries has not been exceeded):
      • Acquire the network map write lock to protect against access by other threads
      • If no events are currently registered on the map, replace the trap sources in the existing network with those in the new one:
        • Retrieve dynamically created trap sources from current network map model (ie. those which were created on-the-fly rather than pre-existing in the network map database, such as bgp peerings not previously known about at the point of network map load or router interfaces not present in OpsDB).
        • In the replacement network map model, create a copy of each of these "old" trap sources and transfer the state from old to new
        • Wipe the current network map model (the one you're replacing)
        • Add all trap sources from the replacement network map to the live one
      • Release the network map write lock
      • Wipe the replacement network map ready for the next reload
      • Push the notification messages which came from network map reload to the Dashboard Web Application (via JMS)
  • If network map reload fails, send a notification message to the Dashboard Webapp (via the relevant "reload.notifications.queue" for this environment) that network map reload failed.


  • No labels