Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Overview: Existing Internal Dashboard (Java Application) (Geant) (Closed-Source)

Strengths: 

  • Aggregates alarms Correlates data from various sources such as Infinera, NCC, Juniper, IMS, etc.
  • Instinctive and familiar to the consumers of the service.
  • Proven reliability (Tried and tested)
  • Been through years of development and features requests / refinements.
  • Complex filtering capabilities.
  • Blacklisting.
  • Different viewing modes (Large screen mode, pc mode).
  • Supports alerts with different states and prioritisation levels.
  • Provides an API for integration.
  • Retains alarm history for reporting purposes.
  • Complete ownership and control of the source code

...

  • Ontological alignment. Our “Alarms” and Argus “Incidents” are slightly different, so we need to explore the consequences of this (e.g. we want “Alarms” to be displayed that don’t have incident tickets in the ticketing system) 
  • Django requires internal technical expertise or even ‘Django devs’
  • Lack of Alarm states (our “phases”)
  • Absence of alarm history or searchability.
  • Only one line of acknowledgment (We have first and second line requirements)
  • Inability to drill down into issues.
  • No blacklists.
  • Filtering not as comprehensive.
  • Won’t naturally coalesce or correlate (integrations required).
  • Flapping not addressed (for future consideration).
  • Prioritisation not handled.

...

Overview:
We also recognise and appreciate the mission of the Sikt team. A common tool could be used among NRENs for Alert aggregation adhering OC Alarm visualization adhering to ITIL best practices and standards. 

Argus has positioned itself as a promising candidate for an alert aggregation tool OC alarms dashboard by adopting an open-source approach and actively promoting its usage and availability at networking conferences.

We don’t want 'a fork' of Argus, but would strongly prefer a unified system that can accommodate extended use cases.  Our rough impression is that the UI “skin” would be a relatively straightforward part to develop on its own, but the fundamental Argus backend use case differs – the main discussion points will be to decide if it’s feasible to have a common backend and/or pluggable architecture that can accommodate both applications. 

Argus excellently meets some of our requirements. However currently misses others to be considered fit for purpose. To be a complete replacement for existing tools it must also achieve at least the following : 

...

  • Desired Future State: That alarms can have states. For example flashing if new. Or different colours to say whether they are pending or urgent. 
  • Gap: The current inability of Argus to support multiple alarm states (e.g. flashing for new, different colours for pending/urgent) poses a functional gap when compared to the desired future state. The absence of such features hinders the support teams ability to react quickly to distinguish or respond to the different alarm conditions, leading to a potential delay in addressing critical issues. This could have an impact on operations and UX. 

Feature/Functionality: Multiple stages of Acknowledgment  

⁃ Current State (System A): Dashboard currently demands that both differentiates between 1st and 2nd line support teams to acknowledge acknowledgement that Alerts have been recognised recognised  
⁃ Current State (System B): Alarms only have 1 level of acknowledgement. ( ‘Acked’ is a tag or status ) 

  • Desired Future State: That Alarms have 2 levels of acknowledgement 
  • Gap: The current inability of Argus to support multiple teams poses a problem and is a functional gap. 2nd line support don’t need to be wasting their time on issues that have already been investigated by first line of support. Or similarly do need to know whether 1st line has addressed a given issues with in the time set by the SLA. 

 Feature/Functionality: Correlation and Coalescing 

⁃ Current State (System A): Supports coalescing and correlating issues for remediation. 
⁃ Current State (System B): Lacks the ability to coalesce and correlate alerts effectively. 

...

Conclusion 

A new and modern Alert aggregation OC alarm visualisation platform is required by the Geant NOC/SOC first and second line support teams. One that satisfies the needs of the consumers of the service but can be maintained by the development team. 

...