UPDATE ......From Tuesday 8 April 2025 we have changed the way that Single Sign-on works on this wiki. Please see here for more information:
Update
High-level Concurrency Notes
- JMS message groups ensure that same thread always handles events from the same trap source:
- Means Status part of a trap source state is guaranteed to be thread-safe since always being handled by the same thread (as are changes to lastEventSysuptime, lastDnaIp etc)
- ReadWriteLock used to protect, among other things, the addition and removal of events for trap sources (also protects addition of new on-the-fly trap sources and establishment of their relationships to existing trap sources, network map swapover etc):
- Read write lock allows a resource to be accessed by multiple readers or a single writer at a time, but not both.
- Read lock is non-exclusive (multiple read locks can be held at any one time, provided write lock is not held)
- "Event-handling" threads acquire READ lock:
- Since all each one does is alter the state of a particular trap source (its status, registration of events, setting of other state like lastEventSysupTime), there is no mutable state shared between these event handling threads, so there is no risk of them interfering with each other (race conditions).
- BUT correlator thread (and certain other threads eg. those which create trap sources on the fly) manipulate the state of multiple trap sources (principally deregistering events):
- The correlator thread does share mutable state with all event-handling threads, so need a locking mechanism which prevents either the correlator from altering state while event-handling threads are doing it, or vice versa
- Therefore correlator thread acquires write lock, which is exclusive. No threads can acquire write lock while either read or write lock is held by another thread, and similarly no thread can acquire read lock while the write lock is held by another thread. So while one or more event handling threads are manipulating trap source state, the correlator cannot do it, and while correlator is doing it, no event handling threads can do it (nor can the post finalisation clearance thread do it).
- Event handling threads do NOT touch DB unless they are threads handling up events which come in after finalisation of the corresponding dashboard alarm. They primarily alter the state of individual trap sources in the in-memory network map.
- Correlator does NOT alter status of trap sources, but it does:
- Establish relationships between the events grouped on the trap sources
- Create source alarm database entities corresponding to the source events registered on the trap sources
- Create high-level correlated alarm entities (instances of DashboardAlarmEntity stored in dashboard_alarms table) based on these groupings of source alarms
- Group these high-level correlated alarms to form a single coalesced alarm (made up of multiple rows in the dashboard_alarms table and grouped together by ref_id).
- ClearanceLock used to protect state of multiple related entities in DB at same time (multiple post-finalisation related ups, resulting in the addition of multiple up source alarms to down source alarms and, crucially, multiple (potentially conflicting) check-then-change attempts on their related dashboard alarms). Other options were considered to ensure data integrity for these post-finalisation ups, but ruled out for the following reasons:
- In an ideal world, DB transactionality alone could be used to handle this, but the locks for many DB entities would potentially have to be acquired in the event of source alarm bursts (up to 200 or so), and they would have to be acquired in the exact same order by all threads in order to prevent deadlock. Low-level JPA locking mechanisms could be used to control lock ordering (sorting source alarms first, acquiring ALL dashboard alarm lock before source down lock before source up lock) but in reality, where there are a large number of locks to acquire, this takes an age and results in much slower throughput and lock timeouts etc. It also appears as if it may be impossible to guarantee the locks are acquired in the order specified because of a minor bug in jpa's low-level pessimistic locking.
- Another option is to handle post-finalisation ups in a similar way to other events (buffer them and then have a cross-cutting thread process after a pre-defined interval), but this:
- Results in delayed processing of ups which could otherwise be processed immediately
- Introduces its own problems around state management
- To avoid a situation where multiple threads in the correlator and network map code try to create two different trap sources representing the same conceptual network element (eg. the same router interface) dynamically, we prevent creation of more than one trap source instance for a given "real" trap source by:
- Limiting access to trap source constructors so only objects in same package can instantiate
- Force developer to use a TrapSourceAccessor to retrieveOrCreate() trap sources, which does an atomic check-then-create.