This document describes performance measurement and verification scenarios that have been identified by SIG-PMV, with corresponding challenges and the (broad) solution space for each.
The document is a "living" text that is under periodic review and updates as the work of SIG-PMV progresses. It is one of the outputs ("KPIs") of the SIG.
We accepted a wide range of scenarios in our first meeting at Zurich, and discussed them further in Amsterdam, narrowing the scope a little, and fleshing out a number of the scenarios.
If there are any scenarios that should be added, please get in touch with the SIG-PMV group.
The following existing scenarios have been identified by SIG-PMV.
Data Intensive Science Transfers
- Researchers from a growing number of disciplines are moving increasingly large volumes of data between systems, locally, nationally and internationally.
- Likely to see Science DMZ model deployed
- Identifying poor performance and troubleshooting the causes, which may lie in end systems or on the network path
- perfSONAR (widely used by WLCG / GridPP)
- In-application monitoring (e.g. FTS reports)
- GTS FIONA DTNs; open soon for testing
Multi domain monitoring - toolkits
- Monitoring network performance between multiple administrative domains
- Understanding in which domains issues lie
- Focus is on the networking aspect.
- Likely to need multiple measurement systems deployed
- Coordination between the administrative domains
- Understand how it can be automated (alongside provisioning)
- GEANT T4 work heading towards solutions
- (Should this merge with data intensive science scenario?)
Wireless Network Monitoring
- Measuring the utilisation and performance of a site’s local WiFi infrastructure
- Probably providing eduroam if at an academic site
- (At the moment not including 5G, IoT tech, but might do…)
- Difficult to run tests from an end user’s system when that is likely to be a BYOD device
- High variability in performance depending on exact location
- Multiple frequency channels and standards, emerging 802.11ac
- RF interference
- Crowd-sourced measurements data (WiFiMon)
Layer 2 Monitoring
- Measurement of L2 performance, below IP layer
- Includes Ethernet, MPLS, Carrier Ethernet
- Variety of L2 media
- Work reported in GEANT JRA1/2 in 2013 (Cyan, Juniper, Ciena, Accedian equipment)
- Embedded probes (e.g. CFM/Y.1731)
Measurements on virtual network environments
- Measurement of performance on VM infrastructure
- May include measurements to/from cloud services; AWS, Azure, Google Cloud Platform
- Increasingly important as university / research services deployed to cloud
- Abstraction of systems, impact of hypervisor, etc
- Variability of cloud performance depending on instance; e.g. AWS performance will vary depending on specific platform/size
- Tunnelling to cloud; MS Expressroute, etc. Extending address space to cloud
- JRA2 Task1 connection services might be applicable
- Measure IPv6 traffic levels
- Desire to measure growth of IPv6 deployment and usage, and relative performance to IPv4
- Not possible to differentiate IPv4 and IPv6 in all devices given state of MIB support?
- Operation in an IPv6-only environment
- IETF moving towards YANG
- (In theory, everything we do in SIG-PMV should be IP version agnostic, i.e., feature equivalent)
Overlay Network Monitoring solutions
- (Not sure of original intent here - need to clarify)
- Measurement of performance of overlay networks
- Do we mean the overlay, or the infrastructure over which it runs? (e.g. under a L2VPN) – both! Understanding which layer has issues
- MD-VPN (used in ~20 NRENs)
- Separation of overlay and underlying infrastructure
- Difficult for a network like GEANT to “peer into” tunnels
- User has no way to understand where the problem is
IP Multicast Monitoring
- Monitor the successful performance and delivery of multicast traffic
- May be within a site, or inter-domain
- Apparently minimal use of multicast in the NRENs?
- Superceded to some point by multi-point VPNs
- Multicast beacons
The following scenarios are emerging areas where SIG-PMV believes that solutions will be required.
100G and beyond
- Performance measurement at 100Gbps +
- How to monitor/sniff/measure at such line rates
- Knowing vendor-specific tricks; tuning, performance of end systems; do 10G recipes work at 100G? They may not
- Building a generic model; so we become service oriented rather than technology oriented
- Transport tech may move at a different pace to CPU tech; other e2e elements such as firewalls
- Mixed speeds – 10G <-> 100G
- Existing systems, e.g. perfSONAR, with appropriate tuning / configuration?
SDN controlled Monitoring
(Not wholly sure what was meant here)
- Monitoring a dynamically configured network?
- Service differences?
- What’s different to a standard IP service
- Tools like traceroute in an OpenFlow network
- Monitoring traffic may follow different paths to application traffic
- Some related work in GEANT project; JRA2, maybe JRA1
Monitoring autonomic networks
- Measuring performance in self-configuring networks
- Solution needs to also be self-configuring
- Network operating systems that move flows very dynamically; flow may not have a static path
Monitoring as a Service / NMS as a Service
- Includes OSS, BSS with monitoring and performance verification.
- Provision, and automatically monitor
- JRA2 T2 is doing something in this area