Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.



Session 1 (chairs: Tim Chown and Ivana Golub)

Data Plane Programming / In Band Telemetry, Mauro Campanella, (GARR)

Abstract: The Data Plane Programming task in GN4-3 has focuses in two use cases: simple DDoS identification and In Band Telemetery usong Telemetry using the P4 programming language. The talk reports the ongoing INT experience, providing new insight in network behaviour and challenges on data collection and presentation.

...

Abstract: Programmable data plane platforms like Tofino-based switches and FPGA linecards enable the implementation of new solutions for network traffic handling. The presentation will report the experience of WP6 Data Plane Programming task in the use of “sketches” algoritms algorithms implemented directly in the data plane for DDoS traffic detection and traffic monitoring. The sketch structures provide memory-efficient collection of summarised traffic statistics and have interesting benefits in comparison to the other monitoring techniques. They allow to process all incoming packets at wire speed ,because a sketch requires a very limited set of actions to be performed for every packet. This implies that all processed packets can contribute to traffic statistics without any performance penalty. Sketches are a great tool for scalable, fine-grained and millisecond or lower latency inline on-switch network analytics. Benefits, usage use-cases and limitation of P4-based sketch structures deployment will be summairzed basied summarised based on our implementation and testing in Tofino switch chip.

Bio: Damian Parniewicz is a researcher in Poznan Supercomputing and Networking Center. He has participated in many European Union Research and Development projects. His major interest areas are network control planes, network monitoring, programmable chipsets, SDN/NFV as well as Big Data technologies, edge computing and ML/DL applied for network security.  


Session 2 (chairs: Simon Leinen and Mauro Campanella)

Scalable and Cost-Efficient Generation of Unsampled NetFlow, Alexander Gall (SWITCH)

...

Bio: Fabio Farina has a PhD in Computer Science and works with GARR since 2010. Fabio works on European projects, on the creation of new services and on NFV, Edge and orchestration under the GARR Network evolution framework. In detail, during last year Fabio contributed to the refactoring and the automation of the monitoring and logging software stacks adopted by the GARR Infrastructure and System Support departments.


Session 3 (chairs: Simon Leinen and Mauro Campanella)

Community Shared Telemetry, Karl Newell (Internet2)

...

Bio: Alex Moura is Network Engineer and Science Engagement Specialist at RNP, the Brazilian National Research and Education Network, and holds a master's degree in information systems and computer networks from Unirio.


Session 4 (chairs: Pavle Vuletic and Ivana Golub)

Network Telemetry at AmLight, Jeronimo Bezerra (Amlight)

Abstract

...

Bio:

DPDK + Kafka: Multi-MPPS Telemetry Data Ingest and Stream Processing at ESnet, Richard Cziva (ESnet)

Abstract

Bio: Richard Cziva is a software engineer at ESnet. He has a range of technical interests including traffic and performance analysis, data-plane programmability, high-speed packet processing, software-defined networking, and network function virtualization.
Prior to joining ESnet in 2018, Richard was a Research Associate at University of Glasgow, where he looked at how advanced services (e.g., personalized firewalls, intrusion detection modules, measurement functions) can be implemented and managed inside wide area networks with programmable edge capabilities.
Richard holds a BSc in Computer Engineering (2013) from Budapest University of Technology and Economics, Hungary and a Ph.D. in Computer Science (2018) from University of Glasgow, United Kingdom.

NetSage measurement and monitoring platform, Doug Jontz (Indiana University)

Abstract: 

...

Funded by the U.S. National Science Foundation (NSF), AmLight is a distributed academic exchange point connecting national and regional research and education networks in Latin America to the U.S. and Africa. AmLight is responsible for transporting science data related to most telescopes in Chile and supporting the Large Hadron Collider Tier 2 data center in Brazil, and many other science projects.

AmLight operates as an SDN network since 2014 and is being migrated to a white-box infrastructure to support P4Runtime and In-band Network Telemetry (INT). In 2018, Florida International University (FIU) was funded by NSF to evaluate telemetry opportunities over AmLight links to enable real-time monitoring of data science flows, including the Vera Rubin Observatory’s flows formerly known as Large Synoptic Survey Telescope (LSST).

Currently, seven Tofino-based white boxes are deployed at AmLight using the NoviWare network operating system to gather and export telemetry reports. With this presentation, we aim to share our experience, achievements, and struggles/challenges.

In-band Network Telemetry over Tofino chip enables switches to export, per packet, IP+TCP/UDP header, and INT metadata. The INT metadata currently supported includes ingress port ID, egress port ID, ingress timestamp, egress timestamp, hop delay, egress queue ID, and egress queue occupancy). Each Tofino-chip switch in the path adds its INT metadata to user packets. The Tofino chip exports the data directly from the data plane, in real-time, to an INT Collector.

We created several tools for data analysis and visualization/correlation of events.

Real-time visibility of interface buffers/queues gives us an understanding of where the points of attention are. Also, we have proof-of-transit per packet, equivalent to a layer 1/2 traceroute.

A typical Vera Rubin telescope data transfer will be 5-second bursts of 9Kbytes packets at 40+Gbps from Chile to the U.S. throughout the night. Each burst creates a telemetry flow of 1.4Gbps @ 487kpps and a total 900MB of telemetry data to be processed/stored/shared. The challenge is receiving 487.000 256-byte packets per second, single flow, single NIC queue, single CPU core, and process them in real-time. Without Kernel bypass, most CPU cores will operate at 100% and drop more than 80% of the packets due to the high CPU utilization. And, this is just one flow over AmLight.

Bio: the AmLight Network Architect

DPDK + Kafka: Multi-MPPS Telemetry Data Ingest and Stream Processing at ESnet, Richard Cziva (ESnet)

Abstract: We will introduce ESnet’s per-packet telemetry collection system (uses Xilinx FPGAs).

The main focus will be on our DPDK application called fastcapa-ng that takes telemetry packets from the wire and pushed it to Kafka. It can do filtering, down-sampling (user specified 1:X ration) and histogram generation (user configurable) that we implemented in this DPDK app. We also show prometheus / Grafana integration to monitor our pipeline.

I will show how we can run stream processing application using Kafka Streams API. Simple code for SYN Flood detection example will be shown.

Ingest rate challenges will be highlighted.

Bio: Richard Cziva is a software engineer at ESnet. He has a range of technical interests including traffic and performance analysis, data-plane programmability, high-speed packet processing, software-defined networking, and network function virtualization.
Prior to joining ESnet in 2018, Richard was a Research Associate at University of Glasgow, where he looked at how advanced services (e.g., personalized firewalls, intrusion detection modules, measurement functions) can be implemented and managed inside wide area networks with programmable edge capabilities.
Richard holds a BSc in Computer Engineering (2013) from Budapest University of Technology and Economics, Hungary and a Ph.D. in Computer Science (2018) from University of Glasgow, United Kingdom.

NetSage Use Cases and Scalability, Doug Southworth (Indiana University)

Abstract: NetSage is a unified, open, privacy-aware measurement, analysis, and visualization service designed to address the needs of today’s research and education (R&E) data sharing collaborations. The innovative aspect of NetSage is not in the individual pieces but rather in the integration of data sources to support objective performance observations as a whole. NetSage uses a combination of passive and active measurement data to provide longitudinal performance visualizations via performance Dashboards. The Dashboards can be used to identify changes of behaviors over monitored resources, new patterns for data transfers, or unexpected data movement to help researchers achieve better performance for inter-institutional data sharing

Bio: Doug Southworth is a Network Systems Analyst for International Networks at Indiana University, working with EPOC, perfSONAR, and NetSage in both developer and science engagement roles, focusing on performance analysis. Prior to working at IU, Southworth has held senior systems engineer positions with several state and federal agencies, including his last position with the United States Courts.

A Proposal towards sFlow Monitoring Dashboards for AI-controlled NRENs, Mariam Kiran (Esnet)

Abstract: Network monitoring collects heterogeneous data such as various kinds of performance data such as TCP transfers, packet-related checks, bandwidth, download speeds, and more, usually through passive and active probing of the network. Multiple monitoring tools can help collect these disparate, heterogeneous metrics, but mostly through probing the network which introduces challenges of extra noise or packets that are also recorded. Additionally having a visualization tool that encompasses all this data into one is challenging to build. In this paper, we start by discussing NetGraf, a tool we were developing for multiple network monitoring tools to visualize using Grafana, and discuss the key findings and challenges we faced. As a result, we propose to further work towards sFlow monitoring dashboard to improve network monitoring challenges. This paper contributes to the theme of automating open-source network monitoring tools software setups and their usability for researchers looking to deploy an end-to-end monitoring stack on their own testbeds.

Bio: Mariam Kiran is a research scientist with shared positions with Energy Sciences Network and the Scientific Data Management (SDM) group in Computational Research Division. Her work specifically concentrates on using advanced software and machine learning techniques to advance system architectures, particularly high-speed networks such as DOE networks.

Her current work is exploring reinforcement learning, unsupervised clustering and classification techniques to optimally control distributed network resources, improving high-speed big data transfers for exascale science applications and optimize how current network infrastructure is utilized. Kiran is the recipient of the DOE ASCR Early Career Award in 2017. Before joining LBNL in 2016, Kiran held positions as a lecturer and research fellow at the Universities of Sheffield and Leeds in the UK. She earned her undergrad and PhD degree in software engineering and computer science from the University of Sheffield, UK in 2011.