Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Currently, seven Tofino-based white boxes are deployed at AmLight using the NoviWare network operating system to gather and export telemetry reports. With this presentation, we aim to share our experience, achievements, and struggles/challenges.

Answering the conference request:

• Which data are collected and how

In-band Network Telemetry over Tofino chip enables switches to export, per packet, IP+TCP/UDP header, and INT metadata. The INT metadata currently supported includes ingress port ID, egress port ID, ingress timestamp, egress timestamp, hop delay, egress queue ID, and egress queue occupancy). Each Tofino-chip switch in the path adds its INT metadata to user packets. The Tofino chip exports the data directly from the data plane, in real-time, to an INT Collector.• Tools used for the analysis and presentation/visualization/storage of data

We created several tools for data analysis and visualization/correlation of events.

• Benefits

Real-time visibility of interface buffers/queues gives us an understanding of where the points of attention are. Also, we have proof-of-transit per packet, equivalent to a layer 1/2 traceroute.• Issues, challenges, and gaps - what you would like to be able to do but cannot

A typical Vera Rubin telescope data transfer will be 5-second bursts of 9Kbytes packets at 40+Gbps from Chile to the U.S. throughout the night. Each burst creates a telemetry flow of 1.4Gbps @ 487kpps and a total 900MB of telemetry data to be processed/stored/shared. The challenge is receiving 487.000 256-byte packets per second, single flow, single NIC queue, single CPU core, and process them in real-time. Without Kernel bypass, most CPU cores will operate at 100% and drop more than 80% of the packets due to the high CPU utilization. And, this is just one flow over AmLight.

...