Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This Wiki page aims to provide a concise review of “state-of-the-art” in the use of DTN infrastructures by NRENs, primarily those in Europe, but activities by R&E networks outside Europe are also referenced. The review includes experiments, testbeds, architectural components (including transfer software and user interfaces), lessons learnt, and a gap analysis.


Introduction

Transfer of large science data over wide area networks require maximum usage of the network throughput with a combination of transfer tools for high-speed/big file/ multiple file data movement. The complexity of data sources from multiple and distributed teams and complex science work-flow, scaling and spanning resources between multiple sites to store or process the data is becoming a challenge for hardware and software architecture.

To improve data transfer between different sites, dedicated computer systems and architectures are used to improve performance. Data Transfer Nodes (DTN) are used to overcome this problem. DTNs are dedicated (usually Linux based) servers, with specific hi-end hardware components and dedicated transfer tools and are configured specifically for wide area data transfer.

In science community many research groups employ a number of DTN instances, with dedicated network pipes for multiple high data file transfers, that bypass network firewalls, filtering services, BGP or QoS restrictions, etc. The challenge that research groups are facing is: “that despite the high performance of the hardware equipment, data transfers are much lower than the bandwidth provided (specialty with bandwidth beyond 40Gbit/s)”.


Why large scale data transfers matter for NRENs?

  • It is a very common use case for the NRENs, supporting data-intensive science
  • Short distance transfer
  • Long distance transfer
  • Very sensitive to packet loss
  • Even a fraction of 1% packet loss leads to poor performance
  • Congestion / provisioning
  • Optimize network (Optimize network path, MTU, …)
  • Deploy Science DMZ at the end sites / campuses - https://fasterdata.es.net/science-dmz/
  • Other providers have the same concern, e.g., Google created an algorithm “Bottleneck Bandwidth and Round-trip time (BBR)”, which is far less sensitive to packet loss
  • Established data-intensive transfer users (mainly the CERN community and WLCG) have already addressed this concern (File Transfer Service, and the same principles as Science DMZ).
  • Long tail science, and emerging science disciplines, are subject to a poor experience

What is a DTN?

  • Speaking about DTN you can find multiple definitions:
    • Hardware – physical machine and its specification
    • Software – the transfer tools used
    • Orchestration – managing the flows, possibly configuring bandwidth on demand

DTNs usually mounts to connected file system, whether it is a Storage Area Network (SAN) or High Performance Computing (HPC) network, with network interface to either transmit or receive data files. Dedicated tools like GridFTP, Xrootd, XDD, FDT, BBCP, etc. are installed on a DTN instance to achieve better input/output performance concerning data transfer.

In science community many research groups employ a number of DTN instances, with dedicated network pipes for multiple high data file transfers, that bypass network firewalls, filtering services, BGP or QoS restrictions, etc. The challenge that research groups are facing is: “that despite the high performance of the hardware equipment, data transfers are much lower than the bandwidth provided (specialty with bandwidth beyond 40Gbit/s)”.


  • All functions must be « tuned»
    • CPU is dedicated for data transfer
    • File transfer (large buffers/chunks of data), tuning TCP parameters
    • Logical (sequential) order in sending data

Since DTNs are placed out of the DMZ (demilitarized zone - an additional layer of security to an organization's LAN) or on a local storage networks network, for security reasons, only software for dedicated data transfers are installed on the servers with "allow" access only to the endpoint sites (not open to the normal Internet traffic).

Example of DTN architecture:


GÉANT NREN Survey

WP6 T1 is currently investigating how European NRENs support their user communities in making optimal use of their networks for large scale data transfers, including the use of Data Transfer Nodes (DTNs). Survey results are available at: GÉANT NREN Survey Page



Contact Us

We are very interested to hear from members of the community working on DTN deployment, whether you wish to share your knowledge or find out more. 

You can email the GN4-3 WP6 DTN team at: gn4-3-wp6-t2-dtn@lists.geant.org