Data Transfer Node (DTN)

This Wiki page aims to provide a concise review of “state-of-the-art” in the use of DTN infrastructures by NRENs, primarily those in Europe, but activities by R&E networks outside Europe are also referenced. The review includes experiments, testbeds, architectural components (including transfer software and user interfaces), lessons learnt, and a gap analysis.




Introduction

Transfer of large science data over wide area networks require maximum usage of the network throughput with a combination of transfer tools for high-speed/big file/ multiple file data movement. The complexity of data sources from multiple and distributed teams and complex science work-flow, scaling and spanning resources between multiple sites to store or process the data is becoming a challenge for hardware and software architecture.

To improve data transfer between different sites, dedicated computer systems and architectures are used to improve performance. Data Transfer Nodes (DTN) are used to overcome this problem. DTNs are dedicated (usually Linux based) servers, with specific hi-end hardware components and dedicated transfer tools and are configured specifically for wide area data transfer.

In science community many research groups employ a number of DTN instances, with dedicated network pipes for multiple high data file transfers, that bypass network firewalls, filtering services, BGP or QoS restrictions, etc. The challenge that research groups are facing is: “that despite the high performance of the hardware equipment, data transfers are much lower than the bandwidth provided (specialty with bandwidth beyond 40Gbit/s)”.

Why do large scale data transfers matter to NRENs?

Research projects, like those related to high energy, genomics or astronomy, need to transfer large amounts of data to complete calculations and get results in a relatively short period of time. In the past, the physical shipping of hard disks full of data was the fastest option. With the high bandwidth pipes offered by research and education networks and the use of DTN, the transfer can be easily done just using the appropriate tools. There are many reasons to use dedicated DTNs in research and education networks. For instance: 

  • To support data-intensive science projects; 
  • To support short distance transfer of large data and examine it in terms of optimization parameters;   
  • To support long distance transfer and examine it in terms of optimization parameters. 
  • To avoid performance problems related to elements like firewalls, bandwidth management equipment, LANs, etc;
  • To avoid the problems related to packet loss in TCP, as big data transfers are very sensitive to it, specially in long distances (in most of the cases, even a fraction of 1% packet loss leads to poor performance, although there are exceptions, like the Google congestion-based control algorithm or “Bottleneck Bandwidth and Round-trip time (BBR)” (TCP-BBR) case, which is far less sensitive to packet loss); 
  • For congestion / provisioning of the big data transfers in terms of hardware usage (e.g. CPU,memory,buffers) at the servers/clients and  network usage (e.g. QoS,Bandwidth, Data Rate) at network equipment; 
  • To examine the optimizations of the network (network path, MTU, etc);
  • To deploy Science DMZ at the end sites / campuses; https://fasterdata.es.net/science-dmz/
  • To examine the establishment of data-intensive transfer users (like the WLCG community) that have already addressed this concern (File Transfer Service, and the same principles as Science DMZ) [FTS-CERN]; 
  • To improve the performance for long tail science, and emerging science disciplines, that are subject to a poor experience in terms of big data transfers.


  • No labels