This feature is also known as "Segmentation Offload", "TCP Segmentation Offload (TSO)", "[TCP] Multidata Transmit (MDT), or "TCP Large Send".
From Microsoft's document, Windows Network Task Offload:
With Segmentation Offload, or TCP Large Send, TCP can pass a buffer to be transmitted that is bigger than the maximum transmission unit (MTU) supported by the medium. Intelligent adapters implement large sends by using the prototype TCP and IP headers of the incoming send buffer to carve out segments of required size. Copying the prototype header and options, then calculating the sequence number and checksum fields creates TCP segment headers. All other information, such as options and flag values, are preserved except where noted.
Large Send Offload can be seen as doing for output what interrupt coalescence combined with large-receive offload does for input, namely reduce the number of (bus/interrupt) transactions between CPUs and network adapters by bundling multiple packets to larger transactions (scatter/gather).
Hardware (network adapter) support for LSO is a refinement of transmit chaining, where multiple transmitted frames can be sent from the host to the adapter in a single transaction.
Issues with Large Send Offload
Timing and Burstiness
Like Interrupt Coalescence, LSO can affect packet timing and increase burstiness. An illustration of this effect is this patch that modified LSO (TSO as it is called in Linux) to bound the time that outgoing segments can be held while trying to accumulate a larger transfer unit. The accompanying message to the
netdev mailing list includes some graphs that show the impact of (pre-patch) TSO on RTTs over a low-speed link.
In Linux, the burstiness issue was addressed in 2013 in a TSO autosizing patch by Eric Dumazet.
(Transport) Protocol Fossilization
The way it is defined by most of the industry, LSO needs to be aware of the transport protocols. In particular, it must be able to split over-large transport segments into suitable sub-segments, and generate transport (e.g. TCP) headers for these sub-segments. This function is typically implemented in the adapter's firmware, for some popular transport protocol such as TCP. This makes it hard to implement additional functions such as IPSec, or the TCP MD5 Authentication option, or even other transport protocols such as SCTP.
There is a weakened form of LSO that requires the host operating system to prepare the segmentation and construct headers. This allows for "dumber" network adapters, and in particular it doesn't require them to be transport protocol-aware. It still provides significant performance improvement because multiple segments can be transferred between host and adapter in a single transaction, which reduces bus occupation and other overhead. Sun's Solaris operating system supports this variant of LSO under the name of "MDT" (Multidata Transmit), and the Linux kernel added something similar as part of "GSO" in 2.6.18 (September 2006) for IPv4 and in 2.6.35 (August 2010) for IPv6.
Under Linux, LSO/TSO can be controlled using the
-K option to the
ethtool command, which can also be used to control other offloading features. It is typically enabled by default if kernel/driver and adapter support it.
- Windows Network Task Offload, Microsoft Web site, December 2001, http://www.microsoft.com/whdc/device/network/taskoffload.mspx
- Thread: LRO Implementation, June 2008, discussion on the OpenSolaris networking forum that includes a debate of the relative merits of LSO vs. MDT.
- tcp: TSO packets automatic sizing, LWN.net, Eric Dumazet, August 2013
- TSO sizing and the FQ scheduler, LWN.net, Jonathan Courbet, August 2013