Tuneable Linux Kernel IP Parameters

See the TCP Man Pages and Oscar Andreasson's Ipsysctl Tutorial for more information on Linux Kernel parameters.

Changes in newer Linux kernel versions

Note that recently there have been significant changes to the Linux kernel concerning these parameters. In particular, many aspects of TCP buffer auto-tuning were implemented. In Linux 2.6.17, auto-tuning has been implemented for both the send and receive directions, and this has made it possible to dramatically raise the default buffer size limits. So if you run a 2.6.17 or newer kernel, it should be possible to achieve very decent TCP throughput over networks with large bandwidth*delay products without any tuning of these parameters.

NOTE! If the application manually selects the socket buffer size, buffer auto-tuning is automatically disabled for that connection. In some cases manual selection can now cause worse (burstier) performance especially if there is congestion-related packet loss!

Description of individual parameters

net/core/rmem_default

The default general socket receive buffer (overwritten by =/tcp_rmem=)

net/core/wmem_default

The default general socket send buffer (overwritten by =/tcp_wmem=)

net/core/rmem_max

The maximum socket receive buffer (not overwritten by =/tcp_rmem=)

net/core/wmem_max

The maximum socket send buffer (not overwritten by =/tcp_wmem=)

net/core/netdev_max_backlog

The maximum number of socket-buffers (a socket buffer is an internal representation of a packet) that will be read during a softirq (soft interrupt). It is during the softirq that the protocol handlers are run. See Gianluca Insolvibile's Inside the Linux Kernel for a more in-depth description.

net/ipv4/tcp_mem

from the TCP Man Pages

This is a vector of 3 integers: (low, pressure, high). These bounds are used by TCP to track its memory usage. The defaults are calculated at boot time from the amount of available memory.

net/ipv4/tcp_rmem

from the TCP Man Pages

This is a vector of 3 integers: (min, default, max). These parameters are used by TCP to regulate receive buffer sizes. TCP dynamically adjusts the size of the receive buffer from the defaults listed below, in the range of these sysctl variables, depending on memory available in the system.

net/ipv4/tcp_wmem

from the TCP Man Pages

This is a vector of 3 integers: (min, default, max). These parameters are used by TCP to regulate send buffer sizes. TCP dynamically adjusts the size of the send buffer from the default values listed below, in the range of these sysctl variables, depending on memory available.

net/ipv4/tcp_sack

The tcp_sack variable enables Selective Acknowledgements (SACK) as they are defined in RFC 2018 - TCP Selective Acknowledgement Options and RFC 2883 - An Extension to Selective Acknowledgement (SACK) Option for TCP. These RFC documents contain information on an TCP option that was especially developed to handle lossy connections.

If this variable is turned on, our host will set the SACK option in the TCP option field in the TCP header when it sends out a SYN packet. This tells the server we are connecting to that we are able to handle SACK. In the future, if the server knows how to handle SACK, it will then send ACK packets with the SACK option turned on. This option selectively acknowledges each segment in a TCP window. This is especially good on very lossy connections (connections that loose a lot of data in the transfer) since this makes it possible to only retransmit specific parts of the TCP window which lost data and not the whole TCP window as the old standards told us to do. This means that if a certain segment of a TCP window is not received, the receiver will not return a SACK for that segment. The sender will then know which packets where not received by the receiver, and will hence retransmit that packet. For redundancy, this option will fill up all space possibly within the option space, 40 bytes per segment. Each SACK'ed packet takes up 2 32-bit unsigned integers and hence the option space can contain 4 SACK'ed segments. However, normally the timestamp option is used in conjunction with this option. The timestamp option takes up 10 bytes of data, and hence only 3 segments may be SACK'ed in each packet in normal operation. ... The tcp_sack option takes a boolean value. This is per default set to 1, or turned on. This is generally a good idea and should cause no problems.

from Oscar Andreasson's Ipsysctl Tutorial

– Main.TobyRodwell - 17 Feb 2006

-- Main.SimonLeinen - 09 Aug 2006

-- Main.PekkaSavola - 25 Oct 2006