TCP performance tuning - how to tune linux
The short summary
The default Linux tcp window sizing parameters sucks.
The short fix [wirespeed for gigE within 5 ms RTT and fastE within 50 ms RTT]:
ifconfig ethN txqueuelen 1000
in /etc/sysctl.conf
net/core/rmem_max = 8738140 net/core/wmem_max = 8738140 net/ipv4/tcp_rmem = 4096 873814 8738140 net/ipv4/tcp_wmem = 4096 873814 8738140
The background
TCP performance is limited by latency and window size (and overhead, which reduces the effective window size) by window_size/RTT (this is how much data that can be "in transit" over the link at any given moment).
To get the actual transfer speeds possible you have to divide the resulting window by the latency (in seconds):
The overhead is: window/2^tcp_adv_win_scale (tcp_adv_win_scale default is 2)
So for linux default parameters for the recieve window (tcp_rmem): 87380 - (87380 / 2^2) = 65536.
Given a transatlantic link (150 ms RTT), the maximum performance ends up at: 65536/0.150 = 436906 bytes/s or about 400 kbyte/s, which is really slow today.
With the increased default size: (873814 - 873814/2^2)/0.150 = 4369070 bytes/s, or about 4Mbytes/s, which is resonable for a modern network. And note that this is the default, if the sender is configured with a larger window size it will happily scale up to 10 times this (8738140*0.75/0.150 = ~40Mbytes/s), pretty good for a modern network.
For the txqueuelen, this is mostly relevant for gigE, but should not hurt anything else. The default might actually be 1000 with newer kernel versions, but this is not always the case, and a txqueuelen of 100 or less has been shown by experience to be too small.
net/core/[rw]mem_max is in bytes, and the largest possible window size. net/ipv4/tcp_[rw]mem is in bytes and is "min default max" for the tcp windows, this is negotiated between both sender and reciever. "r" is for when this machine is on the recieving end, "w" when the connection is initiated from this machine.
There are more tuning parameters, for the Linux kernel they are documented in Documentation/networking/ip-sysctl.txt, but in our experience only the parameters above need tuning to get good tcp performance..
So, what's the downside?
None pretty much. It uses a bit more kernel memory, but this is well regulated by a tuning parameter (net/ipv4/tcp_mem) that has good defaults (percentage of physical ram). Note that you shouldn't touch that unless you really know what you are doing. If you change it and set it too high, you might end up with no memory left for processes and stuff.
If you go up above the middle value of net/ipv4/tcp_mem, you enter tcp_memory_pressure, which means that new tcp windows won't grow until you have gotten back under the pressure value. Allowing bigger windows means that it takes fewer connections for someone evil to make the rest of the tcp streams to go slow.
What you remove is an artificial limit to tcp performance, without that limit you are bounded by the available end-to-end bandwidth and loss. So you might end up saturating your uplink more effectively, but tcp is good at handling this.
The txqueuelen increase will eat about 1.5 megabytes of memory at most given an MSS of 1500 bytes (normal ethernet).
Notes for other operating systems
Find out how to set the tcp window sizing options, then increase them to sensible values. Most operating systems have the same horribly small default values.
AIX example:
no -p -o sb_max=8738140 -o rfc1323=1 -o tcp_recvspace=873814 -o tcp_sendspace=873814 -o use_isno=0
AIX 5.2 sets tuning parameters permanently like this, otherwise you have to set them at boot in /etc/rc.net. See man no for more information on AIX.
Solaris example:
ndd -set /dev/tcp tcp_xmit_hiwat 873814 ndd -set /dev/tcp tcp_recv_hiwat 873814 ndd -set /dev/tcp tcp_max_buf 8738140
See also: http://www.psc.edu/networking/projects/tcptune/ - long page with lots of info on obscure operating systems.
About this document: Written by Mattias Wadenstein <maswan@acc.umu.se>, please send suggestions for improvements. Feel free to use in original or modified form or link here.