Optimizing the Linux TCP Stack: Master Techniques for Handling 10,000+ Concurrent Connections – ITFROMZERO

Table of Contents

Quick Start: Battle-Tested Configuration for High-Load Servers

Is your server slow to respond or dropping connections when traffic spikes? Below are the sysctl.conf parameters I frequently use to resolve these issues. This configuration helped my API system remain stable as CCU (Concurrent Users) surged from 1,000 to 15,000 during flash sales.

First, open the system configuration file:

sudo nano /etc/sysctl.conf

Paste the following optimization configuration at the end of the file:

# Expand port range to avoid ephemeral port exhaustion
net.ipv4.ip_local_port_range = 10000 65535

# Reuse TIME_WAIT sockets faster
net.ipv4.tcp_tw_reuse = 1

# Increase queue for connections in the handshake state (SYN)
net.ipv4.tcp_max_syn_backlog = 8192

# Increase queue limit for established connections
net.core.somaxconn = 8192

# Expand buffer size to 16MB for high-speed connections
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Enable BBR algorithm to reduce congestion and increase throughput
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

To apply the changes immediately without restarting the server, run the following command:

sudo sysctl -p

Why Default Configurations are the “Enemy” of High-Traffic

Distributions like Ubuntu or CentOS are designed to run well on most hardware. However, default settings are often too conservative. When running Nginx, Redis, or Node.js at high intensity, these limits become bottlenecks.

Experience shows that the TIME_WAIT state is the most common cause of server hangs. When a connection closes, Linux holds that socket for 60 seconds to ensure stray packets are handled. At 1,000 requests per second, you will quickly run out of available ports, causing the server to refuse new connections even if the CPU is idle.

Backlog Queues and Packet Dropping

Every time a client sends a SYN packet, it waits in the tcp_max_syn_backlog queue. If this queue is only 128 (default), it fills up in milliseconds during an attack or traffic spike. Consequently, users see “Connection Refused” errors or endless loading spinners.

Analysis of Key Parameters

1. Ephemeral Ports: Expanding Connection Space

By default, Linux reserves only about 28,000 ports for outgoing connections. If your server acts as a Reverse Proxy connecting to multiple backend services, this number is too low. Expanding the port range from 10,000 to 65,535 allows the system to handle more parallel data streams.

2. Leveraging tcp_tw_reuse

Instead of letting sockets “hibernate” for 60 seconds, tcp_tw_reuse allows the kernel to reuse sockets in the TIME_WAIT state for new connections. This is a much safer solution than tcp_tw_recycle (which was removed in newer kernels due to issues with NAT).

3. Optimizing Buffers to Exploit 1Gbps+

Default rmem and wmem values are often too small to saturate the bandwidth of modern network cards. By increasing the buffer to 16MB, TCP Window Scaling has enough room to auto-adjust, significantly speeding up large data transfers.

BBR: “Heavy Weapon” from Google

BBR (Bottleneck Bandwidth and Round-trip propagation time) completely changes how congestion is managed. Instead of waiting for packet loss to slow down, BBR actively estimates bandwidth to maintain the highest possible speed.

In real-world tests on high-latency international links, BBR increases throughput by up to 40% and significantly reduces lag. You can check your current algorithm with the command:

sysctl net.ipv4.tcp_congestion_control

If you see bbr, your system is ready for the largest traffic surges.

Important Operational Notes

Balance RAM Usage: Each socket consumes a certain amount of buffer memory. If your server has only 1GB of RAM, do not set the buffer to 64MB, or you will hit Out of Memory (OOM) errors very quickly.
Monitor Queues: Use the ss -plnt command frequently. If the Send-Q column exceeds the somaxconn value, it’s time to upgrade your configuration or scale your server cluster.
Check TIME_WAIT: Monitor the number of hanging connections using the command:
```
ss -ant | awk '{print $1}' | sort | uniq -c
```
Firewall: Ensure net.netfilter.nf_conntrack_max is large enough (e.g., 262144) so that the firewall doesn’t accidentally drop valid connections due to a full tracking table.

Optimizing the TCP Stack is a process of continuous tuning. No single set of parameters is perfect for every application. Start with the values I suggested, then monitor your dashboards to find the optimal balance for your system.