How TCP and Servers Queue Requests
When a server is busy, incoming requests don’t immediately disappear. Instead, they wait in a series of buffers across the network stack. Understanding where this happens helps explain latency spikes, timeouts, and dropped connections (even when the CPU hasn't spiked).
The Kernel Layer (TCP Queues)
Before your application code even knows a request exists, the Operating System handles the handshake.
- SYN Backlog: This is the waiting room for the TCP "Three-Way Handshake." If this fills up (e.g. during a SYN flood) the server the server may delay or drop new connection attempts.
- Accept Queue: Once the handshake is finished, the connection is "Established" and moved here. It stays in this queue until the application calls
accept() to pull it into the app layer.
- Impact: If these fill up, clients see "Connection Refused" or "Connection Timeout."
The Application Layer
Once the app "accepts" the connection, it enters the internal work queue.
- Thread Pools: In languages like Java or Python (WSGI), a fixed number of workers pick up requests. If all workers are busy, the request sits in a local buffer.
- Event Loops: In Node.js or Go, the queue is managed by the runtime's scheduler/event-loop.
Infrastructure Layer (Proxies & Load Balancers)
Most modern apps sit behind Nginx, HAProxy, or a Cloud Load Balancer.
- Surge Queues: These buffers absorb spikes so the backend doesn't crash.
- Connection Limits: Proxies often limit how many concurrent connections they will send to a single backend instance to prevent "cascading failures".
The Client Side
We often forget that the client (the browser or a mobile app) has its own limits.
- Browser Limits: Most browsers only allow 6 concurrent connections per domain.
- Connection Pooling: If your microservice uses an HTTP client to call another service, it may be queuing requests locally because its internal pool is exhausted.
What Happens When Everything Is Full?
When queues reach their limit, the system must choose:
- Tail Drop: Reject the newest incoming requests immediately.
- Increased Latency: Let the queue grow, causing requests to time out before they are even processed.
It is almost always better to fail fast with an HTTP 503 than to let a queue grow indefinitely. This allows the client to retry or fail gracefully rather than hanging forever.