Load Balancing
L4 vs L7, algorithms (round-robin, least connections, IP hash), sticky sessions
Load balancers distribute incoming traffic across multiple backend instances to maximise throughput, minimise latency, and ensure availability. Layer 4 (L4) load balancers operate on TCP/UDP — routing based on IP and port without inspecting application data; they offer higher throughput and lower latency. Layer 7 (L7) load balancers inspect HTTP headers, cookies, and paths — enabling content-based routing, A/B testing, authentication offload, and SSL termination. AWS offers NLB (L4, ~100μs latency) and ALB (L7, HTTP/2, WebSocket, path-based routing).
Key Points
- Round-robin distributes requests sequentially across the backend pool — simple and effective for homogeneous, stateless backends with similar request cost.
- Weighted round-robin assigns a weight to each backend — route 80% to new version and 20% to old during canary deployments by adjusting weights.
- Least Connections routes new requests to the backend with the fewest active connections — optimal for heterogeneous backends or variable-cost requests (e.g., long-running file uploads mixed with fast API calls).
- IP Hash (sticky by client IP): deterministically maps client IP to backend via hash — provides session affinity without cookies but breaks if client IP changes (NAT, mobile) or if backend pool size changes.
- Sticky sessions (cookie-based affinity): ALB inserts `AWSALB` cookie with a backend identifier — more reliable than IP hash but prevents even load distribution if some clients have long-lived sessions.
- Health checks: ALB performs HTTP health checks every 30 s (configurable); unhealthy threshold of 2 failures removes the backend from the pool within 60 s — tune for fast failure detection vs flapping.
- Connection draining (deregistration delay): ALB waits up to 300 s (configurable) for in-flight requests to complete before removing a deregistering target — critical for zero-downtime deployments.
- AWS Global Accelerator uses Anycast routing to direct users to the nearest AWS edge PoP, then routes over AWS backbone to the regional ALB — reduces global TTFB by 50% vs public internet routing.
| Algorithm | How It Works | Best For | Pitfall |
|---|---|---|---|
| Round-Robin | Sequential distribution | Stateless, homogeneous backends | Unequal load if requests vary in cost |
| Weighted Round-Robin | Proportional distribution by weight | Canary/blue-green traffic splitting | Static weights need manual adjustment |
| Least Connections | Routes to backend with fewest open conns | Variable request duration (uploads) | Doesn't account for request cost/weight |
| Least Response Time | Routes to backend with lowest avg latency | Latency-sensitive APIs | Requires active latency probing |
| IP Hash | Hash(client IP) → fixed backend | L4 session affinity, game servers | Breaks on IP change or pool resize |
| Cookie Affinity | ALB cookie pins client to backend | Stateful HTTP apps (shopping cart) | Uneven distribution with long sessions |
| Random | Uniform random selection | Simple, predictable distribution | No consideration of backend health/load |
| Consistent Hashing | Virtual ring, minimal rehashing on resize | Cache clusters, stateful partitions | Complex implementation, hot spots |
Real-World Example
Netflix uses Ribbon (client-side load balancer) with Least Response Time algorithm across its microservices — the client maintains a rolling average of server response times and routes new calls to the fastest server, reducing p99 latency by ~20% vs round-robin.