Load Balancing | Networking & Security | System Design

Load balancers distribute incoming traffic across multiple backend instances to maximise throughput, minimise latency, and ensure availability. Layer 4 (L4) load balancers operate on TCP/UDP — routing based on IP and port without inspecting application data; they offer higher throughput and lower latency. Layer 7 (L7) load balancers inspect HTTP headers, cookies, and paths — enabling content-based routing, A/B testing, authentication offload, and SSL termination. AWS offers NLB (L4, ~100μs latency) and ALB (L7, HTTP/2, WebSocket, path-based routing).

Key Points

Round-robin distributes requests sequentially across the backend pool — simple and effective for homogeneous, stateless backends with similar request cost.
Weighted round-robin assigns a weight to each backend — route 80% to new version and 20% to old during canary deployments by adjusting weights.
Least Connections routes new requests to the backend with the fewest active connections — optimal for heterogeneous backends or variable-cost requests (e.g., long-running file uploads mixed with fast API calls).
IP Hash (sticky by client IP): deterministically maps client IP to backend via hash — provides session affinity without cookies but breaks if client IP changes (NAT, mobile) or if backend pool size changes.
Sticky sessions (cookie-based affinity): ALB inserts `AWSALB` cookie with a backend identifier — more reliable than IP hash but prevents even load distribution if some clients have long-lived sessions.
Health checks: ALB performs HTTP health checks every 30 s (configurable); unhealthy threshold of 2 failures removes the backend from the pool within 60 s — tune for fast failure detection vs flapping.
Connection draining (deregistration delay): ALB waits up to 300 s (configurable) for in-flight requests to complete before removing a deregistering target — critical for zero-downtime deployments.
AWS Global Accelerator uses Anycast routing to direct users to the nearest AWS edge PoP, then routes over AWS backbone to the regional ALB — reduces global TTFB by 50% vs public internet routing.

Algorithm	How It Works	Best For	Pitfall
Round-Robin	Sequential distribution	Stateless, homogeneous backends	Unequal load if requests vary in cost
Weighted Round-Robin	Proportional distribution by weight	Canary/blue-green traffic splitting	Static weights need manual adjustment
Least Connections	Routes to backend with fewest open conns	Variable request duration (uploads)	Doesn't account for request cost/weight
Least Response Time	Routes to backend with lowest avg latency	Latency-sensitive APIs	Requires active latency probing
IP Hash	Hash(client IP) → fixed backend	L4 session affinity, game servers	Breaks on IP change or pool resize
Cookie Affinity	ALB cookie pins client to backend	Stateful HTTP apps (shopping cart)	Uneven distribution with long sessions
Random	Uniform random selection	Simple, predictable distribution	No consideration of backend health/load
Consistent Hashing	Virtual ring, minimal rehashing on resize	Cache clusters, stateful partitions	Complex implementation, hot spots

Real-World Example

Netflix uses Ribbon (client-side load balancer) with Least Response Time algorithm across its microservices — the client maintains a rolling average of server response times and routes new calls to the fastest server, reducing p99 latency by ~20% vs round-robin.

←PreviousTLS / SSL NextReverse Proxy & API Gateway→