Load balancers distribute incoming traffic across multiple backend instances to maximise throughput, minimise latency, and ensure availability. Layer 4 (L4) load balancers operate on TCP/UDP — routing based on IP and port without inspecting application data; they offer higher throughput and lower latency. Layer 7 (L7) load balancers inspect HTTP headers, cookies, and paths — enabling content-based routing, A/B testing, authentication offload, and SSL termination. AWS offers NLB (L4, ~100μs latency) and ALB (L7, HTTP/2, WebSocket, path-based routing).

Key Points

  • Round-robin distributes requests sequentially across the backend pool — simple and effective for homogeneous, stateless backends with similar request cost.
  • Weighted round-robin assigns a weight to each backend — route 80% to new version and 20% to old during canary deployments by adjusting weights.
  • Least Connections routes new requests to the backend with the fewest active connections — optimal for heterogeneous backends or variable-cost requests (e.g., long-running file uploads mixed with fast API calls).
  • IP Hash (sticky by client IP): deterministically maps client IP to backend via hash — provides session affinity without cookies but breaks if client IP changes (NAT, mobile) or if backend pool size changes.
  • Sticky sessions (cookie-based affinity): ALB inserts `AWSALB` cookie with a backend identifier — more reliable than IP hash but prevents even load distribution if some clients have long-lived sessions.
  • Health checks: ALB performs HTTP health checks every 30 s (configurable); unhealthy threshold of 2 failures removes the backend from the pool within 60 s — tune for fast failure detection vs flapping.
  • Connection draining (deregistration delay): ALB waits up to 300 s (configurable) for in-flight requests to complete before removing a deregistering target — critical for zero-downtime deployments.
  • AWS Global Accelerator uses Anycast routing to direct users to the nearest AWS edge PoP, then routes over AWS backbone to the regional ALB — reduces global TTFB by 50% vs public internet routing.
AlgorithmHow It WorksBest ForPitfall
Round-RobinSequential distributionStateless, homogeneous backendsUnequal load if requests vary in cost
Weighted Round-RobinProportional distribution by weightCanary/blue-green traffic splittingStatic weights need manual adjustment
Least ConnectionsRoutes to backend with fewest open connsVariable request duration (uploads)Doesn't account for request cost/weight
Least Response TimeRoutes to backend with lowest avg latencyLatency-sensitive APIsRequires active latency probing
IP HashHash(client IP) → fixed backendL4 session affinity, game serversBreaks on IP change or pool resize
Cookie AffinityALB cookie pins client to backendStateful HTTP apps (shopping cart)Uneven distribution with long sessions
RandomUniform random selectionSimple, predictable distributionNo consideration of backend health/load
Consistent HashingVirtual ring, minimal rehashing on resizeCache clusters, stateful partitionsComplex implementation, hot spots

Real-World Example

Netflix uses Ribbon (client-side load balancer) with Least Response Time algorithm across its microservices — the client maintains a rolling average of server response times and routes new calls to the fastest server, reducing p99 latency by ~20% vs round-robin.