Latency Optimization | Performance & Scalability | System Design

Latency optimization targets the delays in the critical path of a user request: DNS resolution, TCP handshake, TLS negotiation, network transit, server processing, and response transmission. Each millisecond of latency has measurable business impact — Amazon found that 100ms of additional latency costs 1% of sales; Google found 500ms delays reduced search traffic by 20%. Techniques like geographic co-location, connection keep-alive, HTTP/2 multiplexing, and DNS prefetching are standard practices at every high-performance web service.

Key Points

Geographic co-location: deploy application servers in the same AWS region and AZ as their primary database — eliminate cross-region latency (typically 50–100ms).
TCP keep-alive: reuse established TCP connections for multiple requests — eliminates 3-way handshake overhead (~1 RTT) for every new request.
HTTP/2 multiplexing: multiple concurrent requests over a single TCP connection — eliminates HTTP/1.1 head-of-line blocking for parallel resource loading.
TLS 1.3: single-round-trip handshake vs. TLS 1.2's two round trips — reduces TLS overhead from 2 RTT to 1 RTT for new connections, 0-RTT for session resumption.
DNS prefetch: `<link rel="dns-prefetch" href="//api.example.com">` resolves DNS for cross-origin requests before the browser needs to make them.
Preconnect: `<link rel="preconnect">` performs DNS + TCP + TLS for known origins early in page load — reduces first-request latency by 200–400ms.
HTTP/3 (QUIC): UDP-based protocol eliminating TCP head-of-line blocking — critical for mobile users on lossy networks; supported by Google, Cloudflare, and Fastly.
Server-Sent Events and WebSockets: avoid polling overhead by pushing data to clients — reduces latency for real-time features from poll interval to sub-100ms.

Real-World Example

LinkedIn reduced global median API latency by 60% by co-locating services in the same datacenter rack and implementing HTTP/2 throughout their service mesh. Their "Performance Budgets" system enforces p99 latency SLOs at the individual endpoint level during code review.

←PreviousLB Algorithms NextContent Delivery→