Performance
Response time targets, throughput, latency percentiles (P50 / P95 / P99)
Performance defines how a system behaves under load, measured primarily by response time, throughput, and latency percentiles. The industry standard is to express latency as P50 (median), P95, and P99 tail latencies — Google's SRE book targets P99 < 200 ms for interactive services. Throughput is measured in requests per second (RPS) or transactions per second (TPS), and capacity planning requires understanding both the average and peak burst. Amdahl's Law bounds the theoretical speedup achievable through parallelism, making single-threaded bottlenecks the first target in performance optimization.
Key Points
- P50 (median) latency hides tail pain — always measure P95 and P99; a P99 spike means 1 in 100 users has a bad experience.
- Throughput and latency are inversely related under saturation — Little's Law: L = λW, where L is concurrent requests, λ is arrival rate, W is average latency.
- Latency budgets decompose end-to-end SLOs into per-service targets (e.g., 200 ms total → 50 ms auth + 80 ms DB + 70 ms business logic).
- CPU-bound services scale with more cores/faster CPUs; I/O-bound services benefit more from async I/O, connection pooling, and caching.
- Cache hit ratio is a leading indicator — a 99% hit rate on a 1 ms cache vs. a 10 ms DB lookup yields 10× latency improvement at scale.
- Connection pool exhaustion is a common P99 spike source — tune pool sizes based on observed concurrency, not defaults.
- Benchmark with realistic production traffic shapes (diurnal patterns, spike events) — synthetic benchmarks routinely miss real bottlenecks.
- Use flame graphs and distributed traces to identify hot paths; avoid premature optimization until profiling confirms the bottleneck.
| Percentile | What It Represents | Typical Target (interactive API) | Example System |
|---|---|---|---|
| P50 (Median) | 50% of requests are faster than this | < 50 ms | Twitter feed API |
| P75 | 75% of requests are faster | < 100 ms | Google Search |
| P90 | 90% of requests are faster | < 150 ms | Amazon product page |
| P95 | 95% of requests are faster; key SLO tier | < 200 ms | Stripe payment API |
| P99 | 99% of requests are faster; tail latency signal | < 500 ms | Netflix streaming start |
| P99.9 (P999) | 1 in 1,000 worst requests | < 1,000 ms | High-frequency trading |
| Max | Worst single observed request | Monitor for outliers | Any system |
Real-World Example
Amazon famously found that every 100 ms of latency cost them 1% in sales — this drove their investment in ElastiCache and DynamoDB. Google's Core Web Vitals now tie search ranking directly to P75 page load times, making performance a revenue-critical NFR for any web property.