Performance defines how a system behaves under load, measured primarily by response time, throughput, and latency percentiles. The industry standard is to express latency as P50 (median), P95, and P99 tail latencies — Google's SRE book targets P99 < 200 ms for interactive services. Throughput is measured in requests per second (RPS) or transactions per second (TPS), and capacity planning requires understanding both the average and peak burst. Amdahl's Law bounds the theoretical speedup achievable through parallelism, making single-threaded bottlenecks the first target in performance optimization.

Key Points

  • P50 (median) latency hides tail pain — always measure P95 and P99; a P99 spike means 1 in 100 users has a bad experience.
  • Throughput and latency are inversely related under saturation — Little's Law: L = λW, where L is concurrent requests, λ is arrival rate, W is average latency.
  • Latency budgets decompose end-to-end SLOs into per-service targets (e.g., 200 ms total → 50 ms auth + 80 ms DB + 70 ms business logic).
  • CPU-bound services scale with more cores/faster CPUs; I/O-bound services benefit more from async I/O, connection pooling, and caching.
  • Cache hit ratio is a leading indicator — a 99% hit rate on a 1 ms cache vs. a 10 ms DB lookup yields 10× latency improvement at scale.
  • Connection pool exhaustion is a common P99 spike source — tune pool sizes based on observed concurrency, not defaults.
  • Benchmark with realistic production traffic shapes (diurnal patterns, spike events) — synthetic benchmarks routinely miss real bottlenecks.
  • Use flame graphs and distributed traces to identify hot paths; avoid premature optimization until profiling confirms the bottleneck.
PercentileWhat It RepresentsTypical Target (interactive API)Example System
P50 (Median)50% of requests are faster than this< 50 msTwitter feed API
P7575% of requests are faster< 100 msGoogle Search
P9090% of requests are faster< 150 msAmazon product page
P9595% of requests are faster; key SLO tier< 200 msStripe payment API
P9999% of requests are faster; tail latency signal< 500 msNetflix streaming start
P99.9 (P999)1 in 1,000 worst requests< 1,000 msHigh-frequency trading
MaxWorst single observed requestMonitor for outliersAny system

Real-World Example

Amazon famously found that every 100 ms of latency cost them 1% in sales — this drove their investment in ElastiCache and DynamoDB. Google's Core Web Vitals now tie search ranking directly to P75 page load times, making performance a revenue-critical NFR for any web property.