Performance | Non-Functional Requirements | System Design

Performance defines how a system behaves under load, measured primarily by response time, throughput, and latency percentiles. The industry standard is to express latency as P50 (median), P95, and P99 tail latencies — Google's SRE book targets P99 < 200 ms for interactive services. Throughput is measured in requests per second (RPS) or transactions per second (TPS), and capacity planning requires understanding both the average and peak burst. Amdahl's Law bounds the theoretical speedup achievable through parallelism, making single-threaded bottlenecks the first target in performance optimization.

Key Points

P50 (median) latency hides tail pain — always measure P95 and P99; a P99 spike means 1 in 100 users has a bad experience.
Throughput and latency are inversely related under saturation — Little's Law: L = λW, where L is concurrent requests, λ is arrival rate, W is average latency.
Latency budgets decompose end-to-end SLOs into per-service targets (e.g., 200 ms total → 50 ms auth + 80 ms DB + 70 ms business logic).
CPU-bound services scale with more cores/faster CPUs; I/O-bound services benefit more from async I/O, connection pooling, and caching.
Cache hit ratio is a leading indicator — a 99% hit rate on a 1 ms cache vs. a 10 ms DB lookup yields 10× latency improvement at scale.
Connection pool exhaustion is a common P99 spike source — tune pool sizes based on observed concurrency, not defaults.
Benchmark with realistic production traffic shapes (diurnal patterns, spike events) — synthetic benchmarks routinely miss real bottlenecks.
Use flame graphs and distributed traces to identify hot paths; avoid premature optimization until profiling confirms the bottleneck.

Percentile	What It Represents	Typical Target (interactive API)	Example System
P50 (Median)	50% of requests are faster than this	< 50 ms	Twitter feed API
P75	75% of requests are faster	< 100 ms	Google Search
P90	90% of requests are faster	< 150 ms	Amazon product page
P95	95% of requests are faster; key SLO tier	< 200 ms	Stripe payment API
P99	99% of requests are faster; tail latency signal	< 500 ms	Netflix streaming start
P99.9 (P999)	1 in 1,000 worst requests	< 1,000 ms	High-frequency trading
Max	Worst single observed request	Monitor for outliers	Any system

Real-World Example

Amazon famously found that every 100 ms of latency cost them 1% in sales — this drove their investment in ElastiCache and DynamoDB. Google's Core Web Vitals now tie search ranking directly to P75 page load times, making performance a revenue-critical NFR for any web property.

NextScalability→