Performance Testing | Performance & Scalability | System Design

Performance testing validates system behavior under various load conditions, from normal operational levels to breaking points. Unlike functional testing, performance testing finds non-functional defects — too slow, crashes under load, degrades over time. The five primary test types (load, stress, soak, spike, and volume) each probe different failure modes. Performance budgets — p99 latency <200ms, error rate <0.1% — translate non-functional requirements into enforceable CI gates using tools like k6 with thresholds.

Key Points

Load testing: simulate expected concurrent users and request rates to verify system meets SLOs under normal operating conditions.
Stress testing: increase load beyond expected maximum to find the breaking point — identifies at what concurrency errors spike and which component fails first.
Soak testing (endurance): run at moderate load for 24–72 hours — reveals memory leaks, connection pool exhaustion, and slow disk fill that only appear over time.
Spike testing: sudden 10x traffic burst for 30–60 seconds — validates auto-scaling speed, queue back-pressure, and graceful degradation under sudden demand.
Volume testing: process extremely large data sets — validates DB query plans don't degrade, exports don't OOM, and batch jobs complete within SLA windows.
Baseline benchmarks: establish p50/p95/p99 latency and throughput at N concurrent users — run after every major release to detect performance regressions.
k6 thresholds: fail test if `http_req_duration{p(99)} > 200ms` or `rate(http_req_failed) > 0.01` — enforce performance SLOs as CI gates.
Distributed load generation: single machine tops out at ~10,000 concurrent connections — k6 Cloud, Locust distributed mode, or Artillery Cloud for higher concurrency.

Test Type	Load Level	Duration	What It Finds	Tool Examples
Load	Expected peak (1x)	10–30 minutes	SLO compliance under normal conditions	k6, Locust, JMeter
Stress	Beyond peak (2x–5x+)	10–30 minutes	Breaking point, first failure component	k6, Gatling
Soak/Endurance	Moderate (70% of peak)	24–72 hours	Memory leaks, connection exhaustion, disk fill	Locust, JMeter
Spike	Sudden 10x burst	30–60 seconds	Auto-scaling speed, queue handling, graceful degradation	k6, Artillery
Volume	Expected users, large data	1–4 hours	Data-scale DB degradation, batch job SLAs	JMeter, custom scripts

Real-World Example

Spotify runs automated k6 load tests in their CI pipeline — every service must pass p99 latency and error rate thresholds before merging. Their "Performance as Code" initiative embeds load test scripts alongside service code, ensuring tests evolve with the service.

←PreviousCapacity Planning NextBottleneck Identification→