Performance Testing
Latency benchmarks, throughput, stress, soak, spike tests
Performance testing validates system behavior under various load conditions, from normal operational levels to breaking points. Unlike functional testing, performance testing finds non-functional defects — too slow, crashes under load, degrades over time. The five primary test types (load, stress, soak, spike, and volume) each probe different failure modes. Performance budgets — p99 latency <200ms, error rate <0.1% — translate non-functional requirements into enforceable CI gates using tools like k6 with thresholds.
Key Points
- Load testing: simulate expected concurrent users and request rates to verify system meets SLOs under normal operating conditions.
- Stress testing: increase load beyond expected maximum to find the breaking point — identifies at what concurrency errors spike and which component fails first.
- Soak testing (endurance): run at moderate load for 24–72 hours — reveals memory leaks, connection pool exhaustion, and slow disk fill that only appear over time.
- Spike testing: sudden 10x traffic burst for 30–60 seconds — validates auto-scaling speed, queue back-pressure, and graceful degradation under sudden demand.
- Volume testing: process extremely large data sets — validates DB query plans don't degrade, exports don't OOM, and batch jobs complete within SLA windows.
- Baseline benchmarks: establish p50/p95/p99 latency and throughput at N concurrent users — run after every major release to detect performance regressions.
- k6 thresholds: fail test if `http_req_duration{p(99)} > 200ms` or `rate(http_req_failed) > 0.01` — enforce performance SLOs as CI gates.
- Distributed load generation: single machine tops out at ~10,000 concurrent connections — k6 Cloud, Locust distributed mode, or Artillery Cloud for higher concurrency.
| Test Type | Load Level | Duration | What It Finds | Tool Examples |
|---|---|---|---|---|
| Load | Expected peak (1x) | 10–30 minutes | SLO compliance under normal conditions | k6, Locust, JMeter |
| Stress | Beyond peak (2x–5x+) | 10–30 minutes | Breaking point, first failure component | k6, Gatling |
| Soak/Endurance | Moderate (70% of peak) | 24–72 hours | Memory leaks, connection exhaustion, disk fill | Locust, JMeter |
| Spike | Sudden 10x burst | 30–60 seconds | Auto-scaling speed, queue handling, graceful degradation | k6, Artillery |
| Volume | Expected users, large data | 1–4 hours | Data-scale DB degradation, batch job SLAs | JMeter, custom scripts |
Real-World Example
Spotify runs automated k6 load tests in their CI pipeline — every service must pass p99 latency and error rate thresholds before merging. Their "Performance as Code" initiative embeds load test scripts alongside service code, ensuring tests evolve with the service.