Performance testing validates system behavior under various load conditions, from normal operational levels to breaking points. Unlike functional testing, performance testing finds non-functional defects — too slow, crashes under load, degrades over time. The five primary test types (load, stress, soak, spike, and volume) each probe different failure modes. Performance budgets — p99 latency <200ms, error rate <0.1% — translate non-functional requirements into enforceable CI gates using tools like k6 with thresholds.

Key Points

  • Load testing: simulate expected concurrent users and request rates to verify system meets SLOs under normal operating conditions.
  • Stress testing: increase load beyond expected maximum to find the breaking point — identifies at what concurrency errors spike and which component fails first.
  • Soak testing (endurance): run at moderate load for 24–72 hours — reveals memory leaks, connection pool exhaustion, and slow disk fill that only appear over time.
  • Spike testing: sudden 10x traffic burst for 30–60 seconds — validates auto-scaling speed, queue back-pressure, and graceful degradation under sudden demand.
  • Volume testing: process extremely large data sets — validates DB query plans don't degrade, exports don't OOM, and batch jobs complete within SLA windows.
  • Baseline benchmarks: establish p50/p95/p99 latency and throughput at N concurrent users — run after every major release to detect performance regressions.
  • k6 thresholds: fail test if `http_req_duration{p(99)} > 200ms` or `rate(http_req_failed) > 0.01` — enforce performance SLOs as CI gates.
  • Distributed load generation: single machine tops out at ~10,000 concurrent connections — k6 Cloud, Locust distributed mode, or Artillery Cloud for higher concurrency.
Test TypeLoad LevelDurationWhat It FindsTool Examples
LoadExpected peak (1x)10–30 minutesSLO compliance under normal conditionsk6, Locust, JMeter
StressBeyond peak (2x–5x+)10–30 minutesBreaking point, first failure componentk6, Gatling
Soak/EnduranceModerate (70% of peak)24–72 hoursMemory leaks, connection exhaustion, disk fillLocust, JMeter
SpikeSudden 10x burst30–60 secondsAuto-scaling speed, queue handling, graceful degradationk6, Artillery
VolumeExpected users, large data1–4 hoursData-scale DB degradation, batch job SLAsJMeter, custom scripts

Real-World Example

Spotify runs automated k6 load tests in their CI pipeline — every service must pass p99 latency and error rate thresholds before merging. Their "Performance as Code" initiative embeds load test scripts alongside service code, ensuring tests evolve with the service.