Rate Limiting & Throttling | Performance & Scalability | System Design

Rate limiting protects services from overload, prevents abuse, and enforces fair usage quotas by restricting how many requests a client can make in a given time window. The four primary algorithms — token bucket, leaky bucket, fixed window counter, and sliding window log — trade accuracy, memory, and burst tolerance differently. Redis is the de-facto implementation platform due to atomic Lua scripts enabling distributed rate limiting across multiple application servers. Stripe, Cloudflare, and GitHub all implement rate limiting at the API gateway layer using token bucket or sliding window algorithms.

Key Points

Token bucket: bucket holds up to N tokens, replenished at R tokens/second; each request consumes one token; allows burst up to bucket capacity — most widely used algorithm.
Leaky bucket: requests enter a queue (bucket) and are processed at a fixed rate; excess requests are queued or dropped — enforces strict output rate, no burst allowance.
Fixed window counter: count requests per client per time window (e.g., 1000/hour); simple but vulnerable to boundary bursts — 1999 requests in 2 minutes spanning window boundary.
Sliding window log: store timestamp of each request; count requests in the sliding window — accurate but memory-intensive (O(requests) per client).
Sliding window counter: hybrid — blend current and previous window counts by time overlap fraction — O(1) memory, near-accurate, used by Cloudflare.
Redis atomic rate limiting: use Lua scripts for atomic check-and-increment; INCR + EXPIRE pattern for fixed windows; Sorted Set for sliding window logs.
HTTP 429 Too Many Requests: correct status code for rate limit violations; include `Retry-After` header (seconds until reset) and `X-RateLimit-*` headers for quota visibility.
Distributed rate limiting: Redis cluster ensures consistent limits across all API gateway nodes — without centralized state, each node enforces N times too many requests.

Token bucket: tokens refill at rate R/s up to capacity N; each request consumes one token; empty bucket returns HTTP 429

Real-World Example

Stripe's API rate limiter uses a token bucket with Redis as the centralized token store — each API key has a configurable bucket size and refill rate. Their Lua script atomically decrements the token count, ensuring distributed consistency across hundreds of API servers handling 1M+ requests per minute.

←PreviousBottleneck Identification