Reverse Proxy & API Gateway | Networking & Security | System Design

A reverse proxy sits between clients and backend servers, handling concerns like TLS termination, request routing, rate limiting, authentication offload, compression, and caching. nginx handles 10,000+ concurrent connections per worker process using an event-driven architecture; Envoy (CNCF) is the standard data-plane proxy in service meshes and API gateways; Kong adds a plugin ecosystem (OAuth2, rate limiting, JWT validation) on top of nginx/OpenResty. AWS API Gateway and Azure API Management are managed L7 proxy-as-a-service.

Key Points

nginx worker model: one worker per CPU core, each handling thousands of connections via non-blocking epoll/kqueue — `worker_processes auto; worker_connections 10240` is a common production config.
nginx upstream keepalive: `keepalive 64` maintains a pool of 64 persistent connections per upstream block, eliminating TCP setup overhead for high-QPS backends.
Envoy's xDS (Discovery Service) API (LDS, RDS, CDS, EDS) allows dynamic reconfiguration without restart — used by Istio control plane to push routing, cluster, and endpoint changes to sidecars.
Kong plugins are ordered (authentication → rate limiting → logging) and executed per-request — available in Lua (native), Go (Go Plugin Server), or JavaScript (js-plugins) sandboxes.
Rate limiting at the reverse proxy layer (nginx limit_req, Kong rate-limiting plugin) protects backends from overload without requiring application code changes — implement sliding-window rate limits backed by Redis for distributed counting.
API Gateway request transformation: modify request headers (add `X-Request-ID`, strip `Authorization` before forwarding), transform query params, or aggregate multiple backend calls into one response.
AWS API Gateway throttling: account-level default 10,000 RPS with burst of 5,000; stage and method level limits override defaults — return 429 with `Retry-After` header on throttle.
Reverse proxy caching: nginx `proxy_cache_path` with `Cache-Control: public, max-age=300` caches backend responses in RAM/disk — reduces backend QPS by 60–80% for read-heavy APIs with stable data.

Real-World Example

Cloudflare processes 55+ million HTTP requests per second through its global network of nginx/NGINX-derived proxies, using edge-based WAF, DDoS mitigation, and rate limiting before traffic ever reaches origin servers.

←PreviousLoad Balancing NextCDN→