Performance & Scalability
Caching, load balancing, async offloading, and capacity planning
Caching StrategiesCache-aside, read-through, write-through, write-back, refresh-ahead›Cache TechnologiesRedis (data structures, cluster mode), Memcached, Hazelcast, Ehcache›CDN & Edge CachingEdge nodes, origin shield, cache-control headers, purge APIs›Cache InvalidationTTL, event-driven invalidation, versioned cache keys, stampede prevention›Database PerformanceIndex selection, slow-query analysis, connection pooling, batch operations›Read ScalingRead replicas, CQRS, materialized views, denormalization›Write ScalingSharding, async writes, write-behind caching, append-only logs›Horizontal vs VerticalStateless design, shared-nothing architecture, session externalization›Auto-ScalingReactive, predictive, scheduled scaling; cooldown periods, min/max fleet sizing›Async ProcessingBackground jobs (Celery, Sidekiq, SQS consumers), event-driven offloading›LB AlgorithmsRound-robin, least connections, IP hash, consistent hashing, least response time›Latency OptimizationGeographic co-location, keep-alive, HTTP/2 multiplexing, prefetching›Content DeliveryGzip/Brotli compression, image optimization (WebP, AVIF), minification›Capacity PlanningLoad testing (k6, Locust, JMeter), traffic modeling, burst handling›Performance TestingLatency benchmarks, throughput, stress, soak, spike tests›Bottleneck IdentificationProfiling (CPU, memory, I/O), flame graphs, distributed trace analysis›Rate Limiting & ThrottlingToken bucket, leaky bucket, fixed/sliding window; per-user vs global limits›