Scalability
Vertical vs horizontal scaling, elasticity, auto-scaling strategies
Scalability is the ability of a system to handle growing load by adding resources, either by making existing nodes bigger (vertical scaling) or by adding more nodes (horizontal scaling). Elasticity extends this to automatic, on-demand scaling in response to real-time load signals. True horizontal scalability requires stateless services, effective load balancing, and partitioned data stores — a stateful monolith cannot horizontally scale without architectural changes. Auto-scaling policies must account for warm-up time, cooldown periods, and predictive scaling to avoid oscillation.
Key Points
- Vertical scaling (scale-up) hits hardware limits and creates a single point of failure — maximum EC2 instance today is ~448 vCPUs, 24 TB RAM.
- Horizontal scaling (scale-out) requires stateless application tier; session state must live in a shared store (Redis, DynamoDB) not in-process.
- Auto-scaling triggers: CPU > 70%, request queue depth, custom CloudWatch metrics, or scheduled (predictive) scaling for known traffic patterns.
- Database scaling lags application scaling — read replicas handle read-heavy workloads; sharding or CQRS handles write-heavy growth.
- Elasticity differs from scalability: scalability is about peak capacity; elasticity is about right-sizing dynamically to minimize cost during low traffic.
- Amdahl's Law limits parallel speedup: if 5% of code is serial, max speedup is 20×, regardless of how many cores you add.
- Consistent hashing minimizes data remapping when nodes are added/removed — used by Cassandra, DynamoDB, and Memcached ring topologies.
- Load testing at 2-3× expected peak traffic validates scale headroom before launch; chaos experiments validate scale-in correctness.
Vertical vs horizontal scaling — horizontal scaling requires stateless services and a load balancer.
Real-World Example
Netflix uses horizontal auto-scaling on AWS — during the Stranger Things Season 4 premiere they scaled to ~10,000 EC2 instances within minutes using predictive scaling based on historical premiere traffic patterns. Their stateless microservices and Cassandra's consistent hashing made this seamless.