Scalability is the ability of a system to handle growing load by adding resources, either by making existing nodes bigger (vertical scaling) or by adding more nodes (horizontal scaling). Elasticity extends this to automatic, on-demand scaling in response to real-time load signals. True horizontal scalability requires stateless services, effective load balancing, and partitioned data stores — a stateful monolith cannot horizontally scale without architectural changes. Auto-scaling policies must account for warm-up time, cooldown periods, and predictive scaling to avoid oscillation.

Key Points

  • Vertical scaling (scale-up) hits hardware limits and creates a single point of failure — maximum EC2 instance today is ~448 vCPUs, 24 TB RAM.
  • Horizontal scaling (scale-out) requires stateless application tier; session state must live in a shared store (Redis, DynamoDB) not in-process.
  • Auto-scaling triggers: CPU > 70%, request queue depth, custom CloudWatch metrics, or scheduled (predictive) scaling for known traffic patterns.
  • Database scaling lags application scaling — read replicas handle read-heavy workloads; sharding or CQRS handles write-heavy growth.
  • Elasticity differs from scalability: scalability is about peak capacity; elasticity is about right-sizing dynamically to minimize cost during low traffic.
  • Amdahl's Law limits parallel speedup: if 5% of code is serial, max speedup is 20×, regardless of how many cores you add.
  • Consistent hashing minimizes data remapping when nodes are added/removed — used by Cassandra, DynamoDB, and Memcached ring topologies.
  • Load testing at 2-3× expected peak traffic validates scale headroom before launch; chaos experiments validate scale-in correctness.
Vertical Scaling Server 2 CPU / 4 GB Bigger Server 32 CPU / 128 GB Hardware ceiling! Horizontal Scaling Server → Add nodes Server Server +New Load Balancer distributes traffic Scales to thousands of nodes Max ~24 TB RAM (EC2 u-24tb1) Google runs millions of commodity servers

Vertical vs horizontal scaling — horizontal scaling requires stateless services and a load balancer.

Real-World Example

Netflix uses horizontal auto-scaling on AWS — during the Stranger Things Season 4 premiere they scaled to ~10,000 EC2 instances within minutes using predictive scaling based on historical premiere traffic patterns. Their stateless microservices and Cassandra's consistent hashing made this seamless.