Vertical scaling (scale up) adds more CPU, RAM, or storage to an existing server — simple but limited by hardware maximums and incurs downtime for hardware changes. Horizontal scaling (scale out) adds more servers to a pool, distributing load — requires applications to be stateless and share-nothing to enable any node to serve any request. The Twelve-Factor App methodology mandates stateless processes for this reason. Cloud providers (AWS, GCP, Azure) enable both strategies on demand, but horizontal scaling is the architecture prerequisite for internet-scale systems.

Key Points

  • Vertical scaling ceiling: the largest AWS instance (u-24tb1.metal) has 448 vCPUs and 24TB RAM — beyond that, only horizontal scaling is possible.
  • Stateless design: no in-process session state — externalize sessions to Redis, files to S3, queues to SQS so any instance can handle any request.
  • Share-nothing architecture: each node has its own CPU, memory, and local disk — avoids distributed locking and shared state contention.
  • Session externalization: replace sticky sessions with Redis-backed session stores — enables true horizontal scaling of web tiers.
  • Horizontal scaling math: 10 servers × 1,000 RPS each = 10,000 RPS total; adding a server adds exactly 1,000 RPS (ideal linear scaling).
  • Vertical scaling use cases: databases, stateful legacy applications, single-threaded workloads that cannot be parallelized — vertical scaling is often the right choice for databases.
  • Cloud elasticity: AWS Auto Scaling Groups add/remove EC2 instances in 2–3 minutes; Kubernetes HPA scales pods in <60 seconds based on CPU/memory metrics.
  • Cost comparison: one 32-core server vs 8 × 4-core servers — horizontal often cheaper due to commodity pricing and no single-point-of-failure premium.
Horizontal vs Vertical Scaling Vertical — Scale Up 4 CPU 16 GB RAM 32 CPU 256 GB RAM Bigger Machine Limited by hardware max Horizontal — Scale Out Load Balancer Node 1 4 CPU Node 2 4 CPU Node 3 4 CPU Add nodes as needed

Vertical scaling adds resources to one machine; horizontal scaling adds more machines behind a load balancer

Real-World Example

Google Search is the canonical horizontal scaling example — millions of commodity servers (not supercomputers) in datacenters worldwide, each handling a shard of the index. No single server is critical; failures are expected and handled transparently.