Horizontal vs Vertical | Performance & Scalability | System Design

Vertical scaling (scale up) adds more CPU, RAM, or storage to an existing server — simple but limited by hardware maximums and incurs downtime for hardware changes. Horizontal scaling (scale out) adds more servers to a pool, distributing load — requires applications to be stateless and share-nothing to enable any node to serve any request. The Twelve-Factor App methodology mandates stateless processes for this reason. Cloud providers (AWS, GCP, Azure) enable both strategies on demand, but horizontal scaling is the architecture prerequisite for internet-scale systems.

Key Points

Vertical scaling ceiling: the largest AWS instance (u-24tb1.metal) has 448 vCPUs and 24TB RAM — beyond that, only horizontal scaling is possible.
Stateless design: no in-process session state — externalize sessions to Redis, files to S3, queues to SQS so any instance can handle any request.
Share-nothing architecture: each node has its own CPU, memory, and local disk — avoids distributed locking and shared state contention.
Session externalization: replace sticky sessions with Redis-backed session stores — enables true horizontal scaling of web tiers.
Horizontal scaling math: 10 servers × 1,000 RPS each = 10,000 RPS total; adding a server adds exactly 1,000 RPS (ideal linear scaling).
Vertical scaling use cases: databases, stateful legacy applications, single-threaded workloads that cannot be parallelized — vertical scaling is often the right choice for databases.
Cloud elasticity: AWS Auto Scaling Groups add/remove EC2 instances in 2–3 minutes; Kubernetes HPA scales pods in <60 seconds based on CPU/memory metrics.
Cost comparison: one 32-core server vs 8 × 4-core servers — horizontal often cheaper due to commodity pricing and no single-point-of-failure premium.

Vertical scaling adds resources to one machine; horizontal scaling adds more machines behind a load balancer

Real-World Example

Google Search is the canonical horizontal scaling example — millions of commodity servers (not supercomputers) in datacenters worldwide, each handling a shard of the index. No single server is critical; failures are expected and handled transparently.

←PreviousWrite Scaling NextAuto-Scaling→