Write Scaling
Sharding, async writes, write-behind caching, append-only logs
Write scaling is harder than read scaling because writes must be serialized to maintain consistency, and sharding introduces distributed transaction complexity. Database sharding partitions data across multiple database nodes by a shard key — Cassandra, MongoDB, and Vitess handle this transparently; PostgreSQL requires application-level or Citus-level sharding. Async writes via queues (Kafka, SQS) absorb write bursts without overwhelming the database, while write-behind caching and append-only logs (event sourcing) further decouple write latency from storage persistence.
Key Points
- Horizontal sharding: partition data by shard key (user_id % N, geographic region) across N database nodes — each shard handles 1/N of the write load.
- Shard key selection is critical: high cardinality, even distribution, never change after insert — poor shard keys create hot shards that negate scaling benefits.
- Async writes via queues: write to Kafka/SQS first (fast, durable), DB consumer processes at its own pace — decouples producer write latency from DB throughput.
- Write-behind caching: write to Redis first, flush to DB asynchronously — reduces write latency to sub-millisecond at the cost of data loss risk on crash.
- Append-only logs (Event Sourcing): never update/delete rows, only append new events — eliminates row-level locking, naturally ordered, enables time-travel queries.
- Vitess (YouTube's sharding middleware): transparent MySQL sharding with connection pooling, query rewriting, and topology management — open-sourced and used by Slack, GitHub.
- DynamoDB write scaling: partition-based with auto-sharding; capacity units provision write throughput — use write sharding (add suffix to keys) to avoid hot partitions.
- Two-phase commit (2PC) for cross-shard transactions: expensive and blocking — prefer Saga pattern (distributed compensating transactions) for better availability.
Real-World Example
Discord handles 26 million concurrent users with Cassandra for message storage — a pure write-optimized, append-only distributed DB. Their migration from MongoDB to Cassandra in 2017 scaled writes from 1M to 26M messages/day without architectural changes beyond sharding strategy.