NoSQL Databases | Databases & Data Architecture | System Design

NoSQL databases trade the relational model for specialized data models that offer horizontal scalability, schema flexibility, and purpose-built access patterns. The four major families — document, key-value, wide-column, and graph — each optimize for different query shapes and consistency trade-offs. Most NoSQL systems operate under BASE semantics (Basically Available, Soft-state, Eventually consistent) rather than strict ACID, though many modern engines (MongoDB 4+, DynamoDB transactions) now support multi-document ACID transactions.

Key Points

Document stores (MongoDB, Firestore, Couchbase) embed related data as JSON/BSON sub-documents, optimizing for reads that fetch an entity and its children in a single round-trip.
Key-value stores (Redis, DynamoDB in KV mode, Aerospike) provide O(1) lookups but no secondary indexes or range scans on values — the key is everything.
Wide-column stores (Cassandra, HBase, Bigtable) organize data as a sparse, distributed map sorted by row key; column families group co-accessed columns physically on disk.
Graph databases (Neo4j, Amazon Neptune, TigerGraph) store vertices and edges natively; traversals like "friends-of-friends within 3 hops" run in milliseconds vs multi-join SQL queries.
DynamoDB single-table design collapses multiple entity types into one table using composite primary keys and GSIs to avoid expensive JOIN-equivalent operations.
Cassandra write path: write to Memtable + CommitLog (durable), flush to SSTables, compact with Leveled or STCS compaction — optimized for write-heavy workloads.
MongoDB BSON supports up to 16 MB documents; use GridFS for larger binary objects; document arrays enable atomic updates via $push and $pull operators.
Vector databases (Pinecone, Weaviate, Qdrant) are a fifth emerging family: they store high-dimensional embeddings and serve approximate nearest-neighbor (ANN) queries.

Type	Examples	Data Model	Best Use Cases	Weak At
Document	MongoDB, Firestore, Couchbase	JSON/BSON documents with nested objects	Content mgmt, catalogs, user profiles, mobile apps	Multi-document joins, relational integrity
Key-Value	Redis, DynamoDB, Aerospike	Opaque value behind a flat key	Sessions, caching, leaderboards, shopping carts	Complex queries, range scans on values
Wide-Column	Cassandra, HBase, Bigtable	Rows + dynamic column families	Time-series, IoT, activity feeds, write-heavy logs	Ad-hoc queries, secondary access patterns
Graph	Neo4j, Neptune, TigerGraph	Vertices + typed edges + properties	Social graphs, fraud detection, recommendations	High-volume aggregate analytics
Search Engine	Elasticsearch, OpenSearch, Solr	Inverted index over JSON docs	Full-text search, log analytics, faceted search	Transactional writes, strong consistency

Real-World Example

LinkedIn uses Apache Cassandra for member activity streams (billions of writes/day). Airbnb uses MongoDB for listing documents where all listing attributes are embedded. Twitter/X uses a mix of MySQL (core tweets) and Redis (timeline cache). Uber uses Cassandra for trip records due to its tunable consistency and multi-region active-active replication.

←PreviousSQL Databases NextDatabase Selection→