NoSQL Databases
Document, key-value, wide-column, graph databases and their use cases
NoSQL databases trade the relational model for specialized data models that offer horizontal scalability, schema flexibility, and purpose-built access patterns. The four major families — document, key-value, wide-column, and graph — each optimize for different query shapes and consistency trade-offs. Most NoSQL systems operate under BASE semantics (Basically Available, Soft-state, Eventually consistent) rather than strict ACID, though many modern engines (MongoDB 4+, DynamoDB transactions) now support multi-document ACID transactions.
Key Points
- Document stores (MongoDB, Firestore, Couchbase) embed related data as JSON/BSON sub-documents, optimizing for reads that fetch an entity and its children in a single round-trip.
- Key-value stores (Redis, DynamoDB in KV mode, Aerospike) provide O(1) lookups but no secondary indexes or range scans on values — the key is everything.
- Wide-column stores (Cassandra, HBase, Bigtable) organize data as a sparse, distributed map sorted by row key; column families group co-accessed columns physically on disk.
- Graph databases (Neo4j, Amazon Neptune, TigerGraph) store vertices and edges natively; traversals like "friends-of-friends within 3 hops" run in milliseconds vs multi-join SQL queries.
- DynamoDB single-table design collapses multiple entity types into one table using composite primary keys and GSIs to avoid expensive JOIN-equivalent operations.
- Cassandra write path: write to Memtable + CommitLog (durable), flush to SSTables, compact with Leveled or STCS compaction — optimized for write-heavy workloads.
- MongoDB BSON supports up to 16 MB documents; use GridFS for larger binary objects; document arrays enable atomic updates via $push and $pull operators.
- Vector databases (Pinecone, Weaviate, Qdrant) are a fifth emerging family: they store high-dimensional embeddings and serve approximate nearest-neighbor (ANN) queries.
| Type | Examples | Data Model | Best Use Cases | Weak At |
|---|---|---|---|---|
| Document | MongoDB, Firestore, Couchbase | JSON/BSON documents with nested objects | Content mgmt, catalogs, user profiles, mobile apps | Multi-document joins, relational integrity |
| Key-Value | Redis, DynamoDB, Aerospike | Opaque value behind a flat key | Sessions, caching, leaderboards, shopping carts | Complex queries, range scans on values |
| Wide-Column | Cassandra, HBase, Bigtable | Rows + dynamic column families | Time-series, IoT, activity feeds, write-heavy logs | Ad-hoc queries, secondary access patterns |
| Graph | Neo4j, Neptune, TigerGraph | Vertices + typed edges + properties | Social graphs, fraud detection, recommendations | High-volume aggregate analytics |
| Search Engine | Elasticsearch, OpenSearch, Solr | Inverted index over JSON docs | Full-text search, log analytics, faceted search | Transactional writes, strong consistency |
Real-World Example
LinkedIn uses Apache Cassandra for member activity streams (billions of writes/day). Airbnb uses MongoDB for listing documents where all listing attributes are embedded. Twitter/X uses a mix of MySQL (core tweets) and Redis (timeline cache). Uber uses Cassandra for trip records due to its tunable consistency and multi-region active-active replication.