Vector Databases
Embeddings, ANN search, HNSW index, pgvector, Pinecone, Weaviate
Vector databases store and query high-dimensional embedding vectors, enabling semantic similarity search — the backbone of RAG (Retrieval-Augmented Generation), recommendation systems, and multimodal search. Instead of exact key lookups, they perform Approximate Nearest Neighbor (ANN) search, trading 100% recall for sub-10ms query latency at billion-vector scale. The dominant index structure is HNSW (Hierarchical Navigable Small World), a graph-based index that achieves O(log N) query complexity with controllable recall vs speed trade-offs via the ef parameter.
Key Points
- Embeddings: dense float32 vectors (typically 384–3072 dimensions) produced by models like text-embedding-ada-002 (1536-d), Cohere Embed (1024-d), or open-source BGE (768-d).
- HNSW index: a multi-layer graph where higher layers are sparse (long-range links) and the bottom layer is dense; query traverses from top layer down, greedily following closest neighbors — recall >95% achievable.
- IVFFlat (Inverted File Flat): k-means clusters the vector space into nlist centroids; query finds the nearest nprobe centroids and searches only their assigned vectors — faster build but lower recall than HNSW.
- pgvector extension adds VECTOR type and HNSW/IVFFlat index support to PostgreSQL; enables vector search alongside relational data without a separate infrastructure — ideal for <100M vectors.
- Pinecone: fully managed, serverless vector DB; supports namespaces (tenant isolation), metadata filters, and sparse-dense hybrid search; optimized for RAG use cases.
- Weaviate: open-source, supports hybrid search (BM25 + vector), multi-tenancy, and module ecosystem (OpenAI, Cohere, CLIP integrations); deployed on K8s or as managed cloud.
- Qdrant: Rust-based, supports payload filtering during ANN search (pre-filtering vs post-filtering trade-off); scalar and product quantization for memory reduction.
- Quantization: reducing float32 (4 bytes/dim) to int8 (1 byte/dim) or binary (1 bit/dim) cuts memory 4–32x with 1–5% recall loss; Product Quantization (PQ) is standard in Faiss and Qdrant.
Real-World Example
Notion uses pgvector on PostgreSQL for their AI Q&A feature — semantic search over workspace pages using OpenAI embeddings; the relational schema (user_id, page_id, embedding) enables filtered search without a separate vector DB. Spotify's Annoy (Approximate Nearest Neighbors Oh Yeah) library powers song recommendation by finding the 100 nearest songs in a 40-dimensional acoustic feature space for Discover Weekly.