OLTP vs OLAP
Transactional vs analytical workloads, HTAP databases
OLTP (Online Transaction Processing) systems handle high-frequency, short-duration transactions touching few rows — optimized for row-level access, high write throughput, and low latency (P99 < 10ms). OLAP (Online Analytical Processing) systems run complex aggregation queries scanning millions to billions of rows — optimized for columnar storage, vectorized execution, and massively parallel processing (MPP). HTAP (Hybrid Transactional/Analytical Processing), exemplified by TiDB, SingleStore, and Google AlloyDB, attempts to serve both workloads from a single engine using row + columnar storage replicas.
Key Points
- OLTP: row-oriented storage (InnoDB, PostgreSQL heap) is optimal because a transaction reads/writes a complete row — all columns of one entity in sequence on disk.
- OLAP: columnar storage (Parquet, ORC, Redshift, BigQuery) stores all values of one column contiguously — a COUNT(sales) over 1 billion rows reads only the sales column, skipping all others.
- Columnar compression rates: run-length encoding on sorted data can achieve 10–100x compression (a "status" column with 3 distinct values compresses extremely well).
- OLAP MPP engines (Redshift, Snowflake, BigQuery) distribute a query across hundreds of workers, each reading a different partition of the data in parallel.
- The ETL/ELT pipeline traditionally separated OLTP from OLAP: nightly batch jobs loaded data from operational DBs into the warehouse; this created 12–24 hour data freshness gaps.
- HTAP with TiDB: TiKV (row store, Raft-based) handles OLTP; TiFlash (columnar replica) handles OLAP; the engine routes queries to the appropriate storage automatically.
- Materialized views and summary tables are a lightweight HTAP workaround: pre-compute aggregates in the OLTP DB for common reporting queries, refreshed every 1–15 minutes.
- Star schema (OLAP) vs 3NF (OLTP): star schema denormalizes dimension data into wide fact tables to eliminate joins at query time; typical star schemas have 50–200 columns in fact tables.
Real-World Example
Uber migrated from PostgreSQL to a custom OLAP system (Apache Hive, then ClickHouse) for analytical workloads as their operational DB could not sustain analytical queries without impacting trip booking latency. Netflix uses Druid for real-time OLAP (event-level data, sub-second queries) and Hive/Spark for historical batch analytics.