Data Lake & Lakehouse | Databases & Data Architecture | System Design

The data lakehouse pattern combines the low-cost, schema-flexible storage of a data lake (files on object storage: S3, GCS, ADLS) with the ACID transactions, schema enforcement, and query optimization traditionally associated with data warehouses. The three leading open table formats — Delta Lake (Databricks), Apache Iceberg (Netflix, Apple), and Apache Hudi (Uber) — implement transaction logs on object storage, enabling ACID writes, time travel, schema evolution, and partition pruning without a proprietary warehouse engine.

Key Points

Delta Lake stores a _delta_log directory alongside Parquet data files; each transaction appends a JSON commit entry recording adds, removes, and metadata — enabling ACID without a centralized lock manager.
Apache Iceberg uses a tree of metadata: table metadata → snapshot → manifest list → manifest files → data files; each snapshot is immutable, enabling time-travel queries and concurrent writers.
Apache Hudi supports two table types: Copy-On-Write (COW, rewrites Parquet on each update — better for read-heavy) and Merge-On-Read (MOR, appends delta logs, merges at read time — better for write-heavy).
Time travel: SELECT * FROM orders VERSION AS OF '2024-01-01' (Delta) or AS OF SNAPSHOT 12345 (Iceberg) — critical for audit, ML feature reproducibility, and debugging pipelines.
Schema evolution: adding, renaming, or dropping columns without rewriting data files — Iceberg tracks schema history in metadata; consumers can read old files with the current schema using column mapping.
Partition evolution in Iceberg allows changing the partitioning scheme (e.g., from partition by month to partition by day) without rewriting existing data — new writes use new partition spec.
Open table formats decouple storage from compute: the same Iceberg table can be queried by Spark, Flink, Trino, Dremio, Athena, Snowflake external tables, and BigQuery Omni.
Data compaction (small file problem): streaming writes produce many small Parquet files; scheduled compaction jobs merge them into larger files (128 MB–1 GB target) to improve read performance.

Real-World Example

Netflix open-sourced Apache Iceberg after building it to solve concurrent write correctness on their S3 data lake (Spark jobs conflicting without ACID). Uber built Apache Hudi to enable incremental data ingestion with upsert semantics — updating existing records in their 100+ PB Hive data lake without full rewrites.

←PreviousData Warehousing NextData Pipelines & ETL/ELT→