Databases & Data Architecture
Storage engines, data models, distributed data, and modern data platforms
SQL DatabasesPostgreSQL, MySQL; ACID, transactions, isolation levels, MVCC›NoSQL DatabasesDocument, key-value, wide-column, graph databases and their use cases›Database SelectionConsistency needs, query patterns, scale, operational complexity, cost›Data ModelingNormalization (1NF–3NF, BCNF), denormalization, schema evolution›Indexing StrategiesB-tree, hash, composite, covering, partial, full-text, vector indexes›Query OptimizationEXPLAIN plans, N+1 problem, eager vs lazy loading, query hints›ACID TransactionsAtomicity, consistency, isolation, durability; distributed transactions (2PC)›CAP Applied to DatabasesChoosing CP vs AP databases for specific use cases›Consistent HashingVirtual nodes, hot-spot avoidance, rebalancing on node changes›Sharding & PartitioningHorizontal sharding, range vs hash vs directory-based, cross-shard queries›ReplicationMaster-slave, multi-master, read replicas, replication lag, conflict resolution›OLTP vs OLAPTransactional vs analytical workloads, HTAP databases›Data WarehousingStar/snowflake schema, columnar storage (Redshift, BigQuery, Snowflake)›Data Lake & LakehouseDelta Lake, Apache Iceberg, Apache Hudi; ACID on object storage›Data Pipelines & ETL/ELTBatch vs streaming, Apache Spark, Flink, dbt, Airflow›Event Streaming as StorageKafka log compaction, event sourcing on Kafka, schema registry›Database ReliabilityConnection pooling, backups, point-in-time recovery, failover›Vector DatabasesEmbeddings, ANN search, HNSW index, pgvector, Pinecone, Weaviate›