Event-Driven Architecture
Choreography vs orchestration, event sourcing, event schema versioning
Event-driven architecture (EDA) decouples producers and consumers through events — immutable records of facts that occurred. The two coordination models are choreography (services react to events independently, no central coordinator) and orchestration (a central workflow engine directs service interactions). Event sourcing persists the complete history of domain events rather than current state, enabling time-travel debugging, audit trails, and projection of any read model. CQRS (Command Query Responsibility Segregation) is frequently paired with event sourcing: commands produce events, events build read-optimized projections, queries hit projections.
Key Points
- Choreography is highly decoupled but harder to debug — tracing a business process requires correlating events across multiple services' logs; distributed tracing with correlation IDs is essential.
- Orchestration (Temporal, Conductor, AWS Step Functions) provides explicit business process visibility and easier debugging but centralizes control, which can become a bottleneck.
- Event sourcing stores events as the source of truth (append-only event log); current state is computed by replaying events — enables complete audit trail and retroactive bug fixing by replaying with fixed logic.
- Event schema evolution is non-trivial — once an event is published and consumed by multiple downstreams, schema changes require backward-compatible additions or versioned event types.
- Apache Kafka provides at-least-once delivery with offset management; Kafka Streams or ksqlDB enables stateful stream processing (joins, aggregations, windowing) on event streams.
- Exactly-once processing requires idempotent consumers (deduplicate by event ID) + transactional outbox (write DB + publish event in same transaction) to prevent duplicate effects.
- Event-driven systems are eventually consistent by design — designing UX for "your order is being processed" rather than "your order is placed" is an organizational as much as a technical challenge.
- Dead Letter Queue (DLQ): events that cannot be processed after max retries go to a DLQ for inspection and manual replay — essential for production reliability.
Real-World Example
Uber's Marketplace uses event choreography via Apache Kafka for the core trip lifecycle — 30+ services react to "TripRequested", "DriverAccepted", "TripStarted", "TripCompleted" events independently. This decoupling allows Uber to deploy individual services (Surge Pricing, ETA, Maps) without coordinating across all teams — a critical enabler for their 4,000-engineer engineering organization.