Saga Pattern
Distributed transactions; choreography-based vs orchestration-based sagas
The Saga pattern manages distributed transactions across multiple microservices without using 2PC (Two-Phase Commit), instead decomposing the transaction into a sequence of local transactions each publishing an event or message. If any step fails, the saga executes compensating transactions — semantically undoing completed steps in reverse order. This is crucial for long-running business processes spanning hours or days where holding database locks is impractical. Sagas are either choreography-based (event-driven, decentralized) or orchestration-based (centralized coordinator, easier to debug).
Key Points
- Choreography-based saga: each service publishes an event on success; downstream services subscribe and execute their step; on failure, a service publishes a failure event triggering upstream compensations — fully decentralized.
- Orchestration-based saga: a saga orchestrator (Order Saga Orchestrator) sends commands to each service and waits for success/failure replies; on failure, the orchestrator sends compensation commands in reverse — explicit, debuggable flow.
- Compensating transactions must be idempotent and commutative where possible; a compensating transaction is not a database rollback — it is a new transaction that semantically reverses the effect (CancelReservation vs DELETE).
- Temporal.io and AWS Step Functions are popular saga orchestration platforms: they persist execution state, handle retries with exponential backoff, and provide visual workflow inspection.
- Saga execution log: record each step outcome (STARTED, COMPLETED, COMPENSATING, COMPENSATED) in a durable store to survive orchestrator crashes and support recovery without re-executing completed steps.
- Isolation in sagas: unlike ACID transactions, saga steps are visible to other transactions as they complete (no global isolation); handle dirty reads via reservation/hold patterns (reserve inventory before confirming order).
- Pivot transaction: the last irrevocable step in a saga; all steps before can be compensated, steps after are retriable — the pivot separates the compensatable phase from the retriable phase.
- Countermeasures for lack of isolation: semantic locks (mark records in processing), commutative updates (order-independent), pessimistic view (show conservative state to user during processing), re-read values (check state before compensating).
Orchestration-based saga using Temporal.io; compensating transactions run in reverse on failure
// Orchestration-based Order Saga (Temporal.io workflow)
@WorkflowInterface
interface OrderSaga {
@WorkflowMethod
OrderResult processOrder(OrderRequest req);
}
class OrderSagaImpl implements OrderSaga {
private final PaymentActivity payment = // ...
private final InventoryActivity inventory = // ...
private final ShippingActivity shipping = // ...
public OrderResult processOrder(OrderRequest req) {
String paymentId = null;
String reservationId = null;
try {
// Step 1: Reserve inventory
reservationId = inventory.reserve(req.items());
// Step 2: Charge payment
paymentId = payment.charge(req.customerId(), req.amount());
// Step 3 (pivot): Confirm & ship — no compensation after this
return shipping.schedule(reservationId, req.address());
} catch (Exception e) {
// Compensate in reverse order
if (paymentId != null) payment.refund(paymentId);
if (reservationId != null) inventory.release(reservationId);
throw new ApplicationFailure("Order saga failed", e);
}
}
}Real-World Example
Uber's order fulfillment (requesting a ride) is a saga: reserve a driver, charge the rider, confirm the trip — each step has a compensating transaction (release driver, refund rider, cancel trip). They use an internal orchestration system similar to Temporal. Foodpanda processes order sagas across restaurant confirmation, payment, and delivery assignment using AWS Step Functions with compensations for partial failures.