Trade-Off Analysis
Cost vs complexity vs risk matrices, feasibility assessment, PoC scoping
Every architectural decision involves trade-offs — the ability to make them explicit, quantify them, and communicate them clearly separates a senior architect from a junior one. Trade-off analysis uses structured frameworks: decision matrices score options across weighted criteria; cost vs complexity vs risk triangles force explicit priority conversations; and feasibility assessments validate whether a proposed solution can realistically be built given team skills, time, and budget. Proof of Concepts (PoCs) are time-boxed experiments (typically 1–2 weeks) used to validate the highest-risk assumption in a design before committing to a full implementation.
Key Points
- Decision matrix: list options as rows, weighted criteria as columns (performance weight 30%, cost weight 25%, operational complexity weight 25%, team familiarity weight 20%); score each cell 1–5; multiply by weight; total determines recommendation
- Three-axis trade-off: cost (build/run/operate), complexity (implementation risk, operational burden), and business risk (security, compliance, scalability ceiling); explicitly state which axis is the priority for this decision
- CAP theorem as a trade-off framework: choosing CP (strong consistency) vs AP (high availability) is a business trade-off — financial ledgers need CP; shopping carts tolerate AP; the architect must surface this to product
- PoC scoping: a PoC answers one specific risky question, not all questions; define success criteria before starting ("PoC succeeds if we can achieve < 10ms p99 with 10k concurrent connections"); timebox to 5 days maximum
- Reversibility is a key trade-off dimension: prefer reversible decisions (microservice boundary adjustments) over irreversible ones (choice of primary database); use the "one-way door vs two-way door" framework (Jeff Bezos)
- Total cost of ownership (TCO) vs initial cost: a cheap OSS solution with high operational complexity often costs more over 3 years than a paid SaaS with no ops overhead; always project 3-year TCO
- Second-order effects: every architectural choice has downstream impacts (choosing Kafka requires Zookeeper/KRaft expertise, schema registry, consumer group monitoring) — enumerate these before the decision
- Document rejected alternatives: ADRs must include the alternatives considered and why they were rejected; prevents revisiting the same decision every 6 months when team membership changes
Real-World Example
Netflix explicitly chose AP over CP for their recommendation and user preference systems (Cassandra, eventual consistency) — a deliberate trade-off that lets a customer's "continue watching" list be briefly stale but never fails to load, preserving streaming availability as the top-priority NFR.