FinOps & Cost Observability
Cost allocation tags, anomaly detection, rightsizing, budget alerts
FinOps (Cloud Financial Operations) is the practice of bringing financial accountability to the variable spend model of cloud infrastructure, requiring engineering, finance, and product to collaborate on cost decisions. Cost allocation tags are the foundation: every cloud resource must be tagged with team, service, environment, and cost-center to enable granular chargebacks and accountability. Cost anomaly detection (AWS Cost Anomaly Detection, Azure Cost Management Alerts) catches unexpected spend spikes before they become month-end surprises. The FinOps Foundation defines three maturity stages: Crawl (basic tagging), Walk (showback to teams), Run (real-time cost optimization in engineering workflows).
Key Points
- Cost allocation tags: enforce via AWS Config rules, Azure Policy, or GCP Organization Policies; tag-or-block guardrails prevent untagged resources from being created
- Showback vs chargeback: showback reports costs to teams without transferring budget; chargeback transfers actual cloud spend to team P&Ls — requires organizational maturity and trust in tag accuracy
- Reserved Instances / Savings Plans / Committed Use Discounts: 1-year commitments save 30–40%; 3-year save 50–60%; Compute Savings Plans are most flexible (apply to any EC2 family)
- Spot/Preemptible instances: 60–90% discount; suitable for batch jobs, ML training, CI runners — requires checkpointing or graceful interruption handling
- Rightsizing: analyze CPU/memory utilization over 14–30 days; AWS Compute Optimizer and Azure Advisor provide automated rightsizing recommendations; target 50–70% average CPU utilization
- Cost anomaly detection thresholds: set at 10–15% above 7-day rolling average per service/team; alert to Slack before the monthly bill; use AWS Cost Anomaly Detection ML or Kubecost for Kubernetes cost
- Kubernetes cost visibility: Kubecost and OpenCost (CNCF) allocate cluster costs to namespace/deployment level using actual resource requests and node pricing; critical for multi-tenant clusters
- FinOps culture: engineers must see cost as a first-class architectural quality attribute; add cost impact to PR templates, architecture review checklists, and incident postmortems
Real-World Example
Spotify built an internal FinOps platform called "Backstage Cost" that shows each engineering team their daily cloud spend on the team's Backstage portal page — embedding cost visibility directly into the developer workflow reduced annual cloud spend by 30% within 18 months.