Cloud cost management (FinOps) is the discipline of bringing financial accountability to variable cloud spend. The FinOps Foundation maturity model defines three stages: Crawl (basic tagging, cost reports), Walk (chargeback/showback, rightsizing), and Run (automated anomaly response, unit economics). AWS Compute Savings Plans offer up to 66% discount vs On-Demand in exchange for a 1 or 3-year commitment to a $/hour spend level, while Spot Instances offer 60–90% savings with interruption risk.

Key Points

  • EC2 Savings Plans vs Reserved Instances: Savings Plans are more flexible (apply to any instance family/region/OS for the committed $/hr) vs RIs which lock to a specific instance type — prefer Savings Plans for modern architectures.
  • Spot Instance interruption rate varies by instance type and AZ — use Spot Fleet or ASG with mixed instance types across AZs to maintain ~95% Spot availability even during capacity crunches.
  • AWS Cost Explorer anomaly detection uses ML to identify spend spikes above expected baselines — configure SNS alerts to trigger within 24 hours of anomaly detection.
  • Resource tagging strategy: enforce mandatory tags (Environment, Team, CostCenter, Product) via AWS Config/SCP/Azure Policy — untagged resources cannot be allocated to a budget.
  • EC2 rightsizing: CloudWatch CPU/memory utilization averaged over 14 days; instances consistently below 20% CPU should be downsized or replaced with ARM (Graviton3) for 40% savings.
  • NAT Gateway data processing charges ($0.045/GB) are often a hidden cost driver — use VPC Endpoints for S3/DynamoDB, and analyse VPC Flow Logs to identify excessive NAT Gateway traffic.
  • S3 Intelligent-Tiering automates movement of objects between frequent and infrequent access tiers with no retrieval fee — cost-effective for objects >128 KB with unknown access patterns.
  • FinOps unit economics: measure cost per API request, cost per user, cost per transaction — these metrics tie cloud spend to business value and drive prioritisation of optimisation efforts.

Real-World Example

Lyft reduced its AWS bill by $20M/year by implementing a FinOps practice: mandatory tagging enforcement via SCPs, automated rightsizing triggered by Lambda on 14-day low-utilisation alerts, and migrating 60% of batch workloads to Spot instances.