Cloud-Native Monitoring
CloudWatch, Azure Monitor, Google Cloud Operations Suite
Each major cloud provider offers a native monitoring suite deeply integrated with its managed services, eliminating the operational burden of running Prometheus/ELK yourself. Amazon CloudWatch provides unified metrics, logs, traces (X-Ray), and dashboards with 1-second metric resolution on supported services. Azure Monitor consolidates metrics, logs (Log Analytics Workspace / KQL), Application Insights (APM), and alerting. Google Cloud Operations Suite (formerly Stackdriver) integrates with GKE, Cloud Run, and BigQuery, offering Managed Prometheus for container workloads. All three support cross-service correlation, anomaly detection, and budget-based alerting.
Key Points
- CloudWatch Metrics: 1-minute standard resolution, 1-second high-resolution; custom metrics via PutMetricData API; metric math enables cross-metric ratio computation
- CloudWatch Logs Insights: SQL-like query over log groups; Contributor Insights identifies top-N IP addresses / users contributing to traffic spikes
- AWS X-Ray: distributed tracing with service map visualization; integrates with Lambda, API Gateway, ECS, EKS via X-Ray SDK or ADOT (AWS Distro for OpenTelemetry)
- Azure Monitor: Log Analytics Workspace stores logs in columnar format queryable with KQL; Application Insights provides end-to-end APM for web apps with automatic dependency tracking
- Azure Monitor Workbooks: interactive reports combining metrics, logs, and parameters; used for monthly SLO reporting shared with stakeholders
- Google Cloud Monitoring: MQL (Monitoring Query Language) for dashboards; Managed Service for Prometheus (GMP) for Kubernetes workloads with GCS-backed long-term storage
- Google Cloud Logging: structured log viewer with resource-based filtering; Log-based metrics convert log patterns to Cloud Monitoring metrics
- Cost consideration: CloudWatch custom metrics cost $0.30/metric/month; high-cardinality metric designs can generate thousands of metrics and unexpected bills — use EMF (Embedded Metric Format) to emit from Lambda cheaply
| Capability | AWS CloudWatch | Azure Monitor | Google Cloud Operations |
|---|---|---|---|
| Metrics Store | CloudWatch Metrics (15-month retention) | Azure Metrics (93 days default) | Cloud Monitoring Metrics (6 weeks default) |
| Log Storage | CloudWatch Log Groups (configurable retention) | Log Analytics Workspace (KQL) | Cloud Logging (bucket-based retention) |
| APM / Tracing | AWS X-Ray + Service Map | Application Insights | Cloud Trace + Cloud Profiler |
| Query Language | CloudWatch Metrics Insights / Logs Insights SQL | KQL (Kusto Query Language) | MQL / PromQL (via Managed Prometheus) |
| Alerting | CloudWatch Alarms → SNS → PagerDuty | Azure Alerts → Action Groups | Cloud Monitoring Alerting Policies |
| Dashboards | CloudWatch Dashboards | Azure Monitor Workbooks + Grafana | Cloud Monitoring Dashboards + Grafana |
| Kubernetes | Container Insights (EKS) | Container Insights (AKS) | Google Managed Prometheus (GKE) |
| Anomaly Detection | CloudWatch Anomaly Detection (ML) | Azure Monitor Metrics Smart Detection | Cloud Monitoring Metric Absence Alerts |
Real-World Example
Lyft runs a hybrid model: CloudWatch for AWS-native service alarms and billing visibility, combined with a self-managed Prometheus/Grafana stack for application-layer SLOs — giving them cloud-provider agnostic dashboards for their multi-region deployments.