Each major cloud provider offers a native monitoring suite deeply integrated with its managed services, eliminating the operational burden of running Prometheus/ELK yourself. Amazon CloudWatch provides unified metrics, logs, traces (X-Ray), and dashboards with 1-second metric resolution on supported services. Azure Monitor consolidates metrics, logs (Log Analytics Workspace / KQL), Application Insights (APM), and alerting. Google Cloud Operations Suite (formerly Stackdriver) integrates with GKE, Cloud Run, and BigQuery, offering Managed Prometheus for container workloads. All three support cross-service correlation, anomaly detection, and budget-based alerting.

Key Points

  • CloudWatch Metrics: 1-minute standard resolution, 1-second high-resolution; custom metrics via PutMetricData API; metric math enables cross-metric ratio computation
  • CloudWatch Logs Insights: SQL-like query over log groups; Contributor Insights identifies top-N IP addresses / users contributing to traffic spikes
  • AWS X-Ray: distributed tracing with service map visualization; integrates with Lambda, API Gateway, ECS, EKS via X-Ray SDK or ADOT (AWS Distro for OpenTelemetry)
  • Azure Monitor: Log Analytics Workspace stores logs in columnar format queryable with KQL; Application Insights provides end-to-end APM for web apps with automatic dependency tracking
  • Azure Monitor Workbooks: interactive reports combining metrics, logs, and parameters; used for monthly SLO reporting shared with stakeholders
  • Google Cloud Monitoring: MQL (Monitoring Query Language) for dashboards; Managed Service for Prometheus (GMP) for Kubernetes workloads with GCS-backed long-term storage
  • Google Cloud Logging: structured log viewer with resource-based filtering; Log-based metrics convert log patterns to Cloud Monitoring metrics
  • Cost consideration: CloudWatch custom metrics cost $0.30/metric/month; high-cardinality metric designs can generate thousands of metrics and unexpected bills — use EMF (Embedded Metric Format) to emit from Lambda cheaply
CapabilityAWS CloudWatchAzure MonitorGoogle Cloud Operations
Metrics StoreCloudWatch Metrics (15-month retention)Azure Metrics (93 days default)Cloud Monitoring Metrics (6 weeks default)
Log StorageCloudWatch Log Groups (configurable retention)Log Analytics Workspace (KQL)Cloud Logging (bucket-based retention)
APM / TracingAWS X-Ray + Service MapApplication InsightsCloud Trace + Cloud Profiler
Query LanguageCloudWatch Metrics Insights / Logs Insights SQLKQL (Kusto Query Language)MQL / PromQL (via Managed Prometheus)
AlertingCloudWatch Alarms → SNS → PagerDutyAzure Alerts → Action GroupsCloud Monitoring Alerting Policies
DashboardsCloudWatch DashboardsAzure Monitor Workbooks + GrafanaCloud Monitoring Dashboards + Grafana
KubernetesContainer Insights (EKS)Container Insights (AKS)Google Managed Prometheus (GKE)
Anomaly DetectionCloudWatch Anomaly Detection (ML)Azure Monitor Metrics Smart DetectionCloud Monitoring Metric Absence Alerts

Real-World Example

Lyft runs a hybrid model: CloudWatch for AWS-native service alarms and billing visibility, combined with a self-managed Prometheus/Grafana stack for application-layer SLOs — giving them cloud-provider agnostic dashboards for their multi-region deployments.