Cloud Observability
CloudWatch, Azure Monitor, Cloud Logging, X-Ray, Application Insights
Cloud-native observability integrates logs, metrics, and traces into the managed platform. AWS CloudWatch provides metrics (1-second granularity), logs (CloudWatch Logs Insights for ad-hoc queries), and alarms (SNS-triggered); AWS X-Ray provides distributed tracing with service maps. Azure Monitor unifies metrics and logs; Application Insights provides APM for web applications. All support OpenTelemetry for vendor-neutral instrumentation, enabling migration from proprietary agents.
Key Points
- CloudWatch Metrics retention: 1-second granularity retained for 3 hours, 1-minute for 15 days, 5-minute for 63 days, 1-hour for 455 days — factor this into historical analysis requirements.
- CloudWatch Logs Insights uses a custom query language; queries against large log groups can be expensive ($0.005 per GB scanned) — use log subscriptions to stream to OpenSearch for complex analytics.
- AWS X-Ray sampling: default 5% reservoir + 1 request/second base rate — adjust sampling rules for high-throughput services to control cost while maintaining trace fidelity.
- AWS Container Insights (CloudWatch) automatically collects CPU, memory, disk, and network metrics from ECS/EKS clusters, and surfaces them in pre-built CloudWatch dashboards.
- Azure Application Insights Live Metrics stream provides real-time telemetry with sub-second latency — useful during deployments and incident response.
- OpenTelemetry Collector (ADOT for AWS, Azure OpenTelemetry Distro) allows routing traces to multiple backends (Jaeger, Grafana Tempo, X-Ray, Application Insights) without re-instrumenting.
- CloudWatch Embedded Metric Format (EMF) allows applications to emit high-cardinality metrics as structured log lines — processed asynchronously, avoiding the PutMetricData API rate limit.
- GCP Cloud Operations Suite (formerly Stackdriver) auto-discovers GKE workloads and surfaces per-pod metrics and logs — reducing instrumentation burden for Kubernetes workloads.
Real-World Example
Expedia Group standardised on OpenTelemetry across 4,000+ microservices, routing traces to both AWS X-Ray (for AWS native integration) and Grafana Tempo (for cross-cloud analysis), demonstrating vendor-neutral observability.