APM
End-to-end transaction tracing, dependency maps, error tracking
Application Performance Monitoring (APM) provides end-to-end visibility into application behavior from the user's browser or mobile device through the full backend service chain. Modern APM platforms — Datadog APM, New Relic, Dynatrace, and Elastic APM — automatically instrument applications using agents or byte-code injection, capturing traces, profiling data, and error stacks with minimal configuration. Dependency maps (service graphs) are auto-generated from trace data and show real-time call relationships, error rates, and latency between every service pair. Error tracking (Sentry, Rollbar, Datadog Error Tracking) groups exceptions by fingerprint and alerts on new or regression errors.
Key Points
- End-to-end trace: APM agents inject trace context into outbound HTTP/gRPC calls, database queries, and message queue publishes — the full call stack from frontend to database is captured in one trace
- Real User Monitoring (RUM): JavaScript agent captures browser-side performance metrics (FCP, LCP, CLS, TTFB) and links browser sessions to backend traces via trace propagation headers
- Profiling: continuous profiling (Datadog Continuous Profiler, Pyroscope) captures CPU flame graphs and memory allocations in production without significant overhead (<2%); identifies hotspots without code changes
- Dependency map (service graph): auto-generated from span relationships; shows p95 latency, request rate, and error rate per edge; invaluable for identifying which upstream service is the latency bottleneck
- Error tracking: groups exceptions by stack trace fingerprint; deduplicates high-volume errors; surfaces "new" vs "regressed" errors post-deploy; integrates with GitHub to show the responsible commit
- Datadog APM integrates with Logs and Infrastructure: one-click navigation from a slow trace to the correlated container metrics and log entries for that exact time window
- Dynatrace OneAgent uses AI-based root cause analysis (Davis AI) to automatically identify the root cause of performance degradations from trace and metric data
- APM cost model: most vendors charge per host/container per month or per ingested span; tail-based sampling and span filtering are critical cost controls at scale
Real-World Example
Shopify uses Datadog APM to monitor their checkout platform across 10,000+ storefronts; during Black Friday 2023 they processed over 5 million purchases per minute — APM dependency maps provided real-time visibility that allowed their SRE team to proactively scale bottleneck services before customers experienced degradation.