Metrics are numeric measurements sampled over time, stored as time-series data, and the foundation of dashboards and alerting at scale. Four core metric types cover all use cases: counters (monotonically increasing, e.g., request count), gauges (point-in-time value, e.g., memory used), histograms (bucketed distributions for latency percentiles), and summaries (client-side quantile calculation). Two frameworks guide which metrics to instrument: the RED method (Rate, Errors, Duration — for request-driven services) and the USE method (Utilization, Saturation, Errors — for resource-driven systems like CPUs and queues).

Key Points

  • Counter: always-increasing integer; use rate() in PromQL to get per-second rate; never use a counter for values that can decrease
  • Gauge: snapshot value that can go up or down; suitable for queue depth, active connections, memory usage
  • Histogram: pre-defined buckets (e.g., 0.005s, 0.01s, 0.025s, 0.05s, 0.1s, 0.5s, 1s); enables server-side quantile calculation with histogram_quantile(); preferred over summaries in distributed systems
  • Summary: client-side quantile streaming (e.g., p50, p90, p99); cannot be aggregated across instances — avoid for horizontally scaled services
  • RED method targets: Rate (requests/sec), Errors (error rate %), Duration (latency histogram); apply per service endpoint
  • USE method targets: Utilization (% busy), Saturation (queue/wait depth), Errors (hardware/driver errors); apply per resource (CPU, disk, network, DB connection pool)
  • Cardinality explosion: each unique combination of label values creates a new time series; high-cardinality labels (user_id, request_id) can OOM Prometheus; keep label cardinality under ~1000 per metric
  • Exemplars (OpenMetrics extension) attach a trace_id to a specific histogram observation, enabling direct navigation from a p99 spike to the offending trace
AspectRED MethodUSE Method
FocusRequest-driven microservicesInfrastructure resources (CPU, disk, network, pools)
R / URate — requests per second per endpointUtilization — % time resource is busy
EErrors — % of requests returning 5xx / failuresErrors — device errors, dropped packets, driver faults
D / SDuration — latency distribution (p50/p95/p99)Saturation — queue depth, wait time, pending tasks
Primary ToolPrometheus + Grafana (http_requests_total histogram)node_exporter, cAdvisor, cloud-native metrics
Alert ExampleError rate > 1% for 5 min on /api/checkoutCPU utilization > 80% sustained for 10 min
Blind SpotDoes not reveal why (resource constraint)Does not reveal user-facing impact directly
Best Used ByOn-call SRE during active incidentCapacity planners and infrastructure engineers

Real-World Example

Google SRE pioneered the Four Golden Signals (latency, traffic, errors, saturation) which directly inspired the RED and USE methods; every Google service dashboard is structured around these signals before any custom metrics are added.