Service Mesh Integration
Traffic management, circuit breaking, retry policies via Istio/Envoy
A service mesh adds a dedicated infrastructure layer for service-to-service communication by injecting a sidecar proxy (Envoy) alongside each service instance. The sidecar intercepts all inbound and outbound traffic, enabling mTLS encryption, traffic management (routing, retries, timeouts, circuit breaking), and observability (metrics, traces, access logs) without any application code changes. Istio is the most widely deployed service mesh control plane, managing thousands of Envoy sidecar configurations from a central plane using the xDS API.
Key Points
- Sidecar pattern: every pod gets an Envoy proxy container injected at admission time (Istio mutating webhook); the proxy intercepts all TCP traffic on localhost via iptables rules — transparent to the application.
- mTLS in service mesh: Istio Citadel/CA issues short-lived (24h) X.509 certificates to each workload; sidecars negotiate mTLS automatically — zero-trust networking within the cluster without app changes.
- Traffic management: VirtualService (routing rules), DestinationRule (load balancing policy, circuit breaker config, connection pool), Gateway (ingress/egress) — declarative YAML applied via Kubernetes CRDs.
- Canary routing with Istio: route 5% of traffic to v2 by weight in VirtualService; gradually shift to 100%; rollback by updating weights — fine-grained traffic splitting not possible with Kubernetes Services alone.
- Circuit breaking via Envoy: OutlierDetection in DestinationRule ejects unhealthy hosts after N consecutive 5xx errors; ejected hosts are removed from load balancing for a configurable interval, preventing cascading failures.
- Retry policy: configure per-route retries (retries: attempts: 3, perTryTimeout: 5s, retryOn: gateway-error,connect-failure,retriable-4xx) — Envoy retries transparently, improving resilience without app changes.
- Observability: Envoy emits Prometheus metrics (request count, latency histograms, circuit breaker state), OpenTelemetry traces (with trace context propagation), and structured access logs per request — the golden signals for every service pair.
- Linkerd vs Istio: Linkerd is simpler (Rust-based micro-proxy, not Envoy), lower resource overhead (~10 MB RAM vs Envoy ~50 MB); Istio has richer traffic management and broader ecosystem — choose based on complexity needs.
Real-World Example
Lyft pioneered Envoy and open-sourced it in 2016; they run Envoy as a sidecar for all their microservices, processing millions of RPS with sub-millisecond proxy overhead. Google Cloud's Anthos Service Mesh and AWS App Mesh are managed Istio-based offerings; Airbnb uses Envoy (without Istio) as their edge proxy and service-to-service proxy, running 100k+ Envoy instances across their fleet.