DNS
Resolution process, record types, TTL, GeoDNS, Route 53
DNS (Domain Name System) translates human-readable hostnames into IP addresses through a hierarchical, distributed database. Resolution involves the recursive resolver (ISP or 8.8.8.8), which queries the root servers (13 logical roots, 1,000+ anycast instances), then the TLD nameserver (.com, .io), then the authoritative nameserver (Route 53, Azure DNS, Cloudflare). TTL (Time To Live) controls caching duration — low TTLs (60 s) enable fast failover but increase resolver load; high TTLs (3600 s) reduce load but slow propagation.
Key Points
- DNS record types: A (IPv4), AAAA (IPv6), CNAME (alias to another hostname), MX (mail exchange), TXT (SPF, DKIM, DMARC, domain verification), NS (nameserver delegation), SOA (zone authority), SRV (service locator).
- TTL trade-off: set TTL to 300 s (5 min) normally; lower to 60 s before a planned failover or migration — reduce TTL at least 2× the current TTL in advance to flush caches.
- CNAME cannot be used at the zone apex (root domain): a CNAME for `example.com` is invalid per RFC 1034 — use ALIAS (Route 53), ANAME (Cloudflare, DNS Made Easy), or CNAME flattening.
- Route 53 routing policies: Simple, Weighted (A/B split by percentage), Latency-based (routes to lowest-RTT region), Geolocation (by country/continent), Failover (primary/secondary with health check), Geoproximity.
- GeoDNS latency-based routing (Route 53 Latency, Cloudflare Traffic Steering) measures actual RTT from resolver to AWS/Cloudflare PoP — not geographic distance — for accurate routing.
- DNSSEC (DNS Security Extensions) adds digital signatures to DNS records, preventing cache poisoning attacks — mandatory for government TLDs (.gov) but adds query latency and operational complexity.
- DNS over HTTPS (DoH, RFC 8484) and DNS over TLS (DoT, RFC 7858) encrypt resolver queries — preventing ISP/middleware interception; Cloudflare 1.1.1.1 and Google 8.8.8.8 support both.
- DNS negative caching: NXDOMAIN responses are cached for the SOA `minimum` TTL — if a record is accidentally deleted, clients cache the absence for up to TTL seconds, causing hard-to-diagnose failures.
Real-World Example
Shopify uses Route 53 latency-based routing to direct traffic to its nearest AWS region (us-east-1, eu-west-1, ap-southeast-2), combined with 30-second TTLs on A records to enable regional failover within ~60 seconds (2× TTL + health check interval).