Webhooks are HTTP callbacks that notify external systems of events by sending an HTTP POST to a pre-registered URL when an event occurs — inverting the polling pattern (the provider pushes data rather than the consumer pulling it). The critical challenges are delivery guarantees (the subscriber endpoint may be down), security (verifying the payload came from the legitimate provider), and idempotency (the consumer must handle duplicate deliveries from retry logic). Production webhook systems must implement exponential backoff retry, signature verification, and event ID deduplication.

Key Points

  • Retry strategy: on non-2xx response or connection timeout, retry with exponential backoff (1s, 2s, 4s, 8s...up to 24h); after max retries (typically 5–72h window), move event to a failed event queue for operator review.
  • HMAC signature verification: provider signs the payload with a shared secret (HMAC-SHA256); consumer recomputes the signature and compares — if mismatched, reject with 401; prevents spoofed webhook calls.
  • Stripe signature: X-Stripe-Signature header contains timestamp and signature; include timestamp in signed payload to prevent replay attacks (reject if timestamp > 5 minutes old).
  • Idempotency: consumer must store processed event IDs (in Redis or DB) and ignore duplicates — retry logic will deliver the same event multiple times on intermittent failures.
  • Event ordering: webhooks are not guaranteed to arrive in order; include sequence numbers or timestamps in payloads; for critical ordering, fetch current state from the API after receiving a webhook rather than relying solely on the event payload.
  • Webhook fan-out: a single internal event triggers webhooks to multiple subscribers; use a dedicated webhook delivery service (queue-backed) to decouple from the main application and retry independently per subscriber.
  • Circuit breaker per subscriber: if a subscriber endpoint returns 500 consistently, stop retrying (circuit open) and alert the subscriber; avoids wasting retry budget on a permanently broken endpoint.
  • Webhook catalog / management: provide subscriber portal to view delivery history, retry failed events, inspect payloads, and update endpoint URLs — critical for enterprise subscribers.

Real-World Example

GitHub webhooks deliver 200M+ events per day to CI/CD systems, code review tools, and chatbots; they use exponential backoff with a 5-day retry window and provide a delivery history UI for debugging. Stripe's webhook infrastructure processes billions of payment events monthly; they use HMAC-SHA256 signatures with a 5-minute tolerance window and maintain per-endpoint retry state independently.