Skip to main content

Retries and Failure Handling

Networks fail, deploys happen, downstream services go down. When a webhook delivery doesn't succeed, we retry it on a backoff schedule so you don't lose events just because your endpoint was unreachable for a few minutes.

What counts as a successful delivery

A delivery is considered successful when your endpoint returns an HTTP status code in the 2xx range (200–299) within the delivery timeout.

Anything else is a failure:

  • Any 4xx or 5xx status code
  • A connection error (DNS failure, TCP reset, TLS handshake error, etc.)
  • A timeout — your endpoint didn't respond in time

3xx redirects are not followed. If your endpoint responds with a redirect, the delivery is treated as a failure and will be retried. Update your registered URL instead of relying on redirects.

Retry schedule

If the first attempt fails, we retry on the following schedule. Each interval is measured from the previous failure.

AttemptTime since previous failure
1 (initial)
25 seconds
35 minutes
430 minutes
52 hours
65 hours
710 hours
8 (final)10 hours

If every attempt fails, we give up after the eighth. In total, we spend a little over 24 hours trying to reach your endpoint before considering the delivery lost.

A small amount of random jitter is applied to each interval to smooth out thundering-herd effects if many endpoints are recovering at once.

What you'll see during retries

  • The same event retains the same svix-id across retries. This is the idempotency key you should use.
  • The svix-timestamp header reflects when this particular attempt was sent, not the original attempt. That's why the 5-minute freshness window on signature verification still applies.
  • The signature on each retry is computed fresh for that attempt.

Handling duplicates

Because we retry, and because retries sometimes succeed on your side after your response failed to reach us, you should plan on occasionally seeing the same event more than once. We guarantee at-least-once delivery, not exactly-once.

The right pattern is to use svix-id as an idempotency key:

on webhook:
verify signature
if we've already processed svix-id:
return 200
process event
record svix-id as processed
return 200

A database unique constraint on svix-id is a simple way to enforce this — if the insert fails with a duplicate-key error, you know you've already handled this event.

Failing fast vs. failing slow

Some failure modes are worse than others:

  • Returning 4xx quickly is fine if the request is genuinely malformed or the signature fails to verify. We'll retry, which in those cases won't help, but that's the correct behavior — you don't want to silently accept bad requests.
  • Returning 5xx quickly is the right response to transient internal errors. We'll retry, and your next deploy or the recovery of your downstream dependency will likely resolve it.
  • Timeouts are the most expensive failure. They block one of our delivery workers for the full timeout window and they give you no opportunity to recover. If your processing is going to take more than a couple of seconds, enqueue the event and return 200 immediately.

When to check delivery status

If you suspect deliveries are failing:

  1. Check the status code and response body your endpoint returned to us — often the issue is a 401 from your own auth middleware blocking the webhook route, or a 5xx from a dependency.
  2. Verify your registered URL still points where you expect and is reachable from the public internet.
  3. Check that your signature verification isn't rejecting everything — a recently rotated secret that your handler hasn't picked up is a common cause.

For help investigating specific failed deliveries, reach out to the Allfly team.