Abhinav

Posted on May 29

🔁 Retrying Failed Requests with Exponential Backoff

#programming #javascript #redis #algorithms

In a perfect world, every network request would succeed instantly. But in the real world, APIs fail, networks drop, and servers hiccup. Rather than giving up on the first failure, retrying with an intelligent strategy can make our applications more resilient.

Exponential Backoff — a proven algorithm for handling retries efficiently and politely.

🚧 Why Do Requests Fail?

APIs can fail for many transient reasons:

🌐 Network timeouts
📶 Temporary internet issues
🚦 Rate limiting (429 Too Many Requests)
🔧 Server-side overload (5xx errors)

In many of these cases, retrying the request after a delay can succeed.

✅ The Problem with Naive Retries

Imagine if every client instantly retried after failure — the server would be flooded, making recovery even harder.

So we need to retry smarter — not harder.

⏳ What Is Exponential Backoff?

Exponential backoff increases the wait time between retries exponentially after each failure.

⌛ Formula:

delay = baseDelay * (2 ^ attemptNumber)

📈 Visualization: Exponential Growth of Delays

For example, with baseDelay = 500ms:

Attempt  Delay (ms)
-------  ----------
1        500
2        1000
3        2000
4        4000
5        8000

And with jitter, those values will vary slightly to avoid thundering herds.

🔀 Add a Bit of Jitter

Without randomness, clients can retry at the exact same time — causing a thundering herd problem. Adding jitter (random noise) avoids this.

const jitter = Math.random() * 100;
const delay = baseDelay * 2 ** attempt + jitter;

🧪 Sample Implementation: JavaScript (Node.js)

async function fetchWithRetry(url, options = {}, maxRetries = 5, baseDelay = 500) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url, options);
      if (!response.ok && response.status >= 500) {
        throw new Error(`Server error: ${response.status}`);
      }
      return response;
    } catch (error) {
      if (attempt === maxRetries) {
        throw new Error(`Failed after ${maxRetries + 1} attempts: ${error.message}`);
      }
      const backoff = baseDelay * 2 ** attempt;
      const jitter = Math.random() * 100;
      const delay = backoff + jitter;
      console.warn(`Attempt ${attempt + 1} failed. Retrying in ${delay.toFixed(0)}ms...`);
      await new Promise((r) => setTimeout(r, delay));
    }
  }
}

💡 Retryable vs Non-Retryable Errors

Not every error deserves a retry.

Error Code	Description	Retry?
`500`	Internal Server Error	✅
`503`	Service Unavailable	✅
`429`	Too Many Requests (rate limit)	✅
`400`	Bad Request	❌
`401/403`	Unauthorized/Forbidden	❌

Also, check for Retry-After headers in rate-limited responses:

HTTP/1.1 429 Too Many Requests
Retry-After: 120

You can honor that delay before retrying.

🌍 Real-World Examples

Stripe API: Automatically retries on 409, 429, and 5xx with exponential backoff.
Google Cloud APIs: Recommend exponential backoff with jitter for handling transient errors.
AWS SDKs: Use full jitter to randomize retries even more.

📋 Best Practices

✅ Retry only for transient failures

✅ Use maximum retry caps to avoid infinite loops

✅ Add jitter to prevent synchronized retries

✅ Use headers like Retry-After if available

✅ Log failures with retry metadata for debugging

🧠 Adding Redis for Retry Tracking

When retrying critical or expensive operations, storing retry metadata in Redis can prevent duplicate retries or help in logging/debugging:

// pseudo-code
const retryKey = `retry:${jobId}`;
const attempt = await redis.get(retryKey) || 0;
if (attempt > maxRetries) throw new Error('Too many retries');

await redis.set(retryKey, attempt + 1, 'EX', retryExpirySeconds);

Redis helps us:

Persist retry counts across process restarts
Coordinate retries in distributed systems
Add TTLs to automatically clear retry metadata

🌐 Test APIs to Try This Locally

1. httpstat.us

Returns custom HTTP status codes and delays.

Examples:

https://httpstat.us/503 → Simulates 503 Service Unavailable
https://httpstat.us/500?sleep=2000 → Simulates 500 with delay
https://httpstat.us/429 → Simulates rate limiting
https://httpstat.us/200 → Simulates success

2. reqres.in

REST-style fake API for testing:

https://reqres.in/api/users/2 → Valid request
https://reqres.in/api/users/23 → Returns 404

🚀 Final Thoughts

In today’s distributed systems, failures are not exceptions — they are expected. Retrying failed requests with exponential backoff makes our systems resilient, polite, and production-grade.

If you're building a system that talks to APIs, don't just retry blindly. Use backoff, use jitter, and always fail gracefully when it's time to stop.

✍️ Bonus: Want to Plug This into Axios?

Here’s a wrapper using Axios and exponential backoff:

const axios = require('axios');

async function axiosRetry(url, options = {}, retries = 3, baseDelay = 300) {
  for (let i = 0; i <= retries; i++) {
    try {
      return await axios(url, options);
    } catch (err) {
      if (i === retries || !shouldRetry(err)) throw err;
      const delay = baseDelay * 2 ** i + Math.random() * 100;
      await new Promise(r => setTimeout(r, delay));
    }
  }
}

function shouldRetry(err) {
  return [429, 500, 503].includes(err.response?.status);
}

DEV Community