In a perfect world, every network request would succeed instantly. But in the real world, APIs fail, networks drop, and servers hiccup. Rather than giving up on the first failure, retrying with an intelligent strategy can make our applications more resilient.
Exponential Backoff β a proven algorithm for handling retries efficiently and politely.
π§ Why Do Requests Fail?
APIs can fail for many transient reasons:
- π Network timeouts
- πΆ Temporary internet issues
- π¦ Rate limiting (
429 Too Many Requests
) - π§ Server-side overload (
5xx
errors)
In many of these cases, retrying the request after a delay can succeed.
β The Problem with Naive Retries
Imagine if every client instantly retried after failure β the server would be flooded, making recovery even harder.
So we need to retry smarter β not harder.
β³ What Is Exponential Backoff?
Exponential backoff increases the wait time between retries exponentially after each failure.
β Formula:
delay = baseDelay * (2 ^ attemptNumber)
π Visualization: Exponential Growth of Delays
For example, with baseDelay = 500ms
:
Attempt Delay (ms)
------- ----------
1 500
2 1000
3 2000
4 4000
5 8000
And with jitter, those values will vary slightly to avoid thundering herds.
π Add a Bit of Jitter
Without randomness, clients can retry at the exact same time β causing a thundering herd problem. Adding jitter (random noise) avoids this.
const jitter = Math.random() * 100;
const delay = baseDelay * 2 ** attempt + jitter;
π§ͺ Sample Implementation: JavaScript (Node.js)
async function fetchWithRetry(url, options = {}, maxRetries = 5, baseDelay = 500) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const response = await fetch(url, options);
if (!response.ok && response.status >= 500) {
throw new Error(`Server error: ${response.status}`);
}
return response;
} catch (error) {
if (attempt === maxRetries) {
throw new Error(`Failed after ${maxRetries + 1} attempts: ${error.message}`);
}
const backoff = baseDelay * 2 ** attempt;
const jitter = Math.random() * 100;
const delay = backoff + jitter;
console.warn(`Attempt ${attempt + 1} failed. Retrying in ${delay.toFixed(0)}ms...`);
await new Promise((r) => setTimeout(r, delay));
}
}
}
π‘ Retryable vs Non-Retryable Errors
Not every error deserves a retry.
Error Code | Description | Retry? |
---|---|---|
500 |
Internal Server Error | β |
503 |
Service Unavailable | β |
429 |
Too Many Requests (rate limit) | β |
400 |
Bad Request | β |
401/403 |
Unauthorized/Forbidden | β |
Also, check for Retry-After
headers in rate-limited responses:
HTTP/1.1 429 Too Many Requests
Retry-After: 120
You can honor that delay before retrying.
π Real-World Examples
-
Stripe API: Automatically retries on
409
,429
, and5xx
with exponential backoff. - Google Cloud APIs: Recommend exponential backoff with jitter for handling transient errors.
- AWS SDKs: Use full jitter to randomize retries even more.
π Best Practices
β Retry only for transient failures
β Use maximum retry caps to avoid infinite loops
β Add jitter to prevent synchronized retries
β
Use headers like Retry-After
if available
β Log failures with retry metadata for debugging
π§ Adding Redis for Retry Tracking
When retrying critical or expensive operations, storing retry metadata in Redis can prevent duplicate retries or help in logging/debugging:
// pseudo-code
const retryKey = `retry:${jobId}`;
const attempt = await redis.get(retryKey) || 0;
if (attempt > maxRetries) throw new Error('Too many retries');
await redis.set(retryKey, attempt + 1, 'EX', retryExpirySeconds);
Redis helps us:
- Persist retry counts across process restarts
- Coordinate retries in distributed systems
- Add TTLs to automatically clear retry metadata
π Test APIs to Try This Locally
1. httpstat.us
Returns custom HTTP status codes and delays.
Examples:
-
https://httpstat.us/503
β Simulates503 Service Unavailable
-
https://httpstat.us/500?sleep=2000
β Simulates500
with delay -
https://httpstat.us/429
β Simulates rate limiting -
https://httpstat.us/200
β Simulates success
2. reqres.in
REST-style fake API for testing:
-
https://reqres.in/api/users/2
β Valid request -
https://reqres.in/api/users/23
β Returns404
π Final Thoughts
In todayβs distributed systems, failures are not exceptions β they are expected. Retrying failed requests with exponential backoff makes our systems resilient, polite, and production-grade.
If you're building a system that talks to APIs, don't just retry blindly. Use backoff, use jitter, and always fail gracefully when it's time to stop.
βοΈ Bonus: Want to Plug This into Axios?
Hereβs a wrapper using Axios and exponential backoff:
const axios = require('axios');
async function axiosRetry(url, options = {}, retries = 3, baseDelay = 300) {
for (let i = 0; i <= retries; i++) {
try {
return await axios(url, options);
} catch (err) {
if (i === retries || !shouldRetry(err)) throw err;
const delay = baseDelay * 2 ** i + Math.random() * 100;
await new Promise(r => setTimeout(r, delay));
}
}
}
function shouldRetry(err) {
return [429, 500, 503].includes(err.response?.status);
}
π Reference Blogs & Docs & Good Reads
- AWS: Exponential Backoff and Jitter
- Google Cloud: Retry Guidelines
- Stripe: Idempotent Requests
- Fetch API Error Handling
- Redis Node.js Client
- httpstat.us - HTTP Testing
- reqres.in - Fake API
- Mocky - Custom Mock API
Top comments (0)