The Incident That Woke Us Up
It was 3 AM when the alerts started flooding in.
Our high-traffic API, which usually handled 5,000 requests per second (RPS), was suddenly timing out for 30% of users. Database queries were slow, CPU usage spiked to 95%, and the event loop was lagging by 500ms.
After a frantic hour of checking logs, we realized: Node.js itself wasn’t the problem—our code was.
We had fallen into common Node.js performance traps that only show up under real-world load.
Here’s what we learned—and how we fixed it.
1. Blocking the Event Loop: The Silent Killer
The Symptom:
- High latency during peak traffic.
- Unresponsive API even with low CPU usage.
The Culprit:
A synchronous CSV parser in our user-import feature:
function parseLargeCSV(filePath) {
const data = fs.readFileSync(filePath); // 😱 Blocking call!
return data.split('\n').map(processRow);
}
Even though it was a rarely used admin feature, it blocked the entire Node.js process for 2-3 seconds, causing request delays.
The Fix:
-
Use
fs.promises
+ streams for large files:
async function parseLargeCSV(filePath) {
const stream = fs.createReadStream(filePath);
for await (const chunk of stream) {
// Process chunk by chunk
}
}
- Offload CPU-heavy tasks to Worker Threads.
Result: API latency dropped by 40% during peak loads.
2. Memory Leaks: The Slow Death
The Symptom:
- Gradual slowdowns over days.
- Restarts temporarily fixed issues (a classic red flag).
The Culprit:
A misconfigured cache kept growing indefinitely:
const cache = {};
app.get('/data/:id', (req, res) => {
if (!cache[req.params.id]) {
cache[req.params.id] = fetchData(req.params.id); // 🚀 Grows forever!
}
res.json(cache[req.params.id]);
});
The Fix:
-
Use
WeakMap
or LRU caches (likelru-cache
):
const LRU = require('lru-cache');
const cache = new LRU({ max: 1000 }); // Automatically evicts old entries
-
Monitor heap usage with
--inspect
flag + Chrome DevTools.
Result: Memory usage stabilized at 300MB instead of creeping to 2GB+.
3. Promise Hell: Uncontrolled Concurrency
The Symptom:
- Database timeouts under load.
-
High
pendingPromises
count in metrics.
The Culprit:
An unbatched Promise.all
fetching 10,000 rows at once:
async function fetchAllUsers(userIds) {
return Promise.all(userIds.map(id => db.query('SELECT * FROM users WHERE id = ?', [id])));
} // 💥 Database gets hammered!
The Fix:
-
Batch with
p-limit
orbluebird
’sPromise.map
:
const limit = require('p-limit');
const concurrency = limit(10); // Max 10 DB queries at once
async function fetchAllUsers(userIds) {
return Promise.all(userIds.map(id => concurrency(() => db.query('...', [id]))));
}
Result: Database load reduced by 70%, no more timeouts.
4. Poorly Optimized Logging
The Symptom:
- High disk I/O during traffic spikes.
-
console.log
in production (yes, we did it 😅).
The Culprit:
app.use((req, res, next) => {
console.log(`Incoming: ${req.method} ${req.url}`); // ⚠️ Sync + unbuffered!
next();
});
The Fix:
-
Use
winston
orpino
(async, structured logging):
const logger = require('pino')();
app.use((req, res, next) => {
logger.info({ method: req.method, url: req.url }, 'Request');
next();
});
Result: Logging overhead reduced from 15ms to <1ms per request.
Key Takeaways
✅ Never block the event loop (use streams, workers).
✅ Cache wisely (LRU, TTL, WeakMap).
✅ Control concurrency (avoid unlimited Promise.all
).
✅ Log efficiently (pino
> console.log
).
Our API now handles 10,000 RPS without breaking a sweat.
What Node.js performance traps have YOU faced? Let’s discuss in the comments! 👇
Top comments (0)