Rethinking Database Scalability with Serverless Patterns
Introduction
Scalability often looks simple on paper: if demand grows, just add more servers. But in the real world, especially when databases are involved, things rarely scale so neatly.
In one of my past roles, we built a large-scale crawler that ingested thousands of web pages every night. The crawler scaled out effortlessly on EC2 — we could spin up thousands of nodes without issue. The real challenge appeared behind the scenes: the database.
Every database server has a finite connection limit. While EC2 could scale horizontally into the thousands, the DB simply couldn’t keep up. As crawlers scaled, we hit connection limits, and before long the system was throwing connection timeouts.
We explored the usual options:
What we faced was a classic scalability mismatch: Crawlers scaling out by the thousands vs. DB that couldn’t handle the surge in connections.
The Shift in Thinking
At some point, we realized the problem wasn’t really about “how do we make the database bigger?” It was about how do we take pressure off the database altogether?
The crawlers didn’t actually need to write raw HTML directly into the DB. What the business needed was structured insights extracted from those pages. The raw HTML was just an intermediate step — and since no one was ever going to query it, there was no reason for it to live in the database at all.
Worse, keeping the entire raw HTML in the DB meant the storage footprint kept growing, adding to both cost and operational overhead. We were essentially paying premium database prices to store something we didn’t even use.
This shift in mindset opened the door to decoupling. Instead of forcing thousands of EC2 instances to hold long-lived DB connections, we looked at how serverless patterns could absorb the raw data, process it, and only store what was truly valuable.
The New Architecture
We redesigned the crawler pipeline around three AWS building blocks: S3, SQS, and Lambda.
With this design, crawlers still queried the DB for the list of URLs to crawl — but only through short, read-only connections. They no longer kept long-lived sessions open to write every raw HTML page they fetched. Instead, the database only saw quick writes from Lambda containing the structured insights we actually needed.
The Outcomes
The impact of this redesign was immediate:
Cost Impact
The redesign wasn’t just about performance — it had a direct impact on cost as well.
Overall, the system delivered better scale at a fraction of the previous cost, proving that the right architecture can save both timeouts and dollars.
Final Thoughts
Looking back, the real breakthrough wasn’t a bigger database or more powerful servers — it was realizing the database didn’t need to do all that work in the first place. By separating crawling from parsing, shifting raw storage to S3, and introducing SQS and Lambda, we turned a overloaded system into one that scaled smoothly, cost less, and was easier to operate.
For me, this reinforced a core principle of cloud architecture: true scalability doesn’t come from adding more resources, it comes from rethinking the workflow.
IT Infrastructure Leader | 15 Yrs in Microsoft & VMware | Driving Enterprise Transformation & Innovation
1wThank you, Abhishek Sharma, for sharing such an insightful and detailed write-up. The solution you provided was very case-specific and reinforced an important lesson: even when the size is the same, the right fit can vary from person to person." I will get in touch with you for more detailed discussion on this about how you guys got to know about DB disconnections and which tools were used in initial troubleshooting. Does all EC2 writing (evene newly built as part of scalability) writing to same S3 bucket or there was a modification was required in application code to write different data in separate buckets. Solution provided looks very effective but this will need lot of skill set to write correct lamba function, application code modification and also how to manage SQS.