Rethinking Database Scalability with Serverless Patterns

Rethinking Database Scalability with Serverless Patterns

Introduction

Scalability often looks simple on paper: if demand grows, just add more servers. But in the real world, especially when databases are involved, things rarely scale so neatly.

In one of my past roles, we built a large-scale crawler that ingested thousands of web pages every night. The crawler scaled out effortlessly on EC2 — we could spin up thousands of nodes without issue. The real challenge appeared behind the scenes: the database.

Every database server has a finite connection limit. While EC2 could scale horizontally into the thousands, the DB simply couldn’t keep up. As crawlers scaled, we hit connection limits, and before long the system was throwing connection timeouts.

We explored the usual options:

  • Read replicas? Not helpful, since the workload was write-heavy.
  • Vertical scaling? We kept moving to bigger machines, but it only delayed the inevitable.
  • Sharding? Possible in theory, but far too complex and impractical for our use case.

What we faced was a classic scalability mismatch: Crawlers scaling out by the thousands vs. DB that couldn’t handle the surge in connections.

The Shift in Thinking

At some point, we realized the problem wasn’t really about “how do we make the database bigger?” It was about how do we take pressure off the database altogether?

The crawlers didn’t actually need to write raw HTML directly into the DB. What the business needed was structured insights extracted from those pages. The raw HTML was just an intermediate step — and since no one was ever going to query it, there was no reason for it to live in the database at all.

Worse, keeping the entire raw HTML in the DB meant the storage footprint kept growing, adding to both cost and operational overhead. We were essentially paying premium database prices to store something we didn’t even use.

This shift in mindset opened the door to decoupling. Instead of forcing thousands of EC2 instances to hold long-lived DB connections, we looked at how serverless patterns could absorb the raw data, process it, and only store what was truly valuable.

The New Architecture

We redesigned the crawler pipeline around three AWS building blocks: S3, SQS, and Lambda.

  • S3 for raw storage – Instead of dumping raw HTML into the database, crawlers uploaded it to S3. Cheap, durable, and infinitely scalable storage was a perfect fit for data we didn’t need to query directly.
  • SQS for decoupling – After uploading HTML to S3, each crawler pushed the file path into an SQS queue. This separated the “producers” (crawlers) from the “consumers” (processors), smoothing out load spikes and ensuring no data was lost.
  • Lambda for processing – An AWS Lambda function subscribed to the SQS queue. For each message, it fetched the HTML from S3, extracted only the meaningful information, and inserted structured data into the DB. Each Lambda connection to the DB lasted only a few milliseconds, instead of crawlers holding open long-lived sessions. The design also let us scale up to thousands of Lambda functions in parallel, ensuring elasticity without stressing the database.


Article content
Simplified Architecture

With this design, crawlers still queried the DB for the list of URLs to crawl — but only through short, read-only connections. They no longer kept long-lived sessions open to write every raw HTML page they fetched. Instead, the database only saw quick writes from Lambda containing the structured insights we actually needed.

The Outcomes

The impact of this redesign was immediate:

  • Database load dropped dramatically – With crawlers no longer writing raw HTML, the DB handled only short, purposeful connections from Lambda.
  • Connection timeouts disappeared – Short-lived connections (milliseconds per Lambda) eliminated the surge of long-lived sessions that had been overwhelming the DB.
  • Elastic scalability – Thousands of Lambda functions could process messages in parallel, keeping up with crawler scale without stressing the database.
  • Operational simplicity – The system became more resilient: if a Lambda failed, SQS automatically retried, instead of losing data midstream.
  • Cleaner database footprint – Only structured insights were stored, instead of bloated raw HTML that no one ever queried.

Cost Impact

The redesign wasn’t just about performance — it had a direct impact on cost as well.

  • Database savings – By removing raw HTML storage, the DB footprint shrank significantly. We no longer needed to keep scaling into larger (and more expensive) database instances just to handle unnecessary data.
  • Reduced connection overhead – Crawlers only made quick, read-only queries, while Lambda handled short-lived writes. This meant we could operate with a smaller DB instance size than before.
  • Serverless efficiency – Lambda scaled up to thousands of functions in parallel when needed, but incurred cost only while running. There were no idle EC2 crawlers holding DB connections open and wasting compute.
  • Storage efficiency – S3 was far cheaper per GB than database storage, making it the ideal place to keep raw HTML if we ever needed to revisit it.
  • Right-sizing for workload timing – This entire pipeline was a scheduled job that only ran once a night. Paying for oversized database infrastructure 24/7 made little sense, while serverless let us pay only during actual processing.
  • Lighter EC2 footprint – Parsing data while crawling was adding overhead on the EC2 fleet and making them run longer. By decoupling parsing and shifting it to Lambda, crawlers became leaner, faster, and cheaper to operate.

Overall, the system delivered better scale at a fraction of the previous cost, proving that the right architecture can save both timeouts and dollars.

Final Thoughts

Looking back, the real breakthrough wasn’t a bigger database or more powerful servers — it was realizing the database didn’t need to do all that work in the first place. By separating crawling from parsing, shifting raw storage to S3, and introducing SQS and Lambda, we turned a overloaded system into one that scaled smoothly, cost less, and was easier to operate.

For me, this reinforced a core principle of cloud architecture: true scalability doesn’t come from adding more resources, it comes from rethinking the workflow.

Vineet Bhardwaj

IT Infrastructure Leader | 15 Yrs in Microsoft & VMware | Driving Enterprise Transformation & Innovation

1w

Thank you, Abhishek Sharma, for sharing such an insightful and detailed write-up. The solution you provided was very case-specific and reinforced an important lesson: even when the size is the same, the right fit can vary from person to person." I will get in touch with you for more detailed discussion on this about how you guys got to know about DB disconnections and which tools were used in initial troubleshooting. Does all EC2 writing (evene newly built as part of scalability) writing to same S3 bucket or there was a modification was required in application code to write different data in separate buckets. Solution provided looks very effective but this will need lot of skill set to write correct lamba function, application code modification and also how to manage SQS.

To view or add a comment, sign in

More articles by Abhishek Sharma

Explore content categories