0

I am learning backend development by building a simple URL shortener project. My goal is to learn backend concepts and try to do things in simple but better ways, keeping scalability and reliability in mind.

Tech Stack:

  • Python
  • FastAPI
  • PostgreSQL
  • SQLAlchemy

a) Handling Click Events:

Currently, when a user clicks a short URL:

  • I record the click in a clicks table (url_id, clicked_at, and other metadata).
  • After that, the user is redirected to the long URL.

This direct DB insert works fine for a few users. But with thousands or millions of clicks, this becomes a bottleneck, as frequent DB writes are expensive.

I started thinking about using background tasks and eventual consistency. The idea was:

  • Do the important updates immediately.
  • Push less critical work to a task queue (e.g. Celery).

However, creating one background task per click is too heavy. So I thought of introducing a buffer system:

  • Use Redis to capture click data quickly.
  • Periodically fetch click data in bulk from Redis (using a task which runs periodically) and bulk-insert into DB.

This looks reasonable for my scale, but I’m not sure if this is the best approach. I have read about Kafka and event-driven architectures, but they feel too complex for my current learning level.

My questions for this part:

  • Is the Redis buffer -> periodic bulk insert approach a good and simple way for this use case?
  • Are there any better or simpler ways to handle click recording efficiently in small-to-medium projects?

b) Handling Sequentially Related Operations:

When a user is deactivated, all their URLs should also be deactivated. A simple approach is:

  • Update user status.
  • Update all related URLs.

This works fine for a few users. But if many users are deactivated at once (e.g. 1000 users with 100 URLs each), this will be slow.

So I thought of making the user update instant, and handling URL updates asynchronously.

Idea 1: Per-user Tasks

  • When a user is deactivated, update the user status immediately.
  • Push a background task (via Celery) to deactivate all URLs of that user.
  • Similar for reactivation.

This works, but introduces a race condition:

  • What if the user is reactivated before the first task runs?
  • Example: user status changes from active → deactivated → active within a short time.

Now the queued deactivate task might still run and disable URLs incorrectly. After that queued reactivate task will also run, enabling the URLs. So, two operations, to return to the same original state.

I thought of using a versioning system:

  • Maintain a version number for each user status change.
  • Each task carries the version it was created for.
  • Before updating URLs, the task checks if the current user version still matches.
  • If not, it skips the update.
# Deactivation worker
if url.last_user_version >= task.version or user.status_version > task.version:
    skip
else:
    deactivate URL
    url.last_user_version = task.version

# Reactivation worker
if url.last_user_version >= task.version or user.status_version > task.version:
    skip
else:
    activate URL
    url.last_user_version = task.version

This prevents outdated tasks from applying invalid updates. If there are many users, will the task queue be overloaded?

Idea 2: Scheduled Task Approach:

Instead of per-user tasks, I can have a periodic scheduled task that:

  • Scans a small user_status table (user_id, status, version, updated_at). All the status updates will be recorded per user in this table.
  • Compares the current version with the last applied version.
  • Applies URL updates only where needed.

If there are no changes, the task will still run

For Idea 1, I thought of using Redis again as an intermediate buffer:

  • Store the user_id in a list only if it is not present.
    • Suppose user deactivates, we add the user_id to buffer.
    • If user reactivates, we check the buffer and remove the user_id.
  • We avoid per-user tasks here, instead we will have a periodic task running, scanning the buffer.

Questions:

  • Is this Redis buffer + periodic sync approach suitable for this kind of workflow?
  • How can I avoid long delays between user deactivation and URL deactivation (so that redirection stops quickly)?

Is there a simpler or better design to handle these at small-to-medium scale?

2
  • 3
    You make a lot of statements about performance. Do you have the profiling data to confirm these? Or are they just unverified assumptions? If you do have profiling data, that would be useful to include. Commented Oct 15 at 9:04
  • 1
    You are drastically overcomplicating things without evidence that you need this complexity. My rule of thumb: don't worry to much about per-request performance until you hit something like 100 requests per second. It is normal for databases to handle more than that, especially if most queries are relatively cheap. If you ever hit performance problems here, you might find that Postgres isn't the best database for recording clicks, and might switch to some NoSQL solution that's optimized for such append-only event streams. Introducing caches is likely to make things slower and more fragile. Commented Oct 15 at 10:38

2 Answers 2

5

But with thousands or millions of clicks, this becomes a bottleneck

Well, have you tested it to be able to assert that?

You can't just guess bottlenecks in software development—chances are, you'll get it all wrong most of the time.

When it comes to optimizing existing code, you start by profile it. A profiler will show you where the bottleneck is. Then you focus your optimization efforts on this part.

When it comes to code that doesn't yet exist, you start by writing it, and testing it at a given scale. If your business requirement is to support, say, two millions of clicks per second, do an actual test where two millions of clicks are being simulated. Hint: you'll likely see that it's not the database that's your problem. ;)

as frequent DB writes are expensive.

How expensive? Compared to what? Here again, you're just making hypotheses, but formulating them as assertions. This is not a correct approach.

I started thinking about using background tasks and eventual consistency.

The previous hypotheses led you to start thinking about solutions to a wrong problem. This would led you to an unnecessarily complex solution. Essentially, you're violating KISS and YAGNI principles here.

Here's one way to face such projects instead.

Start by making clear business requirements, and think about the simplest system you can get to fulfill those requirements, without thinking yet about the scalability.

Basically, if the need is to have a trace of what URLs were clicked the most, you don't need a single line of code for that. Set up Nginx and use one of the popular tools that can process the HTTP logs and extract the information you need.

Once you have a system that (1) fulfills the requirements, and (2) is as simple as possible, you can get back to the requirements and start thinking about scalability. How many URLs are there? How many clicks are expected per second? What do we want to do with the records of those clicks?

Then start testing. Maybe your solution already works for the case where eight billion people would simultaneously access your URL shortener service. Or maybe at some point, you'll see that things start get ugly—such as, getting out of sockets, for instance. Once you identified the issue, you have, in general, three ways to cope with it:

  1. Vertical scalability. Say, if something takes too much CPU time, you get a more powerful CPU. If something uses all the 128 GB of memory on your server, you buy the additional 128 GBs. Naturally, it has its limits, as at some point, your hardware would start costing a lot.

  2. Horizontal scalability. Instead of buying a more powerful server, you just buy an inexpensive second one to share the load. Compared to the first approach, this one has a benefit of being linearly scalable. If your application is hosted on fifty servers, each costing you N USD per month, and you want twice the capacity, you just scale up to one hundred servers, paying now N×2 USD per month.

    There are a bunch of techniques that are used to ensure a given system can be horizontally scaled. One of them is sharding. The idea is that based on some key, the load would go to a given machine. For an URL shortener, for instance, one may redirect the URLs starting by the specific letters to the first server, and the other ones to the second server.

  3. System optimization. If, for some reason, it appears that it is less expensive to spend time optimizing, rather than paying for more hardware, then you go back to the actual system, and see, with the help of profiling and stress testing, how to fix the issue while continuing to use the same hardware.

0

Is this Redis buffer + periodic sync approach suitable for this kind of workflow?

Yes, Although I would try and avoid having expensive Redis if possible. You could do the same with a local memory cache.

Also, consider whether its worth storing the time stamp of every click. Could you sum up all the clicks for a url in a second instead? ie you get 899 clicks at 12:00:01 and instead of writing 899 rows 12:00.01.001, 12:00.01.002 etc you just write a single row and say 899 clicks at 12:00:01

Also consider whether SQL is the best tech for time series data. You might be better writing clicks to a log and ingesting into elastic/reporting tech of your choice. Let the 3rd party software handle the buffering etc.

How can I avoid long delays between user deactivation and URL deactivation (so that redirection stops quickly)?

Update the user status and put a where clause user.status = 'active' when you select the url. Rather than deleting 100's of rows.

As others have pointed out, its odd that you are saying the DB is slow. Are you using keys and indexing correctly?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.