DEV Community

Jonathan
Jonathan

Posted on • Edited on

How to Handle Millions of Rows in SQL Without Killing Performance

When working with massive datasets, an inefficient SQL query can significantly degrade database performance. One common mistake? Using a giant IN (...) clause to filter a large set of values.

This post shows you why JOIN with a temporary table is a smarter, faster alternative and how to apply it in a real use case.

The Problem: IN with Too Many Values

Imagine you have a huge table transactions with more than 5 billion rows and you're trying to retrieve all the transactions for 10,000 specific users.

The first and easy way that comes to our mind would be doing something like this:

SELECT *
FROM transactions
WHERE user_id IN (101, 102, 103, ..., 10000);
Enter fullscreen mode Exit fullscreen mode

It may look simple, but it can lead to serious performance issues.

  • The DB might scan the entire table
  • The IN list might not be optimized well
  • Performance degrades fast as the list grows

The Solution: JOIN with a Temp Table

Instead of passing a huge list, use a temporary table:

CREATE TEMP TABLE temp_user_ids (user_id BIGINT);
Enter fullscreen mode Exit fullscreen mode

Remember that Temporary tables exist only for the duration of the session or transaction, depending on the database.
Now, insert your 10,000 user IDs:

INSERT INTO temp_user_ids (user_id) VALUES
(101), (102), (103), ..., (10000);
Enter fullscreen mode Exit fullscreen mode

Now query:

SELECT t.*
FROM transactions t
JOIN temp_user_ids u ON t.user_id = u.user_id;
Enter fullscreen mode Exit fullscreen mode

Why is this better?
The database can use an index on transactions.user_id

  • It's easier to parallelize.
  • It avoids large memory buffers and complex query plans.
  • It's faster and more predictable.

Conclusion

If your SQL query is slow and you're using a massive IN (...) list, switch to a JOIN with a temp table. It’s faster, cleaner, and lets your database do what it does best: optimize joins using indexes.

Using a temporary table instead of a long IN (...) list makes your SQL query faster and easier for the database to handle. When you use IN with thousands of values, the database may get slow because it has to check each value one by one and might not use indexes properly. But with a temporary table, you can store all the values in one place, add an index, and let the database use a faster join method to find matches. This makes the query run quicker, use less memory, and work better with large amounts of data.

Top comments (1)

Collapse
 
bakingbad profile image
Smit

Nice! This is an impressive way to solve a problem. I think using IN with many values is easy and less time consuming when the list is short. But as the number of values grows, this method is a much better and cleaner alternative. Thanks for sharing, I’ll definitely try this approach next time.