DEV Community

levi
levi

Posted on • Edited on

How Joins work?

Joins are like set operations.

I know math can be intimidating, but this part isn’t that scary.

INNER JOIN : get rows common to both table, matching key( primary and foreign key)

id name
1 Alice
id customer_id product
10 1 Book
11 1 Laptop
SELECT c.name, o.product
FROM customers c
JOIN orders o ON c.id = o.customer_id;
Enter fullscreen mode Exit fullscreen mode

output :

name product
Alice Book
Alice Laptop

On clause specifies matching condition.

Intuition:

  1. Identify the table join: Custumors and Orders
  2. we compare each row of table 1 with table 2 and look for the matching condition
  3. for every matching pair database combines the table columns
  4. If it's an INNER JOIN, only matched pairs are included.
  5. If it's a LEFT JOIN, all rows from the left table are included; if no match is found in the right table, NULLs fill the right side.
  6. For other join types, the logic changes slightly (like including unmatched rows on either side).

For obvious reasons, databases don't compare every row with every other row blindly.

🔍 What Happens Under the Hood During a Join

  1. Cost Estimation – The database estimates the CPU, memory, and I/O cost of different join strategies.
  2. Join Reordering – It may reorder joins to start with the most selective table.
  3. Bloom Filters – In some systems, these help reduce unnecessary data scanning.

What are strategies Db uses for Joins?

  1. Nested Loop Join: for each row in outer table, the database searches for matching rows in inner table.
    Con: can be very slow on large datasets.

  2. Hash Joins: db creates hash table(usually smaller one) on the key used for join. and then we lookup other table key in hash table.

pro: No index required.
con: uses extra memory for hashtable

Drum roll ....

  1. Merge-join: A merge join scans two sorted tables in order, comparing rows based on the join key. When keys match, it returns the joined rows, moving pointers forward to continue merging efficiently. example:
sale_id store_id amount
1 100 200
2 101 300
3 102 250
id name
100 Store A
101 Store B
102 Store C
  • Start at the first row of sales (store_id = 100) and the first row of stores (id = 100).
  • Compare them:
  • 100 = 100 → MATCH → return this joined row: sale_id=1, store_id=100, amount=200, name=Store A
  • Move both pointers forward.
  • 101 = 101 → MATCH → return: sale_id=2, store_id=101, amount=300, name=Store B
  • Next: 102 = 102 → MATCH → return: sale_id=3, store_id=102, amount=250, name=Store C
  • Efficient, fast, and only one scan through each table.

✅ In Summary:

  • Joins combine tables by matching rows based on keys.
  • INNER JOIN returns only matching rows.
  • Different strategies (nested loop, hash, merge) help optimize performance behind the scenes.

Top comments (0)