Joins are like set operations.
I know math can be intimidating, but this part isn’t that scary.
INNER JOIN : get rows common to both table, matching key( primary and foreign key)
id | name |
---|---|
1 | Alice |
id | customer_id | product |
---|---|---|
10 | 1 | Book |
11 | 1 | Laptop |
SELECT c.name, o.product
FROM customers c
JOIN orders o ON c.id = o.customer_id;
output :
name | product |
---|---|
Alice | Book |
Alice | Laptop |
On
clause specifies matching condition.
Intuition:
- Identify the table join: Custumors and Orders
- we compare each row of table 1 with table 2 and look for the matching condition
- for every matching pair database combines the table columns
- If it's an INNER JOIN, only matched pairs are included.
- If it's a LEFT JOIN, all rows from the left table are included; if no match is found in the right table, NULLs fill the right side.
- For other join types, the logic changes slightly (like including unmatched rows on either side).
For obvious reasons, databases don't compare every row with every other row blindly.
🔍 What Happens Under the Hood During a Join
- Cost Estimation – The database estimates the CPU, memory, and I/O cost of different join strategies.
- Join Reordering – It may reorder joins to start with the most selective table.
- Bloom Filters – In some systems, these help reduce unnecessary data scanning.
What are strategies Db uses for Joins?
Nested Loop Join: for each row in outer table, the database searches for matching rows in inner table.
Con: can be very slow on large datasets.Hash Joins: db creates hash table(usually smaller one) on the key used for join. and then we lookup other table key in hash table.
pro: No index required.
con: uses extra memory for hashtable
Drum roll ....
- Merge-join: A merge join scans two sorted tables in order, comparing rows based on the join key. When keys match, it returns the joined rows, moving pointers forward to continue merging efficiently. example:
sale_id | store_id | amount |
---|---|---|
1 | 100 | 200 |
2 | 101 | 300 |
3 | 102 | 250 |
id | name |
---|---|
100 | Store A |
101 | Store B |
102 | Store C |
- Start at the first row of sales (store_id = 100) and the first row of stores (id = 100).
- Compare them:
- 100 = 100 → MATCH → return this joined row: sale_id=1, store_id=100, amount=200, name=Store A
- Move both pointers forward.
- 101 = 101 → MATCH → return: sale_id=2, store_id=101, amount=300, name=Store B
- Next: 102 = 102 → MATCH → return: sale_id=3, store_id=102, amount=250, name=Store C
- Efficient, fast, and only one scan through each table.
✅ In Summary:
- Joins combine tables by matching rows based on keys.
- INNER JOIN returns only matching rows.
- Different strategies (nested loop, hash, merge) help optimize performance behind the scenes.
Top comments (0)