How Joins work?

#explainlikeimfive #java #database #beginners

Joins are like set operations.

I know math can be intimidating, but this part isn’t that scary.

INNER JOIN : get rows common to both table, matching key( primary and foreign key)

id	name
1	Alice

id	customer_id	product
10	1	Book
11	1	Laptop

SELECT c.name, o.product
FROM customers c
JOIN orders o ON c.id = o.customer_id;

output :

name	product
Alice	Book
Alice	Laptop

On clause specifies matching condition.

Intuition:

Identify the table join: Custumors and Orders
we compare each row of table 1 with table 2 and look for the matching condition
for every matching pair database combines the table columns
If it's an INNER JOIN, only matched pairs are included.
If it's a LEFT JOIN, all rows from the left table are included; if no match is found in the right table, NULLs fill the right side.
For other join types, the logic changes slightly (like including unmatched rows on either side).

For obvious reasons, databases don't compare every row with every other row blindly.

Cost Estimation – The database estimates the CPU, memory, and I/O cost of different join strategies.
Join Reordering – It may reorder joins to start with the most selective table.
Bloom Filters – In some systems, these help reduce unnecessary data scanning.

Nested Loop Join: for each row in outer table, the database searches for matching rows in inner table.
Con: can be very slow on large datasets.
Hash Joins: db creates hash table(usually smaller one) on the key used for join. and then we lookup other table key in hash table.

pro: No index required.
con: uses extra memory for hashtable

Drum roll ....

Merge-join: A merge join scans two sorted tables in order, comparing rows based on the join key. When keys match, it returns the joined rows, moving pointers forward to continue merging efficiently. example:

Start at the first row of sales (store_id = 100) and the first row of stores (id = 100).
Compare them:
100 = 100 → MATCH → return this joined row: sale_id=1, store_id=100, amount=200, name=Store A
Move both pointers forward.
101 = 101 → MATCH → return: sale_id=2, store_id=101, amount=300, name=Store B
Next: 102 = 102 → MATCH → return: sale_id=3, store_id=102, amount=250, name=Store C
Efficient, fast, and only one scan through each table.

Joins combine tables by matching rows based on keys.
INNER JOIN returns only matching rows.
Different strategies (nested loop, hash, merge) help optimize performance behind the scenes.

DEV Community