Assign unique values in a set-based approach

Question

Simplifying, I have the following data:

Col1	Col2
A	X
A	Y
A	Z
B	X
B	Y
B	Z
C	Z

I need to receive the following result:

Col1	Col2
A	X
B	Y
C	Z

In other words: For each value in the left column, I need to assign the minimum UNUSED value from the right column (no duplicates). This was easy to do iteratively, i.e., with cursors. However, I would like something that's a thousand times faster.

What I've tried

Unsurprisingly, select L,min(R) gives the wrong result. I've tried partitioning over several window functions, but I can't get the right combination. I always get the following incorrect result:

Col1	Col2
A	X
B	X
C	Z

I've loaded some of the data into https://dbfiddle.uk/6HbpdlYd.

Here are 142 rows, created from 139 distinct L values, and 139 distinct R values.

Since the input data is produced by a join, there is always exactly one correct solution.

What if you don't have row(B,Y), bat have row (C,Y)? Will be result (A,X),(B,Z),(C,Y)? — ValNik
– ValNik, Commented Aug 22 at 16:00
And what if there is only one row for C, (C,X)? Should it "look backward" and change the min Col2 you already picked for A? — Aaron Bertrand
– Aaron Bertrand, Commented Aug 22 at 17:25
So, (A,X),(B,Y),(C,Z) is a possible solution, but (A,Y),(B,X),(C,Z) would be as good, and you don't care which of the two solutons you get, correct? In your case there are three distinct Col1 values and three distinct Col2 values, and we happen to be able to find solutions where we get three pairs. But what if this is not possible? Because there are more distinct values in Col1 or more distinct values in Col2 or there is just no solution that gets exactly three pairs from the 3 + 3 values? Please tell us the exact and comprehensive rule set needed. — Thorsten Kettner
– Thorsten Kettner, Commented Aug 22 at 17:51
I should specify that there is always exactly one correct solution. This solution can be achieved every time with a rudimentary iterative approach [ORDER BY L,R; iterate row-by-row: (if R not on blacklist then accept row and add R to blacklist, else discard row) ]. However, I am desperate to get this working via set-based operations. — Hammy
– Hammy, Commented Aug 22 at 18:51
This doesn't seem possible then. You say there is only one possible solution, because you iterate through the rows in a particular order and reach (A,X),(B,Y),(C,Z) in your example. Although (A,Y),(B,X),(C,Z) also couples all values without duplicates, it is not a posible solution, because it represents another order. Well, set-based means that you are working with (unordered) sets, so this cannot work by definition. (You still can solve this with SQL, namely with a recursive query, because a recursive query is how SQL iterates through lists, but that would not be set-based anymore.) — Thorsten Kettner
– Thorsten Kettner, Commented Aug 22 at 20:37

Jonas Metzler · Accepted Answer · 2025-08-22 15:21:36Z

3

I would use this query:

SELECT l.Col1, r.Col2
FROM (
    SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS rn
    FROM MyTable
    GROUP BY Col1
) l
JOIN (
    SELECT Col2, ROW_NUMBER() OVER (ORDER BY Col2) AS rn
    FROM MyTable
    GROUP BY Col2
) r
ON l.rn = r.rn
ORDER BY l.Col1;

Explanation:

Each subquery builds a distinct, ordered list of values (Col1 or Col2) and numbers them with ROW_NUMBER().
Joining on the row number pairs the first Col1 with the first Col2, the second with the second, etc.
I would do it this way because:
- ROW_NUMBER() OVER (PARTITION BY Col1 …) only picks one Col2 per Col1, not a global "zip".
- Using DISTINCT and ROW_NUMBER() in the same query doesn’t deduplicate correctly, since ROW_NUMBER() is assigned before DISTINCT.
If one list is longer, only as many rows as the shorter list appear; use LEFT JOIN if you want to keep all Col1s.

Tested with your sample data on this db<>fiddle

A small example:

Input:

Col1 | Col2
-----+-----
A    | X
A    | Y
B    | Z
C    | Y

Distinct lists after numbering:

Col1 list        Col2 list
---------        ---------
A | 1            X | 1
B | 2            Y | 2
C | 3            Z | 3

Final result:

Col1 | Col2
-----+-----
A    | X
B    | Y
C    | Z

answered Aug 22 at 15:21

Jonas Metzler

6,8363 gold badges9 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Bart McEndree Aug 22 at 16:16

What if there is no A,X in MyTable? dbfiddle.uk/zFNat5MD

Jonas Metzler Aug 22 at 16:45

You need to ask this to OP. As I read the question, this doesn't matter, the result would still be correct. But I might be wrong.

Hammy Aug 22 at 18:48

Thank you. This does indeed work on the data I provided. However, it looks like the data I provided was not representative. I've updated the OP with an actual sample.

Dale K Aug 22 at 19:46

Does it work with your additional data? If not are you able to adapt it to do so?

Hammy Aug 22 at 22:45

I can conceive of it working. Zipping feels like it could be the solution. But no luck thus far.

Jonas Metzler Aug 22 at 20:13

I don't get what you mean. You have asked a clear question and this query produces the outcome you requested. Now you suddenly talk about more than hundred further rows and about a wrong result, without showing the expected result. I think you would better undo the changes on your question and then create a new question. There you should clearly explain what you need, with the minmum required sample input data and expected result to understand the logic. I can't imagine more than 100 rows are required for that.

Hammy Aug 22 at 22:21

Yes, I asked the question wrong in the first place. Namely, the ABC XYZ was where I distorted things.

Jonas Metzler Aug 22 at 20:22

Maybe you want something like this?: db<>fiddle, but as said far too vague now to tell and better be analyzed in another question.

Hammy Aug 22 at 22:23

This works, yes.

Collectives™ on Stack Overflow

Assign unique values in a set-based approach

1 Answer 1

9 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Related