-1

Simplifying, I have the following data:

Col1 Col2
A X
A Y
A Z
B X
B Y
B Z
C Z

I need to receive the following result:

Col1 Col2
A X
B Y
C Z

In other words: For each value in the left column, I need to assign the minimum UNUSED value from the right column (no duplicates). This was easy to do iteratively, i.e., with cursors. However, I would like something that's a thousand times faster.

What I've tried

Unsurprisingly, select L,min(R) gives the wrong result. I've tried partitioning over several window functions, but I can't get the right combination. I always get the following incorrect result:

Col1 Col2
A X
B X
C Z

I've loaded some of the data into https://dbfiddle.uk/6HbpdlYd.

Here are 142 rows, created from 139 distinct L values, and 139 distinct R values.

Since the input data is produced by a join, there is always exactly one correct solution.

7
  • What if you don't have row(B,Y), bat have row (C,Y)? Will be result (A,X),(B,Z),(C,Y)? Commented Aug 22 at 16:00
  • 2
    And what if there is only one row for C, (C,X)? Should it "look backward" and change the min Col2 you already picked for A? Commented Aug 22 at 17:25
  • So, (A,X),(B,Y),(C,Z) is a possible solution, but (A,Y),(B,X),(C,Z) would be as good, and you don't care which of the two solutons you get, correct? In your case there are three distinct Col1 values and three distinct Col2 values, and we happen to be able to find solutions where we get three pairs. But what if this is not possible? Because there are more distinct values in Col1 or more distinct values in Col2 or there is just no solution that gets exactly three pairs from the 3 + 3 values? Please tell us the exact and comprehensive rule set needed. Commented Aug 22 at 17:51
  • 1
    I should specify that there is always exactly one correct solution. This solution can be achieved every time with a rudimentary iterative approach [ORDER BY L,R; iterate row-by-row: (if R not on blacklist then accept row and add R to blacklist, else discard row) ]. However, I am desperate to get this working via set-based operations. Commented Aug 22 at 18:51
  • 2
    This doesn't seem possible then. You say there is only one possible solution, because you iterate through the rows in a particular order and reach (A,X),(B,Y),(C,Z) in your example. Although (A,Y),(B,X),(C,Z) also couples all values without duplicates, it is not a posible solution, because it represents another order. Well, set-based means that you are working with (unordered) sets, so this cannot work by definition. (You still can solve this with SQL, namely with a recursive query, because a recursive query is how SQL iterates through lists, but that would not be set-based anymore.) Commented Aug 22 at 20:37

1 Answer 1

3

I would use this query:

SELECT l.Col1, r.Col2
FROM (
    SELECT Col1, ROW_NUMBER() OVER (ORDER BY Col1) AS rn
    FROM MyTable
    GROUP BY Col1
) l
JOIN (
    SELECT Col2, ROW_NUMBER() OVER (ORDER BY Col2) AS rn
    FROM MyTable
    GROUP BY Col2
) r
ON l.rn = r.rn
ORDER BY l.Col1;

Explanation:

  1. Each subquery builds a distinct, ordered list of values (Col1 or Col2) and numbers them with ROW_NUMBER().

  2. Joining on the row number pairs the first Col1 with the first Col2, the second with the second, etc.

  3. I would do it this way because:

    • ROW_NUMBER() OVER (PARTITION BY Col1 …) only picks one Col2 per Col1, not a global "zip".

    • Using DISTINCT and ROW_NUMBER() in the same query doesn’t deduplicate correctly, since ROW_NUMBER() is assigned before DISTINCT.

  4. If one list is longer, only as many rows as the shorter list appear; use LEFT JOIN if you want to keep all Col1s.

Tested with your sample data on this db<>fiddle

A small example:

Input:

Col1 | Col2
-----+-----
A    | X
A    | Y
B    | Z
C    | Y

Distinct lists after numbering:

Col1 list        Col2 list
---------        ---------
A | 1            X | 1
B | 2            Y | 2
C | 3            Z | 3

Final result:

Col1 | Col2
-----+-----
A    | X
B    | Y
C    | Z
Sign up to request clarification or add additional context in comments.

9 Comments

What if there is no A,X in MyTable? dbfiddle.uk/zFNat5MD
You need to ask this to OP. As I read the question, this doesn't matter, the result would still be correct. But I might be wrong.
Thank you. This does indeed work on the data I provided. However, it looks like the data I provided was not representative. I've updated the OP with an actual sample.
Does it work with your additional data? If not are you able to adapt it to do so?
I can conceive of it working. Zipping feels like it could be the solution. But no luck thus far.
I don't get what you mean. You have asked a clear question and this query produces the outcome you requested. Now you suddenly talk about more than hundred further rows and about a wrong result, without showing the expected result. I think you would better undo the changes on your question and then create a new question. There you should clearly explain what you need, with the minmum required sample input data and expected result to understand the logic. I can't imagine more than 100 rows are required for that.
Yes, I asked the question wrong in the first place. Namely, the ABC XYZ was where I distorted things.
Maybe you want something like this?: db<>fiddle, but as said far too vague now to tell and better be analyzed in another question.
This works, yes.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.