I have 2 columns A & B in a SQL Server table that contain a string aggregated list of codes.
The codes in the lists are separated by a semicolon.
The string aggregated lists can take any length and the codes can be in any order.
What I would like to do is to create a new column C which contains the codes that appear somewhere in the same row of the lists of Column A and Column B.
Example:
| Column_A | Column_B | Column_C |
|---|---|---|
| a;b;c;d;e | c;a;e;i;k | c;a;e |
| d;e;f;g | e;h;i;j;d | e;d |
The example above returns "c", "a" and "e" for the first row of column C, because these codes are present in the same row of both Column A and Column B.
The same for the second row, here "e" and "d" are overlapping in Column A and B and thus returned in Column C.
I have tried something that works, but it does not seem like the best solution in terms of efficiency and performance. Especially because I have many (1m+) rows to check this for and the length of code lists to compare can be very long.
SELECT
STRING_AGG(CAST([value] AS NVARCHAR(MAX)),'; ') AS Overlapping_Code
FROM
(SELECT a.value
FROM MyTable t
CROSS APPLY STRING_SPLIT(t.Column_A, ';') a
INTERSECT
SELECT b.value
FROM MyTable t
CROSS APPLY STRING_SPLIT(t.Column_B, ';') b
) ab
I am looking for a better solution in terms of performance and elegance to compare the string aggregated lists for two columns across (many) rows.
A, another with all the values forB(both with respective IDs) and then a 3rd with the IDs that are linked.