Output all duplicate rows (SQL Server)

Question

I have a table which holds what I consider duplicate rows. the values in these records may not be exactly the same, but it’s been calculated that they’re possible duplicates by fuzzy logic. For example:

RecordCD    key_in  key_out
---------------------------
1           1       2
2           2       2
3           3       3
4           4       6
5           5       5
6           6       6
7           7       7
8           8       11
9           9       9
10          10      10
11          11      11

key_in column has a unique ID of the record.

key_out column has a possible duplicate if it’s not equal to key_in

I need my output to look like this and list all of the possible duplicates:

RecordCD    key_in  key_out
---------------------------
1           1       2
2           2       2
4           4       6
6           6       6
8           8       11
11          11      11

but I’m struggling to construct a query that would do that.

Thanks.

Gordon Linoff · Accepted Answer · 2018-11-26 21:01:19Z

2

I think this is what you want:

select t.*
from t
where exists (select 1
              from t t2
              where t2.key_out = t.key_out and t2.key_in <> t.key_in
             )
order by t.key_out;

Here is a db<>fiddle.

edited Nov 26, 2018 at 21:01

answered Nov 26, 2018 at 20:38

Gordon Linoff

1.3m62 gold badges704 silver badges856 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Tony Over a year ago

Hi @Gordon, looks like this might work. Is it possible to output the rows grouping them by possible duplicates like I have them listed in my question? The way you have it, it would just randomly order the output. For example, I might have 3 possible duplicates and 1st one would be in row 10, 2nd one in row 520 and the 3rd one in row 20,145. I want them to be output in the consecutive rows (like rows 10, 11, and 12). Like I said, they're not exact duplicates so it's hard to order them by any specific field (fuzzy logic compares about 5 fields). Thanks!

xQbert Over a year ago

@tony order by t.key_out, t.key_in

Tony Over a year ago

@xQbert Works! Thank you!

EoinS · Accepted Answer · 2018-11-26 20:50:46Z

1

It seems like if there is a mismatch between key_in, key_out you want to pull all rows where key_in has either value`

I would create a temp table with all values in rows with mismatched key_in, key_out, call this value bad_match

If either of your key_in, key_out values match this value, include it in output

select mytable.* from mytable 
where key_in  in 
(select key_in bad_match from mytable where key_in <> key_out
union all
select key_out from mytable where key_in <> key_out);

This sample builds your schema and returns the desired output

edited Nov 26, 2018 at 20:50

answered Nov 26, 2018 at 20:36

EoinS

5,4921 gold badge21 silver badges33 bronze badges

2 Comments

Tony Over a year ago

Hi @EoinS, yes your query works well, but I'm facing the same exact issue as I did with Gordon's answer above. Both of your queries return the same # of records which is good, but the resultset is not ordered/grouped the way I want it (group possible duplicates together in consecutive rows). The reason it actually groups them properly in your example is because I listed them in the proper order when in fact they're randomly positioned in the table. Thank you!

Tony Over a year ago

xQbert above was kind enough to sort (pun intended) this out for me. I appreciate your help.

Collectives™ on Stack Overflow

Output all duplicate rows (SQL Server)

2 Answers 2

3 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Related