SQL Server 2008: delete duplicate rows

Question

I have duplicate rows in my table, how can I delete them based on a single column's value?

Eg

uniqueid, col2, col3 ...
1, john, simpson
2, sally, roberts
1, johnny, simpson

delete any duplicate uniqueIds
to get 

1, John, Simpson
2, Sally, Roberts

Which would you keep? johnny or john?

Hart CO
– Hart CO

2013-08-15 15:37:19 +00:00
Commented Aug 15, 2013 at 15:37 — Hart CO
– Hart CO, Commented Aug 15, 2013 at 15:37
I dont mind which i keep.

Fearghal
– Fearghal

2013-08-15 15:43:15 +00:00
Commented Aug 15, 2013 at 15:43 — Fearghal
– Fearghal, Commented Aug 15, 2013 at 15:43

Hart CO · Accepted Answer · 2013-08-15 16:42:15Z

You can DELETE from a cte:

WITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY uniqueid ORDER BY col2)'RowRank'
             FROM Table)
DELETE FROM cte 
WHERE RowRank > 1

The ROW_NUMBER() function assigns a number to each row. PARTITION BY is used to start the numbering over for each item in that group, in this case each value of uniqueid will start numbering at 1 and go up from there. ORDER BY determines which order the numbers go in. Since each uniqueid gets numbered starting at 1, any record with a ROW_NUMBER() greater than 1 has a duplicate uniqueid

To get an understanding of how the ROW_NUMBER() function works, just try it out:

SELECT *,ROW_NUMBER() OVER(PARTITION BY uniqueid ORDER BY col2)'RowRank'
FROM Table
ORDER BY uniqueid

You can adjust the logic of the ROW_NUMBER() function to adjust which record you'll keep or remove.

For instance, perhaps you'd like to do this in multiple steps, first deleting records with the same last name but different first names, you could add last name to the PARTITION BY:

WITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY uniqueid, col3 ORDER BY col2)'RowRank'
             FROM Table)
DELETE FROM cte 
WHERE RowRank > 1

Can you explain what 'SELECT *,ROW_NUMBER() OVER(PARTITION BY ID, ORDER BY col2)'RowRank'FROM Table' does?
Sure thing, updated the answer to include a description of ROW_NUMBER()
By Id do you mean UniqueId - PARTITION BY ID. Also why Orderby Col2 - i dont care if col2 is duplicating, i want to remove duplicates of UniqueId not caring which is left behind
Yeah, ID = UniqueId in your case. The ORDER BY could just as well be ORDER BY (SELECT 1) to make it arbitrary. Again, the PARTITION BY defines the field that will be numbered from 1-n, the ORDER BY is required in the ROW_NUMBER() function, so in effect it determines which duplicate gets deleted and which doesn't.

CowboyBebop · Accepted Answer · 2014-11-26 06:07:13Z

You probably have a row id that is assigned by the DB upon insertion and is actually unique. I'll call this rowId in my example.

rowId |uniqueid |col2  |col3
----- |-------- |----  |----
1      10        john   simpson
2      20        sally  roberts
3      10        johnny simpson

You can remove duplicates by grouping on the thing that is supposed to be unique (whether it be one column or many), then you grab a rowId from each group, and delete everything else besides those rowIds. In the inner query, everything in the table will have a rowId except for the duplicate rows.

select * 
--DELETE 
FROM MyTable 
WHERE rowId NOT IN 
(SELECT MIN(rowId) 
 FROM MyTable 
 GROUP BY uniqueid);

You could also use MAX instead of MIN with similar results.

user123 · Accepted Answer · 2013-08-15 15:48:01Z

2

DECLARE @du TABLE (
    id INT,  
    Name VARCHAR(4)
)

INSERT INTO @du VALUES(1,'john')
INSERT INTO @du VALUES(2,'jane')
INSERT INTO @du VALUES(1,'john')

;WITH dup (id,dp)
AS
(SELECT id
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY Name) AS dp
FROM @du)
DELETE FROM dup
WHERE dp > 1

SELECT *
FROM @du

answered Aug 15, 2013 at 15:48

user123

5452 gold badges7 silver badges13 bronze badges

Comments

DhruvJoshi · Accepted Answer · 2016-07-08 11:17:50Z

2

Here is simple magic to remove duplicates

select * into NewTable from ExistingTable
union
select * from ExistingTable;

edited Jul 8, 2016 at 11:17

DhruvJoshi

17.2k6 gold badges46 silver badges63 bronze badges

answered Aug 17, 2014 at 14:07

Ata Ul Wadood Bhatti

211 bronze badge

Comments

waka · Accepted Answer · 2013-08-15 15:40:22Z

1

DELETE FROM table WHERE uniqueid='1' AND col2='john' Or you change col2='john' to col2='johnny'. Depends on which record you want to delete.

How did you end up with two same "unique" IDs in the first place?

answered Aug 15, 2013 at 15:40

waka

3,4359 gold badges38 silver badges54 bronze badges

2 Comments

Vindicare Over a year ago

This only answers the very specific example that the OP gave, as it was only an example they are presumably hoping for a more generic solution

Fearghal Over a year ago

This is hard coded for the example no? Im still trying to figure out how i ended up with the dupes tbh.

Ganesh Kumar · Accepted Answer · 2015-12-24 18:25:32Z

You have many ways for deleting the duplicate records some of them are below...........

Different ways to delete Duplicate records

Using Row_Number() function and CTE

  with CTE(DuplicateCount) as  ( SELECT  ROW_NUMBER() OVER
(PARTITION by UniqueId order by UniqueId ) as DuplicateCount from
Table1 ) Delete from CTE where DuplicateCount > 1

  .Without using CTE*

Delete DuplicateCount from ( Select Row_Number() over(Partition by
UniqueId order by UniqueId) as Dup from Table1 ) DuplicateCount 
where DuplicateCount.Dup > 1

 .Without using row_Number() and CTE

Delete from Subject where RowId not in(select Min(RowId ) from
Subject group by UniqueId)

Collectives™ on Stack Overflow

SQL Server 2008: delete duplicate rows

6 Answers 6

7 Comments

Comments

Comments

Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

7 Comments

Comments

Comments

Comments

2 Comments

Comments

Linked

Related