How to remove duplicate rows in SQL Server?

Question

Environment:

OS: Windows Server 2012 DataCenter
DBMS: SQL Server 2012
Hardware (VPS): Xeon E5530 4 cores + 4GB RAM

Question:

I have a large table with 140 million rows. Some rows are supposed to be duplicate so I want to remove such rows. For example:

id   name   value   timestamp
---------------------------------------
001  dummy1 10      2015-7-27 10:00:00
002  dummy1 10      2015-7-27 10:00:00    <-- duplicate
003  dummy1 20      2015-7-27 10:00:00

The second row is deemed duplicate because it has identical name, value and timestamp regardless of different id with the first row.

Note: the first two rows are duplicate NOT because of all identical columns, but due to self-defined rules.

I tried to remove such duplication by using window function:

select 
    id, name, value, timestamp
from
   (select 
        id, name, value, timestamp,
        DATEDIFF(SECOND, lag(timestamp, 1) over (partition by name order by timestamp),
        timestamp) [TimeDiff]
    from table) tab

But after an hour of execution, the lock is used up and error was raised:

Msg 1204, Level 19, State 4, Line 2
The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.

How could I remove such duplicate rows in an efficient way?

You want to delete duplicate or select all non-duplicate? — ForguesR
– ForguesR, Commented Jul 27, 2015 at 16:55
@ForguesR Sorry about the ambiguity. I want to select all non-duplicate for other queries and leave the original table intact. — Zelong
– Zelong, Commented Jul 27, 2015 at 17:08
Do you have a covering index with leading columns name, timestamp? What isolation level is the query running at? — Martin Smith
– Martin Smith, Commented Jul 27, 2015 at 17:25
This is going to be slow unless you have an index on the columns you are using according to your self-defined rules. Also if this is a run-once query then it might be better to put all the non-dup inside a new temporary table instead of working which such a huge dataset result. — ForguesR
– ForguesR, Commented Jul 27, 2015 at 17:56

Sean Lange · Accepted Answer · 2015-07-27 16:29:53Z

4

What about using a cte? Something like this.

with DeDupe as
(
    select id
        , [name]
        , [value]
        , [timestamp]
        , ROW_NUMBER() over (partition by [name], [value], [timestamp] order by id) as RowNum
    from SomeTable
)

Delete DeDupe
where RowNum > 1;

answered Jul 27, 2015 at 16:29

Sean Lange

33.7k3 gold badges28 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

DhruvJoshi · Accepted Answer · 2015-07-27 16:47:13Z

1

If only thing is selection of non-duplicate rows from table, consider using this script

SELECT MIN(id), name, value, timestamp FROM table GROUP BY name, value, timestamp

If you need to delete duplicate rows:

DELETE FROM table  WHERE id NOT IN ( SELECT MIN(id) FROM table GROUP BY name, value, timestamp)

or

DELETE t FROM table t INNER JOIN 
table t2  ON
t.name=t2.name AND 
t.value=t2.value AND 
t.timestamp=t2.timestamp AND 
t2.id<t.id

answered Jul 27, 2015 at 16:47

DhruvJoshi

17.2k6 gold badges46 silver badges63 bronze badges

Comments

APH · Accepted Answer · 2015-07-27 16:38:47Z

1

Try something like this - determine the lowest ID for each set of values, then delete rows that have an ID other than the lowest one.

Select Name, Value, TimeStamp, min(ID) as LowestID
into #temp1
From MyTable
group by Name, Value, TimeStamp

Delete MyTable 
from MyTable a
inner join #temp1 b
on a.Name = b.Name 
  and a.Value = b.Value 
  and a.Timestamp = b.timestamp 
  and a.ID <> b.LowestID

answered Jul 27, 2015 at 16:38

APH

4,1541 gold badge28 silver badges37 bronze badges

Collectives™ on Stack Overflow

How to remove duplicate rows in SQL Server?

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related