Environment:
- OS: Windows Server 2012 DataCenter
- DBMS: SQL Server 2012
- Hardware (VPS): Xeon E5530 4 cores + 4GB RAM
Question:
I have a large table with 140 million rows. Some rows are supposed to be duplicate so I want to remove such rows. For example:
id name value timestamp
---------------------------------------
001 dummy1 10 2015-7-27 10:00:00
002 dummy1 10 2015-7-27 10:00:00 <-- duplicate
003 dummy1 20 2015-7-27 10:00:00
The second row is deemed duplicate because it has identical name, value and timestamp regardless of different id with the first row.
Note: the first two rows are duplicate NOT because of all identical columns, but due to self-defined rules.
I tried to remove such duplication by using window function:
select
id, name, value, timestamp
from
(select
id, name, value, timestamp,
DATEDIFF(SECOND, lag(timestamp, 1) over (partition by name order by timestamp),
timestamp) [TimeDiff]
from table) tab
But after an hour of execution, the lock is used up and error was raised:
Msg 1204, Level 19, State 4, Line 2
The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.
How could I remove such duplicate rows in an efficient way?
deleteduplicate orselectall non-duplicate?selectall non-duplicate for other queries and leave the original table intact.name, timestamp? What isolation level is the query running at?