0

How do I remove duplicates from a table that is set up in the following way?

unique_ID | worker_ID | date | type_ID

A worker can have multiple type_ID's associated with them and I want to remove any duplicate types. If there is a duplicate, I want to remove the type with the most recent entry.

0

4 Answers 4

10

A textbook candidate for the window function row_number():

;WITH x AS (
    SELECT unique_ID
          ,row_number() OVER (PARTITION BY worker_ID,type_ID ORDER BY date) AS rn
    FROM   tbl
    )
DELETE FROM tbl
FROM   x
WHERE  tbl.unique_ID = x.unique_ID
AND    x.rn > 1

This also takes care of the situation where a set of dupes on (worker_ID,type_ID) shares the same date.
See the simplified demo on data.SE.

Update with simpler version

Turns out, this can be simplified: In SQL Server you can delete from the CTE directly:

;WITH x AS (
    SELECT unique_ID
          ,row_number() OVER (PARTITION BY worker_ID,type_ID ORDER BY date) AS rn
    FROM   tbl
    )
DELETE x
WHERE  rn > 1
Sign up to request clarification or add additional context in comments.

5 Comments

I must be doing something wrong because I get the same "The multi-part identifier (unique ID) could not be bound" error.
@Ryan: unique_ID != unique ID - I see a missing underscore.
I just put 'unique ID' in the comment to represent my table's specific unique ID.
@Ryan: have you had a look at the link I provided: data.stackexchange.com/stackoverflow/q/118656/demo-fro-ryan You can test and see that is works there. This is SQL Server 2008 R2. Are you running an older installation maybe?
Yes, and thank you. I came back and tried it again today and it worked. I'm not sure what I was doing yesterday.
2
delete from table t
 where exists ( select 1 from table t2 
                 where t2.worker_id = t.worker_id
                   and t2.type_id = t.type_id
                   and t2.date < t.date )

HTH

4 Comments

I get the error "The multi-part identifier ... could not be bound". Every change I try and make either results in nothing being returned or more errors. When I turned this into a SELECT statement for testing I didn't get the error, only when attempting to delete.
@Ryan: this is probably due to a typo. should be from table t t2 instead of from table t2.
If I do that I get "Incorrect Syntax" errors on 'where' and 't2'.
This ended up working well, with the exception of not being able to detect duplicates that share the same date.
2
DELETE FROM @t WHERE unique_Id IN 
(
    SELECT unique_Id FROM 
    (   
        SELECT  unique_Id
                ,Type_Id 
                ,ROW_NUMBER() OVER (PARTITION BY worker_Id, type_Id ORDER BY date) AS rn 
        FROM @t 
    ) Q 
    WHERE rn > 1
)

And to test...

DECLARE @t TABLE
(
    unique_ID  INT IDENTITY,
    worker_ID  INT,
    date  DATETIME,
    type_ID INT
)

INSERT INTO @t VALUES (1, DATEADD(DAY, 1, GETDATE()), 1)
INSERT INTO @t VALUES (1, GETDATE(), 1)
INSERT INTO @t VALUES (2, GETDATE(), 1)
INSERT INTO @t VALUES (1, DATEADD(DAY, 2, GETDATE()), 1)
INSERT INTO @t VALUES (1, DATEADD(DAY, 3, GETDATE()), 2)

SELECT * FROM @t

DELETE FROM @t WHERE unique_Id IN 
(
    SELECT unique_Id FROM 
    (   
        SELECT  unique_Id
                ,Type_Id 
                ,ROW_NUMBER() OVER (PARTITION BY worker_Id, type_Id ORDER BY date) AS rn 
        FROM @t 
    ) Q 
    WHERE rn > 1
)

SELECT * FROM @t

Comments

1

you may use this query

delete from worker where unique_id in (
select max(unique_id)  from worker group by  worker_ID , type_ID having count(type_id)>1)

here i am assuming worker as your table name

1 Comment

And the date check requested by the OP?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.