T- SQL Duplicate Records

Question

I am trying to delete every other record which are duplicate my select query returns every other record duplicate (tblPoints.ptUser_ID) is the unique id

SELECT *, u.usMembershipID
  FROM [ABCRewards].[dbo].[tblPoints]
  inner join tblUsers u on u.User_ID = tblPoints.ptUser_ID
  where ptUser_ID in (select user_id from tblusers where Client_ID = 8)
  and ptCreateDate >= '3/9/2016'
  and ptDesc = 'December Anniversary'

What if there are three or four? Do you want to delete all but one? Or just one or the duplicated group... or something else? — JosephStyons
– JosephStyons, Commented Mar 10, 2016 at 21:18
@JosephStyons There is every other record duplicate so for example ptUser_ID=50 is duplicate of 51 and so on. — TSCAmerica.com
– TSCAmerica.com, Commented Mar 10, 2016 at 21:20
We need to clarify due to language boundaries. "Every other" can mean even or odd (ex. 1, 3, 5). Do you mean all duplicates, leaving only one original record? You could say, "all other", which would be more correct than "every other". — Jeff.Clark
– Jeff.Clark, Commented Mar 10, 2016 at 21:30
@Jeff.Clark yes all duplicates leaving the only one original record — TSCAmerica.com
– TSCAmerica.com, Commented Mar 10, 2016 at 21:31
what is the primary key of your table [ABCRewards].[dbo].[tblPoints]? — AXMIM
– AXMIM, Commented Mar 10, 2016 at 21:59

Fuzzy · Accepted Answer · 2016-03-10 21:31:04Z

1

Usually duplicates getting returned by an INNER JOIN suggests an issue with the query but if you are certain that your join is correct then this would do it:

;WITH CTE
     AS (SELECT *
              , ROW_NUMBER() OVER(PARTITION BY t.ptUser_ID ORDER BY t.ptUser_ID) AS rn
         FROM [ABCRewards].[dbo].[tblPoints] AS t)

/*Uncomment below to Review duplicates*/
     --SELECT *
     --FROM CTE
     --WHERE rn > 1;

/*Uncomment below to Delete duplicates*/
    --DELETE 
    --FROM CTE
    --WHERE rn > 1;

answered Mar 10, 2016 at 21:31

Fuzzy

3,8102 gold badges17 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

TSCAmerica.com Over a year ago

where should i put the where condition?

TSCAmerica.com Over a year ago

also u.usMembershipID is missing from the query

Fuzzy Over a year ago

@user580950 if you believe your table [ABCRewards].[dbo].[tblPoints] has duplicates then you want to dedupe the whole table don't you? I recommend running the query with the SELECT uncommented to see if you indeed have duplicates. My suspicion is that your join might be the culprit

TSCAmerica.com Over a year ago

joins are perfect as the select query returns the data

Fuzzy Over a year ago

ok so if uncommenting the SELECT rertuns duplicate data and tblPoints.ptUser_ID is indeed the unique key, you can simply comment out the SELECT and uncomment the DELETE and run it. This will dedupe your whole table. Just as a percussion you may want to back up your table :)

|

AXMIM · Accepted Answer · 2016-03-10 22:03:47Z

When cleaning up data duplication, I have always used the same query pattern to delete all the duplicate and keep the wanted one(original, most recent, whatever). The below query pattern delete all duplicates and keep the one you wish to keep.

Just replace all [] with your table and fields.

[Field(s)ToDetectDuplications] : Put here the field(s) that allow you to say that they are dupplicate when they have the same values.
[Field(s)ToChooseWhichDupplicationIsKept ] : Put here a fields to choose which dupplicate will be kept. For exemple, the one with the biggest value or the less old one.

.

DELETE [YourTableName]
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
            FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

I recommend to have a look to what will be deleted before. To do so, just replace the "delete" statement with a "select" instead just like below.

SELECT  T.I,
        [YourTableName].*
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
            FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

Explanation :
Here we use "row_number()", "Partition by" and "Order by" to detect duplicates. "Partition" group together all rows. Set your partitions fields in order to have one row per partition when the data is right. That way bad data come out with partition that have more than one row. Row_number assign them a number. When a number is greater then 1, then this mean there is a duplicate with this partition. The "order by" is use to tell "row_number" in what order to assign them a number. Number 1 is kept, all others are deleted.

Exemple with OP's schema and specification
Here I attempted to fill the patern with guess I have made on your database schema.

DECLARE @userID INT
SELECT @userID = 8

SELECT  T.I,
        [ABCRewards].[dbo].[tblPoints].*
FROM [ABCRewards].[dbo].[tblPoints]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY T.ptDesc, T.ptUser_ID  ORDER BY ptCreateDate DESC)
            FROM [ABCRewards].[dbo].[tblPoints]
            WHERE T.ptCreateDate >= '3/9/2016'
            AND T.ptDesc = 'December Anniversary'
            AND T.ptUser_ID = @userID
            ) AS T ON [ABCRewards].[dbo].[tblPoints].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

I tried to rewrite the query but stuck here SELECT *, u.usMembershipID FROM [ABCRewards].[dbo].[tblPoints] INNER JOIN (SELECT [YourTablePrimaryKey], ) How do i add this sub query select user_id from tblusers where Client_ID = 8
@user580950, see my exemple at the end of my answer that I have just added. Take in consideration, that checking for a specific user can be irrelevant, because if you do cleaning for one user, why not check if others user have the same issues?

Collectives™ on Stack Overflow

T- SQL Duplicate Records

2 Answers 2

7 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

2 Comments

Related