0

I am trying to delete every other record which are duplicate my select query returns every other record duplicate (tblPoints.ptUser_ID) is the unique id

SELECT *, u.usMembershipID
  FROM [ABCRewards].[dbo].[tblPoints]
  inner join tblUsers u on u.User_ID = tblPoints.ptUser_ID
  where ptUser_ID in (select user_id from tblusers where Client_ID = 8)
  and ptCreateDate >= '3/9/2016'
  and ptDesc = 'December Anniversary'
5
  • What if there are three or four? Do you want to delete all but one? Or just one or the duplicated group... or something else? Commented Mar 10, 2016 at 21:18
  • @JosephStyons There is every other record duplicate so for example ptUser_ID=50 is duplicate of 51 and so on. Commented Mar 10, 2016 at 21:20
  • We need to clarify due to language boundaries. "Every other" can mean even or odd (ex. 1, 3, 5). Do you mean all duplicates, leaving only one original record? You could say, "all other", which would be more correct than "every other". Commented Mar 10, 2016 at 21:30
  • @Jeff.Clark yes all duplicates leaving the only one original record Commented Mar 10, 2016 at 21:31
  • what is the primary key of your table [ABCRewards].[dbo].[tblPoints]? Commented Mar 10, 2016 at 21:59

2 Answers 2

1

Usually duplicates getting returned by an INNER JOIN suggests an issue with the query but if you are certain that your join is correct then this would do it:

;WITH CTE
     AS (SELECT *
              , ROW_NUMBER() OVER(PARTITION BY t.ptUser_ID ORDER BY t.ptUser_ID) AS rn
         FROM [ABCRewards].[dbo].[tblPoints] AS t)

/*Uncomment below to Review duplicates*/
     --SELECT *
     --FROM CTE
     --WHERE rn > 1;

/*Uncomment below to Delete duplicates*/
    --DELETE 
    --FROM CTE
    --WHERE rn > 1;
Sign up to request clarification or add additional context in comments.

7 Comments

where should i put the where condition?
also u.usMembershipID is missing from the query
@user580950 if you believe your table [ABCRewards].[dbo].[tblPoints] has duplicates then you want to dedupe the whole table don't you? I recommend running the query with the SELECT uncommented to see if you indeed have duplicates. My suspicion is that your join might be the culprit
joins are perfect as the select query returns the data
ok so if uncommenting the SELECT rertuns duplicate data and tblPoints.ptUser_ID is indeed the unique key, you can simply comment out the SELECT and uncomment the DELETE and run it. This will dedupe your whole table. Just as a percussion you may want to back up your table :)
|
1

When cleaning up data duplication, I have always used the same query pattern to delete all the duplicate and keep the wanted one(original, most recent, whatever). The below query pattern delete all duplicates and keep the one you wish to keep.

Just replace all [] with your table and fields.

  • [Field(s)ToDetectDuplications] : Put here the field(s) that allow you to say that they are dupplicate when they have the same values.

  • [Field(s)ToChooseWhichDupplicationIsKept ] : Put here a fields to choose which dupplicate will be kept. For exemple, the one with the biggest value or the less old one.

.

DELETE [YourTableName]
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
            FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

I recommend to have a look to what will be deleted before. To do so, just replace the "delete" statement with a "select" instead just like below.

SELECT  T.I,
        [YourTableName].*
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
            FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

Explanation :
Here we use "row_number()", "Partition by" and "Order by" to detect duplicates. "Partition" group together all rows. Set your partitions fields in order to have one row per partition when the data is right. That way bad data come out with partition that have more than one row. Row_number assign them a number. When a number is greater then 1, then this mean there is a duplicate with this partition. The "order by" is use to tell "row_number" in what order to assign them a number. Number 1 is kept, all others are deleted.

Exemple with OP's schema and specification
Here I attempted to fill the patern with guess I have made on your database schema.

DECLARE @userID INT
SELECT @userID = 8

SELECT  T.I,
        [ABCRewards].[dbo].[tblPoints].*
FROM [ABCRewards].[dbo].[tblPoints]
INNER JOIN (SELECT [YourTablePrimaryKey],
                   I = ROW_NUMBER() OVER(PARTITION BY T.ptDesc, T.ptUser_ID  ORDER BY ptCreateDate DESC)
            FROM [ABCRewards].[dbo].[tblPoints]
            WHERE T.ptCreateDate >= '3/9/2016'
            AND T.ptDesc = 'December Anniversary'
            AND T.ptUser_ID = @userID
            ) AS T ON [ABCRewards].[dbo].[tblPoints].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
                                                 AND T.I > 1

2 Comments

I tried to rewrite the query but stuck here SELECT *, u.usMembershipID FROM [ABCRewards].[dbo].[tblPoints] INNER JOIN (SELECT [YourTablePrimaryKey], ) How do i add this sub query select user_id from tblusers where Client_ID = 8
@user580950, see my exemple at the end of my answer that I have just added. Take in consideration, that checking for a specific user can be irrelevant, because if you do cleaning for one user, why not check if others user have the same issues?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.