1

Hi I have a login table that has some duplicated username. Yes I know I should have put a constraint on it, but it's a bit too late for that now!

So essentially what I want to do is to first identify the duplicates. I can't just delete them since I can't be too sure which account is the correct one. The accounts have the same username and both of them have roughly the same information with a few small variances.

Is there any way to efficiently script it so that I can add "_duplicate" to only one of the accounts per duplicate?

1
  • have you identified the duplicates? Do you have any query? Commented Dec 18, 2017 at 0:18

3 Answers 3

1

You can use ROW_NUMBER with a PARTITION BY in the OVER() clause to find the duplicates and an updateable CTE to change the values accordingly:

DECLARE @dummyTable TABLE(ID INT IDENTITY, UserName VARCHAR(100));
INSERT INTO @dummyTable VALUES('Peter'),('Tom'),('Jane'),('Victoria')
                             ,('Peter')        ,('Jane')
                             ,('Peter');
WITH UpdateableCTE AS
(
    SELECT t.UserName AS OldValue
          ,t.UserName + CASE WHEN ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY ID)=1 THEN '' ELSE '_duplicate' END AS NewValue
    FROM @dummyTable AS t
)
UPDATE UpdateableCTE SET OldValue = NewValue;

SELECT * FROM @dummyTable;

The result

ID  UserName
1   Peter
2   Tom
3   Jane
4   Victoria
5   Peter_duplicate
6   Jane_duplicate
7   Peter_duplicate

You might include ROW_NUMBER() as another column to find the duplicates ordinal. If you've got a sort clause to get the earliest (or must current) numbered with 1 it should be easy to find and correct the duplicates.

Once you've cleaned this mess, you should ensure not to get new dups. But you know this already :-D

Sign up to request clarification or add additional context in comments.

Comments

0

There is no easy way to get rid of this nightmare. Some manual actions required.
First identify duplicates.

select * from dbo.users
where userId in 
(select userId from dbo.users
   group by username
   having count(userId) > 1)

Next identify "useless" users (for example those who registered but never place any order).
Rerun the query above. Out of this list find duplicates which are the same (by email for example) and combine them in a single record. If they did something useful previously (for example placed orders) then first assign these orders to a user which survive. Remove others.
Continue with other criteria until you you get rid of duplicates.
Then set unique constrain on username field. Also it is good idea to set unique constraint on email field.
Again, it is not easy and not automatic.

Comments

0

In this case where you duplicates and the original names have some variance it is highly impossible to select non duplicate rows since you are not aware which is real and which is duplicate.

I think the best thing to is to correct you data and then fix from where you are getting this slight variant duplicates.

1 Comment

If you read the question the OP's need is exactly what you've described (identify the duplicates and re-work the manually). But the question is: How can this be done?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.