0

I'm almost sure this is impossible, but I've been amazed so often at the power of the SQL language that I just thought I'd ask, because this would save me weeks of work.

Let's say you have a table like this, with random numbers of duplicate records:

First   Last    Title   Grade   Replicate
Mike    Smith   Manager 2          1
Jenn    Jones   Sales   1          1
Bill    Bennett CEO 2   1          1
Jeff    Miller  Sales   4          1
Harry   James   Manager 6          1
Beth    Samuals Dock    1          1
Cathy   Gordon  Sales   2          1
Bill    Bennett CEO 2   1          1
Jeff    Miller  Sales   4          1
Harry   James   Manager 6          1
Beth    Samuals Dock    1          1
Cathy   Gordon  Sales   2          1
Bill    Bennett CEO 2   1          1
Jeff    Miller  Sales   4          1
Harry   James   Manager 6          1
Jeff    Miller  Sales   4          1
Harry   James   Manager 6          1

The 'Replicate' field is there to make these records unique. The replicates need to be incremented by one each time a new duplicate is found, so that the other records can be duplicated but the Replicate field will contain 1, 2, 3.... and so on.

This is a simplified example of my problem - about 40,000 records in a table with about 30 fields, and 16 fields that must be made unique by using the 'Replicate' field. Sorting them on the 16 'uniqueness' fields is easy, of course. But is there any way to tell SQL to update the Replicate field, adding one each time a duplicate is found?

As I said, I suspect that this isn't possible or that if it is, that the solution is above my skill level, but I've been wrong about that before.

Thanks.

2
  • You're using SQL Server? Which version? And do you have a field that would be used to indicate which is the first, or is it arbitrary? Commented Mar 13, 2015 at 22:42
  • SQL Server 2008. And it doesn't matter which 'duplicate set' gets which Replicate value, the Replicates just need to make the 16-field combination unique. Commented Mar 13, 2015 at 22:44

1 Answer 1

1

You can use ROW_NUMBER() for this:

;WITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY [First],[Last],[Title],[Grade] ORDER BY Replicate) AS UPD_Replicate
              FROM Table1
              )
SELECT *
FROM cte

Demo: SQL Fiddle

Or to UPDATE the field:

;WITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY [First],[Last],[Title],[Grade] ORDER BY Replicate) AS UPD_Replicate
              FROM Table1
              )
UPDATE cte
SET Replicate = UPD_Replicate

The ROW_NUMBER() function assigns a number to each row. PARTITION BY is optional, but used to start the numbering over for each value in a given group, ie: if you PARTITION BY Some_Date then for each unique date value the numbering would start over at 1. ORDER BY of course is used to define how the counting should go, and is required in the ROW_NUMBER() function.

Sign up to request clarification or add additional context in comments.

3 Comments

Hart, this is obviously the answer to my problem, but I need to dissect it and teach myself how to use CTEs (which I've been meaning to do anyway). I'll spend the weekend on this until I get and apply it to my larger problem. Thank you so much, I can't tell you how much this will help me. I'm mentally buying you a pint right now.
My pleasure. Simply, you can think of a cte as a temporary result set or subquery, they just have some added benefits, like use in recursion and ability to UPDATE directly.
@GordonLinoff Thanks for that, can you point me to an example? Always thought an update with a subquery had to utilize multiple references to the table, either JOIN or correlated subquery.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.