7

Say i have duplicate rows in my table and well my database design is of 3rd class :-

Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Cinthol','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Lux','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (1,'Crowning Glory','cosmetic soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (2,'Cinthol','nice soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (3,'Lux','nice soap','soap');
Insert Into tblProduct (ProductId,ProductName,Description,Category) Values (3,'Lux','nice soap','soap');

I want only 1 instance of each row should be present in my table. Thus 2nd, 3rd and last row whcih are completely identical should be deleted. What query can i write for this? Can it be done without creating temp tables? Just in one single query?

Thanks in advance :)

2
  • The first record based on insertion order? For what version of SQL Server? Commented Jul 27, 2010 at 15:41
  • I'm going to guess that you mean 3rd normal form when you say 3rd class. If it's allowing complete duplicates in a table, it's not in 3NF by definition ;) Commented Jul 27, 2010 at 15:47

4 Answers 4

18

Try this - it will delete all duplicates from your table:

;WITH duplicates AS
(
    SELECT 
       ProductID, ProductName, Description, Category,
       ROW_NUMBER() OVER (PARTITION BY ProductID, ProductName
                          ORDER BY ProductID) 'RowNum'
    FROM dbo.tblProduct
)
DELETE FROM duplicates
WHERE RowNum > 1
GO

SELECT * FROM dbo.tblProduct
GO

Your duplicates should be gone now: output is:

ProductID   ProductName   DESCRIPTION        Category
   1          Cinthol         cosmetic soap      soap
   1          Lux             cosmetic soap      soap
   1          Crowning Glory  cosmetic soap      soap
   2          Cinthol         nice soap          soap
   3          Lux             nice soap          soap
Sign up to request clarification or add additional context in comments.

4 Comments

Nice Marc_s, is this a CTE query? If so, is it not necessary in CTE query to have a union clause?
@Nitesh Panchal: yes, CTE's are one of the underused features of SQL Server - as is the OVER() clause :-)
+1: I wasn't sure that you could issue a delete against a CTE like that, and before I could test it you had your answer posted :)
@Tom H. Even i wasn't sure that Delete could be issued in CTE. I was under the impression that CTE's are only used for recursive queries.
4
DELETE tblProduct 
FROM tblProduct 
LEFT OUTER JOIN (
   SELECT MIN(ProductId) as ProductId, ProductName, Description, Category
   FROM tblProduct 
   GROUP BY ProductName, Description, Category
) as KeepRows ON
   tblProduct.ProductId= KeepRows.ProductId
WHERE
   KeepRows.ProductId IS NULL

Stolen from How can I remove duplicate rows?

UPDATE:

This will only work if ProductId is a Primary Key (which it is not). You are better off using @marc_s' method, but I'll leave this up in case someone using a PK comes across this post.

4 Comments

@Abe: rowid was the primary key for the table; I thought this was Oracle syntax for a moment till I saw the link.
I was assuming that ProductId was a primary key in his table. I've updated it with his column names to help avoid any confusion.
@Abe Miessler, i thought it would work but it did sound confusing to me. So i tested in Managemenet Studio and indeed it is not working. It says 0 rows deleted. Can you fix the query?
@Nitesh, I assumed (mistakenly) that ProductId would be a unique identifier. Since it is not you are better off using @marc_s' method. Sorry for the confusion!
1

I had to do this a few weeks back... what version of SQL Server are you using? In SQL Server 2005 and up, you can use Row_Number as part of your select, and only select where Row_Number is 1. I forget the exact syntax, but it's well documented... something along the lines of:

Select t0.ProductID, 
       t0.ProductName, 
       t0.Description, 
       t0.Category
Into   tblCleanData
From   (
    Select ProductID, 
           ProductName, 
           Description, 
           Category, 
           Row_Number() Over (
               Partition By ProductID, 
                            ProductName, 
                            Description, 
                            Category
               Order By     ProductID,
                            ProductName,
                            Description,
                            Category
           ) As RowNumber
    From   MyTable
) As t0
Where t0.RowNumber = 1

Check out http://msdn.microsoft.com/en-us/library/ms186734.aspx, that should get you going in the right direction.

Comments

0

First use a SELECT... INTO:

SELECT DISTINCT ProductID, ProductName, Description, Category
    INTO tblProductClean
    FROM tblProduct

The drop the first table.

1 Comment

From the OP: "Can it be done without creating temp tables? Just in one single query?"

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.