485

How would I delete all duplicate data from a MySQL Table?

For example, with the following data:

SELECT * FROM names;

+----+--------+
| id | name   |
+----+--------+
| 1  | google |
| 2  | yahoo  |
| 3  | msn    |
| 4  | google |
| 5  | google |
| 6  | yahoo  |
+----+--------+

I would use SELECT DISTINCT name FROM names; if it were a SELECT query.

How would I do this with DELETE to only remove duplicates and keep just one record of each?

3
  • 43
    Duplicate of stackoverflow.com/questions/3311903/… and stackoverflow.com/questions/2867530/… (Ironically.) Commented Jan 13, 2011 at 21:03
  • 30
    It's not an exact duplicate question, as this asks specifically for a DELETE command to perform the same action that an ALTER command adding a unique index would be needed to have MySQL automatically remove duplicate rows. In this case, we're choosing how exactly we want to delete the duplicates. Commented Jan 8, 2013 at 22:37
  • 1
    So a question about duplicates has duplicates? Hmm Commented Sep 21, 2017 at 3:24

2 Answers 2

1043

Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.

NB - You need to do this first on a test copy of your table!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

  1. If you want to keep the row with the lowest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
    
  2. If you want to keep the row with the highest id value:

    DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
    

I used this method in MySQL 5.1

Not sure about other versions.


Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
    SELECT DISTINCT cellId,attributeId,entityRowId,value
    FROM tableName;
Sign up to request clarification or add additional context in comments.

30 Comments

Excellent solution. It worked perfectly. But I have one suggestion here we should swap the conditions. Instead of [WHERE n1.id > n2.id AND n1.name = n2.name] we should write [WHERE n1.name = n2.name AND n1.id > n2.id] it will improve performance if we have so much data.
FYI: This ignores rows where column "name" is null.
The NB in this answer is VERY IMPORTANT kids. But this is an excellent of MySQL. NOTE that for tables that could have duplicates repeated more than once you will also want a GROUP BY n1.id clause.
I love this solution, but do you have a suggestion for optimizing it on larger tables?
This took 171 seconds on a 10,000 record table with 450 duplicates. The answer by OMG Ponies took 4 seconds.
|
254

If you want to keep the row with the lowest id value:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MIN(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

If you want the id value that is the highest:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MAX(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.

20 Comments

What does the 'x' do ?
@GDmac it serves as an alias for inner query. If not specified, an error will be thrown.
But what's the x for? (just kidding)
it seems this sql delete unique rows too. actualy all rows
x could be written as AS x or AS temp or AS `temp` to make it more clear it's a random alias name for the table generated by the outer select. MySQL won't let you use the same table for both an action (DELETE, UPDATE) and a condition, so wrapping the condition in another SELECT allows a temp table to be created and given an alias.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.