85

We have a table of photos with the following columns:

id, merchant_id, url 

this table contains duplicate values for the combination merchant_id, url. so it's possible that one row appears more several times.

234 some_merchant  http://www.some-image-url.com/abscde1213
235 some_merchant  http://www.some-image-url.com/abscde1213
236 some_merchant  http://www.some-image-url.com/abscde1213

What is the best way to delete those duplications? (I use PostgreSQL 9.2 and Rails 3.)

3
  • 2
    Is your ID column unique? I see 234 3 times but you say your merchant_id and url are the duplicate values. Commented Jan 23, 2013 at 2:27
  • 1
    Possible duplicate of stackoverflow.com/questions/1746213/… Commented Jan 23, 2013 at 2:51
  • 1
    sorry for the confusion. the id in the example above should be unique. thanks for the correct edit. the solution here stackoverflow.com/questions/1746213/… doesn't work for my case. Commented Jan 23, 2013 at 8:26

3 Answers 3

152

Here is my take on it.

select * from (
  SELECT id,
  ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row
  FROM Photos
) dups
where 
dups.Row > 1

Feel free to play with the order by to tailor the records you want to delete to your specification.

SQL Fiddle => http://sqlfiddle.com/#!15/d6941/1/0


SQL Fiddle for Postgres 9.2 is no longer supported; updating SQL Fiddle to postgres 9.3

Sign up to request clarification or add additional context in comments.

8 Comments

This works like a charm but how do you delete the duplicates found using this query ?
If we have same thing repeating 3 times, the take 2 and take 3 is taken in result. How can i resolve it?
No it will not, that's precisely why you check rows > 1. See the sql fiddle.
As a non db guy I found this explanationa really good postgresqltutorial.com/postgresql-row_number
|
10

The second part of sgeddes's answer doesn't work on Postgres (the fiddle uses MySQL). Here is an updated version of his answer using Postgres: http://sqlfiddle.com/#!12/6b1a7/1

DELETE FROM Photos AS P1  
USING Photos AS P2
WHERE P1.id > P2.id
   AND P1.merchant_id = P2.merchant_id  
   AND P1.url = P2.url;  

Comments

6

I see a couple of options for you.

For a quick way of doing it, use something like this (it assumes your ID column is not unique as you mention 234 multiple times above):

CREATE TABLE tmpPhotos AS SELECT DISTINCT * FROM Photos;
DROP TABLE Photos;
ALTER TABLE tmpPhotos RENAME TO Photos;

Here is the SQL Fiddle.

You will need to add your constraints back to the table if you have any.

If your ID column is unique, you could do something like to keep your lowest id:

DELETE FROM P1  
USING Photos P1, Photos P2
WHERE P1.id > P2.id
   AND P1.merchant_id = P2.merchant_id  
   AND P1.url = P2.url;  

And the Fiddle.

2 Comments

the id is unique in my case. I just did it wrong in my example code. but I get an error if I try to use your second solution. ERROR: relation "p1" does not exist
@StefanSchmidt I fixed it to run on Postgres instead of MySQL: sqlfiddle.com/#!12/6b1a7/1

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.