Finding multiple duplicates in Postgres

Question

We've had an accident where multiple rows with duplicate values have been inserted into a table, and I need to find which rows in a rather specific format. So far, I have this query:

    SELECT p2.id
    FROM assignmentobject p1, assignmentobject p2
    WHERE ST_Equals(p1.the_geom, p2.the_geom) AND
    p1.id <> p2.id and p1.assignmentid = 15548
    group by p1.id, p2.id

which compares the geometries of the rows and spits it out if it's the same. The IDs are primary keys, and are sequentially created.

However, this presents a problem, as this small segment of the result shows:

p1.id   p2.id
35311   35314
35311   35315
35314   35311
35314   35315
35315   35311
35315   35314

As can be seen here, 35311, 35314, and 35315 have the same geometries, and because of this, all of the combinations between them are included in the result. What I'm aiming to achieve, is having the lowest or highest ID used as the "base", and ignore the other combinations that doesn't involve this "base". I.e., the result shown above would be:

p1.id    p2.id
35311    35314
35311    35315

Here, the combinations between 31314 and 35315 are left out. Is this possible to achieve using pure SQL?

If this is for an accident fix, instead of having the most perfect complex query I would use temporary tables, using indexes on theses tables, and making cleanup queries on the temp tables, until I get the final interesting listing — regilero
– regilero, Commented Sep 12, 2013 at 16:12

Clodoaldo Neto · Accepted Answer · 2013-09-12 16:25:26Z

1

Just change the <> operator to <

WHERE ST_Equals(p1.the_geom, p2.the_geom) AND
p1.id < p2.id and p1.assignmentid = 15548

If assignmentid is duplicated and you want all the duplicates at once

select p2.id
from
    assignmentobject p1
    inner join
    assignmentobject p2 using(assigmentid)
where
    st_equals(p1.the_geom, p2.the_geom) and
    p1.id < p2.id
group by p1.id, p2.id

edited Sep 12, 2013 at 16:25

answered Sep 12, 2013 at 16:18

Clodoaldo Neto

127k30 gold badges251 silver badges274 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

joop · Accepted Answer · 2013-09-12 16:44:12Z

CREATE TABLE pair (
        ll INTEGER NOT NULL
        , rr INTEGER NOT NULL
        , PRIMARY KEY (ll , rr)
        ) ;

INSERT INTO pair (ll,rr) VALUES
(35311,35314) ,(35311,35315)
,(35314,35311) ,(35314,35315)
,(35315,35311) ,(35315,35314)
        ;

SELECT p1.ll AS p1, p1.rr AS p2
FROM pair p1
WHERE p1.ll < p1.rr -- tie breaker
AND NOT EXISTS (
        SELECT * FROM pair nx
        WHERE nx.ll < nx.rr
        AND nx.rr = p1.ll
        )
        ;

The same with the original geo query packed into a CTE:

WITH pair AS (
  SELECT p1.id AS ll
       , p2.id AS rr
  FROM assignmentobject p1
  JOIN assignmentobject p2 ON ST_Equals(p1.the_geom, p2.the_geom)
                          -- not sure if you want this ...
                          AND p1.assignmentid = p2.assignmentid 
  WHERE p1.id <> p2.id and p1.assignmentid = 15548
  -- group by seems to make no sense here
  -- group by p1.id, p2.id
   )                                                      
SELECT pp.ll AS p1, pp.rr AS p2
FROM pair pp
WHERE pp.ll < pp.rr -- tie breaker
AND NOT EXISTS (
        SELECT * FROM pair nx
        WHERE nx.ll < nx.rr
        AND nx.rr = pp.ll
        )
        ;

Collectives™ on Stack Overflow

Finding multiple duplicates in Postgres

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related