15

I have a table "votes" with the following columns: voter, election_year, election_type, party I need to remove all duplicate rows of the combination of voter and election_year, and I'm having trouble figuring out how to do this.

I ran the following:

WITH CTE AS(
SELECT voter, 
       election_year,
       ROW_NUMBER()OVER(PARTITION BY voter, election_year ORDER BY voter) as RN

FROM votes
)
DELETE
FROM CTE where RN>1

based on another StackOverflow answer, but it seems this is specific to SQL Server. I've seen ways to do this using unique ID's, but this particular table doesn't have that luxury. How can I adopt the above script to remove the duplicates I need? Thanks!

EDIT: Per request, creation of the table with some example data:

CREATE TABLE public.votes
(
    voter varchar(10),
    election_year smallint,
    election_type varchar(2),
    party varchar(3)
);

INSERT INTO votes
    (voter, election_year, election_type, party)
VALUES
    ('2435871347', 2018, 'PO', 'EV'),
    ('2435871347', 2018, 'RU', 'EV'),
    ('2435871347', 2018, 'GE', 'EV'),
    ('2435871347', 2016, 'PO', 'EV'),
    ('2435871347', 2016, 'GE', 'EV'),
    ('10215121/8', 2016, 'GE', 'ED')
;
2
  • "based on another StackOverflow answer, but it seems this is specific to SQL Server." This query looks perfect PostgreSQL syntax to me. PostgreSQL also support WITH .. AS (common table expressions) and ROW_NUMBER() OVER (....) just fine.. "How can I adopt the above script to remove the duplicates I need? Thanks!" Very hard to say without table structure and example data. Check stackoverflow.com/help/how-to-ask section "Help others reproduce the problem" Commented Aug 19, 2018 at 1:42
  • I apologize, the error given is "[42P01] ERROR: relation "cte" does not exist Position: 157" Commented Aug 19, 2018 at 1:52

3 Answers 3

26

Here's an option

DELETE FROM votes T1
    USING   votes T2
WHERE   T1.ctid < T2.ctid 
    AND T1.voter = T2.voter 
    AND T1.election_year  = T2.election_year;

See http://sqlfiddle.com/#!15/4d45d/5

Sign up to request clarification or add additional context in comments.

Comments

19

Delete from or updating CTEs doesn't work in Postgres, see the accepted answer of "PostgreSQL with-delete “relation does not exists”".

Since you have no primary key you may (ab)use the ctid pseudo column to identify the rows to delete.

WITH
cte
AS
(
SELECT ctid,
       row_number() OVER (PARTITION BY voter,
                                       election_year
                          ORDER BY voter) rn
       FROM votes
)
DELETE FROM votes
       USING cte
       WHERE cte.rn > 1
             AND cte.ctid = votes.ctid;

db<>fiddle

And probably think about introducing a primary key.

4 Comments

I tried this, but I'm getting [42703] ERROR: column cte.ctid does not exist
@JGrindal: Sure you copied the statement completely? Or have you just edited yours? If, have you also added the ctid to the SELECT in the CTE?
Yup, forgot ctid in my CTE. Thanks!
plus for dbfiddle :D
2

The ctid field is a field that exists in every PostgreSQL table and is unique for each record in a table and denotes the location of the tuple. You did almost right just need ctid as you have no unique id for each row

;WITH CTE AS(
SELECT ctid,voter, 
       election_year,
       ROW_NUMBER()OVER(PARTITION BY voter, election_year ORDER BY voter) as RN

FROM votes
)
delete  FROM votes v where v.ctid in (select CTE.ctid from  CTE where CTE.RN>1)

http://sqlfiddle.com/#!17/4d45d/14

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.