1

Using Postgres 9.5, I have a table properties:

CREATE TABLE properties (
    id serial PRIMARY KEY,
    property_id integer,
    state character(2),
    record_type character(1),
    ...
);
  • id is my unique internal id.
  • property_id is from a 3rd party. Properties from different states may share the same property_id but there is only one property_id per state. Reason being, the properties table contains all states together instead of one state per table and the property_id counter starts from 1 for each state.
  • state is the US state abbreviation (e.g. MA, CA, NY). When concatenated with property_id it references one property, e.g. 12345NY.
  • record_type can be A (add), C (change), or D (delete).

When new properties are added to the table their record_type is A. Over time a properties' details change and there are new rows added to the table with C as their record_type.

Example:

id,   property_id, state, record_type, ...
7353, 6001,        'MA',  'A',         ...
7354, 6001,        'MA',  'C',         ...
7355, 6001,        'MA',  'C',         ...

Here's the problem: I want to only keep the most recent row for the property (doesn't matter what record_type) and delete all the older ones. So in the example, just keep the last row. There's no date column but we can assume the higher the id, the newer the record. As a side note, all the rows with D record types have been previously removed so we're only dealing with add and change record types.

2
  • It's not clear if the record_type (and state) is relevant for the question. Do you just want to keep one record per property_id, the rest of the field being irrelevant? Commented Mar 11, 2016 at 22:18
  • @leonbloy The state is relevant because when combined with the property_id, references one unique property. I guess the record_type could be considered extraneous information if we're only concerned with keeping the latest record irrespective of record_type. Commented Mar 11, 2016 at 22:48

2 Answers 2

2
WITH CTE AS
  (SELECT *,ROW_NUMBER() OVER(PARTITION by property_id,state
                              ORDER BY id DESC) AS rn
   FROM properties)
DELETE
FROM properties WHERE id IN (SELECT id FROM CTE WHERE rn >1)
Sign up to request clarification or add additional context in comments.

3 Comments

This is much more elegant (and probably efficient) than my answer.
What a beautiful solution
@Mihai I'm getting ERROR: relation "cte" does not exist. Any ideas?
1

If you just want to keep one record per property_id state pair, irrespective of the other fields, this should be enough

DELETE FROM properties p1 
WHERE p1.id != 
(SELECT max(p2.id) FROM properties p2 WHERE 
 p2.property_id = p1.property_id AND p2.state = p1.state);

3 Comments

Sorry I am on smartphone and just touched the wrong place, already undone it (:
@leonbloy I'm not the downvoter, but I think there is an issue with that query where it can unintentionally delete properties from other states. The true property id is really the property_id and state combined. There may be multiple different properties that share the same property_id.
@Tyler It seems you are right. I think it's fixed now.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.