0

Given a table

CREATE TABLE data(
 irs_number VARCHAR (50),
 mop_up INTEGER,
 ou VARCHAR (50)
);

How would I return all matching records that...

  • have at least one identical value for irs_number in another row AND
  • at least one mop_up of those with the same irs_number must be set to 1 AND
  • the ou values are not identical, i.e. only return those not matching to a row with the identical irs_number.

... so that all irs_numbers would be returned (not only one where the conditions are true - see example below).

I tried this but the query cannot finish within reasonable time:

SELECT irs_number, mop_up, ou
FROM data outer_data
WHERE (SELECT count(*)
FROM data inner_data
WHERE inner_data.irs_number = outer_data.irs_number
AND inner_data.mop_up = 1 OR outer_data.mop_up = 1
AND inner_data.ou <> outer_data.ou
);

As well as variations of duplicate counts as described here: How to find duplicate records in PostgreSQL - they will always just return the duplicates but not the proper filter applied.


edit:

Example data:

INSERT INTO data VALUES 
('0001', 1, 'abc'),
('0001', 0, 'abc'),
('0001', 0, 'cde'),
('0001', 0, 'abc'),
('0002', 1, 'abc'),
('0002', 0, 'abc'),
('0003', 0, 'abc'),
('0003', 0, 'xyz')
;

SQLFiddle: http://sqlfiddle.com/#!17/be28f

a query should ideally return:

irs_number  mop_up  ou
-----------------------
0001        1       abc
0001        0       abc
0001        0       cde
0001        0       abc

(order not important) meaning it should return all rows matching having the irs_number with the condition above.

2 Answers 2

1

You should be able to do this with a simple exists clause:

SELECT irs_number, mop_up, ou
FROM data d
WHERE EXISTS (SELECT 1
              FROM data d2
              WHERE d2.irs_number = d.irs_number AND
                    d2.mop_up = 1 AND
                    d2.ou <> d.ou
             );

EDIT:

The above misinterpreted the question. It assumed that a mop_up = 1 needed to be on a different ou. As I read the question, this is ambiguous but doesn't appear to be what you want. So, two exists address this:

SELECT irs_number, mop_up, ou
FROM data d
WHERE EXISTS (SELECT 1
              FROM data d2
              WHERE d2.irs_number = d.irs_number AND
                    d2.mop_up = 1
             ) AND
     EXISTS (SELECT 1
              FROM data d2
              WHERE d2.irs_number = d.irs_number AND
                    d2.ou <> d.ou
             );

Here is a db<>fiddle.

Both these solutions will be able to take advantage of an index on (irs_number, mop_up, ou).

Sign up to request clarification or add additional context in comments.

6 Comments

This has nothing to do with the requirements of the question.
this would return only those with mop_up set to 1 but not all within where at least on mop_up was set to 1
@dh762 . . . Not at all. This would return all irs_numbers that have a corresponding row with mop_up = 1 -- which is what you are asking for. I have no idea why you are confusing the where clause in the correlated subquery with what the outer query returns.
correct, but this would only serve as a subquery because the final query should return all records with that subqueried irs_number - see example - your query does only return 1 row with 0001 instead of all 4 with 0001. It might not be obvious from the pre-edit question (will update)
@dh762 . . . I see. I interpreted the second and third bullet points differently from what you intended. I've adjusted the answer.
|
1

I think this join will do:

SELECT * FROM data 
WHERE irs_number in (
  SELECT irs_number
  FROM data d
  WHERE EXISTS (SELECT 1
    FROM data 
    WHERE irs_number = d.irs_number
    AND (mop_up = 1 OR d.mop_up = 1)
    AND ou <> d.ou
  )
)

See the demo

7 Comments

it does return also rows that have just a single irs_number @forpas
This joins rows with the same irs_number and different ou, meaning different rows with the same irs_number. Can you post sample data and expected results so to be clear?
correction to the comment: it does return one row only if there are duplicate irs_numbers within the same ou (at least one has mop_up = true). happy to add sample data
Why should 0001 0 abc be returned? There is not a row with the same irs_number and different ou with mop_up = 1 to any of the 2 rows.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.