POSTGRES - How do i optimize this join query

Question

I have one table as below which stores employees details

empid	attribute	value	start_date	end_date
E1	Active	FALSE	2020-01-01	2022-05-05
E1	Active	TRUE	2022-06-06	9999-12-31
E1	Branch	NYC	2022-01-01	9999-12-31
E2	Active	TRUE	2020-01-01	9999-12-31

Then another table which have address details

empid	city
E1	CON

What I want is

List of all the employee ids which do not have entries in address table

AND

All those employees whose current value(today's value) of attribute ACTIVE is TRUE.

I have written below query but its taking too much time to execute (3-4mins). Is there any way i can optimize this query.

select distinct(emp.empid) from schema1.employees emp, schema2.address add 
where emp.empid = add.empid 
  and (emp.attribute ='ACTIVE' and emp.val in ('TRUE') 
  and CURRENT_DATE BETWEEN emp.start_date and emp.end_date)
  and emp.emp_id not in (select empid from schema2.address add2)
limit 20

#1 select * from employees e left join address a on a.empid = e.empid where a.empid is null. — The Impaler
– The Impaler, Commented Dec 27, 2022 at 14:36
@TheImpaler I want a single query which satisfies both the condition. i cannot have two queries — Johny
– Johny, Commented Dec 27, 2022 at 14:37
Is DISTINCT really necessary? I would guess there's going to be only one ACTIVE row for an employee in any specific date. — The Impaler
– The Impaler, Commented Dec 27, 2022 at 14:47
Unrelated to your problem, but: distinct is not a function. It always applies to all columns in the select list. Enclosing one of the columns with parentheses won't change anything and is useless. distinct (a),b is the same as distinct a,(b) or distinct a,b — user330315
– user330315, Commented Dec 27, 2022 at 15:05

The Impaler · Accepted Answer · 2022-12-27 19:01:43Z

1

I made only minor modifications to your query that could help performance.

select emp.empid
from schema1.employees emp
left join schema2.address add on emp.empid = add.empid 
where emp.attribute ='ACTIVE' 
  and emp.val = 'TRUE'
  and emp.start_date <= CURRENT_DATE 
  and emp.end_date >= CURRENT_DATE 
  and add.empid is null;

The following indexes that can be useful as well. If you don't already have them, add them and try again.

create index ix1 on schema1.employees (attribute, val, start_date, end_date);

create index ix2 on schema2.address (empid);

edited Dec 27, 2022 at 19:01

answered Dec 27, 2022 at 14:53

The Impaler

49.3k10 gold badges50 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

The Impaler Over a year ago

@ErwinBrandstetter I don't know where vpd came from (probably some gremlin). Now, the predicate add.empid is null is there to implement an anti-join; that is, all rows in the left table that do not have a match in the right table.

Johny Over a year ago

This did improved performance. Thanks

Erwin Brandstetter · Accepted Answer · 2022-12-27 18:45:43Z

From the description it's unclear how your two conditions should be combined.

To get all employees that pass either of your two conditions:

SELECT e.empid
FROM   schema1.employees e
WHERE  e.attribute = 'ACTIVE'
AND    e.val  -- = true
AND    CURRENT_DATE BETWEEN e.start_date AND e.end_date
OR     NOT EXISTS (SELECT FROM schema2.address a WHERE a.empid = e.empid);

To get all employees that pass both of your two conditions:

...
AND    NOT EXISTS (SELECT FROM schema2.address a WHERE a.empid = e.empid);

Assuming you only added DISTINCT (incorrectly with parentheses) to deal with duplicates introduced by the join. Not duplicating rows with an EXISTS expression in the first place, we also don't need DISTINCT.

Collectives™ on Stack Overflow

POSTGRES - How do i optimize this join query

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related