0

I have a very slow query (30+ minutes or more) that I think can be sped up with more efficient coding. Below is the code and the query plan that results. So I am looking for answers to speed up with query that is performing several joins on large tables.

drop table if exists totalshad;
create temporary table totalshad as
select pricedate, hour, sum(cast(price as numeric)) as totalprice from 
pjm.rtcons
where
rtcons.pricedate >= '2017-12-01'
--  and
--  rtcons.pricedate <= '2018-01-23'
group by pricedate, hour
order by pricedate, hour;
-----------------------------
drop table if exists percshad;
create temporary table percshad as
select totalshad.pricedate, totalshad.hour, facility, round(sum(cast(price 
as numeric)),2) as cons_shad, round(sum(cast(totalprice as numeric)),2) as 
total_shad, round(cast(price/totalprice as numeric),4) as per_shad from 
totalshad
join pjm.rtcons on
rtcons.pricedate = totalshad.pricedate
and
rtcons.hour = totalshad.hour
and
facility = 'ETOWANDA-NMESHOPP ETL 1057  A  115 KV'
where totalprice <> 0 and totalshad.pricedate > '2017-12-01'
group by totalshad.pricedate, totalshad.hour, facility, 
(price/totalprice)
order by per_shad desc
limit 5;

EXPLAIN select facility, percshad.pricedate, percshad.hour, per_shad, 
minmcc.rtmcc, minnode.nodename, maxmcc.rtmcc, maxnode.nodename from percshad
join pjm.prices minmcc on
minmcc.pricedate = percshad.pricedate
and
minmcc.hour = percshad.hour
and
minmcc.rtmcc = (select min(rtmcc) from pjm.prices where pricedate = 
percshad.pricedate and hour = percshad.hour)
join pjm.nodes minnode on
minnode.node_id = minmcc.node_id
join pjm.prices maxmcc on
maxmcc.pricedate = percshad.pricedate
and
maxmcc.hour = percshad.hour
and
maxmcc.rtmcc = (select max(rtmcc) from pjm.prices where pricedate = 
percshad.pricedate and hour = percshad.hour)
join pjm.nodes maxnode on
maxnode.node_id = maxmcc.node_id
order by per_shad desc
limit 5

And here is the EXPLAIN output:

EXPLAIN OUTPUT

UPDATE: I have now simplified my code down to the following. But as can be seen from the EXPLAIN, it stills takes forever to find the node_id in the last select statement

 drop table if exists totalshad;
create temporary table totalshad as
select pricedate, hour, sum(cast(price as numeric)) as totalprice from 
pjm.rtcons
where
rtcons.pricedate >= '2017-12-01'
--  and
--  rtcons.pricedate <= '2018-01-23'
group by pricedate, hour
order by pricedate, hour;
-----------------------------
drop table if exists percshad;
create temporary table percshad as
select totalshad.pricedate, totalshad.hour, facility, round(sum(cast(price 
as numeric)),2) as cons_shad, round(sum(cast(totalprice as numeric)),2) as 
total_shad,
round(cast(price/totalprice as numeric),4) as per_shad from totalshad
join pjm.rtcons on
rtcons.pricedate = totalshad.pricedate
and
rtcons.hour = totalshad.hour
and
facility = 'ETOWANDA-NMESHOPP ETL 1057  A  115 KV'
where totalprice <> 0 and totalshad.pricedate > '2017-12-01'
group by totalshad.pricedate, totalshad.hour, facility, (price/totalprice)
order by per_shad desc
limit 5;

drop table if exists mincong;
create temporary table mincong as
select pricedate, hour, min(rtmcc) as rtmcc
from pjm.prices JOIN percshad USING (pricedate, hour)
group by pricedate, hour;

EXPLAIN select distinct on (pricedate, hour) prices.node_id from mincong 
JOIN pjm.prices USING (pricedate, hour, rtmcc)
group by pricedate, hour, node_id

EXPLAIN 2

5
  • 1
    The first thing that I would try is putting indexes on percshad. (Yes, you can do that with temporary tables.) Commented Feb 19, 2018 at 23:04
  • You might not be gaining speed by using temp tables. Commented Feb 20, 2018 at 6:11
  • tell us what are the columns are indexed in your query Commented Feb 20, 2018 at 7:18
  • 1
    For future questions: please do not show us the plan as an image (that doesn't even show the full plan). Use Formatted text please, no screen shots. And the output of explain (analyze, buffers) is usually much more helpful. You can also upload the plan to explain.depesz.com Commented Feb 20, 2018 at 9:11
  • Using PgAdmin4, when I hit F7 for EXPLAIN I just get: syntax error at or near "drop". Additionally, I cannot copy and paste the explain output. Hence my need to use the screenshot. If there's another way, I'm all ears. Commented Feb 20, 2018 at 21:22

1 Answer 1

1

The problem are the subselects in the join condition; they have to be executed for every row joined.

If you cannot get rid of them, try to create an index that will support the subselects as good as possible:

CREATE INDEX ON pjm.prices(pricedate, hour, rtmcc);
Sign up to request clarification or add additional context in comments.

4 Comments

Would it help to join with the subselect instead of using it in the joind condition?
Joining with a subselect is usually much better than having one in a join condition.
@LaurenzAlbe I tried incorporating your suggestions. It still runs slow. See my updated question. Also, I cannot INDEX the pjm.prices as I am not the owner.
Well, you can get the DBA to create an index. But for a hash join you don't need it. You should ANALYZE the temporary tables before using them. Can you provide EXPLAIN (ANALYZE, BUFFERS) output?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.