0

so I have a query and I'm running it on the worst case scenario and it's taking 10-12 minutes. if I remove the time check on the where query it goes down to 20-30 seconds so I was wondering how to optimize it?

I tried adding index on the conversion of the timestamp to time but it didn't really help... the rs tables (register_status) has over 70 million rows, register_date around 280k and the cp one less than 1k.

The idea of the query is getting all the results of CP grouped by status over a period of dates, included in a time range. And that's the worst case scenario, so that's the first date in the database and if the user select the whole day as time range. Query is the following:

explain analyze SELECT
    COUNT(rs.status) filter (where rs.status = 'Occ') as total_occ,
    COUNT(rs.status) filter (where rs.status = 'Part') as total_part,
    COUNT(rs.status) filter (where rs.status = 'OOS') as total_oos,
    COUNT(rs.status) filter (where rs.status = 'OOC') as total_ooc,
    cp.id as charge_point_id,
    cp.address,
    cp.type as charge_point_type,
    cp.latitude,
    cp.longitude 
FROM register_date rd 
    inner join register_status rs on rs.fk_register_date = rd.id
    inner join charge_point cp on cp.id = rs.fk_charge_point
WHERE 
 rd.date::date >= '2016-11-01' and rd.date::date <= '2019-08-01'
 AND
 rd.date::time >= time '00:00' AND rd.date::time <= time '23:59'
group by cp.id

And the EXPLAIN ANALYZE result is the following one, I can see a lot of space usage...

"Finalize GroupAggregate  (cost=34412.78..34536.10 rows=780 width=124) (actual time=689440.380..699740.172 rows=813 loops=1)"
"  Group Key: cp.id"
"  ->  Gather Merge  (cost=34412.78..34519.27 rows=722 width=124) (actual time=689421.445..699736.996 rows=1579 loops=1)"
"        Workers Planned: 1"
"        Workers Launched: 1"
"        ->  Partial GroupAggregate  (cost=33412.77..33438.04 rows=722 width=124) (actual time=649515.576..659674.461 rows=790 loops=2)"
"              Group Key: cp.id"
"              ->  Sort  (cost=33412.77..33414.57 rows=722 width=96) (actual time=649496.720..654001.697 rows=24509314 loops=2)"
"                    Sort Key: cp.id"
"                    Sort Method: external merge  Disk: 2649104kB"
"                    Worker 0:  Sort Method: external merge  Disk: 2652840kB"
"                    ->  Nested Loop  (cost=0.56..33378.49 rows=722 width=96) (actual time=1.343..504948.423 rows=24509314 loops=2)"
"                          ->  Parallel Seq Scan on register_date rd  (cost=0.00..6443.69 rows=4 width=4) (actual time=0.021..294.724 rows=139760 loops=2)"
"                                Filter: (((date)::date >= '2016-11-01'::date) AND ((date)::date <= '2019-08-01'::date) AND ((date)::time without time zone >= '00:00:00'::time without time zone) AND ((date)::time without time zone <= '23:59:00'::time without time zone))"
"                          ->  Nested Loop  (cost=0.56..6725.90 rows=780 width=100) (actual time=0.077..3.574 rows=175 loops=279519)"
"                                ->  Seq Scan on charge_point cp  (cost=0.00..21.80 rows=780 width=92) (actual time=0.002..0.077 rows=813 loops=279519)"
"                                ->  Index Only Scan using register_status_fk_charge_point_fk_register_date_status_key on register_status rs  (cost=0.56..8.58 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=227248947)"
"                                      Index Cond: ((fk_charge_point = cp.id) AND (fk_register_date = rd.id))"
"                                      Heap Fetches: 49018627"
"Planning Time: 0.506 ms"
"Execution Time: 700065.010 ms"
1
  • So you have a timestamp column called date. Commented Dec 4, 2019 at 22:40

2 Answers 2

3

This might be faster with a lateral join:

SELECT cp.*, rd.*
FROM charge_point cp CROSS JOIN LATERAL
     (SELECT COUNT(*) filter (where rs.status = 'Occ') as total_occ,
             COUNT(*) filter (where rs.status = 'Part') as total_part,
             COUNT(*) filter (where rs.status = 'OOS') as total_oos,
             COUNT(*) filter (where rs.status = 'OOC') as total_ooc,
      FROM register_date rd JOIN
           register_status rs 
           ON rs.fk_register_date = rd.id
      WHERE cp.id = rs.fk_charge_point AND
            rd.date >= '2016-11-01' and
            rd.date < '2019-08-01' + interval '1 day'
     ) rd;

Indexes on register_date(fk_charge_point, date) and register_status(id, status) are recommended.

Note that I changed the date comparisons so they are more index-friendly. I see no reason to filter by time, so I removed those conditions.

Sign up to request clarification or add additional context in comments.

3 Comments

Note: ) cp; [ plus: I dont think you need the LATERAL here, only the not-splitting of the timestamps would suffice, IMO]
I do need the time, because it could be needed only the results where the time of the date is between 5am to 6pm for example, so as I said I'm running the query in the worst case scenario, in case the user selects from 00:00 to 23:59 , which wouldn't make sense, but still possible.
@user3107720 . . . Then add the condition back in. The lateral join should still use the index for the charge point and date.
-1

I used Gordon's approach to develop a new query and it ended up being way faster, from 10-12 min to 20-40 seconds:

SELECT cp.*, rd.* from charge_point cp cross join lateral
    (select
        COUNT(rs.status) filter (where rs.status = 'Occ') as total_occ,
        COUNT(rs.status) filter (where rs.status = 'Part') as total_part,
        COUNT(rs.status) filter (where rs.status = 'OOS') as total_oos,
        COUNT(rs.status) filter (where rs.status = 'OOC') as total_ooc,
        rs.fk_charge_point as cpid
    FROM register_date rd 
        inner join register_status rs on rs.fk_register_date = rd.id
    WHERE 
        rd.date::date >= '2019-02-01' and rd.date::date <= '2019-08-01'
        AND
        rd.date::time >= time '00:00' AND rd.date::time <= time '23:59'
    group by rs.fk_charge_point) rd
where cp.id = rd.cpid

I still need to check if adding any indexes will make it even faster, but it's looking good so far

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.