Function to_date and BETWEEN slow query on a large table

Question

I am using PostgreSQL, and I use the following query:

SELECT r.name,count(r.name)
from rooms r
where to_date(dateinput,'YYYYMMDD') between r.start_date and r.end_date
or to_char(r.end_time,'HH24:MI:SS')<> '00:00:00')
and (r.name in ('nameA','nameB'))
group by r.name

When my table has 900.000 rows, it is very slow. I create an index on the columns start_date, end_date and name. It executes in 1543 ms. end_time is of data type time without time zone.

But when I change the query to

SELECT r.name,count(r.name)
from rooms r
where cast(dateinput as date) >= r.start_date
  and cast(dateinput as date) < r.end_date
  and r.name in ('nameA','nameB')
or to_char(r.end_time,'HH24:MI:SS')<> '00:00:00')
and (r.name in ('nameA','nameB'))
group by r.name

the execution time is reduced to 786 ms. I think to_date and between make the index inoperable. I can't find any documentation or example that explains why the index cannot be used when using to_date and between. I don't know why modifying the query reduces the time to 786 ms. Can anyone help me?

The index cannot be used for either query, and the time difference might be by accident or because the second query found more of the table in cache than the first. Both queries you show are syntactically incorrect and will not run. Please show the correct queries! — Laurenz Albe
– Laurenz Albe, Commented Apr 13, 2021 at 2:41
@LaurenzAlbe because the second query found more of the table in cache than the first. => i don't understand this. Could you explain ? Why you know this — BaoTrung Tran
– BaoTrung Tran, Commented Apr 13, 2021 at 2:51

Laurenz Albe · Accepted Answer · 2021-04-14 10:47:20Z

The index cannot be used for either query, and the difference in execution time is probably because the first query had to read more data from disk, while they were already cached in RAM (shared buffers) during the second query.

The strange OR condition makes it difficult to make this query efficient, and to_char(r.end_time,'HH24:MI:SS') is impossible to index (and I don't understand its meaning).

You will have to rewrite the query without OR (use a UNION) and express the condition on end_time differently, then you can use indexes to speed it up.

I would rewrite the query like this:

SELECT r.name, count(r.name)
FROM (SELECT r.name
      FROM rooms r
      WHERE to_date(dateinput,'YYYYMMDD') <@ daterange(r.start_date, r.end_date, '[]')
        AND r.name IN ('nameA','nameB')
      UNION
      SELECT r.name
      FROM rooms r
      WHERE r.end_time <> TIME '00:00:00'
        AND r.name IN ('nameA','nameB')
     ) AS r
GROUP BY r.name;

These indexes could help:

CREATE INDEX ON rooms USING gist (daterange(r.start_date, r.end_date, '[]')) WHERE r.name IN ('nameA','nameB');

CREATE INDEX ON rooms (name) WHERE r.end_time <> TIME '00:00:00' AND r.name IN ('nameA','nameB');

Thanks you for your answer . My end_time with type : time without time zone. I using to_char(r.end_time,'HH24:MI:SS') because i want convert it to char and compare it to begin time. So can you explain why when i remove function to_date my query reduce a lot time. Any suggest for rewrite query
Thanks you for suggestions. It help me so much. So i have a question ? I can using create index on rooms using btree ? Why i need USING gist here
Because the index is on a range type, which does not support B-tree indexes.

Collectives™ on Stack Overflow

Function to_date and BETWEEN slow query on a large table

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related