I have schema like thisa table station_logs in a PostgreSQL 9.6 database:
Column | Type |
---------------+-----------------------------+
id | integerbigint | bigserial
station_id | integer | not null
submitted_at | timestamp without time zone |
level_sensor | double precision |
Indexes:
"station_logs_pkey" PRIMARY KEY, btree (id)
"uniq_sid_sat" UNIQUE CONSTRAINT, btree (station_id, submitted_at)
I'm trying to get the last level_sensor value based on submitted_at, for each station_id. There are around 400 unique station_id values, and around 20k rows per day per station_id.
Before creating index:
EXPLAIN ANALYZE
SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor
FROM station_logs ORDER BY station_id, submitted_at DESC;
Unique (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
-> Sort (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
Sort Key: station_id, submitted_at DESC
Sort Method: external merge Disk: 681040kB
-> Seq Scan on station_logs (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
Planning time: 0.072 ms
Execution time: 27690.644 ms
Unique (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
-> Sort (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
Sort Key: station_id, submitted_at DESC
Sort Method: external merge Disk: 681040kB
-> Seq Scan on station_logs (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
Planning time: 0.072 ms
Execution time: 27690.644 ms
After creating index:
EXPLAIN ANALYZE SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor FROM station_logs ORDER BY station_id, submitted_at DESC;
Unique (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
-> Index Scan using station_id__submitted_at on station_logs (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
Planning time: 0.206 ms
Execution time: 16263.490 ms
I'm trying to get last level_sensor value based on submitted_at, for each station_id.the same query:
Unique (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
-> Index Scan using station_id__submitted_at on station_logs (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
Planning time: 0.206 ms
Execution time: 16263.490 ms
Is there a way to make this query faster? (1Like 1 sec for example, 27-1616 sec is still too much)
We use PostgreSQL 9.6.
There are around 400 unique station_id values, and around 20k rows per day per station_id.