added [query-performance] to 2412 questions - Shog9 (Id=1924)

Link

edited Jan 7, 2020 at 22:23

Community Bot

1

deleted 46 characters in body

Source Link

edited Jun 26, 2017 at 14:44

Erwin Brandstetter

186.5k
28
465
638

 EXPLAIN ANALYZE
 SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor
 FROM station_logs ORDER BY station_id, submitted_at DESC;

 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

 Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

 EXPLAIN ANALYZE
 SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor
 FROM station_logs ORDER BY station_id, submitted_at DESC;

 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

EXPLAIN ANALYZE
SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor
FROM station_logs ORDER BY station_id, submitted_at DESC;

 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

 Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

Make table definition consistent, format, clean up

Source Link

edited Jun 26, 2017 at 14:10

Erwin Brandstetter

186.5k
28
465
638

How to make DISTINCT ON faster onin PostgreSQL?

I have schema like thisa table station_logs in a PostgreSQL 9.6 database:

    Column     |            Type             |    
---------------+-----------------------------+
 id            | integerbigint                      | bigserial
 station_id    | integer                     | not null
 submitted_at  | timestamp without time zone | 
 level_sensor  | double precision            | 
Indexes:
    "station_logs_pkey" PRIMARY KEY, btree (id)
    "uniq_sid_sat" UNIQUE CONSTRAINT, btree (station_id, submitted_at)

I'm trying to get the last level_sensor value based on submitted_at, for each station_id. There are around 400 unique station_id values, and around 20k rows per day per station_id.

Before creating index:

 EXPLAIN ANALYZE 
 SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor 
 FROM station_logs ORDER BY station_id, submitted_at DESC;
 
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

After creating index:

EXPLAIN ANALYZE SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor FROM station_logs ORDER BY station_id, submitted_at DESC;

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

I'm trying to get last level_sensor value based on submitted_at, for each station_id.the same query:

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

Is there a way to make this query faster? (1Like 1 sec for example, 27-1616 sec is still too much)

We use PostgreSQL 9.6.

There are around 400 unique station_id values, and around 20k rows per day per station_id.

How to DISTINCT ON faster on PostgreSQL?

I have schema like this:

    Column     |            Type             |    
---------------+-----------------------------+
 id            | integer                     | bigserial
 station_id    | integer                     | not null
 submitted_at  | timestamp without time zone | 
 level_sensor  | double precision            | 
Indexes:
    "station_logs_pkey" PRIMARY KEY, btree (id)
    "uniq_sid_sat" UNIQUE CONSTRAINT, btree (station_id, submitted_at)

Before creating index:

 EXPLAIN ANALYZE SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor FROM station_logs ORDER BY station_id, submitted_at DESC;
 
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

After creating index:

EXPLAIN ANALYZE SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor FROM station_logs ORDER BY station_id, submitted_at DESC;

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

I'm trying to get last level_sensor value based on submitted_at, for each station_id.

Is there a way to make this query faster? (1 sec for example, 27-16 sec is too much)

We use PostgreSQL 9.6.

There are around 400 unique station_id values, and around 20k rows per day per station_id.

How to make DISTINCT ON faster in PostgreSQL?

I have a table station_logs in a PostgreSQL 9.6 database:

    Column     |            Type             |    
---------------+-----------------------------+
 id            | bigint                      | bigserial
 station_id    | integer                     | not null
 submitted_at  | timestamp without time zone | 
 level_sensor  | double precision            | 
Indexes:
    "station_logs_pkey" PRIMARY KEY, btree (id)
    "uniq_sid_sat" UNIQUE CONSTRAINT, btree (station_id, submitted_at)

I'm trying to get the last level_sensor value based on submitted_at, for each station_id. There are around 400 unique station_id values, and around 20k rows per day per station_id.

Before creating index:

 EXPLAIN ANALYZE 
 SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor 
 FROM station_logs ORDER BY station_id, submitted_at DESC;

 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

After creating index, for the same query:

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

Is there a way to make this query faster? Like 1 sec for example, 16 sec is still too much.