Skip to main content
added [query-performance] to 2412 questions - Shog9 (Id=1924)
Link
deleted 46 characters in body
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638
 EXPLAIN ANALYZE
 SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor
 FROM station_logs ORDER BY station_id, submitted_at DESC;
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms
Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms
 Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms
 EXPLAIN ANALYZE
 SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor
 FROM station_logs ORDER BY station_id, submitted_at DESC;
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms
Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms
EXPLAIN ANALYZE
SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor
FROM station_logs ORDER BY station_id, submitted_at DESC;
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms
 Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms
Make table definition consistent, format, clean up
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638

How to make DISTINCT ON faster onin PostgreSQL?

I have schema like thisa table station_logs in a PostgreSQL 9.6 database:

    Column     |            Type             |    
---------------+-----------------------------+
 id            | integerbigint                      | bigserial
 station_id    | integer                     | not null
 submitted_at  | timestamp without time zone | 
 level_sensor  | double precision            | 
Indexes:
    "station_logs_pkey" PRIMARY KEY, btree (id)
    "uniq_sid_sat" UNIQUE CONSTRAINT, btree (station_id, submitted_at)

I'm trying to get the last level_sensor value based on submitted_at, for each station_id. There are around 400 unique station_id values, and around 20k rows per day per station_id.

Before creating index:

 EXPLAIN ANALYZE 
 SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor 
 FROM station_logs ORDER BY station_id, submitted_at DESC;
 
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

After creating index:

EXPLAIN ANALYZE SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor FROM station_logs ORDER BY station_id, submitted_at DESC;

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

I'm trying to get last level_sensor value based on submitted_at, for each station_id.the same query:

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

Is there a way to make this query faster? (1Like 1 sec for example, 27-1616 sec is still too much)

We use PostgreSQL 9.6.

There are around 400 unique station_id values, and around 20k rows per day per station_id.

How to DISTINCT ON faster on PostgreSQL?

I have schema like this:

    Column     |            Type             |    
---------------+-----------------------------+
 id            | integer                     | bigserial
 station_id    | integer                     | not null
 submitted_at  | timestamp without time zone | 
 level_sensor  | double precision            | 
Indexes:
    "station_logs_pkey" PRIMARY KEY, btree (id)
    "uniq_sid_sat" UNIQUE CONSTRAINT, btree (station_id, submitted_at)

Before creating index:

 EXPLAIN ANALYZE SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor FROM station_logs ORDER BY station_id, submitted_at DESC;
 
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

After creating index:

EXPLAIN ANALYZE SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor FROM station_logs ORDER BY station_id, submitted_at DESC;

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

I'm trying to get last level_sensor value based on submitted_at, for each station_id.

Is there a way to make this query faster? (1 sec for example, 27-16 sec is too much)

We use PostgreSQL 9.6.

There are around 400 unique station_id values, and around 20k rows per day per station_id.

How to make DISTINCT ON faster in PostgreSQL?

I have a table station_logs in a PostgreSQL 9.6 database:

    Column     |            Type             |    
---------------+-----------------------------+
 id            | bigint                      | bigserial
 station_id    | integer                     | not null
 submitted_at  | timestamp without time zone | 
 level_sensor  | double precision            | 
Indexes:
    "station_logs_pkey" PRIMARY KEY, btree (id)
    "uniq_sid_sat" UNIQUE CONSTRAINT, btree (station_id, submitted_at)

I'm trying to get the last level_sensor value based on submitted_at, for each station_id. There are around 400 unique station_id values, and around 20k rows per day per station_id.

Before creating index:

 EXPLAIN ANALYZE 
 SELECT DISTINCT ON(station_id) station_id, submitted_at, level_sensor 
 FROM station_logs ORDER BY station_id, submitted_at DESC;
 Unique  (cost=4347852.14..4450301.72 rows=89 width=20) (actual time=22202.080..27619.167 rows=98 loops=1)
   ->  Sort  (cost=4347852.14..4399076.93 rows=20489916 width=20) (actual time=22202.077..26540.827 rows=20489812 loops=1)
         Sort Key: station_id, submitted_at DESC
         Sort Method: external merge  Disk: 681040kB
         ->  Seq Scan on station_logs  (cost=0.00..598895.16 rows=20489916 width=20) (actual time=0.023..3443.587 rows=20489812 loops=$
 Planning time: 0.072 ms
 Execution time: 27690.644 ms

After creating index, for the same query:

Unique  (cost=0.56..2156367.51 rows=89 width=20) (actual time=0.184..16263.413 rows=98 loops=1)
   ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..2105142.98 rows=20489812 width=20) (actual time=0.181..1$
 Planning time: 0.206 ms
 Execution time: 16263.490 ms

Is there a way to make this query faster? Like 1 sec for example, 16 sec is still too much.

tags
Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638
Loading
Tweeted twitter.com/StackDBAs/status/878621570373226496
edited tags
Source Link
ypercubeᵀᴹ
  • 100k
  • 13
  • 217
  • 306
Loading
Source Link
Kokizzu
  • 1.4k
  • 6
  • 18
  • 35
Loading