Skip to main content
links, format
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638
SELECT s.station_id, l.submitted_at, l.level_sensor
FROM   station s
CROSS  JOIN LATERAL (
   SELECT submitted_at, level_sensor
   FROM   station_logs
   WHERE  station_id = s.station_id
   ORDER  BY submitted_at DESC NULLS LAST
   LIMIT  1
   ) l;
SELECT s.station_id, l.submitted_at, l.level_sensor
FROM   station s
CROSS  JOIN LATERAL (
   SELECT submitted_at, level_sensor
   FROM   station_logs
   WHERE  station_id = s.station_id
   ORDER  BY submitted_at DESC NULLS LAST
   LIMIT  1
   ) l;

dbfiddle here
   (comparing plans for this query, Abelisto's alternative and your original)

 Nested Loop  (cost=0.56..356.65 rows=102 width=20) (actual time=0.034..0.979 rows=98 loops=1)
   ->  Seq Scan on stations s  (cost=0.00..3.02 rows=102 width=4) (actual time=0.009..0.016 rows=102 loops=1)
   ->  Limit  (cost=0.56..3.45 rows=1 width=16) (actual time=0.009..0.009 rows=1 loops=102)
         ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..664062.38 rows=230223 width=16) (actual time=0.009$
               Index Cond: (station_id = s.id)
 Planning time: 0.542 ms
 Execution time: 1.013 ms  -- !!
Nested Loop  (cost=0.56..356.65 rows=102 width=20) (actual time=0.034..0.979 rows=98 loops=1)
   ->  Seq Scan on stations s  (cost=0.00..3.02 rows=102 width=4) (actual time=0.009..0.016 rows=102 loops=1)
   ->  Limit  (cost=0.56..3.45 rows=1 width=16) (actual time=0.009..0.009 rows=1 loops=102)
         ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..664062.38 rows=230223 width=16) (actual time=0.009$
               Index Cond: (station_id = s.id)
 Planning time: 0.542 ms
 Execution time: <b>1.013 ms</b>  -- !!
CREATE TABLE station AS
WITH RECURSIVE cte AS (
   (
   SELECT station_id
   FROM   station_logs
   ORDER  BY station_id
   LIMIT  1
   )
   UNION ALL
   SELECT l.station_id
   FROM   cte c
   ,      LATERAL (   
      SELECT station_id
      FROM   station_logs
      WHERE  station_id > c.station_id
      ORDER  BY station_id
      LIMIT  1
      ) l
   )
TABLE cte;
CREATE TABLE station AS
WITH RECURSIVE cte AS (
   (
   SELECT station_id
   FROM   station_logs
   ORDER  BY station_id
   LIMIT  1
   )
   UNION ALL
   SELECT l.station_id
   FROM   cte c
   ,      LATERAL (   
      SELECT station_id
      FROM   station_logs
      WHERE  station_id > c.station_id
      ORDER  BY station_id
      LIMIT  1
      ) l
   )
TABLE cte;
SELECT s.station_id, l.submitted_at, l.level_sensor
FROM   station s
CROSS  JOIN LATERAL (
   SELECT submitted_at, level_sensor
   FROM   station_logs
   WHERE  station_id = s.station_id
   ORDER  BY submitted_at DESC NULLS LAST
   LIMIT  1
   ) l;

dbfiddle here
 (comparing plans for this query, Abelisto's alternative and your original)

 Nested Loop  (cost=0.56..356.65 rows=102 width=20) (actual time=0.034..0.979 rows=98 loops=1)
   ->  Seq Scan on stations s  (cost=0.00..3.02 rows=102 width=4) (actual time=0.009..0.016 rows=102 loops=1)
   ->  Limit  (cost=0.56..3.45 rows=1 width=16) (actual time=0.009..0.009 rows=1 loops=102)
         ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..664062.38 rows=230223 width=16) (actual time=0.009$
               Index Cond: (station_id = s.id)
 Planning time: 0.542 ms
 Execution time: 1.013 ms  -- !!
CREATE TABLE station AS
WITH RECURSIVE cte AS (
   (
   SELECT station_id
   FROM   station_logs
   ORDER  BY station_id
   LIMIT  1
   )
   UNION ALL
   SELECT l.station_id
   FROM   cte c
   ,      LATERAL (   
      SELECT station_id
      FROM   station_logs
      WHERE  station_id > c.station_id
      ORDER  BY station_id
      LIMIT  1
      ) l
   )
TABLE cte;
SELECT s.station_id, l.submitted_at, l.level_sensor
FROM   station s
CROSS  JOIN LATERAL (
   SELECT submitted_at, level_sensor
   FROM   station_logs
   WHERE  station_id = s.station_id
   ORDER  BY submitted_at DESC NULLS LAST
   LIMIT  1
   ) l;

dbfiddle here  (comparing plans for this query, Abelisto's alternative and your original)

Nested Loop  (cost=0.56..356.65 rows=102 width=20) (actual time=0.034..0.979 rows=98 loops=1)
   ->  Seq Scan on stations s  (cost=0.00..3.02 rows=102 width=4) (actual time=0.009..0.016 rows=102 loops=1)
   ->  Limit  (cost=0.56..3.45 rows=1 width=16) (actual time=0.009..0.009 rows=1 loops=102)
         ->  Index Scan using station_id__submitted_at on station_logs  (cost=0.56..664062.38 rows=230223 width=16) (actual time=0.009$
               Index Cond: (station_id = s.id)
 Planning time: 0.542 ms
 Execution time: <b>1.013 ms</b>  -- !!
CREATE TABLE station AS
WITH RECURSIVE cte AS (
   (
   SELECT station_id
   FROM   station_logs
   ORDER  BY station_id
   LIMIT  1
   )
   UNION ALL
   SELECT l.station_id
   FROM   cte c
   ,      LATERAL (   
      SELECT station_id
      FROM   station_logs
      WHERE  station_id > c.station_id
      ORDER  BY station_id
      LIMIT  1
      ) l
   )
TABLE cte;
clarify
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638

Resulting EXPLAIN ANALYZE by ThreadStarter (providedas provided by the OP):

I added NULLS LAST to ORDER BY because submitted_at isn't defined NOT NULL (which it probably should be).

The only index you need is the one you created: station_id__submitted_at. The uniqueUNIQUE constraint uniq_sid_sat also does the job, basically. Maintaining both seems like a waste of disk space and write performance.

I added NULLS LAST to ORDER BY in the query because submitted_at isn't defined NOT NULL. Ideally, youif applicable!, add a NOT NULL constraint to the column submitted_at (if that's applicable!) and, drop the additional index and theremove NULLS LAST clausefrom the query.

If that's not an option (submitted_at can be NULL), create a uniquethis UNIQUE index to replace both your current index and unique constraint:

(I I use that in the fiddle as well.) You could also use a similar query to solve your task directly, without station table - if you can't be convinced to create it.

###Optimize index

Optimize index

It might make sense to add level_sensor as last column to the index to allow index-only scans, like joanolo commented.
Con: It makes the index bigger - which adds a little cost to all queries using it.
Pro: If you actually get index only scans out of it, the query at hand does not have to visit heap pages at all, which makes it about twice as fast. But that may be an insubstantial gain for the very fast query now.

More importantlyHowever, I don't expect that to work for your case. You mentioned:

Typically, that would indicate unceasing write load (1 per station_id every 5 seconds). And you are interested in the latest row. Index-only scans only work for heap pages that are visible to all transactions (bit in the visibility map is set). You would have to run extremely aggressive VACUUM settings for the table to keep up with the write load, and it would still not work most of the time. If my assumptions are correct, index-only scans are out, don't add level_sensor to the index.

EXPLAIN ANALYZE by ThreadStarter (provided by OP):

I added NULLS LAST to ORDER BY because submitted_at isn't defined NOT NULL (which it probably should be).

The only index you need is the one you created: station_id__submitted_at. The unique constraint uniq_sid_sat also does the job, basically. Maintaining both seems like a waste of disk space and write performance.

Ideally, you add a NOT NULL constraint to the column submitted_at (if that's applicable!) and drop the additional index and the NULLS LAST clause.

If that's not an option (submitted_at can be NULL), create a unique index to replace index and unique constraint:

(I use that in the fiddle as well.) You could also use a similar query to solve your task directly, without station table - if you can't be convinced to create it.

###Optimize index

It might make sense to add level_sensor as last column to the index to allow index-only scans, like joanolo commented.
Con: It makes the index bigger - which adds a little cost to all queries using it.
Pro: If you actually get index only scans out of it, the query at hand does not have to visit heap pages at all, which makes it about twice as fast. But that may be an insubstantial gain for the very fast query.

More importantly, I don't expect that to work for your case. You mentioned:

Typically, that would indicate unceasing write load (1 per station_id every 5 seconds). And you are interested in the latest row. Index-only scans only work for heap pages that are visible to all transactions (bit in the visibility map is set). You would have to run extremely aggressive VACUUM settings for the table to keep up with the write load, and it would still not work most of the time. If my assumptions are correct index-only scans are out, don't add level_sensor to the index.

Resulting EXPLAIN ANALYZE as provided by the OP:

The only index you need is the one you created: station_id__submitted_at. The UNIQUE constraint uniq_sid_sat also does the job, basically. Maintaining both seems like a waste of disk space and write performance.

I added NULLS LAST to ORDER BY in the query because submitted_at isn't defined NOT NULL. Ideally, if applicable!, add a NOT NULL constraint to the column submitted_at, drop the additional index and remove NULLS LAST from the query.

If submitted_at can be NULL, create this UNIQUE index to replace both your current index and unique constraint:

I use that in the fiddle as well. You could use a similar query to solve your task directly, without station table - if you can't be convinced to create it.

Optimize index

It might make sense to add level_sensor as last column to the index to allow index-only scans, like joanolo commented.
Con: It makes the index bigger - which adds a little cost to all queries using it.
Pro: If you actually get index only scans out of it, the query at hand does not have to visit heap pages at all, which makes it about twice as fast. But that may be an insubstantial gain for the very fast query now.

However, I don't expect that to work for your case. You mentioned:

Typically, that would indicate unceasing write load (1 per station_id every 5 seconds). And you are interested in the latest row. Index-only scans only work for heap pages that are visible to all transactions (bit in the visibility map is set). You would have to run extremely aggressive VACUUM settings for the table to keep up with the write load, and it would still not work most of the time. If my assumptions are correct, index-only scans are out, don't add level_sensor to the index.

clarify index / unique constraint
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638
Loading
adjust format, typo
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638
Loading
discuss index optimization
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638
Loading
added 1056 characters in body
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638
Loading
Source Link
Erwin Brandstetter
  • 186.5k
  • 28
  • 465
  • 638
Loading