Postgres query with order by causes Seq Scan on large portion of table

Question

I have a model with two tables and an identifier which we are sorting by:

Explain analyze
SELECT          "item"."id"
FROM            "item"
INNER JOIN      "spec"
ON              (
                                "item"."spec_id" = "spec"."id")
WHERE "spec"."project_id" = '70d49665-b041-4afc-8d97-fde42904892f'
ORDER BY       "item"."identifier" DESC
LIMIT 50;

The field identifier is indexed, and "item"."spec_id" as well, but planner is yet scanning multiple rows (much more than exists in the same project):

 Limit  (cost=0.83..101.93 rows=50 width=20) (actual time=53.365..53.885 rows=50     loops=1)
   ->  Nested Loop  (cost=0.83..100806.47 rows=49854 width=20) (actual time=53.364..53.861 rows=50 loops=1)
         ->  Index Scan Backward using item_identifier on item  (cost=0.42..42273.79 rows=109215 width=36) (actual time=0.013..17.292 rows=19332 loops=1)
         ->  Index Scan using spec_pkey on spec  (cost=0.41..0.53 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=19332)
               Index Cond: (id = item.spec_id)
               Filter: (container_id = '70d49665-b041-4afc-8d97-fde42904892f'::uuid)
               Rows Removed by Filter: 1
 Planning time: 0.442 ms
 Execution time: 53.932 ms
(9 rows)

I guess that the optimiser need to sort the identifier first and only then run the filters, but why doesn't it run the filters and only then the do the sorting?

== edit == When I remove the order by stmt the

Index Scan Backward using item_identifier on item  (cost=0.42..42273.79 rows=109215 width=36) (actual time=0.013..17.292 rows=19332 loops=1)

is removed as well from plan

It doesn't do any sorting at all (there is no sort step in your plan) — user1822
– user1822, Commented Sep 27, 2021 at 8:13
@a_horse_with_no_name The thing is once I remove the ORDER BY statement the Index Scan Backward using item_identifier is removed as well from plan. So I assume this is the case here — gCoh
– gCoh, Commented Sep 27, 2021 at 8:18
The ORDER BY-LIMIT interaction together with possibly skewed data or statistics seems to lead the optimizer to the idea that it can find the matching rows soon (that the rows satisfying the other conditions are common in the dataset). Do you have an index on (spec_id, identifier) ? It won't help much with actual sorting but it might give the optimizer better stat info. Or (identifier, spec_id) on the other hand might at least allow index-only scan. — jkavalik
– jkavalik, Commented Sep 27, 2021 at 9:09
@jkavalik I created the two indexes and the planner output is exactly the same, do you have any other insights? — gCoh
– gCoh, Commented Sep 27, 2021 at 11:26
Does this answer your question? postgres Poor performance on ORDER BY "id" DESC LIMIT 1 — mustaccio
– mustaccio, Commented Sep 27, 2021 at 13:40

Laurenz Albe · Accepted Answer · 2021-09-27 12:10:37Z

The optimizer knows that many rows satisfy the condition spec.project_id = '70d49665-b041-4afc-8d97-fde42904892f', so it decides that it would be more effective to scan the rows in ORDER BY order using a nested loop join and discard rows that don't meet the above condition until it has found 50 matching rows.

Using the index on container_id would require to calculate the whole join, sort and pick the first 50 rows.

I cannot say if the strategy works or not, because I don't know how the other plan would perform. It has to scan 19332 rows and takes 54 milliseconds, which doesn't seem too bad to be.

To force the alternative plan, you could ORDER BY item.identifier || '' and see if that performs better.

Stack Exchange Network

Postgres query with order by causes Seq Scan on large portion of table

1 Answer 1

Linked

Hot Network Questions

Postgres query with order by causes Seq Scan on large portion of table

1 Answer 1

Linked

Related

Hot Network Questions