Optimization of simple join query PostgreSQL

Question

I have two tables: book and bought_book

book: id, price, author_id

sold_book: id, date, book_id, buyer_id

book 1:M sold_book

I want to find max price for sold book (for author). This is my query now:

SELECT max(b.price) 
FROM book b 
JOIN sold_book sb ON b.id = sb.book_id 
where b.author_id = 1;

But the problem is that I have millions of books and millions of sold books as well. And I want to make the most efficient query. I use PostgreSQL. Can my query be more efficient?

Execution plan

Finalize Aggregate  (cost=47504.43..47504.44 rows=1 width=4) (actual time=513.959..522.363 rows=1 loops=1)
  Buffers: shared hit=134940
  ->  Gather  (cost=47504.39..47504.43 rows=3 width=4) (actual time=509.061..522.351 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        Buffers: shared hit=134940
        ->  Partial Aggregate  (cost=47404.39..47404.40 rows=1 width=4) (actual time=502.675..502.682 rows=1 loops=4)
              Buffers: shared hit=134940
              ->  Parallel Hash Join  (cost=42572.03..47398.10 rows=12566 width=4) (actual time=401.608..502.117 rows=6134 loops=4)
                    Hash Cond: (sb.book_id = b.id)
                    Buffers: shared hit=134940
                    ->  Parallel Seq Scan on sold_book sb  (cost=0.00..4632.79 rows=368149 width=4) (actual time=0.010..35.459 rows=285316 loops=4)
                          Buffers: shared hit=9513
                    ->  Parallel Hash  (cost=41111.17..41111.17 rows=139130 width=8) (actual time=379.915..379.916 rows=215480 loops=4)
                          Buckets: 1048576  Batches: 1  Memory Usage: 41952kB
                          Buffers: shared hit=125337
                          ->  Parallel Bitmap Heap Scan on book b  (cost=4733.21..41111.17 rows=139130 width=8) (actual time=121.184..291.225 rows=215480 loops=4)
                                Recheck Cond: (author_id = 1)
                                Heap Blocks: exact=30831
                                Buffers: shared hit=125337
                                ->  Bitmap Index Scan on book_author_id_price_key  (cost=0.00..4691.47 rows=834778 width=0) (actual time=82.198..82.198 rows=874616 loops=1)
                                      Index Cond: (author_id = 1)
                                      Buffers: shared hit=806
Planning:
  Buffers: shared hit=9
Planning Time: 0.366 ms
Execution Time: 522.436 ms

You could add an index on these two fields: author_id, price — Luuk
– Luuk, Commented Feb 22, 2022 at 14:44
Does 'book' represent one title/SKU/ISBN, or one hunk of dead tree? — jjanes
– jjanes, Commented Feb 22, 2022 at 17:45

jjanes · Accepted Answer · 2022-02-23 04:22:08Z

What a bizarre data set. One author wrote nearly a million books. And the vast majority of them have zero sales. And apparently every sale for a given book takes place at the same price.

Your best hope is probably changing the way the query is written.

SELECT b.price
FROM book b 
where b.author_id = 1 and 
  exists (select 1 from sold_book sb where b.id = sb.book_id ) 
order by b.price desc limit 1;

This might be efficiently supported by indexes:

create index on book (author_id, price, id);
create index on sold_book (book_id);

The efficient query might walk backwards down the price list for the specified author, testing each one if it has sold any, stopping as soon as it finds one sale. But who knows, maybe 99% of the this authors works cost several million dollars each, which is why he has so few sales, and it isn't until you get down to $15 that you find a sale. Then this query plan might not work so well.

David Aldridge · Accepted Answer · 2022-02-22 14:50:05Z

1

You don't need to join to every sold_book record, just one of them:

SELECT max(b.price) 
FROM   book b 
where  exists (select from sold_book sb where sb.book_id = b.id)
and    b.author_id = 1;

Indexes on book.author_id, and sold_book.book_id would help.

answered Feb 22, 2022 at 14:50

David Aldridge

52.5k8 gold badges73 silver badges99 bronze badges

Comments

0xdw · Accepted Answer · 2022-02-22 16:16:58Z

0

You can create an index for the author_id.

For example,

CREATE INDEX book_author_id_index ON book (author_id);

For more information, please refer to, https://www.postgresql.org/docs/14/performance-tips.html

edited Feb 22, 2022 at 16:16

answered Feb 22, 2022 at 14:19

0xdw

3,8522 gold badges29 silver badges44 bronze badges

2 Comments

Vladimir Safonov Over a year ago

It is column of book. And I have one. Had to mension, my bad.

0xdw Over a year ago

@VladimirSafonov OK then you can create indexes on the book table and which improves your query executing speed.

Collectives™ on Stack Overflow

Optimization of simple join query PostgreSQL

3 Answers 3

Comments

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Related