0

I have two tables: book and bought_book

book: id, price, author_id

sold_book: id, date, book_id, buyer_id

book 1:M sold_book

I want to find max price for sold book (for author). This is my query now:

SELECT max(b.price) 
FROM book b 
JOIN sold_book sb ON b.id = sb.book_id 
where b.author_id = 1;

But the problem is that I have millions of books and millions of sold books as well. And I want to make the most efficient query. I use PostgreSQL. Can my query be more efficient?

Execution plan

Finalize Aggregate  (cost=47504.43..47504.44 rows=1 width=4) (actual time=513.959..522.363 rows=1 loops=1)
  Buffers: shared hit=134940
  ->  Gather  (cost=47504.39..47504.43 rows=3 width=4) (actual time=509.061..522.351 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        Buffers: shared hit=134940
        ->  Partial Aggregate  (cost=47404.39..47404.40 rows=1 width=4) (actual time=502.675..502.682 rows=1 loops=4)
              Buffers: shared hit=134940
              ->  Parallel Hash Join  (cost=42572.03..47398.10 rows=12566 width=4) (actual time=401.608..502.117 rows=6134 loops=4)
                    Hash Cond: (sb.book_id = b.id)
                    Buffers: shared hit=134940
                    ->  Parallel Seq Scan on sold_book sb  (cost=0.00..4632.79 rows=368149 width=4) (actual time=0.010..35.459 rows=285316 loops=4)
                          Buffers: shared hit=9513
                    ->  Parallel Hash  (cost=41111.17..41111.17 rows=139130 width=8) (actual time=379.915..379.916 rows=215480 loops=4)
                          Buckets: 1048576  Batches: 1  Memory Usage: 41952kB
                          Buffers: shared hit=125337
                          ->  Parallel Bitmap Heap Scan on book b  (cost=4733.21..41111.17 rows=139130 width=8) (actual time=121.184..291.225 rows=215480 loops=4)
                                Recheck Cond: (author_id = 1)
                                Heap Blocks: exact=30831
                                Buffers: shared hit=125337
                                ->  Bitmap Index Scan on book_author_id_price_key  (cost=0.00..4691.47 rows=834778 width=0) (actual time=82.198..82.198 rows=874616 loops=1)
                                      Index Cond: (author_id = 1)
                                      Buffers: shared hit=806
Planning:
  Buffers: shared hit=9
Planning Time: 0.366 ms
Execution Time: 522.436 ms 
5
  • @Luuk it takes 1.2 secs approximately. It is slow enough Commented Feb 22, 2022 at 14:35
  • You could add an index on these two fields: author_id, price Commented Feb 22, 2022 at 14:44
  • Please add EXPLAIN (ANALYZE, BUFFERS) output. Commented Feb 22, 2022 at 14:56
  • @a_horse_with_no_name added Commented Feb 22, 2022 at 15:44
  • Does 'book' represent one title/SKU/ISBN, or one hunk of dead tree? Commented Feb 22, 2022 at 17:45

3 Answers 3

1

What a bizarre data set. One author wrote nearly a million books. And the vast majority of them have zero sales. And apparently every sale for a given book takes place at the same price.

Your best hope is probably changing the way the query is written.

SELECT b.price
FROM book b 
where b.author_id = 1 and 
  exists (select 1 from sold_book sb where b.id = sb.book_id ) 
order by b.price desc limit 1;

This might be efficiently supported by indexes:

create index on book (author_id, price, id);
create index on sold_book (book_id);

The efficient query might walk backwards down the price list for the specified author, testing each one if it has sold any, stopping as soon as it finds one sale. But who knows, maybe 99% of the this authors works cost several million dollars each, which is why he has so few sales, and it isn't until you get down to $15 that you find a sale. Then this query plan might not work so well.

Sign up to request clarification or add additional context in comments.

Comments

1

You don't need to join to every sold_book record, just one of them:

SELECT max(b.price) 
FROM   book b 
where  exists (select from sold_book sb where sb.book_id = b.id)
and    b.author_id = 1;

Indexes on book.author_id, and sold_book.book_id would help.

Comments

0

You can create an index for the author_id.

For example,

CREATE INDEX book_author_id_index ON book (author_id);

For more information, please refer to, https://www.postgresql.org/docs/14/performance-tips.html

2 Comments

It is column of book. And I have one. Had to mension, my bad.
@VladimirSafonov OK then you can create indexes on the book table and which improves your query executing speed.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.