I have two tables: book and bought_book
book: id, price, author_id
sold_book: id, date, book_id, buyer_id
book 1:M sold_book
I want to find max price for sold book (for author). This is my query now:
SELECT max(b.price)
FROM book b
JOIN sold_book sb ON b.id = sb.book_id
where b.author_id = 1;
But the problem is that I have millions of books and millions of sold books as well. And I want to make the most efficient query. I use PostgreSQL. Can my query be more efficient?
Execution plan
Finalize Aggregate (cost=47504.43..47504.44 rows=1 width=4) (actual time=513.959..522.363 rows=1 loops=1)
Buffers: shared hit=134940
-> Gather (cost=47504.39..47504.43 rows=3 width=4) (actual time=509.061..522.351 rows=4 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=134940
-> Partial Aggregate (cost=47404.39..47404.40 rows=1 width=4) (actual time=502.675..502.682 rows=1 loops=4)
Buffers: shared hit=134940
-> Parallel Hash Join (cost=42572.03..47398.10 rows=12566 width=4) (actual time=401.608..502.117 rows=6134 loops=4)
Hash Cond: (sb.book_id = b.id)
Buffers: shared hit=134940
-> Parallel Seq Scan on sold_book sb (cost=0.00..4632.79 rows=368149 width=4) (actual time=0.010..35.459 rows=285316 loops=4)
Buffers: shared hit=9513
-> Parallel Hash (cost=41111.17..41111.17 rows=139130 width=8) (actual time=379.915..379.916 rows=215480 loops=4)
Buckets: 1048576 Batches: 1 Memory Usage: 41952kB
Buffers: shared hit=125337
-> Parallel Bitmap Heap Scan on book b (cost=4733.21..41111.17 rows=139130 width=8) (actual time=121.184..291.225 rows=215480 loops=4)
Recheck Cond: (author_id = 1)
Heap Blocks: exact=30831
Buffers: shared hit=125337
-> Bitmap Index Scan on book_author_id_price_key (cost=0.00..4691.47 rows=834778 width=0) (actual time=82.198..82.198 rows=874616 loops=1)
Index Cond: (author_id = 1)
Buffers: shared hit=806
Planning:
Buffers: shared hit=9
Planning Time: 0.366 ms
Execution Time: 522.436 ms
author_id, priceEXPLAIN (ANALYZE, BUFFERS)output.