Bumped by Community user

occurred Sep 4 at 1:02

added 52 characters in body

Source Link

edited Oct 10, 2023 at 21:17

Ian Ling

101
1

I have a PostgreSQL table with millions of rows. The table is around 130GB.

I'm filtering this table on a tsvector column. The tsvector column has a GIN index. The index is around 6GB.

Source Link

asked Oct 10, 2023 at 21:10

Ian Ling

101
1

PostgreSQL not keeping most things cached between queries

I have a PostgreSQL table with millions of rows.

I'm filtering this table on a tsvector column. The tsvector column has a GIN index.

Indexes:
    "packages_text_search_vector_idx" gin (text_search_vector)

The following query is very slow.

SELECT
 name
FROM
  packages
WHERE
  text_search_vector @@ to_tsquery('value');

After using EXPLAIN (ANALYZE, BUFFERS), it appears that this query is slow due to the amount of data that is being retrieved from disk, rather than from the cache.

                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=19651.30..6514076.08 rows=2297168 width=504) (actual time=443.159..8388.212 rows=2293550 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=348 read=2094387
   I/O Timings: read=4294.699
   ->  Parallel Bitmap Heap Scan on packages  (cost=18651.30..6283359.28 rows=957153 width=504) (actual time=441.121..7950.227 rows=764517 loops=3)
         Recheck Cond: (text_search_vector @@ to_tsquery('value'::text))
         Rows Removed by Index Recheck: 2009576
         Heap Blocks: exact=5371 lossy=676087
         Buffers: shared hit=348 read=2094387
         I/O Timings: read=4294.699
         ->  Bitmap Index Scan on packages_text_search_vector_idx  (cost=0.00..18077.01 rows=2297168 width=0) (actual time=432.139..432.140 rows=2293550 loops=1)
               Index Cond: (text_search_vector @@ to_tsquery('value'::text))
               Buffers: shared hit=4 read=617
               I/O Timings: read=2.083
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.147 ms
 Execution Time: 8530.372 ms
(19 rows)

I would expect Postgres to leave this data buffered/cached after the query is finished, so that subsequent executions of the same query would be much faster. However, this doesn't seem to be the case. The query takes about the same amount of time each time I run it, and the Buffers: line in the EXPLAIN output still says it is performing around 2 millions shared buffer reads each time.

Am I misunderstanding something? Or are there some parameters that I should tune to get better performance? This is running on an AWS RDS instance with 64GB of RAM.

work_mem: 4MB
shared_buffers: 16284208kB (~16GB)
effective_cache_size: 32568416kB (~32GB)

postgresql

Stack Exchange Network

Return to Question

PostgreSQL not keeping most things cached between queries