3

I have 2 tables scheduling_flownode and xact_message with weak relation between them. I am trying to execute the following query

set search_path='ad_96d5be';
explain analyze 
SELECT f.id, f.target_object_id 
FROM "scheduling_flownode" f, 
     "xact_message" m 
where f.target_object_id = m.id 
and f.root_node=True 
AND f.state=1 
and m.state=4 
and m.templatelanguage_id IN (17, 18, 19, 20, 21, 22, 23, 24);

On execution, I get the following query plan

  Gather  (cost=252701.26..1711972.04 rows=374109 width=8) (actual time=17737.908..164181.063 rows=441130 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=35705 read=1346425, temp read=18190 written=18148
   ->  Hash Join  (cost=251701.26..1673561.14 rows=155879 width=8) (actual time=18805.587..163991.468 rows=147043 loops=3)
         Hash Cond: (f.target_object_id = m.id)
         Buffers: shared hit=35705 read=1346425, temp read=18190 written=18148
         ->  Parallel Bitmap Heap Scan on scheduling_flownode f  (cost=124367.21..1523127.76 rows=2061083 width=8) (actual time=963.910..155466.840 rows=1642157 loops=3)
               Recheck Cond: (state = 1)
               Rows Removed by Index Recheck: 44
               Filter: root_node
               Rows Removed by Filter: 12406874
               Heap Blocks: exact=10570 lossy=427078
               Buffers: shared read=1328631
               ->  Bitmap Index Scan on "root-node-and-state"  (cost=0.00..123130.57 rows=4946600 width=0) (actual time=955.044..955.045 rows=4926472 loops=1)
                     Index Cond: ((root_node = true) AND (state = 1))
                     Buffers: shared read=13464
         ->  Hash  (cost=120677.64..120677.64 rows=405712 width=4) (actual time=7124.131..7124.131 rows=441128 loops=3)
               Buckets: 131072  Batches: 8  Memory Usage: 2966kB
               Buffers: shared hit=35591 read=17793, temp written=3384
               ->  Bitmap Heap Scan on xact_message m  (cost=7893.56..120677.64 rows=405712 width=4) (actual time=61.307..6925.456 rows=441128 loops=3)
                     Recheck Cond: (state = 4)
                     Filter: (templatelanguage_id = ANY ('{17,18,19,20,21,22,23,24}'::integer[]))
                     Rows Removed by Filter: 4
                     Heap Blocks: exact=16585
                     Buffers: shared hit=35591 read=17793
                     ->  Bitmap Index Scan on "state-index"  (cost=0.00..7792.13 rows=421826 width=0) (actual time=58.781..58.781 rows=441132 loops=3)
                           Index Cond: (state = 4)
                           Buffers: shared hit=2420 read=1209
 Planning time: 1.382 ms
 Execution time: 164289.481 ms
(31 rows)


scheduling_flownode here has over 400,00,000 entries and xact_message has about 50,00,000 rows. Working on postgres 10, am I wrong to believe that this much load should be handled easily by postgres ? If it can, am I doing something wrong with the query here ?

7
  • 1
    The statistics are way off. Does running analyze scheduling_flownode; change anything? Commented Jul 4, 2019 at 14:51
  • It depends on your hardware, your other load, and your indexing. Commented Jul 4, 2019 at 15:04
  • Hardware includes AWS RDS server 8GB RAM, 200 GB Memory and indexes are as per the query plan. There's absolutely no other load on the system. Single query running Commented Jul 4, 2019 at 15:06
  • \d scheduling_flownode, \d xact_message are needed. I would say that target_object_id should be indexed, also your execution plan looks strange - what is (root_node AND (date_part('month'::text, created) = '6'::double precision))??? Please call explain once again, with provided query. Commented Jul 4, 2019 at 15:16
  • Updated the comment to remove datepart. Sorry for the confusion. Just been trying too many things. Also, there's an index on target_object_id already but postgres isnt using it Commented Jul 4, 2019 at 15:25

1 Answer 1

2

You didn't show what indexes you have, but I strongly recommend that your indexes will cover all of the columns that you are filtering on.

In Postgres 11 that can be done by using COVERING indexes, so for example on table scheduling_flownode you will have an index like:

CREATE INDEX ix_scheduling_flownode_target_object_id 
  ON scheduling_flownode(target_object_id) 
    INCLUDE (state, root_node);

In Postgres 10 simply include the columns in the index:

CREATE INDEX ix_scheduling_flownode_target_object_id 
  ON scheduling_flownode(target_object_id, state, root_node);

Do the same for table xact_message with templatelanguage_id and state.

Sign up to request clarification or add additional context in comments.

1 Comment

We added this query but it seems that postgres doesnt use it while querying

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.