table structure
Column | Type | Collation | Nullable | Default
------------+--------------------------------+-----------+----------+------------------------------------
id | integer | | not null | nextval('events_id_seq'::regclass)
event_name | character varying(255) | | not null |
data | jsonb | | not null |
Indexes:
"events_pkey" PRIMARY KEY, btree (id)
"events_expr_idx" btree ((data -> 'program_id'::text))
"idxginp" gin (data)
what 'data' might look like
{
"label":"Take Survey Button",
"sent_at":"2018-07-25",
"user_id":123456,
"order_id":7654321,
"event_time":"2018-07-25 00:09:41",
"program_id":4,
"user_agent":"Mozilla/5.0 ...",
"destination":"http://www.google.com",
"communication_kind":"email",
"communication_name":"welcome"
}
Query that I need to run:
select label, count(*) as clicks
from (select distinct "data" ->> 'label' as "label", "data" ->> 'user_id' as "user_id"
from "events"
where "event_name" = 'communication_link_clicked'
and data @> '{"communication_kind": "email"}'
and data @> '{"communication_name": "program_welcome"}'
and CAST ((data ->> 'sent_at') as date ) between '2018-07-25' and '2019-05-18'
and CAST ((data ->> 'program_id') as int) = 4) as sub
group by "label";
This query on my small VM takes around 1 minute.
Records on this table: ~4 million. Records where data ->> program_id = 4 (~1 million.)
I've tried creating individual indexes for the columns included in the where condition but I din't see major improvements.
Should I convert the json attributes used in the WHERE clause to recular columns? or Am I just missing more indexes and tune my query?
EXPLAIN:
EXPLAIN (ANALYZE, BUFFERS) select label, count(*) as clicks
from (select distinct "data" ->> 'label' as "label", "data" ->> 'user_id' as "user_id"
from "events"
where "event_name" = 'communication_link_clicked'
and data @> '{"communication_kind": "email"}'
and data @> '{"communication_name": "program_welcome"}'
and CAST ((data ->> 'sent_at') as date ) between '2018-07-25' and '2019-05-18'
and CAST ((data ->> 'program_id') as int) = 4) as sub
group by "label";
GroupAggregate (cost=104.24..104.28 rows=1 width=40) (actual time=43735.844..43735.844 rows=1 loops=1)
Group Key: ((events.data ->> 'label'::text))
Buffers: shared hit=839 read=143489, temp read=210 written=210
-> Unique (cost=104.24..104.25 rows=1 width=64) (actual time=43719.789..43731.616 rows=46772 loops=1)
Buffers: shared hit=839 read=143489, temp read=210 written=210
-> Sort (cost=104.24..104.25 rows=1 width=64) (actual time=43719.787..43724.865 rows=55219 loops=1)
Sort Key: ((events.data ->> 'label'::text)), ((events.data ->> 'user_id'::text))
Sort Method: external merge Disk: 1680kB
Buffers: shared hit=839 read=143489, temp read=210 written=210
-> Bitmap Heap Scan on events (cost=84.05..104.23 rows=1 width=64) (actual time=448.903..43328.010 rows=55219 loops=1)
Recheck Cond: ((data @> '{"communication_kind": "email"}'::jsonb) AND (data @> '{"communication_name": "program_welcome"}'::jsonb))
Rows Removed by Index Recheck: 1434250
Filter: (((event_name)::text = 'communication_link_clicked'::text) AND (((data ->> 'sent_at'::text))::date >= '2018-07-25'::date) AND (((data ->> 'sent_at'::text))::date > 'program_id'::text))::integer = 4))
Rows Removed by Filter: 264849
Heap Blocks: exact=42839 lossy=99688
Buffers: shared hit=839 read=143489
-> Bitmap Index Scan on idxginp (cost=0.00..84.05 rows=5 width=0) (actual time=437.235..437.235 rows=320068 loops=1)
Index Cond: ((data @> '{"communication_kind": "email"}'::jsonb) AND (data @> '{"communication_name": "program_welcome"}'::jsonb))
Buffers: shared hit=839 read=962
Planning time: 0.890 ms
Execution time: 43736.182 ms