0

I am a newbie to database optimisations, The table data I have is around 29 million rows, I am running on Pgadmin to do select * on the rows and it takes 9 seconds. What can I do to optimize performance?

SELECT 
  F."Id", 
  F."Name", 
  F."Url", 
  F."CountryModel", 
  F."RegionModel", 
  F."CityModel", 
  F."Street", 
  F."Phone", 
  F."PostCode", 
  F."Images", 
  F."Rank", 
  F."CommentCount", 
  F."PageRank", 
  F."Description", 
  F."Properties", 
  F."IsVerify", 
  count(*) AS Counter 
FROM 
  public."Firms" F, 
  LATERAL unnest(F."Properties") AS P 
WHERE 
  F."CountryId" = 1 
  AND F."RegionId" = 7 
  AND F."CityId" = 4365 
  AND P = ANY (ARRAY[126, 128]) 
  AND F."Deleted" = FALSE 
GROUP BY 
  F."Id" 
ORDER BY 
  Counter DESC, 
  F."IsVerify" DESC, 
  F."PageRank" DESC OFFSET 10 ROWS FETCH FIRST 20 ROW ONLY

Thats my query plan

"  ->  Sort  (cost=11945.20..11948.15 rows=1178 width=369) (actual time=8981.514..8981.515 rows=30 loops=1)"
"        Sort Key: (count(*)) DESC, f.""IsVerify"" DESC, f.""PageRank"" DESC"
"        Sort Method: top-N heapsort  Memory: 58kB"
"        ->  HashAggregate  (cost=11898.63..11910.41 rows=1178 width=369) (actual time=8981.234..8981.305 rows=309 loops=1)"
"              Group Key: f.""Id"""
"              Batches: 1  Memory Usage: 577kB"
"              ->  Nested Loop  (cost=7050.07..11886.85 rows=2356 width=360) (actual time=79.408..8980.167 rows=322 loops=1)"
"                    ->  Bitmap Heap Scan on ""Firms"" f  (cost=7050.06..11716.04 rows=1178 width=360) (actual time=78.414..8909.649 rows=56071 loops=1)"
"                          Recheck Cond: ((""CityId"" = 4365) AND (""RegionId"" = 7))"
"                          Filter: ((NOT ""Deleted"") AND (""CountryId"" = 1))"
"                          Heap Blocks: exact=55330"
"                          ->  BitmapAnd  (cost=7050.06..7050.06 rows=1178 width=0) (actual time=70.947..70.947 rows=0 loops=1)"
"                                ->  Bitmap Index Scan on ""IX_Firms_CityId""  (cost=0.00..635.62 rows=58025 width=0) (actual time=11.563..11.563 rows=56072 loops=1)"
"                                      Index Cond: (""CityId"" = 4365)"
"                                ->  Bitmap Index Scan on ""IX_Firms_RegionId""  (cost=0.00..6413.60 rows=588955 width=0) (actual time=57.795..57.795 rows=598278 loops=1)"
"                                      Index Cond: (""RegionId"" = 7)"
"                    ->  Function Scan on unnest p  (cost=0.00..0.13 rows=2 width=0) (actual time=0.001..0.001 rows=0 loops=56071)"
"                          Filter: (p = ANY ('{126,128}'::integer[]))"
"                          Rows Removed by Filter: 2"
"Planning Time: 0.351 ms"
"Execution Time: 8981.725 ms"```
5
  • 2
    The unnest is probably unnecessary but without knowing what exactly you are trying to achieve there this is really hard to tell. What exactly is the data type of the column "Firms."."Properties"? Commented Apr 9, 2022 at 11:23
  • datatype is integer[] Commented Apr 9, 2022 at 15:05
  • Are the predicates on country and region necessary? Commented Apr 9, 2022 at 16:26
  • Query slowed even further without country and region estimates :( Commented Apr 9, 2022 at 22:13
  • Why the unnest()? What are you trying to achieve there? Commented Apr 10, 2022 at 10:22

1 Answer 1

1

Create a GIN index on F."Properties",

create index on "Firms" using gin ("Properties");

then add a clause to your WHERE

...
AND P = ANY (ARRAY[126, 128]) 
AND "Properties" && ARRAY[126, 128]
....

That added clause is redundant to the one preceding it, but the planner is not smart enough to reason through that so you need to make it explicit.

Sign up to request clarification or add additional context in comments.

5 Comments

thanks for the answer, this method speeded up my query a lot but, now the most relevant ones do not appear on the top when ordering by counter
I don't understand that, it should not change the output at all. Did you just add the new condition, or did you replace the P=ANY... condition with new condition? If you replaced it, then yes the count will no longer be the same. That is why you need to add, not replace.
Now the query is running fast as it is, but when I tried it without region and city estimates, the query slowed down, how should I edit it?
Without the condition on city, you essentially have an entirely different question. Maybe you could post it as a new question with new query plans. I'm certainly not surprised it would slow down as it should be returning vastly more rows, but without the GIN index it would probably also slow down if you dropped city condition, perhaps by even more.
i am trying to open it as a different topic but it closed saying duplicate topic:(

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.