Running spatial query on huge table [duplicate]

Question

I've got two tables conatining geometry: 1 containing big polygon's (regions, srid=4326) and another containing smaller polygon's (pand, srid=28992, containg ~9.9milion rows)

By using one of the big polygons, I want to find all the smaller polygons withing the bigger polygon. In the end I want the result set to be grouped by a column. But now I just want to count the number of rows (=number of objects within the big region)

This is the query I use at the moment:

WITH regio AS (  SELECT ST_Transform(the_geom,28992) as the_geom 
                 FROM public.regions WHERE gid = 622
              )
SELECT
   count(*) AS count
FROM public.smaller AS p
WHERE
    ST_Within ( p.geovlak, 
                (SELECT the_geom FROM regio)
              )

But somehow this query takes ages to complete. This is the Queryplan:

Aggregate  (cost=3018553.41..3018553.42 rows=1 width=0)
  CTE regio
    ->  Index Scan using regions_prim_key_gid2 on regions  (cost=0.28..8.29 rows=1 width=1458)
          Index Cond: (gid = 622)
  InitPlan 2 (returns $1)
    ->  CTE Scan on regio  (cost=0.00..0.02 rows=1 width=32)
    ->  Seq Scan on smaller p  (cost=0.00..3010265.32 rows=3311910 width=0)
        Filter: st_within(geovlak, $1)

I'm no expert in reading these queryplans; but the second to last row says there are ~3.3 rows instead of the ~9.9. So there is already some filtering in place. But
1 - what filtering is this? And
2 - I can't figure out where the problem in this query lies.

When changing the count(*) to a * and limit the result to 10; the query takes 99seconds to run, so that's also not fast enough. I can understand it takes some time, but it is still running :(

Spatial indexes:

CREATE INDEX smaller_geom_idx
  ON smaller
  USING gist
  (geovlak);

Do you have spatial indexes in both tables? How selective the query is - how many rows out of 9.9 million are within the big polygon? — user30184
– user30184, Commented May 8, 2015 at 12:16
I do have a spatial index on geovlak; but I don't have one on the bigger polygon. Is the latter one needed? — stUrb
– stUrb, Commented May 8, 2015 at 12:30
I don't know the number of rows wihtin the region. That's the reason why I want to run this query :). Probably it's around 750.000. Does it matter that there is a ST_transform on one of the spatially indexed columns? — stUrb
– stUrb, Commented May 8, 2015 at 12:33
I asked for getting a rough estimate. If query selects finally almost everything then index does not make it faster. In your case it should have an effect even with close to 10 percent in the resultset it is not especially selective. You can estimate the fastest possible speed by doing a bounding box based selection with && as described in gis.stackexchange.com/questions/131612/…. ST_Within must use the real geometries for comparison and it is much slower especially if the big polygon has huge number of vertices. — user30184
– user30184, Commented May 8, 2015 at 12:52
Using the &&-operator instead of the ST_within the number differ from 300k to 1.5m. Query takes as low as 10seconds to run. So it might be better to first create a subset of the big table and to use that with the ST_within function? The big polygon indeed has much vertices. — stUrb
– stUrb, Commented May 8, 2015 at 13:12

Community · Accepted Answer · 2020-06-11 15:27:48Z

3

The question is a duplicate of the one here but it may not be so obvious so let me explain.

ST_Within is defined as:

-- Inlines index magic
CREATE OR REPLACE FUNCTION ST_Within(geom1 geometry, geom2 geometry)
    RETURNS boolean
    AS 'SELECT $1 && $2 AND _ST_Contains($2,$1)'
    LANGUAGE 'sql' IMMUTABLE;

As you can see the function does use && and as that operator can use an index this is what causes the function to be fast if it's inlined. Your function doesn't get inlined because sometimes the query planner just doesn't think it's worth it.

Rewrite the query so there is no subquery as a function argument:

WITH 
  regio AS (  
   SELECT 
     ST_Transform(the_geom,28992) as the_geom 
   FROM 
     public.regions 
   WHERE
     gid = 622
  )
SELECT
   count(*) AS count
FROM
  public.smaller AS p
  JOIN regio AS r
    ON ST_Within ( p.geovlak,r.the_geom)

If the planner is still being stubborn inline the query manually.

edited Jun 11, 2020 at 15:27

CommunityBot

1

answered May 11, 2015 at 15:05

Jakub Kania

2,85418 silver badges22 bronze badges

1

Please correct link to other question - it's currently the same as the link to the ST_Within definition.

Toby Speight
– Toby Speight

2015-05-13 10:44:24 +00:00
Commented May 13, 2015 at 10:44
1

You're a lifesaver. The planner indeed thought it was unnecessary. The query now works rather quickly. Still not lightning fast, but hej ~9.9 milion rows; what would you expect :)

stUrb
– stUrb

2015-05-21 09:13:57 +00:00
Commented May 21, 2015 at 9:13

Add a comment |

Stack Exchange Network

Running spatial query on huge table [duplicate]

1 Answer 1

Linked

Hot Network Questions

Running spatial query on huge table [duplicate]

1 Answer 1

Linked

Related

Hot Network Questions