0

I had a shapefile that I needed to integrate into my SQL Server DB. To do so, I used the very useful Shape2Sql.exe with which I chose 'Set SRID 4326' in the settings.

I now have a table that contains a geometry column while another table is showing to me the latitude and longitude of different addresses along with other useful information. This can be summed up as follows:

CREATE TABLE dbo.myTable1 (
       FULL_ADDRESS varchar(120) NULL,
       LONGITUDE varchar(50) NULL,
       LATITUDE varchar(50) NULL
                           )

CREATE TABLE dbo.myTable2 (
       POLYGON_NAME nvarchar(255) NULL,
       geom geometry NULL
                           )

What I would like is to join the two tables based on the sets of coordinates that match a certain polygon. I took some inspiration from here and I basically did this:

SELECT DISTINCT
      mt1.FULL_ADDRESS
    , mt1.LONGITUDE
    , mt1.LATITUDE
    , mt2.POLYGON_NAME
INTO #TEMP_TABLE
FROM myTable1 mt1
LEFT JOIN myTable2 mt2 
  ON geom.STIntersects(geometry::STGeomFromWKB((geography::Point(mt2.LATITUDE, mt2.LONGITUDE, 4326)).STAsBinary(), 4326)) = 1

(the DISTINCT is useful because, for some reason that I can't explain yet, I get several matches for a single set)

The results thereof are exactly what I expect but, for 10,000 sets of coordinates, it almost took 3 hours. Problem is, I'm supposed to find the polygons of 25,000,000 sets (which means 300 days of computation...). I therefore need to find a trick to improve that query performance by several orders of magnitude.

BONUS QUESTION:

When I don't store the results of my query inside a temp table (i.e. when I want to see the results), I get the following error message:

A .NET Framework error occurred during execution of user-defined routine or aggregate "No extended information available.": .

But everything is fine when using a SELECT... INTO #TEMP_TABLE instead of a plain and simple SELECT. I couldn't find any piece of information about this message. Any idea to solve it?

1 Answer 1

2

The best way to increase spatial join is to create spatial index.

CREATE INDEX table_gist ON table USING GIST(geom);

Since your first table has coordinates but no geometry you can't create a spatial index. That's why it takes so much time.

You should consider adding a geometry column on your first table based on your coordinate.

ALTER TABLE mytable1
ADD COLUMN geom geometry;

UPDATE mytable1
SET geom=ST_MakePoint(longitude, latitude);

Then you can create an index on mytable1. Your ST_Intersects between your two geometries will be much faster.

PS : I haven't talk about geography/geometry and srid but you should handle it :)

6
  • I'm using SQL Server, which means I wrote the following: ALTER TABLE mytable1 ADD geom geometry; UPDATE mytable1 SET geom=geometry::STGeomFromWKB((geography::Point(LATITUDE, LONGITUDE, 4326)).STAsBinary(), 4326); CREATE NONCLUSTERED INDEX table_gist ON mytable1(geom);. But I got the following message: Column 'geom' in table 'mytable1' is of a type that is invalid for use as a key column in an index or statistics. Commented Oct 19, 2018 at 9:04
  • Obviously on SQL server spatial index are created as following : CREATE SPATIAL INDEX table_index ON table(geom); Commented Oct 19, 2018 at 10:03
  • Yep sorry, I didn't know spatial indexes were a thing in SQL Server. I ended up writing CREATE SPATIAL INDEX table_gist ON myTable1(geom) WITH (BOUNDING_BOX = (minLong, minLat, maxLong, maxLat)) ON [PRIMARY]; and the same query took 2h30 instead of the initial 3h. That's an improvement but not quite sufficient to be fully satisfactory :( Commented Oct 22, 2018 at 13:25
  • Have you created an index on both tables ? Commented Oct 23, 2018 at 9:03
  • 1
    I can confirm the perf was improved by several orders of magnitude, which is exactly what I wanted. Thank you so much! Commented Oct 27, 2018 at 13:19

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.