I have a postgres table ("dist_mx") that indicates the distances between two points (geographic space). The points are defined in the "hex_0" and "hex_1" columns. The table will eventually be 10^7 to 10^8 rows. The table is structured as such:
One of the purposes of this table is to query the shortest distance from a list of points (1000s) to the points that correspond to locations of interest. For example, I want to know the shortest distance from each point to a grocery stores (we know how each grocery store corresponds to point ids).
I'm using a UNION statement to run the query. The OR statement is used because the order of the points is arbitrary (i.e., pairs aren't repeated in reverse order). See below:
SELECT MIN(distances) FROM dist_mx
WHERE ((point_id_0= '8829abb139fffff' AND point_id_1 IN ('8829abb555fffff', ...))
OR (point_id_1= '8829abb139fffff' AND point_id_0 IN ('8829abb555fffff', ...))
UNION
SELECT MIN(distances) FROM dist_mx
WHERE ((point_id_0= '8829abb469fffff' AND point_id_1 IN ('8829abb555fffff', ...))
OR (point_id_1= '8829abb469fffff' AND point_id_0 IN ('8829abb555fffff', ...))
...
The query seems to be working as intended but it is slow. It takes 20 minutes for the query to run on a list of ~4500 points. I have tried chunking the query so I only include 500 queries at a time (i.e., connected by the UNION statement), but this does not significantly change performance.
I'm relatively new to postgres so I am hoping that there is a fairly simple speedup (or a not fairly simple speedup)?


\d dist_mxin psql will do it.