Problem statement
My goal is to filter the observations table and find all points that intersect with the grid cells of my area of interest. I'm using the dbplyr-package in R so that I can write and send queries to a PostGIS database using R-code. This works for the most part, but I'm having troubles with spatial filtering. In my case, I want to filter points/centroids that intersect with my area of interest.
Normally I would use the sf-package for this, but it seems you can't use sf-functions to directly query a PostGIS database. I checked other posts, but they only use SQL-statements directly and those solutions don't work with a lazy-table setup as I have. About lazy table/evaluation: The most important difference between ordinary data frames and remote database queries is that your R code is translated into SQL and executed in the database on the remote server, not in R on your local machine. When working with databases, dplyr tries to be as lazy as possible: It never pulls data into R unless you explicitly ask for it. Additionally, tt delays doing any work until the last possible moment: it collects together everything you want to do and then sends it to the database in one step (source: https://smithjd.github.io/sql-pet/chapter-lazy-evaluation-queries.html)
Code examples
I'm aware that the code below is not reproducible, but hopefully it still provides enough insight into what I'm trying to do. In my case, I first had to create two different connections to finally get the grid cells I'm interested in:
## Connection "con_va" with names of the area of interest and get x-y
all_grids <-
tbl(con_va, "all_grids") |>
filter(
area_name == "my_area"
) |>
distinct(x, y)
## Connection 2 'con_new' with postgis-geometry###
area_of_interest <-
tbl(con_new, dbplyr::in_schema("geodata", "gridceells")) |>
inner_join(
y = all_grids,
join_by(am_x == x, am_y == y),
copy = TRUE
) |>
select(
the_geom
)
The second connection (con_new) also contains the table with all the observations and I want to filter those observations that fall within my area of interest. The observations-table have a column 'epsg28992_centroid' which is of type 'pq_gmtry'.
all_obs <-
tbl(con_new, dbplyr::in_schema("observation", "observations")) |>
select(
subject_txn_id, osn_id, epsg28992_centroid
)
As far as I know, only SQL-statements are currently available for spatial filtering PostGIS-data. But because 'area_of_interest' is a lazy-table, this can't be directly used in an SQL-statement (as far as I'm aware). I have tried the code below, but obviously that doesn't work:
all_obs |>
inner_join(
y = area_of_interest,
join_by(sql("ST_Intersects(area_of_interest.the_geom, epsg28992_centroid)"))
)
## Different approach; similar results
all_obs |>
filter(
sql("ST_Intersects(area_of_interest.the_geom, epsg28992_centroid)")
)
Question
My goal is to filter the observations table and find all points that intersect with the grid cells of my area of interest. What is the appropiate way to write these types of queries for spatial filtering from R? Can I still use the lazy-table I have with the area of interest? The observation-table is massive so not something I want (or even can) read into memory. The locations-tables are much smaller something I could potentially pull into memory if needed, but I doubt that's the best way to go about it.
?st_readthat shows reading data from a Postgres database with an SQL query to subset. What's a "lazy-table"? Somedbplyrspecific thing?st_readquery parameter, but I'm not sure if that would work across databases. Or even across schemas.