2

I have raster data where each pixel is assigned a feature id and the features form connected pixel regions. We could think of this as a rasterized map of different countries.

raster = np.array([
    [0, 1, 1, 1, 2, 2],
    [0, 1, 1, 2, 2, 3],
    [0, 1, 4, 4, 4, 3],
    [0, 0, 0, 0, 0, 0]
])

This can be turned into a GeoPandas GeoDataFrame using rasterio.features.shapes:

geo_json_shapes, feature_ids = zip(*rio.features.shapes(raster, connectivity=4))
shapes = [shapely.geometry.shape(g) for g in geo_json_shapes]
gdf = gpd.GeoDataFrame(dict(id=feature_ids, geometry=shapes))
gdf.plot(column='id', edgecolor='red')

plot of GeoDataFrame

What I would like to do from this starting point is basically the inverse of geopandas.GeoSeries.polygonize, which takes a series of LineStrings and puts them together to form Polygons. That is, I want to extract from this a list of borders between countries. For example the border between countries with ids 1 and 2 would be a LineString with coordinates (4,0), (4,1), (3,1), (3,2). Hence, counting pixels outside the raster as sixth region with id None, this would results in a series of 12 LineStrings, each of them equipped with two ids for the two countries it is the common boundary of.

Is there any function of GeoPandas or similar that can be used for this?

In the end the goal is to simplify these borders by applying some line generalisation algorithm (with parameters depending on which feature ids are involved) to the individual LineStrings and then put everything back together into a GeoDataFrame of simplified features.

1 Answer 1

2

Set the geometry to the dataframes boundary, intersect the df with itself, select rows where ids are different:

df["id"] = df["id"].astype(int)
df.geometry = df.boundary
df = gpd.overlay(df1=df, df2=df, how="intersection")
df = df.loc[df["id_1"] < df["id_2"]]

enter image description here

2
  • Thank you, this was indeed almost the solution. In order to also obtain the linework of the outer boundary, I added an additional row with geometry df.union_all().boundary to the boundary dataframe. However, many of the resulting geometries are actually MultiLineStrings. This can be fixed with a simple df.geometry = df.geometry.line_merge(), but I don't know why it is happening in the first place. Commented Sep 26, 2024 at 12:05
  • 1
    Seems to be a shapely problem: even (a := LineString(((0,0), (0,1), (1,1)))).intersection(a) results in a MultiLineString. Commented Sep 26, 2024 at 12:45

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.