0

Assuming there is a raster file that contains multiple bands, is there a simple way to extract the pixel value for each band and store this information in a dataframe for later use in machine learning modeling?

2
  • What do you want to dataframe to look like? One row for each pixel and multiple columns for each bandd value? Commented Apr 2 at 13:20
  • Exactly. The dataframe should have the rows (pixels) and columns (bands). For example, if we had a [3,2,2] raster - where we have 3 bands, the extent is only 2x and 2y, we would have 4 pixels total with 3 bands. This would then be converted into a dataframe with the dimensions [4x3]. Commented Apr 2 at 13:40

1 Answer 1

0

One solution is using a combination of rioxarray (raster handling) and GeoPandas (dataframe handling).

def raster_to_gdf(r_path):
    # Read in the raster
    raster = rioxarray.open_rasterio(r_path)
    
    # Create gdf from raster shape
    gdf = gpd.GeoDataFrame(index = np.arange(raster.shape[1] * raster.shape[2]))

    # Add band values as columns
    for band in range(raster.shape[0]):
        gdf[band] = raster[band].values.flatten()

    return gdf

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.