0

I try to extract a small spatial subset (defined by a lon/lat bounding box) of data from a satellite image (VIIRS) using python/xarray. The original data provides 2D-Arrays with pixel lines and rows, and corresponding 2D-Arrays with longitude and latitude coordinates. Essentially, the data is not axis-aligned with the lon/lat-Grid and thus not with my bounding box.

This leaves me with two options:

  1. Subset the original grid to cells, such that my bounding box is included, and mask all values that are within the grid, but not within the bounding box. This approach might keep much more data than required, bloating the file size.
  2. Select only the data points that fall within the bounding box and somehow create variables of them.

I try to take the second approach, following the cf convention "Compression by Gathering" as described here: https://cfconventions.org/cf-conventions/cf-conventions.html#compression-by-gathering

I fail to do that in a manner, that Panoply would allow me to plot it as geo-referenced data

Let's take one concrete example: Within my bounding box (lon: 7 - 11°, lat: 47 - 50°), I have datapoints at 619811 locations. They are distributed over 575123 longitudes coordinates and 421930 latitude coordinates. I only select the channel I04 and the corresponding data.

In my code I have the following variables

n = 619811 # The number of observations in my bounding box.
i04 # xr.DataArray with the observations. Length n. Dimension "locations"
lat # xr.DataArray with a latitude for each observation. Length n. Dimension "locations"
lon # xr.DataArray with a longitutde for each observation. Length n. Dimension "locations"
ulat # xr.DataArray with unique and sorted latitudes. Length 421930. Dimension "latitude"
ulon # xr.DataArray with unique and sorted longitutdes. Length 575123. Dimension "longitude"
inv_lat_map # np.array with a mapping from lat[i] -> j; ulat[j] == lat[i], Length n.
inv_lon_map # np.array with a mapping from lon[i] -> j; ulon[j] == lon[i], Length n.
locations # xr.DataArray with inv_lat_map * len(ulon) + inv_lon_map; With the attribute compress="latitude longitude"

Of course, all lon/lat-Variables come with unit=degree_[north/east].

Now, I need to know how I correctly assembly xr.DataSet and the NetCDF-file, such that Panoply properly detects the variable I04 as geo-referenced variable.

I tried:

xr.DataSet(
    data_vars={
      "I04":i04,
      "locations":locations,
      "latitude":ulat,
      "longitude":ulon,
    }
)

I also tried, putting latitude and longitude, or locations, or all three of them to "coords". I also assigned I04 an attribute: "coordinates" = "latitude longitude", or coordinates = locations, or assiging the coordinates = "latitude longitude" to the locations... All to no avail! Panoply consistently refuses to treat I04 as georeferenced variable.

This leads me to the questions:

  1. Does Panoply support georeferenced variables that are "compressed by gathering"?
  2. If so - how would I properly assemble my dataset for this to work.
1
  • 1
    The underlying library that Panoply uses for a decent chunk of its I/O / conventions handling is netCDF-Java, and that does not currently support this feature of the CF conventions. It's possible Panoply has code to handle it, but if not, the netCDF-Java library won't be able to handle it under-the-hood. However, if you were able to get me an example dataset, we'd love to be able to add that support. Commented Sep 24 at 14:25

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.