0

I have NetCDF files containing subdatasets with 4 dimensions: time, height, latitude, longitude.

Here is an example of the output of GDAL GetSubDatasets():

('NETCDF:"Martinique_ALT22.nc":height', '[30x30x30] height (32-bit floating-point)'), 
('NETCDF:"Martinique_ALT22.nc":latitude', '[30x30] latitude (32-bit floating-point)'), 
('NETCDF:"Martinique_ALT22.nc":longitude', '[30x30] longitude (32-bit floating-point)'), 
('NETCDF:"Martinique_ALT22.nc":Water_Vapor_Concentration', '[2x30x30x30] Water_Vapor_Concentration (32-bit floating-point)')

When opening these subdatasets in Python with gdal.Open() and ReadAsArray(), the first two dimensions are overlapped and I get a 3 dimensions numpy array.

>>> band = gdal.Open(dataset.GetSubDatasets()[-1][0])
>>> array=band.ReadAsArray()
>>> print(array.shape)
(60, 30, 30)

I read somewhere that this is due to the GDAL raster format accepting only 3 dimensions: bands, rows, columns.

Is there a way to keep the first two dimensions separated and extract these subdatasets in 4-dimension numpy arrays?

3
  • Cant really check if this works without the actual file, hence just a comment: I think you can just do a reshape to get 4 dimensions again, i.e.: array = band.ReadAsArray().reshape((2, 30, 30, 30), order = 'C') you might have to change the order from C to either F or A (docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html), not sure about that. Otherwise, I would definitely consider using the netCDF4 package (unidata.github.io/netcdf4-python), which can easily handle many dimensions. Commented Mar 15, 2019 at 13:42
  • I think xarray is going to handle this case much better. Commented Mar 15, 2019 at 14:01
  • @BertCoerver The reshape() method worked for me. Strangely enough, the order option had no impact on the output. I had not worked with NetCDF files in a while and forgot about the netCDF4 library: thanks for the reminder, I am totally getting back to this library. If you want to repost your comment as an answer I would gladly accept it. Commented Mar 15, 2019 at 15:45

1 Answer 1

1

By applying a reshape to the "array" variable you can get back the original dimensions:

array = band.ReadAsArray().reshape((2, 30, 30, 30))

Just a side note, it might be interesting to look into the netcdf4 package (https://unidata.github.io/netcdf4-python/) or the xarray package (http://xarray.pydata.org/en/stable/). These can easily handle variables with many dimensions.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.