-4

I'm using H5PY to store a high number of simulations' outputs. Now of course these simulations are parametrized, hence I need to also store what parameters were used for which simulation output.

At first, I wanted to give each simulation a codename and have somewhere a codename to parameters' values matrix but then realized that as simulations' number increases, it will rapidly be expensive to retrieve.

I then thought about creating a group named after a concatenation of all my parameters' values but it doesn't really address the fact that I will need to rapidly retrieve the values of the parameters (I would still need to parse the group's name and extract which values are associated to which parameter). I am therefore now contemplating creating as many groups in the hdf5 structure as I have parameters such that I can directly retrieve the parameters' values when accessing the simulated timeseries, but I would therefore need to create an ungodly amount of subgroups since my parameters take essentially real values (up to the floating point precision of course).

Does that last proposal sound reasonable (it doesn't to me but I don't know, maybe it's not as bad as I'm imagining it) or is there better ways to do, some good practices that I'm unaware of that would address my problem?

Thanks in advance!

1 Answer 1

1

I suggest saving the parameters as attributes at the file group level. That way anyone who accesses the file can easily retrieve the parameters and their values. Here is a simple example that shows how to create parameters/attributes:

with h5py.File('sim_file.h5','w') as h5f:
    h5f.attrs['param1'] = 10 # an int
    h5f.attrs['param2'] = 125.25 # a float
    h5f.attrs['param3'] = 'Average' # a string
    h5f.attrs['param4'] = np.array([0.25, 1.5, 0.75]) # an array

You retrieve the values in a similar way.

with h5py.File('sim_file.h5') as h5f:
    param1 = h5f.attrs['param1']
    param2 = h5f.attrs['param2']
    param3 = h5f.attrs['param3']
    param4 = h5f.attrs['param4']
# or, using keys to loop and access:
    for k in h5f.attrs.keys():
        print(f'Param {k} = {h5f.attrs[k]}')

Here is a previous answer that shows how to create and retieve attributes for all object types (file, group and dataset): How to read HDF5 attributes (metadata) with Python and h5py

Sign up to request clarification or add additional context in comments.

2 Comments

Amazing this is exactly the type of thing I was looking for thank you so much! Now I just need to find a proper way to name the sim_file haha but I'll find a way since I can omit the parameters now!
A simple naming convention is to use the basename of the input file for all output files. For example if the input is simulation_1.inp the output H5 file would be simulation_1.h5

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.