Saving dictionary of numpy arrays

Question

So I have a DB with a couple of years worth of site data. I am now attempting to use that data for analytics - plotting and sorting of advertising costs by keyword, etc.

One of the data grabs from the DB takes minutes to complete. While I could spend some time optimizing the SQL statements I use to get the data I'd prefer to simply leave that class and it's SQL alone, grab the data, and save the results to a data file for faster retrieval later. Most of this DB data isn't going to change so I could write a separate python script to update the file every 24 hours and then use that file for this long running task.

The data is being returned as a dictionary of numpy arrays. When I use numpy.save('data', data) the file is saved just fine. When I use data2 = numpy.load('data.npy') it loads the file without error. However, the output data2 doesn't not equal the original data.

Specifically the line data == data2 returns false. Additionally, if I use the following:

for key, key_data in data.items():
  print key

it works. But when I replace data.items() with data2.items() then I get an error:

AttributeError: 'numpy.ndarray' object has no attribute 'items'

Using type(data) I get dict. Using type(data2) I get numpy.ndarray.

So how do I fix this? I want the loaded data to equal the data I passed in for saving. Is there an argument to numpy.save to fix this or do I need some form of simple reformatting function to reformat the loaded data into the proper structure?

Attempts to get into the ndarray via for loops or indexing all lead to errors about indexing a 0-d array. Casting like this dict(data2) also fails for iterating over a 0-d array. However, Spyder shows value of the array and it includes the data I saved. I just can't figure out how to get to it.

If I need to reformat the loaded data I'd appreciate some example code on how to do this.

Note to future self: np.savez('arrs', **my_dict) is simpler to use than np.save(my_dict) for a flat dict of arrays, because the result of np.load('arrs.npz') can be directly indexed like a Python dict. — Yibo Yang
– Yibo Yang, Commented Dec 8, 2019 at 6:03

hpaulj · Accepted Answer · 2015-06-12 21:39:02Z

65

Let's look at a small example:

In [819]: N
Out[819]: 
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])

In [820]: data={'N':N}

In [821]: np.save('temp.npy',data)

In [822]: data2=np.load('temp.npy')

In [823]: data2
Out[823]: 
array({'N': array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])}, dtype=object)

np.save is designed to save numpy arrays. data is a dictionary. So it wrapped it in a object array, and used pickle to save that object. Your data2 probably has the same character.

You get at the array with:

In [826]: data2[()]['N']
Out[826]: 
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])

edited Jun 12, 2015 at 21:39

answered Jun 12, 2015 at 21:10

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Gabe Spradlin Over a year ago

Would never have occurred to me to use [()]. I don't think I've ever seen an index like that in any other language I've used.

TNT Over a year ago

If you only save a dictionary of arrays, and numpy.save uses pickle anyways, you might as well use pickle directly?

hpaulj Over a year ago

np.save is the most direct and compact way to save an array. For a dictionary of arrays, I'd prefer np.savez. For a dictionary, which might have stuff besides arrays, np.save as used here, or pickle are probably equivalent. Funny thing is, I'm having trouble running pickle (without reading the docs) to make a comparison.

hpaulj Over a year ago

Indexing takes a tuple, one value for each dimension of the array. For a 0d array that means you can index with a empty tuple. It works just like [(1,2)], [1,2], just for 0d arrays.

blue Over a year ago

instead of the weird [()] notation (which corresponds to indexing with an empty tuple, I guess), numpy offers a method that does exactly the same thing: item.

|

SeF · Accepted Answer · 2022-02-18 14:42:31Z

When saving a dictionary with numpy, the dictionary is encoded into an array. To have what you need, you can do as in this example:

my_dict = {'a' : np.array(range(3)), 'b': np.array(range(4))}

np.save('my_dict.npy',  my_dict)    

my_dict_back = np.load('my_dict.npy')

print(my_dict_back.item().keys())    
print(my_dict_back.item().get('a'))

So you are probably missing .item() for the reloaded dictionary. Check this out:

for key, key_d in data2.item().items():
    print key, key_d

The comparison my_dict == my_dict_back.item() works only for dictionaries that does not have lists or arrays in their values.

EDIT: for the item() issue mentioned above, I think it is a better option to save dictionaries with the library pickle rather than with numpy.

SECOND EDIT: if not happy with pickle, and all the types in the dictionary are compatible with , json is an option as well.

Ben Usman · Accepted Answer · 2016-06-27 21:38:28Z

7

I really liked the deepdish (it saves them in HDF5 format):

>>> import deepdish as dd
>>> d = {'foo': np.arange(10), 'bar': np.ones((5, 4, 3))}
>>> dd.io.save('test.h5', d)

$ ddls test.h5
/bar                       array (5, 4, 3) [float64]
/foo                       array (10,) [int64]

>>> d = dd.io.load('test.h5')

for my experience, it seems to be partially broken for large datasets, though :(

edited Jun 27, 2016 at 21:38

answered Jun 16, 2016 at 19:49

Ben Usman

8,4576 gold badges48 silver badges66 bronze badges

2 Comments

Hannes Landeholm Over a year ago

In which way is it broken for large datasets?

Ben Usman Over a year ago

I encountered some issues with saving extremely large lists of objects

Collectives™ on Stack Overflow

Saving dictionary of numpy arrays

3 Answers 3

6 Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

Comments

2 Comments

Linked

Related