16

I am using numpy.loadtext to generate a structured Numpy array from a CSV data file that I would like to save to a MAT file for colleagues who are more familiar with MATLAB than Python.

Sample case:

import numpy as np
import scipy.io

mydata = np.array([(1, 1.0), (2, 2.0)], dtype=[('foo', 'i'), ('bar', 'f')])
scipy.io.savemat('test.mat', mydata)

When I attempt to use scipy.io.savemat on this array, the following error is thrown:

Traceback (most recent call last):
  File "C:/Project Data/General Python/test.py", line 6, in <module>
    scipy.io.savemat('test.mat', mydata)
  File "C:\python35\lib\site-packages\scipy\io\matlab\mio.py", line 210, in savemat
    MW.put_variables(mdict)
  File "C:\python35\lib\site-packages\scipy\io\matlab\mio5.py", line 831, in put_variables
    for name, var in mdict.items():
AttributeError: 'numpy.ndarray' object has no attribute 'items'

I'm a Python novice (at best), but I'm assuming this is because savemat is set up to handle dicts and the structure of Numpy's structured arrays is not compatible.

I can get around this error by pulling my data into a dict:

tmp = {}
for varname in mydata.dtype.names:
    tmp[varname] = mydata[varname]

scipy.io.savemat('test.mat', tmp)

Which loads into MATLAB fine:

>> mydata = load('test.mat')

mydata = 

    foo: [1 2]
    bar: [1 2]

But this seems like a very inefficient method since I'm duplicating the data in memory. Is there a smarter way to accomplish this?

1
  • 3
    Don't worry about potential data copies. savemat has to manipulate the data so it can write it in a MATLAB compatible form. File writing takes more time than array copy. Focus on the best MATLAB data structure. Commented Feb 29, 2016 at 19:06

1 Answer 1

17

You can do scipy.io.savemat('test.mat', {'mydata': mydata}).

This creates a struct mydata with fields foo and bar in the file.

Alternatively, you can pack your loop in a dict comprehension:

tmp = {varname: mydata[varname] for varname in mydata.dtype.names}

I don't think creating a temprorary dictionary duplicates data in memory, because Python generally only stores references, and numpy in particular tries to create views into the original data whenever possible.

Sign up to request clarification or add additional context in comments.

1 Comment

In quick time tests, saving tmp is faster than saving mydata. But time shouldn't be the big issue here.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.