1

I have a TfIDF matrix of size

tr_tfidf_q1.shape, tr_tfidf_q2.shape which gives 
( (404288, 83766), (404288, 83766) )

Now I save it using

np.save('tr_tfidf_q1.npy', tr_tfidf_q1)

When I load the file like this

f = np.load('tr_tfidf_q1.npy') 
f.shape() ## returns an empty array.
()

Thanks in advance.

2
  • What's the size of the file (from OS)? Commented Apr 17, 2017 at 16:58
  • Its around 37MB. But i can it now as an array as well. Commented Apr 17, 2017 at 17:09

2 Answers 2

1
In [172]: from scipy import sparse
In [173]: M=sparse.csr_matrix(np.eye(10))
In [174]: np.save('test.npy',M)


In [175]: f=np.load('test.npy')
In [176]: f
Out[176]: 
array(<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 10 stored elements in Compressed Sparse Row format>, dtype=object)

Note the dtype=object wrapper. This has shape (), 0d. A sparse matrix is not a regular array, or subclass. So np.save resorts to wrapping it in an object array, and letting the object's own pickle method take care of the writing.

In [177]: f.item()
Out[177]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 10 stored elements in Compressed Sparse Row format>
In [178]: f.shape
Out[178]: ()

Using pickle directly:

In [181]: with open('test.pkl','wb') as f:
     ...:     pickle.dump(M,f)

In [182]: with open('test.pkl','rb') as f:
     ...:     M1=pickle.load(f)    
In [183]: M1
Out[183]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 10 stored elements in Compressed Sparse Row format>

The newest scipy release has new function for saving sparse matrices

https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.save_npz.html

Sign up to request clarification or add additional context in comments.

Comments

0

I solved it myself.

f = np.load('tr_tfidf.npy')
f ## returns the below.

array(<404288x83766 sparse matrix of type '<class 'numpy.float64'>'
with 2117757 stored elements in Compressed Sparse Row format>, dtype=object)

I belive XYZ.shape works with references as well.

1 Comment

A csr_matrix is not a regular array, and is not saved directly by np.save. Instead it wraps it in a 0d object array, and the sparse matrix is pickled. So f.shape is the shape of that wrapper. f.item() should give you the sparse matrix itself.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.