Numpy: creating indices for a sparse matrix

Question

I have a list of numpy arrays. Each array holds the value for the column I need there to be a 1 in a sparse binary matrix. The order in which the numpy array appears in the list is the value for the row for all the values in that respective numpy array. I am trying to use scipy.sparse.csr_matrix to do this. So I need two numpy arrays one for the column indices and the other for the row indices. Here is an example of what I need:

a = np.array([1, 2, 3, 4, 5, 6])
b = np.array([10, 11, 12])
c = np.array([60, 100])
d = [a, b, c]

column = np.array([1, 2, 3, 4, 5, 6, 10, 11, 12, 60, 100])
row = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2])

Adam Stone · Accepted Answer · 2015-02-05 19:49:07Z

3

column is just a flattened vector of d, so try:

column = np.hstack(d)

For row, this should work:

row = np.hstack([np.ones(len(arr))*i for i, arr in enumerate(d)])

Basically: step through d, make an array of ones of length of each item in d, multiply by its index in d, and then flatten all these into a vector.

answered Feb 5, 2015 at 19:49

Adam Stone

2,00614 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hpaulj Over a year ago

np.concatenate does the same with these 1d arrays.

hpaulj · Accepted Answer · 2015-02-06 01:00:43Z

A couple of other methods of generating the row array:

row = np.concatenate([np.ones_like(x)*i for i,x in enumerate(d)])

row = np.concatenate([[i]*len(x) for i,x in enumerate(d)])

for this small example the latter, with list replication, is quite a bit faster. But with large arrays timings might go the other way.

For 1d arrays like this, hstack is the same as the default concatenate.

col = np.concatenate(d)

The full sparse call is then (for shape (N,M))

sparse.csr_matrix((np.ones_like(col),(row, col)),shape=(N,M))

coo, csr and csc all accept this style of input. coo is different in that it assigns the attributes exactly as given (so is fast). The others do some sorting and summing (allowing for duplicate row,col pairs), and cleaning.

Collectives™ on Stack Overflow

Numpy: creating indices for a sparse matrix

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related