0

I have a list of numpy arrays. Each array holds the value for the column I need there to be a 1 in a sparse binary matrix. The order in which the numpy array appears in the list is the value for the row for all the values in that respective numpy array. I am trying to use scipy.sparse.csr_matrix to do this. So I need two numpy arrays one for the column indices and the other for the row indices. Here is an example of what I need:

a = np.array([1, 2, 3, 4, 5, 6])
b = np.array([10, 11, 12])
c = np.array([60, 100])
d = [a, b, c]

column = np.array([1, 2, 3, 4, 5, 6, 10, 11, 12, 60, 100])
row = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2])

2 Answers 2

3

column is just a flattened vector of d, so try:

column = np.hstack(d)

For row, this should work:

row = np.hstack([np.ones(len(arr))*i for i, arr in enumerate(d)])

Basically: step through d, make an array of ones of length of each item in d, multiply by its index in d, and then flatten all these into a vector.

Sign up to request clarification or add additional context in comments.

1 Comment

np.concatenate does the same with these 1d arrays.
0

A couple of other methods of generating the row array:

row = np.concatenate([np.ones_like(x)*i for i,x in enumerate(d)])

row = np.concatenate([[i]*len(x) for i,x in enumerate(d)])

for this small example the latter, with list replication, is quite a bit faster. But with large arrays timings might go the other way.

For 1d arrays like this, hstack is the same as the default concatenate.

col = np.concatenate(d)

The full sparse call is then (for shape (N,M))

sparse.csr_matrix((np.ones_like(col),(row, col)),shape=(N,M))

coo, csr and csc all accept this style of input. coo is different in that it assigns the attributes exactly as given (so is fast). The others do some sorting and summing (allowing for duplicate row,col pairs), and cleaning.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.