1

I have a huge sparse matrix in Scipy and I would like to replace numerous elements inside by a given value (let's say -1).

Is there a more efficient way to do it than using:

SM[[rows],[columns]]=-1

Here is an example:

Nr=seg.shape[0] #size ~=50000

Im1=sparse.csr_matrix(np.append(np.array([-1]),np.zeros([1,Nr-1])))
Im1=sparse.csr_matrix(sparse.vstack([Im1,sparse.eye(Nr)]))
Im1[prev[1::]-1,Num[1::]-1]=-1 # this line is very slow

Im2=sparse.vstack([sparse.csr_matrix(np.zeros([1,Nr])),sparse.eye(Nr)])

IM=sparse.hstack([Im1,Im2]) #final result
7
  • What do you mean by nomerous? How are they arranged? Do you want to replace an entire row or column? Commented May 26, 2014 at 13:01
  • By numerous I intend ~ 40 000 elements to replace. Actually I have to replace one element by row or by column Commented May 26, 2014 at 13:03
  • Do they have the same values? Or is it everytime only an isolated random entry with random value? Maybe you could supply an example? Commented May 26, 2014 at 13:04
  • Ok, can you in addition supply - by editing your question - a minimal working example? Commented May 26, 2014 at 13:26
  • I just added an example Commented May 26, 2014 at 14:03

1 Answer 1

1

I've played around with your sparse arrays. I'd encourage you to do some timings on smaller sizes, to see how different methods and sparse types behave. I like to use timeit in Ipython.

Nr=10 # seg.shape[0] #size ~=50000
Im2=sparse.vstack([sparse.csr_matrix(np.zeros([1,Nr])),sparse.eye(Nr)])

Im2 has a zero first row, and offset diagonal on the rest. So it's simpler, though not much faster, to start with an empty sparse matrix:

X = sparse.vstack([sparse.csr_matrix((1,Nr)),sparse.eye(Nr)])

Or use diags to construct the offset diagonal directly:

X = sparse.diags([1],[-1],shape=(Nr+1, Nr))

Im1 is similar, except it has a -1 in the (0,0) slot. How about stacking 2 diagonal matrices?

X = sparse.vstack([sparse.diags([-1],[0],(1,Nr)),sparse.eye(Nr)])

Or make the offset diagonal (copy Im2?), and modify [0,0]. A csr matrix gives an efficiency warning, recommending the use of lil format. It does, though, take some time to convert tolil().

X = sparse.diags([1],[-1],shape=(Nr+1, Nr)).tolil()
X[0,0] = -1  # slow warning with csr

Let's try your larger insertions:

prev = np.arange(Nr-2)  # what are these like?
Num = np.arange(Nr-2)
Im1[prev[1::]-1,Num[1::]-1]=-1

With Nr=10, and various Im1 formats:

lil - 267 us
csr - 1.44 ms
coo - not supported
todense - 25 us

OK, I've picked prev and Num such that I end up modifying diagonals of Im1. In this case it would be faster to construct those diagonals right from the start.

X2=Im1.todia()
print X2.data
[[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [-1. -1. -1. -1. -1. -1. -1.  0.  0.  0.]]
print X2.offsets
[-1  0]

You may have to learn how various sparse formats are stored. csr and csc are a bit complex, designed for fast linear algebra operations. lil, dia, coo are simpler to understand.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.