0

How would I replace values less than 4 with 0 in this array without triggering a SparseEfficiencyWarning and without reducing its sparsity?

from scipy import sparse
x = sparse.csr_matrix(
    [[0, 1, 2, 3, 4],
     [1, 2, 3, 4, 5],
     [0, 0, 0, 2, 5]])
x[x < 4] = 0
x.toarray()  # verifies that this works

Note also that the sparsity between the initial version of x is 11 stored elements, which rises to 15 stored elements after doing the masking.

1 Answer 1

3

Manipulate the data array directly

from scipy import sparse
x = sparse.csr_matrix(
    [[0, 1, 2, 3, 4],
     [1, 2, 3, 4, 5],
     [0, 0, 0, 2, 5]])

x.data[x.data < 4] = 0

>>> x.toarray()
array([[0, 0, 0, 0, 4],
       [0, 0, 0, 4, 5],
       [0, 0, 0, 0, 5]])

>>> x.data
array([0, 0, 0, 4, 0, 0, 0, 4, 5, 0, 5])

Note that the sparsity is unchanged and there are zero values unless you run x.eliminate_zeros().

x.eliminate_zeros()
>>> x.data
array([4, 4, 5, 5])

If for some reason you don't want to use a boolean mask & fancy indexing in numpy, you can loop over the array with numba:

import numba

@numba.jit(nopython=True)
def _set_array_less_than_to_zero(array, value):
    
    for i in range(len(array)):
        if array[i] < value:
            array[i] = 0

This should also be faster than the numpy indexing by a fairly substantial degree.

array = np.arange(10)
_set_array_less_than_to_zero(array, 5)

>>> array
array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9])
Sign up to request clarification or add additional context in comments.

4 Comments

I hit a memory error when doing this; is there some way to reduce memory usage on the x.data[x.data < i] step?
The only reason that you'd have a memory error here is if x.data is copy-on-write or if the boolean x.data < i array can't be allocated. You could iterate over the array to change it instead of making a boolean mask and using indexing if it's x.data < i that's the problem, but its probably a copy on write issue, which you can only fix by changing other code that isn't here.
I'm pretty sure that the error is emerging from the boolean mask; how would I iterate over the array to effect the same masking?
Easiest way I can think of is to write a simple for loop to iterate over the array and JIT compile it with numba.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.