Numpy arrays with mixed mutable and immutable values

Question

I'm interested in the best/fastest way to do array ops (dot, outer, add, etc.) while ignoring some values in the array. I'm mostly interested in cases where some (maybe 50%-30%) of the values are ignored and are effectively zero with moderately large arrays, maybe 100,000 to 1,000,000 elements. There are a number of solutions I can think of but none seem to really benefit from the possible advantages of being able to ignore some values. For example:

import numpy as np
A = np.ones((dim, dim)) # the array to modify
B = np.random.random_integers(0, 1, (dim, dim)) # the values to ignore are 0
C = np.array(B, dtype = np.bool)
D = np.random.random((dim, dim)) # the array which will be used to modify A

# Option 1: zero some values using multiplication.
# some initial tests show this is the fastest
A += B * D

# Option 2: use indexing
# this seems to be the slowest
A[C] += D[C]

# Option 3: use masked arrays
A = np.ma.array(np.ones((dim, dim)), mask = np.array(B - 1, dtype = np.bool))
A += D

edit1:

As suggested by cyborg, sparse arrays may be another option. Unfortunately I'm not very familiar with the package and am unable to get the speed advantages that I might be able to. For example if I have a weighted graph with restricted connectivity defined by a sparse matrix A, another sparse matrix B which defines the connectivity (1 = connected, 0 = not connected), and a dense numpy matrix C, I'd like to be able to do something like A = A + B.multiply(C) and take advantage of A and B being sparse.

Does B stays the same during your application, or it may change a lot during the course of processing? i am asking that if it doesnt change, you may take approach similar to this Masked arrays to Compressed arrays. I am assuming that hopping around index is the part which takes long time, judgeing from results of option 2 — yosukesabai
– yosukesabai, Commented Nov 5, 2011 at 22:38
For the purposes I have in mind B would remain fixed. The compressed array sound about right but there doesn't appear to be any follow up and as it looks like it uses masked arrays for all the arithmetic behind the scenes. I doubt it would be any better, but who knows. Masked array was about 4 times as slow as simply using multiplication to zero out values in the simple tests I did. — alto
– alto, Commented Nov 6, 2011 at 1:02
A += B.multiply(C) is the right thing to do. Why do you say you don't get advantage? — cyborg
– cyborg, Commented Nov 6, 2011 at 20:21
if B is sparse and C is dense B.multiply(C) is a dense matrix and I imagine any benefits of using sparse matrices will be overcome by the cost of having to keep converting the result back to a sparse matrix before adding to A. Maybe there is a better way to this, but as I said, I'm not very familiar with scipy.sparse. Also my syntax was wrong. += isn't defined for sparse matrices in scipy, it should be A = A + B.multiply(C) — alto
– alto, Commented Nov 6, 2011 at 20:46

cyborg · Accepted Answer · 2011-11-07 22:14:49Z

With a sparse matrix you can get improvement if the density is less than 10%. A sparse matrix may be faster, depending on whether you include the time required to build the matrix.

import timeit

setup=\
'''
import numpy as np
dim=1000
A = np.ones((dim, dim)) # the array to modify
B = np.random.random_integers(0, 1, (dim, dim)) # the values to ignore are 0
C = np.array(B, dtype = np.bool)
D = np.random.random((dim, dim)) # the array which will be used to modify A
'''

print('mult    '+str(timeit.timeit('A += B * D', setup, number=3)))

print('index   '+str(timeit.timeit('A[C] += D[C]', setup, number=3)))

setup3 = setup+\
''' 
A = np.ma.array(np.ones((dim, dim)), mask = np.array(B - 1, dtype = np.bool))
'''
print('ma      ' + str(timeit.timeit('A += D', setup3, number=3)))

setup4 = setup+\
''' 
from scipy import sparse
S = sparse.csr_matrix(C)
DS = S.multiply(D)
'''
print('sparse- '+str(timeit.timeit('A += DS', setup4, number=3)))

setup5 = setup+\
''' 
from scipy import sparse
'''
print('sparse+ '+str(timeit.timeit('S = sparse.csr_matrix(C); DS = S.multiply(D); A += DS', setup4, number=3)))

setup6 = setup+\
'''
from scipy import sparse
class Sparsemat(sparse.coo_matrix):
    def __iadd__(self, other):
        self.data += other.data
        return self
A = Sparsemat(sparse.rand(dim, dim, 0.5, 'coo')) # the array to modify
D = np.random.random((dim, dim)) # the array which will be used to modify A
anz = A.nonzero()
'''
stmt6=\
'''
DS = Sparsemat((D[anz[0],anz[1]], anz), shape=A.shape) # new graph based on random weights
A += DS
'''
print('sparse2 '+str(timeit.timeit(stmt6, setup6, number=3)))

Output:

mult    0.0248420299535
index   0.32025789431
ma      0.1067024434
sparse- 0.00996273276303
sparse+ 0.228869672266
sparse2 0.105496183846

Edit: You can use the code above (setup6) to extend scipy.sparse.coo_matrix. It keeps the sparse format.

It seems like sparse arrays may be a reasonable option. Unfortunately, I'm just not familiar enough with the package to take advantage, i.e. I always end up with dense matrices. I've edited my original question to provide a more concrete example.

Collectives™ on Stack Overflow

Numpy arrays with mixed mutable and immutable values

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related