I'm interested in the best/fastest way to do array ops (dot, outer, add, etc.) while ignoring some values in the array. I'm mostly interested in cases where some (maybe 50%-30%) of the values are ignored and are effectively zero with moderately large arrays, maybe 100,000 to 1,000,000 elements. There are a number of solutions I can think of but none seem to really benefit from the possible advantages of being able to ignore some values. For example:
import numpy as np
A = np.ones((dim, dim)) # the array to modify
B = np.random.random_integers(0, 1, (dim, dim)) # the values to ignore are 0
C = np.array(B, dtype = np.bool)
D = np.random.random((dim, dim)) # the array which will be used to modify A
# Option 1: zero some values using multiplication.
# some initial tests show this is the fastest
A += B * D
# Option 2: use indexing
# this seems to be the slowest
A[C] += D[C]
# Option 3: use masked arrays
A = np.ma.array(np.ones((dim, dim)), mask = np.array(B - 1, dtype = np.bool))
A += D
edit1:
As suggested by cyborg, sparse arrays may be another option. Unfortunately I'm not very familiar with the package and am unable to get the speed advantages that I might be able to. For example if I have a weighted graph with restricted connectivity defined by a sparse matrix A, another sparse matrix B which defines the connectivity (1 = connected, 0 = not connected), and a dense numpy matrix C, I'd like to be able to do something like A = A + B.multiply(C) and take advantage of A and B being sparse.
A += B.multiply(C)is the right thing to do. Why do you say you don't get advantage?Bis sparse andCis denseB.multiply(C)is a dense matrix and I imagine any benefits of using sparse matrices will be overcome by the cost of having to keep converting the result back to a sparse matrix before adding toA. Maybe there is a better way to this, but as I said, I'm not very familiar with scipy.sparse. Also my syntax was wrong.+=isn't defined for sparse matrices in scipy, it should beA = A + B.multiply(C)