Is there a smart way to vectorize a nested for loop of inner products, where the inner index is lower bound by the outer index?
Here's a simple example. Say that arr1
and arr2
are numpy arrays each containing N
vectors of length 3, i.e. they are of shape (N,3)
. N
is usually in the range between 500 and 2000. I want to compute all possible combinations of inner products, but I also know that the mathematical problem is designed in such a way that the inner product of the i-th vector in arr1
and j
-th vector in arr2
is equal to the inner product of the j
-th vector in arr1
and i
-th vector in arr2
.
In an explicit nested for-loop this would look something like this.
N = 2000
# For simplicity, I choose arr1 and arr2 to be the same here, but this need not always be the case.
arr1 = np.arange(N*3).reshape((N,3))
arr2 = arr1.copy()
inner_products = np.zeros((arr1.shape[0],arr2.shape[0]))
for idx1 in range(arr1.shape[0]):
vec1 = arr1[idx1]
for idx2 in range(idx1, arr2.shape[0]):
vec2 = arr2[idx2]
inner_products[idx1,idx2] = np.dot(vec1,vec2)
inner_products = inner_products + inner_products.T
inner_products[np.diag_indices(arr1.shape[0])] /= 2
Is there a smart way to vectorize this in a single command, without carrying out twice as many calculations as needed?
So far, I vectorized the inner loop with
inner_products = np.zeros((arr1.shape[0],arr2.shape[0]))
for idx1 in range(arr1.shape[0]):
vec1 = arr1[idx1]
inner_products[idx1,idx1:] = np.dot(vec1.reshape(1,3), arr2[idx1:].T)
inner_products = inner_products + inner_products.T
inner_products[np.diag_indices(arr1.shape[0])] /= 2
But this variant is still slower than computing all N*N
inner products with a single np.dot
command despite computing twice as many inner products.
inner_products = np.dot(arr1, arr2.T)
Numba compilation also does not change the outcome; the single np.dot
command remains the fastest.
Is there a smart way to vectorize these sort of problems where the amount of computations is kept at a minimum?