Short intro
I have two paired lists of 2D numpy arrays (see below) - paired in the sense that index 0 in array1 corresponds to index 0 in array2. For each of the pairs I want to get all the combinations of all rows in the 2D numpy arrays, like answered by Divakar here.
Array example
arr1 = [
np.vstack([[1,6,3,9], [8,5,6,7]]),
np.vstack([[1,6,3,9]]),
np.vstack([[1,6,3,9], [8,5,6,7],[8,5,6,7]])
]
arr2 = [
np.vstack([[8,8,8,8]]),
np.vstack([[8,8,8,8]]),
np.vstack([[1,6,3,9], [8,5,6,7],[8,5,6,7]])
]
Working code
Note, unlike the linked answer my columns are fixed (always 4) hence I replaced using shape by the hardcode value 4 (or 8 in np.zeros).
def merge(a1, a2):
# From: https://stackoverflow.com/questions/47143712/combination-of-all-rows-in-two-numpy-arrays
m1 = a1.shape[0]
m2 = a2.shape[0]
out = np.zeros((m1, m2, 8), dtype=int)
out[:, :, :4] = a1[:, None, :]
out[:, :, 4:] = a2
out.shape = (m1 * m2, -1)
return out
total = np.concatenate([merge(arr1[i], arr2[i]) for i in range(len(arr1))])
print(total)
Question
While the above works fine, it looks inefficient to me as it:
- involves looping through the arrays
- "appends" (in list list comprehsion) to the
totalarray, requiring it to allocate memory each time - creates multiple
zeroarrays (in the merge function), whereas I could create an empty one at the start? related to the point above
I perform this operation thousands of times on arrays with millions of elements, so any suggestions on how to transform this code into something more efficient?
np.array([[1,2,3,4]])instead ofvstack. That's the more conventional way of defining an array. And for those of us who can't run your code mentally, or are too lazy to copy to our own computer, show some results. For example one or more of themergecalls, as well as thetotal.