Combination of all rows in two numpy arrays

Question

I have two arrays, for example with shape (3,2) and the other with shape (10,7). I want all combinations of the two arrays such that I end up with a 9 column array. In other words, I want all combinations of each row of the first array with the rows of the second array.

How can I do this? I am not using meshgrid correctly as far as I can tell.

Based on previous posts, I was under the impression that

a1 = np.zeros((10,7))
a2 = np.zeros((3,2))
r = np.array(np.meshgrid(a1, a2)).T.reshape(-1, a1.shape[1] + a2.shape[1])

would work, but that gives me dimensions of (84,10).

What's the shape of expected output array?

Divakar
– Divakar

2017-11-06 18:52:29 +00:00
Commented Nov 6, 2017 at 18:52 — Divakar
– Divakar, Commented Nov 6, 2017 at 18:52
It would be (30,9)

Ian Fiddes
– Ian Fiddes

2017-11-06 19:09:03 +00:00
Commented Nov 6, 2017 at 19:09 — Ian Fiddes
– Ian Fiddes, Commented Nov 6, 2017 at 19:09

Divakar · Accepted Answer · 2017-11-06 20:02:57Z

Approach #1

With focus on performance here's one approach with array-initialization and element-broadcasting for assignments -

m1,n1 = a1.shape
m2,n2 = a2.shape
out = np.zeros((m1,m2,n1+n2),dtype=int)
out[:,:,:n1] = a1[:,None,:]
out[:,:,n1:] = a2
out.shape = (m1*m2,-1)

Explanation :

The trick lies in the two steps :

out[:,:,:n1] = a1[:,None,:]
out[:,:,n1:] = a2

Step #1 :

In [227]: np.random.seed(0)

In [228]: a1 = np.random.randint(1,9,(3,2))

In [229]: a2 = np.random.randint(1,9,(2,7))

In [230]: m1,n1 = a1.shape
     ...: m2,n2 = a2.shape
     ...: out = np.zeros((m1,m2,n1+n2),dtype=int)
     ...: 

In [231]: out[:,:,:n1] = a1[:,None,:]

In [232]: out[:,:,:n1]
Out[232]: 
array([[[5, 8],
        [5, 8]],

       [[6, 1],
        [6, 1]],

       [[4, 4],
        [4, 4]]])

In [233]: a1[:,None,:]
Out[233]: 
array([[[5, 8]],

       [[6, 1]],

       [[4, 4]]])

So, basically we are assigning the elements of a1 keeping the first axis aligned with the corresponding one of the output, while letting the elements along the second axis of the output array being filled in a broadcasted manner corresponding to the newaxis being added for a1 along that axis. This is the crux here and brings about performance because we are not allocating extra memory space, which we would need otherwise with explicit repeating/tiling methods.

Step #2 :

In [237]: out[:,:,n1:] = a2

In [238]: out[:,:,n1:]
Out[238]: 
array([[[4, 8, 2, 4, 6, 3, 5],
        [8, 7, 1, 1, 5, 3, 2]],

       [[4, 8, 2, 4, 6, 3, 5],
        [8, 7, 1, 1, 5, 3, 2]],

       [[4, 8, 2, 4, 6, 3, 5],
        [8, 7, 1, 1, 5, 3, 2]]])

In [239]: a2
Out[239]: 
array([[4, 8, 2, 4, 6, 3, 5],
       [8, 7, 1, 1, 5, 3, 2]])

Here, we are basically broadcasting that block a2 along the first axis of the output array without explicitly making repeated copies.

Sample input, output for completeness -

In [242]: a1
Out[242]: 
array([[5, 8],
       [6, 1],
       [4, 4]])

In [243]: a2
Out[243]: 
array([[4, 8, 2, 4, 6, 3, 5],
       [8, 7, 1, 1, 5, 3, 2]])

In [244]: out
Out[244]: 
array([[[5, 8, 4, 8, 2, 4, 6, 3, 5],
        [5, 8, 8, 7, 1, 1, 5, 3, 2]],

       [[6, 1, 4, 8, 2, 4, 6, 3, 5],
        [6, 1, 8, 7, 1, 1, 5, 3, 2]],

       [[4, 4, 4, 8, 2, 4, 6, 3, 5],
        [4, 4, 8, 7, 1, 1, 5, 3, 2]]])

Approach #2

Another with tiling/repeating -

parte1 = np.repeat(a1[:,None,:],m2,axis=0).reshape(-1,m2)
parte2 = np.repeat(a2[None],m1,axis=0).reshape(-1,n2)
out = np.c_[parte1, parte2]

Your answers are very elegant, but also abstract. It would help a lot if you add some explanation to it.
@Chiel Tried my best to explain. Hope this will help the readers.

B. M. · Accepted Answer · 2017-11-06 19:36:09Z

A solution with np.tile and np.repeat :

a1 = np.arange(20).reshape(5,4)
a2 = np.arange(6).reshape(3,2)

res=hstack((np.tile(a1,(len(a2),1)),np.repeat(a2,len(a1),0)))

# array([[ 0,  1,  2,  3,  0,  1],
#        [ 4,  5,  6,  7,  0,  1],
#        [ 8,  9, 10, 11,  0,  1],
#        [12, 13, 14, 15,  0,  1],
#        [16, 17, 18, 19,  0,  1],
#        [ 0,  1,  2,  3,  2,  3],
#        [ 4,  5,  6,  7,  2,  3],
#        [ 8,  9, 10, 11,  2,  3],
#        [12, 13, 14, 15,  2,  3],
#        [16, 17, 18, 19,  2,  3],
#        [ 0,  1,  2,  3,  4,  5],
#        [ 4,  5,  6,  7,  4,  5],
#        [ 8,  9, 10, 11,  4,  5],
#        [12, 13, 14, 15,  4,  5],
#        [16, 17, 18, 19,  4,  5]])

hpaulj · Accepted Answer · 2017-11-06 20:13:54Z

meshgrid can be used, but indirectly, generating row indexing:

In [796]: A = np.arange(6).reshape(3,2)
In [797]: B = np.arange(12).reshape(4,3)*10    # reduced size

Mixed indexing of the rows of the 2 arrays:

In [798]: idx=np.meshgrid(np.arange(3), np.arange(4),indexing='ij')
In [799]: idx
Out[799]: 
[array([[0, 0, 0, 0],
        [1, 1, 1, 1],
        [2, 2, 2, 2]]), 
 array([[0, 1, 2, 3],
        [0, 1, 2, 3],
        [0, 1, 2, 3]])]

This replicates rows of A several times; similarly for B:

In [800]: A[idx[0],:]
Out[800]: 
array([[[0, 1],
        [0, 1],
        [0, 1],
        [0, 1]],

       [[2, 3],
        [2, 3],
        [2, 3],
        [2, 3]],

       [[4, 5],
        [4, 5],
        [4, 5],
        [4, 5]]])

Now concatenate these on the last dimension, producing a (3,4,5) array. Finally reshape to (12,5):

In [802]: np.concatenate((A[idx[0],:],B[idx[1],:]), axis=-1).reshape(12,5)
Out[802]: 
array([[  0,   1,   0,  10,  20],
       [  0,   1,  30,  40,  50],
       [  0,   1,  60,  70,  80],
       [  0,   1,  90, 100, 110],
       [  2,   3,   0,  10,  20],
       [  2,   3,  30,  40,  50],
       [  2,   3,  60,  70,  80],
       [  2,   3,  90, 100, 110],
       [  4,   5,   0,  10,  20],
       [  4,   5,  30,  40,  50],
       [  4,   5,  60,  70,  80],
       [  4,   5,  90, 100, 110]])

Collectives™ on Stack Overflow

Combination of all rows in two numpy arrays

3 Answers 3

Approach #1

Explanation :

Approach #2

3 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Approach #1

Explanation :

Approach #2

3 Comments

Comments

Comments

Linked

Related