numpy 2d array combination

Question

Suppose I have two arrays:

a = np.array(
[[0, 1],
 [2, 3],
 [4, 5],
 [6, 7]])

b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5]])

As you can see, one array is simply a shuffle of the other. I need to combine these two arrays to form a third array, c, such as:

the first part of array c (until a random index i) consists of elements from the first part of array a (until index i). Therefore, c[:i] == a[:i] must return True.
the rest of the array c is filled by values from array b, that are not already inside array c, in the exact same order they appear in.

Given that index i is set to 2, the desired output for above arrays a and b in the code should be:

> c
[[0, 1],
 [2, 3],
 [6, 7],
 [4, 5]]

Array c must be of the same length as both array b and array a, and there is a possibility that two elements within either array a or array b are the same. Array c must also consist of the same elements that are in a and b, (i.e. it behaves somewhat like a shuffle).

I've tried multiple solutions, but none give the desired result. The closest was this:

a = np.arange(10).reshape(5, 2)
np.random.shuffle(a)

b = np.arange(10).reshape(5, 2)
b_part = b[:4]

temp = []

for part in a:
    if part in b_part:
        continue
    else:
        temp.append(part)

temp = np.array(temp)

c = copy.deepcopy(np.vstack((b_part, temp)))

However, it sometimes results in array c being smaller than arrays a and b, because the elements in either list can sometimes repeat.

If elements are not unique your rules imply that c can be shorter: Example a = [(0,1),(2,3),(2,3),(4,5)] b=[(2,3),(4,5),(2,3),(0,1)] i=2 So you'd pick a[:i] which is [(0,1),(2,3)] and from b what has not occurred yet which is (4,5). This c would be [(0,1),(2,3),(4,5)] which is shorter. — Paul Panzer
– Paul Panzer, Commented Mar 29, 2019 at 21:39
@PaulPanzer I understand that is what's causing the issue, but I don't know how I can address it myself (which is why I'm asking the question) — Sergey Ronin
– Sergey Ronin, Commented Mar 29, 2019 at 21:43
The first thing would be to decide what your desired answer would be in this case. — Paul Panzer
– Paul Panzer, Commented Mar 29, 2019 at 21:46
imho your code is working though (just create a_part instead of b_part and reverse arrays in both loops+ vstack(a_part, temp)) but your issue with size(c) is your problem definition if you have duplicates inside a which is basis for c. Imagine that you split a with index i as resulting c in the way that you take just one duplicate of values inside a so your c cannot have the same size as a or b because you cannot add other duplicate from b — vldbnc
– vldbnc, Commented Mar 29, 2019 at 22:07
@KomronAripov using your pic example, try index [:4] instead of [:2] and give us result. — vldbnc
– vldbnc, Commented Mar 29, 2019 at 22:37

Paul Panzer · Accepted Answer · 2019-03-29 23:02:38Z

2

The following should handle duplicates alright.

def mix(a, b, i):                                             
    sa, sb = map(np.lexsort, (a.T, b.T))                      
    mb = np.empty(len(a), '?')                                
    mb[sb] = np.arange(2, dtype='?').repeat((i, len(a)-i))[sa]
    return np.concatenate([a[:i], b[mb]], 0)

It

indirectly sorts a and b
creates a mask which is True at the positions not taken from a, i.e. has i Falses and then len(a)-i Trues.
uses the sort orders to map that mask to b
filters b with the mask and appends to a[:i]

Example (transposed to save space):

a.T
# array([[2, 2, 0, 2, 3, 0, 2, 0, 0, 1],
#        [0, 1, 2, 0, 1, 0, 3, 0, 0, 0]])
b.T
# array([[0, 0, 2, 1, 0, 0, 2, 2, 2, 3],
#        [0, 0, 0, 0, 2, 0, 1, 3, 0, 1]])
mix(a, b, 6).T
# array([[2, 2, 0, 2, 3, 0, 0, 1, 0, 2],
#        [0, 1, 2, 0, 1, 0, 0, 0, 0, 3]])

answered Mar 29, 2019 at 23:02

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

fountainhead Over a year ago

If

a = np.array( [[0, 1],  [2, 3],  [0, 0],  [2, 3],  [4, 5],  [6, 7]]), b = np.array( [[2, 3],  [6, 7],  [0, 1],  [4, 5],  [2, 3],  [0, 0]])

, then for i=3, this solution gives c = [0, 1], [2, 3], [0, 0], [6, 7], [4, 5], [2, 3]]. The last [2,3] probably violates OP's requirement, since it is already picked up from a, and must not be picked up again from b.

Paul Panzer Over a year ago

@fountainhead No, that's the other [2, 3] ;-] -- More seriously, OP's clarifications as to how to handle dupes are spread out across several comments.

fountainhead Over a year ago

Several comments and several external links too. :-)

fountainhead · Accepted Answer · 2019-03-29 23:38:43Z

Here's one solution:

full_len = len(a)

b_not_in_a_part = ~np.all(np.isin(b,a[:i+1]),axis=1)         # Get boolean mask, to apply on b
b_part_len = full_len-i-1                                    # Length of b part of c

c = np.concatenate((a[:i+1], b[b_not_in_a_part,:]), axis=0)  # Contruct c, using the mask for the b part.

Testing it out:

import numpy as np
a = np.array(
[[0, 1],
 [2, 3],
 [0, 0],
 [2, 3],
 [4, 5],
 [6, 7]])
b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5],
 [2, 3],
 [0, 0]])

i = 2

print ("a is:\n", a)
print ("b is:\n", b)

full_len = len(a)

b_not_in_a_part = ~np.all(np.isin(b,a[:i+1]),axis=1)         # Get boolean mask, to apply on b
b_part_len = full_len-i-1                                    # Length of b part of c

c = np.concatenate((a[:i+1], b[b_not_in_a_part,:]), axis=0)  # Contruct c, using the mask for the b part.
print ("c is:\n", c)

Output:

a is:
 [[0 1]
 [2 3]
 [0 0]
 [2 3]
 [4 5]
 [6 7]]
b is:
 [[2 3]
 [6 7]
 [0 1]
 [4 5]
 [2 3]
 [0 0]]
c is:
 [[0 1]
 [2 3]
 [0 0]
 [6 7]
 [4 5]]

Note: For this example, c has a length of only 5, even though a and b have a length of 6. This is because, due to high duplication in b, there aren't enough values left in b, that are eligible to be used for c.

@KomronAripov: This error is occurring because, due to high duplication in b, there aren't enough values in b that are not already in used from a. I can fix it if you tell me what is your requirement for this scenario.
In such a scenario, is it ok if c has smaller length than a or b? (Since there are not enough values to match the full length)
@KomronAripov, Fixed the error, making the assumption that it's ok for c to have a smaller length, if high duplication in b leaves us with not enough values eligible to go into c

BigH · Accepted Answer · 2019-03-29 22:26:04Z

0

Just use numpy.concatenate() and ensure that your index is itself plus 1 (as numpy indexing goes up to but not inclusive of said index value, see below): (Edit: seems like you modified your a, b and c arrays, so I 'll change my code below to accomodate)

import numpy as np

a = np.array(
[[0, 1],
 [2, 3],
 [4, 5],
 [6, 7]])

b = np.array(
[[2, 3],
 [6, 7],
 [0, 1],
 [4, 5]])


i = 2
c = a[0:i]
for k in b:
    if k not in c:
        c = np.concatenate((c, [k]))

print(c)

Output:

[[0 1]
 [2 3]
 [6 7]
 [4 5]]

edited Mar 29, 2019 at 22:26

answered Mar 29, 2019 at 21:46

BigH

3704 silver badges7 bronze badges

5 Comments

Sergey Ronin Over a year ago

after a slight modification to array b (now updated in the answer), the output is not as expected.

Sergey Ronin Over a year ago

this results in duplicates (even though there were none to begin with), which is undesired...

BigH Over a year ago

Ok I get ya, how about now?

Sergey Ronin Over a year ago

this still does not address the issue where there is a possibility that either array a or array b can contain duplicates.

BigH Over a year ago

can you provide an example of what c should look like if a and b contains the duplicates as you have mentioned?

panktijk · Accepted Answer · 2019-03-29 22:27:28Z

0

For i=2, get your first part of the result:
```
c = a[i:]
```

Get "uncommon" elements between b and c:

diff = np.array([x for x in b if x not in c])

Select random elements from diff and concatenate to the original array:

s = len(a) - i
np.concatenate([c, diff[np.random.choice(diff.shape[0], size=s, replace=False), :]], axis=0)

OUTPUT:

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

edited Mar 29, 2019 at 22:27

answered Mar 29, 2019 at 21:57

panktijk

1,61411 silver badges11 bronze badges

6 Comments

panktijk Over a year ago

You can add replace=False when you select random rows from b. Updated the code.

Sergey Ronin Over a year ago

TypeError: randint() got an unexpected keyword argument 'replace'

panktijk Over a year ago

@KomronAripov Sorry, my bad. You are also supposed to change randint to choice. Try now?

Sergey Ronin Over a year ago

ValueError: Cannot take a larger sample than population when 'replace=False'

panktijk Over a year ago

This is probably your data issue where you are trying to select a larger number of elements than your parent set. It works with the data you have provided in the question.

|

Collectives™ on Stack Overflow

numpy 2d array combination

4 Answers 4

3 Comments

3 Comments

5 Comments

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

3 Comments

5 Comments

6 Comments

Related