How to find row of 2d array in 3d numpy array

Question

I'm trying to find the row in which a 2d array appears in a 3d numpy ndarray. Here's an example of what I mean. Give:

arr = [[[0, 3], [3, 0]],
       [[0, 0], [0, 0]],
       [[3, 3], [3, 3]],
       [[0, 3], [3, 0]]]

I'd like to find all occurrences of:

[[0, 3], [3, 0]]

The result I'd like is:

[0, 3]

I tried to use argwhere but that unfortunately got me nowhere. Any ideas?

score 5 · Accepted Answer · 2016-04-03 04:44:03Z

Try

np.argwhere(np.all(arr==[[0,3], [3,0]], axis=(1,2)))

How it works:

arr == [[0,3], [3,0]] returns

array([[[ True,  True],
        [ True,  True]],

       [[ True, False],
        [False,  True]],

       [[False,  True],
        [ True, False]],

       [[ True,  True],
        [ True,  True]]], dtype=bool)

This is a three dimensional array where the innermost axis is 2. The values at this axis are:

[True, True]
[True, True]
[True, False]
[False, True]
[False, True]
[True, False]
[True, True]
[True, True]

Now with np.all(arr==[[0,3], [3,0]], axis=2) you are checking if both elements on a row are True and its shape will be reduced to (4, 2) from (4, 2, 2). Like this:

array([[ True,  True],
       [False, False],
       [False, False],
       [ True,  True]], dtype=bool)

You need one more step of reducing as you want both of them to be the same (both [0, 3] and [3, 0]. You can do it either by reducing on the result (now the innermost axis is 1):

np.all(np.all(test, axis = 2), axis=1)

Or you can also do it by giving a tuple for the axis parameter to do the same thing step by step (first innermost, then one step higher). The result will be:

array([ True, False, False,  True], dtype=bool)

BRILLIANT!! The whole idea of axes is so confusing to me. Could you explain to me why axis=(1,2) works?

Eelco Hoogendoorn · Accepted Answer · 2016-04-03 07:56:42Z

2

The 'contains' function in the numpy_indexed package (disclaimer: I am its author) can be used to make queries of this kind. It implements a solution similar to the one offered by Saullo.

import numpy_indexed as npi
test = [[[0, 3], [3, 0]]]
# check which elements of arr are present in test (checked along axis=0 by default)
flags = npi.contains(test, arr)
# if you want the indexes:
idx = np.flatnonzero(flags)

answered Apr 3, 2016 at 7:56

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

2 Comments

Saullo G. P. Castro Over a year ago

Good to know Eelco (+1), does it come in NumPy?

Eelco Hoogendoorn Over a year ago

I originally intended to make a numpy EP out of this; but backwards compatibility would grate a bit, and I decided I wanted this functionality sooner than the numpy release cycle would allow. But yeah I think it would make a lot of sense if the functionality in this package will find its way into numpy.

Saullo G. P. Castro · Accepted Answer · 2016-04-03 10:00:43Z

In you can use np.in1d after defining a new data type which will have the memory size of each row in your arr. To define such data type:

mydtype = np.dtype((np.void, arr.dtype.itemsize*arr.shape[1]*arr.shape[2]))

then you have to convert your arr to a 1-D array where each row will have arr.shape[1]*arr.shape[2] elements:

aView = np.ascontiguousarray(arr).flatten().view(mydtype)

You are now ready to look for your 2-D array pattern [[0, 3], [3, 0]] which also has to be converted to dtype:

bView = np.array([[0, 3], [3, 0]]).flatten().view(mydtype)

You can now check the occurrencies of bView in aView:

np.in1d(aView, bView)
#array([ True, False, False,  True], dtype=bool)

This mask is easily converted to indices using np.where, for example.

Timings (updated)

THe following function is used to implement this approach:

def check2din3d(b, a):
        """
        Return where `b` (2D array) appears in `a` (3D array) along `axis=0`
        """
        mydtype = np.dtype((np.void, a.dtype.itemsize*a.shape[1]*a.shape[2]))
        aView = np.ascontiguousarray(a).flatten().view(mydtype)
        bView = np.ascontiguousarray(b).flatten().view(mydtype)
        return np.in1d(aView, bView)

The updated timings considering @ayhan comments showed that this method can be faster the np.argwhere, but the different is not significant and for large arrays like below, @ayhan's approach is considerably faster:

arrLarge = np.concatenate([arr]*10000000)
arrLarge = np.concatenate([arrLarge]*10, axis=2)

pattern = np.ascontiguousarray([[0,3]*10, [3,0]*10])

%timeit np.argwhere(np.all(arrLarger==pattern, axis=(1,2)))
#1 loops, best of 3: 2.99 s per loop

%timeit check2din3d(pattern, arrLarger)
#1 loops, best of 3: 4.65 s per loop

I don't know much about performance but I believe you also need to account for aView construction, which takes about 80% of the time np.all takes. I wasn't able to reproduce 2x or 6x performance improvements at higher dimensions you mentioned.
Creating the views should be O(1) if the arrays are already contiguous; which they should be.
Maybe it's not creating the views but there seems to be something that is not O(1). Anything I am doing incorrectly here? i.imgur.com/79u344W.png
@ayhan I had to give you my upvote.... your method showed to be faster than this one for very large arrays. For intermediate arrays they are very close. Regards
Thanks. I was thinking maybe the random numbers were generating the worst case scenario. I learned a lot from your approach btw.

Collectives™ on Stack Overflow

How to find row of 2d array in 3d numpy array

3 Answers 3

2 Comments

2 Comments

Timings (updated)

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Timings (updated)

6 Comments

Related