4

I'm trying to find the row in which a 2d array appears in a 3d numpy ndarray. Here's an example of what I mean. Give:

arr = [[[0, 3], [3, 0]],
       [[0, 0], [0, 0]],
       [[3, 3], [3, 3]],
       [[0, 3], [3, 0]]]

I'd like to find all occurrences of:

[[0, 3], [3, 0]]

The result I'd like is:

[0, 3]

I tried to use argwhere but that unfortunately got me nowhere. Any ideas?

3 Answers 3

5

Try

np.argwhere(np.all(arr==[[0,3], [3,0]], axis=(1,2)))

How it works:

arr == [[0,3], [3,0]] returns

array([[[ True,  True],
        [ True,  True]],

       [[ True, False],
        [False,  True]],

       [[False,  True],
        [ True, False]],

       [[ True,  True],
        [ True,  True]]], dtype=bool)

This is a three dimensional array where the innermost axis is 2. The values at this axis are:

[True, True]
[True, True]
[True, False]
[False, True]
[False, True]
[True, False]
[True, True]
[True, True]

Now with np.all(arr==[[0,3], [3,0]], axis=2) you are checking if both elements on a row are True and its shape will be reduced to (4, 2) from (4, 2, 2). Like this:

array([[ True,  True],
       [False, False],
       [False, False],
       [ True,  True]], dtype=bool)

You need one more step of reducing as you want both of them to be the same (both [0, 3] and [3, 0]. You can do it either by reducing on the result (now the innermost axis is 1):

np.all(np.all(test, axis = 2), axis=1)

Or you can also do it by giving a tuple for the axis parameter to do the same thing step by step (first innermost, then one step higher). The result will be:

array([ True, False, False,  True], dtype=bool)
Sign up to request clarification or add additional context in comments.

2 Comments

BRILLIANT!! The whole idea of axes is so confusing to me. Could you explain to me why axis=(1,2) works?
Sure, I'll try to explain.
2

The 'contains' function in the numpy_indexed package (disclaimer: I am its author) can be used to make queries of this kind. It implements a solution similar to the one offered by Saullo.

import numpy_indexed as npi
test = [[[0, 3], [3, 0]]]
# check which elements of arr are present in test (checked along axis=0 by default)
flags = npi.contains(test, arr)
# if you want the indexes:
idx = np.flatnonzero(flags)

2 Comments

Good to know Eelco (+1), does it come in NumPy?
I originally intended to make a numpy EP out of this; but backwards compatibility would grate a bit, and I decided I wanted this functionality sooner than the numpy release cycle would allow. But yeah I think it would make a lot of sense if the functionality in this package will find its way into numpy.
0

In you can use np.in1d after defining a new data type which will have the memory size of each row in your arr. To define such data type:

mydtype = np.dtype((np.void, arr.dtype.itemsize*arr.shape[1]*arr.shape[2]))

then you have to convert your arr to a 1-D array where each row will have arr.shape[1]*arr.shape[2] elements:

aView = np.ascontiguousarray(arr).flatten().view(mydtype)

You are now ready to look for your 2-D array pattern [[0, 3], [3, 0]] which also has to be converted to dtype:

bView = np.array([[0, 3], [3, 0]]).flatten().view(mydtype)

You can now check the occurrencies of bView in aView:

np.in1d(aView, bView)
#array([ True, False, False,  True], dtype=bool)

This mask is easily converted to indices using np.where, for example.

Timings (updated)

THe following function is used to implement this approach:

def check2din3d(b, a):
        """
        Return where `b` (2D array) appears in `a` (3D array) along `axis=0`
        """
        mydtype = np.dtype((np.void, a.dtype.itemsize*a.shape[1]*a.shape[2]))
        aView = np.ascontiguousarray(a).flatten().view(mydtype)
        bView = np.ascontiguousarray(b).flatten().view(mydtype)
        return np.in1d(aView, bView)

The updated timings considering @ayhan comments showed that this method can be faster the np.argwhere, but the different is not significant and for large arrays like below, @ayhan's approach is considerably faster:

arrLarge = np.concatenate([arr]*10000000)
arrLarge = np.concatenate([arrLarge]*10, axis=2)

pattern = np.ascontiguousarray([[0,3]*10, [3,0]*10])

%timeit np.argwhere(np.all(arrLarger==pattern, axis=(1,2)))
#1 loops, best of 3: 2.99 s per loop

%timeit check2din3d(pattern, arrLarger)
#1 loops, best of 3: 4.65 s per loop

6 Comments

I don't know much about performance but I believe you also need to account for aView construction, which takes about 80% of the time np.all takes. I wasn't able to reproduce 2x or 6x performance improvements at higher dimensions you mentioned.
Creating the views should be O(1) if the arrays are already contiguous; which they should be.
Maybe it's not creating the views but there seems to be something that is not O(1). Anything I am doing incorrectly here? i.imgur.com/79u344W.png
@ayhan I had to give you my upvote.... your method showed to be faster than this one for very large arrays. For intermediate arrays they are very close. Regards
Thanks. I was thinking maybe the random numbers were generating the worst case scenario. I learned a lot from your approach btw.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.