Numpy 2-D array boolean masking

Question

I don't understand one example in this numpy tutorial.

a = np.arange(12).reshape(3,4)
b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])

Then why will a[b1,b2] return array([4, 10])? Shouldn't it return array([[4, 6], [8, 10]])?

Any detailed explanation is appreciated!

alkasm · Accepted Answer · 2018-08-26 01:15:53Z

When you index an array with multiple arrays, it indexes with pairs of elements from the indexing arrays

>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> b1
array([False,  True,  True], dtype=bool)
>>> b2
array([ True, False,  True, False], dtype=bool)
>>> a[b1, b2]
array([ 4, 10])

Notice that this is equivalent to:

>>> a[(1, 2), (0, 2)]
array([ 4, 10])

which are the elements at a[1, 0] and a[2, 2]

>>> a[1, 0]
4
>>> a[2, 2]
10

Because of this pairwise behavior, you cannot in general index with separate length arrays (they have to be able to broadcast). So this example is sort of an accident since both indexing arrays have two indices where they are True; if one had three True values for example, you'd get an error:

>>> b3 = np.array([True, True, True, False])
>>> a[b1, b3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)

So this is specifically letting you know that the indexing arrays must be able to be broadcast together (so that it can chip off indices together in a smart way; e.g. if one indexing array just had a single value, that would be repeated with each value from the other indexing array).

To get the results you expect, you could index the result separately:

>>> a[b1][:, b2]
array([[ 4,  6],
       [ 8, 10]])

Otherwise, you could also turn your index array into a 2D array with the same shape as a, but note that if you do that the result will be a linear array (since any number of elements could be pulled out, which of course might not be square):

>>> a[np.outer(b1, b2)]
array([ 4,  6,  8, 10])

kevinkayaks · Accepted Answer · 2018-08-26 02:59:25Z

1

The indices of true for the first array are

>>> i = np.where(b1)
>>> i 
array([1,2])

For the second array they are

>>> j = np.where(b2)
>>> j
array([0,1])

Using these index masks together,

>>> a[i,j]
array([4, 10])

edited Aug 26, 2018 at 2:59

answered Aug 26, 2018 at 1:09

kevinkayaks

2,7361 gold badge16 silver badges30 bronze badges

Comments

seralouk · Accepted Answer · 2019-06-20 14:54:19Z

1

Another way to apply a general boolean 2D mask on a 2D numpy array is the following:

Use matrix element-wise multiplication:

import numpy as np

n = 100
mask = np.identity(n)
data = np.random.rand(n,n)

data_masked = data * mask

In this random example, you are keeping only the elements on the diagonal. The mask could be any n by n matrix though.

answered Jun 20, 2019 at 14:54

seralouk

33.4k10 gold badges127 silver badges141 bronze badges

4 Comments

Michael Berk Over a year ago

This still leaves 0's in the matrix, so it doesn't work for mean calculations.

seralouk Over a year ago

what do you mean? what is your goal? this is just an example about how to apply a 2d masking.

Michael Berk Over a year ago

Fair my comment is out of the scope of the question, so probably not the place for this. My point is that multiplying a 2d array by a 2d mask doesn't actually subset the array/replace unwanted elements with np.nan. So this works if you can treat 0 as a null e.g. if you're summing but not averaging.

seralouk Over a year ago

The OP asked something else. However, you can achieve what you need this in 0.5 secs: data[mask.astype(bool)].mean()

Collectives™ on Stack Overflow

Numpy 2-D array boolean masking

3 Answers 3

Comments

Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

4 Comments

Related