2

I don't understand one example in this numpy tutorial.

a = np.arange(12).reshape(3,4)
b1 = np.array([False, True, True])
b2 = np.array([True, False, True, False])

Then why will a[b1,b2] return array([4, 10])? Shouldn't it return array([[4, 6], [8, 10]])?

Any detailed explanation is appreciated!

3 Answers 3

2

When you index an array with multiple arrays, it indexes with pairs of elements from the indexing arrays

>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> b1
array([False,  True,  True], dtype=bool)
>>> b2
array([ True, False,  True, False], dtype=bool)
>>> a[b1, b2]
array([ 4, 10])

Notice that this is equivalent to:

>>> a[(1, 2), (0, 2)]
array([ 4, 10])

which are the elements at a[1, 0] and a[2, 2]

>>> a[1, 0]
4
>>> a[2, 2]
10

Because of this pairwise behavior, you cannot in general index with separate length arrays (they have to be able to broadcast). So this example is sort of an accident since both indexing arrays have two indices where they are True; if one had three True values for example, you'd get an error:

>>> b3 = np.array([True, True, True, False])
>>> a[b1, b3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)

So this is specifically letting you know that the indexing arrays must be able to be broadcast together (so that it can chip off indices together in a smart way; e.g. if one indexing array just had a single value, that would be repeated with each value from the other indexing array).

To get the results you expect, you could index the result separately:

>>> a[b1][:, b2]
array([[ 4,  6],
       [ 8, 10]])

Otherwise, you could also turn your index array into a 2D array with the same shape as a, but note that if you do that the result will be a linear array (since any number of elements could be pulled out, which of course might not be square):

>>> a[np.outer(b1, b2)]
array([ 4,  6,  8, 10])
Sign up to request clarification or add additional context in comments.

Comments

1

The indices of true for the first array are

>>> i = np.where(b1)
>>> i 
array([1,2])

For the second array they are

>>> j = np.where(b2)
>>> j
array([0,1])

Using these index masks together,

>>> a[i,j]
array([4, 10])

Comments

1

Another way to apply a general boolean 2D mask on a 2D numpy array is the following:

Use matrix element-wise multiplication:

import numpy as np

n = 100
mask = np.identity(n)
data = np.random.rand(n,n)

data_masked = data * mask

In this random example, you are keeping only the elements on the diagonal. The mask could be any n by n matrix though.

4 Comments

This still leaves 0's in the matrix, so it doesn't work for mean calculations.
what do you mean? what is your goal? this is just an example about how to apply a 2d masking.
Fair my comment is out of the scope of the question, so probably not the place for this. My point is that multiplying a 2d array by a 2d mask doesn't actually subset the array/replace unwanted elements with np.nan. So this works if you can treat 0 as a null e.g. if you're summing but not averaging.
The OP asked something else. However, you can achieve what you need this in 0.5 secs: data[mask.astype(bool)].mean()

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.