54

I am experimenting with the numpy.where(condition[, x, y]) function.
From the numpy documentation, I learn that if you give just one array as input, it should return the indices where the array is non-zero (i.e. "True"):

If only condition is given, return the tuple condition.nonzero(), the indices where condition is True.

But if try it, it returns me a tuple of two elements, where the first is the wanted list of indices, and the second is a null element:

>>> import numpy as np
>>> array = np.array([1,2,3,4,5,6,7,8,9])
>>> np.where(array>4)
(array([4, 5, 6, 7, 8]),) # notice the comma before the last parenthesis

so the question is: why? what is the purpose of this behaviour? in what situation this is useful? Indeed, to get the wanted list of indices I have to add the indexing, as in np.where(array>4)[0], which seems... "ugly".


ADDENDUM

I understand (from some answers) that it is actually a tuple of just one element. Still I don't understand why to give the output in this way. To illustrate how this is not ideal, consider the following error (which motivated my question in the first place):

>>> import numpy as np
>>> array = np.array([1,2,3,4,5,6,7,8,9])
>>> pippo = np.where(array>4)
>>> pippo + 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate tuple (not "int") to tuple

so that you need to do some indexing to access the actual array of indices:

>>> pippo[0] + 1
array([5, 6, 7, 8, 9])
2
  • 4
    How about np.argwhere? Commented Nov 17, 2015 at 7:12
  • 6
    What you want is np.flatnonzero, which does a.ravel().nonzero()[0]. Commented Nov 17, 2015 at 7:40

3 Answers 3

50

In Python (1) means just 1. () can be freely added to group numbers and expressions for human readability (e.g. (1+3)*3 v (1+3,)*3). Thus to denote a 1 element tuple it uses (1,) (and requires you to use it as well).

Thus

(array([4, 5, 6, 7, 8]),)

is a one element tuple, that element being an array.

If you applied where to a 2d array, the result would be a 2 element tuple.

The result of where is such that it can be plugged directly into an indexing slot, e.g.

a[where(a>0)]
a[a>0]

should return the same things

as would

I,J = where(a>0)   # a is 2d
a[I,J]
a[(I,J)]

Or with your example:

In [278]: a=np.array([1,2,3,4,5,6,7,8,9])
In [279]: np.where(a>4)
Out[279]: (array([4, 5, 6, 7, 8], dtype=int32),)  # tuple

In [280]: a[np.where(a>4)]
Out[280]: array([5, 6, 7, 8, 9])

In [281]: I=np.where(a>4)
In [282]: I
Out[282]: (array([4, 5, 6, 7, 8], dtype=int32),)
In [283]: a[I]
Out[283]: array([5, 6, 7, 8, 9])

In [286]: i, = np.where(a>4)   # note the , on LHS
In [287]: i
Out[287]: array([4, 5, 6, 7, 8], dtype=int32)  # not tuple
In [288]: a[i]
Out[288]: array([5, 6, 7, 8, 9])
In [289]: a[(i,)]
Out[289]: array([5, 6, 7, 8, 9])

======================

np.flatnonzero shows the correct way of returning just one array, regardless of the dimensions of the input array.

In [299]: np.flatnonzero(a>4)
Out[299]: array([4, 5, 6, 7, 8], dtype=int32)
In [300]: np.flatnonzero(a>4)+10
Out[300]: array([14, 15, 16, 17, 18], dtype=int32)

It's doc says:

This is equivalent to a.ravel().nonzero()[0]

In fact that is literally what the function does.

By flattening a removes the question of what to do with multiple dimensions. And then it takes the response out of the tuple, giving you a plain array. With flattening it doesn't have make a special case for 1d arrays.

===========================

@Divakar suggests np.argwhere:

In [303]: np.argwhere(a>4)
Out[303]: 
array([[4],
       [5],
       [6],
       [7],
       [8]], dtype=int32)

which does np.transpose(np.where(a>4))

Or if you don't like the column vector, you could transpose it again

In [307]: np.argwhere(a>4).T
Out[307]: array([[4, 5, 6, 7, 8]], dtype=int32)

except now it is a 1xn array.

We could just as well have wrapped where in array:

In [311]: np.array(np.where(a>4))
Out[311]: array([[4, 5, 6, 7, 8]], dtype=int32)

Lots of ways of taking an array out the where tuple ([0], i,=, transpose, array, etc).

Sign up to request clarification or add additional context in comments.

5 Comments

hi @hpaulj, thank you for your answer. I understand, it is a tuple of 1 element. Still, you need the indexing to access the actual array, and I don't understand why, i.e. why to give the output in that way... See the addendum in the edited question.
The goal is consistency across all arrays. np.flatnonzero shows the correct way returning an array instead of a tuple.
So, why in the documentation does it say "Returns: out : ndarray An array with elements..." That's what confused me.
@Bill, that out is for the 3 parameter use. On SO we often use it with only a condition parameter. The docs now recommend using np.nonzero for that purpose.
Ah yes, Thank you @hpaulj. Now I see the note near the top of the page which says "The rest of this documentation covers only the case where all three arguments are provided."
13

Short answer: np.where is designed to have consistent output regardless of the dimension of the array.

A two-dimensional array has two indices, so the result of np.where is a length-2 tuple containing the relevant indices. This generalizes to a length-3 tuple for 3-dimensions, a length-4 tuple for 4 dimensions, or a length-N tuple for N dimensions. By this rule, it is clear that in 1 dimension, the result should be a length-1 tuple.

2 Comments

I don't understand why not a n-d array whose nth axis corresponds to the nth dimension? Can different elements of the tuple be of different lengths?
Indexing with arrays has different semantics than indexing with tuples. If the result were an ndarray rather than a tuple, x[np.where(x == 0)] would not return the zero elements in the array.
2

Just use np.asarray function. In your case:

>>> import numpy as np
>>> array = np.array([1,2,3,4,5,6,7,8,9])
>>> pippo = np.asarray(np.where(array>4))
>>> pippo + 1
array([[5, 6, 7, 8, 9]])

1 Comment

Why is np.asarray() used instead of np.array()?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.