python numpy where returning unexpected warning

Question

Using python 2.7, scipy 1.0.0-3

Apparently I have a misunderstanding of how the numpy where function is supposed to operate or there is a known bug in its operation. I'm hoping someone can tell me which and explain a work-around to suppress the annoying warning that I am trying to avoid. I'm getting the same behavior when I use the pandas Series where().

To make it simple, I'll use a numpy array as my example. Say I want to apply np.log() on the array and only so for the condition a value is a valid input, i.e., myArray>0.0. For values where this function should not be applied, I want to set the output flag of -999.9:

myArray = np.array([1.0, 0.75, 0.5, 0.25, 0.0])
np.where(myArray>0.0, np.log(myArray), -999.9)

I expected numpy.where() to not complain about the 0.0 value in the array since the condition is False there, yet it does and it appears to actually execute for that False condition:

-c:2: RuntimeWarning: divide by zero encountered in log 
array([  0.00000000e+00,  -2.87682072e-01,  -6.93147181e-01,
        -1.38629436e+00,  -9.99900000e+02])

The numpy documentation states:

If x and y are given and input arrays are 1-D, where is equivalent to: [xv if c else yv for (c,xv,yv) in zip(condition,x,y)]

I beg to differ with this statement since

[np.log(val) if val>0.0 else -999.9 for val in myArray]

provides no warning at all:

[0.0, -0.2876820724517809, -0.69314718055994529, -1.3862943611198906, -999.9]

So, is this a known bug? I don't want to suppress the warning for my entire code.

Paul Panzer · Accepted Answer · 2018-04-10 18:20:28Z

5

You can have the log evaluated at the relevant places only using its optional where parameter

np.where(myArray>0.0, np.log(myArray, where=myArray>0.0), -999.9)

or more efficiently

mask = myArray > 0.0
np.where(mask, np.log(myArray, where=mask), -999)

or if you find the "double where" ugly

np.log(myArray, where=myArray>0.0, out=np.full(myArray.shape, -999.9))

Any one of those three should suppress the warning.

edited Apr 10, 2018 at 18:20

answered Apr 10, 2018 at 18:04

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hpaulj · Accepted Answer · 2018-04-10 21:33:11Z

This behavior of where should be understandable given a basic understanding of Python. This is a Python expression that uses a couple of numpy functions.

What happens in this expression?

np.where(myArray>0.0, np.log(myArray), -999.9)

The interpreter first evaluates all the arguments of the function, and then passes the results to the where. Effectively then:

cond = myArray>0.0
A = np.log(myArray)
B = -999.9
np.where(cond, A, B)

The warning is produced in the 2nd line, not in the 4th.

The 4th line is equivalent to:

[xv if c else yv for (c,xv,yv) in zip(cond, A, B)]

or

[A[i] if c else B for i,c in enumerate(cond)]

np.where is most often used with one argument, where it is a synonym for np.nonzero. We don't see this three-argument form that often on SO. It isn't that useful, in part because it doesn't save on calculations.

Masked assignment is more often, especially if there are more than 2 alternatives.

In [123]: mask = myArray>0
In [124]: out = np.full(myArray.shape, np.nan)
In [125]: out[mask] = np.log(myArray[mask])
In [126]: out
Out[126]: array([ 0.        , -0.28768207, -0.69314718, -1.38629436,         nan])

Paul Panzer showed how to do the same with the where parameter of log. That feature isn't being used as much as it could be.

In [127]: np.log(myArray, where=mask, out=out)
Out[127]: array([ 0.        , -0.28768207, -0.69314718, -1.38629436,         nan])

jpp · Accepted Answer · 2018-04-11 14:17:08Z

This is not a bug. See this related answer to a similar question. The example in the docs is misleading, but that answer looks at it in detail.

The issue is that ternary statements are processed by the interpreter at compile-time while numpy.where is a regular function. Therefore, ternary statements allow short-circuiting, whereas this is not possible when arguments are defined beforehand.

In other words, the arguments of numpy.where are calculated before the Boolean array is processed.

You may think this is inefficient: why build 2 separate arrays and then use a 3rd Boolean array to decide which item to choose? Surely that's double the work / double the memory?

However, this inefficiency is more than offset by the vectorisation provided by numpy functions acting on an entire array, e.g. np.log(arr).

Consider the example provided in the docs:

If x and y are given and input arrays are 1-D, where is equivalent to::
    [xv if c else yv for (c,xv,yv) in zip(condition,x,y)]

Notice the inputs are arrays. Try running:

c = np.array([0])

result = [xv if c else yv for (c, xv, yv) in zip(c==0, np.array([1]), np.log(c))]

You will notice that this errors.

Collectives™ on Stack Overflow

python numpy where returning unexpected warning

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related