1

I am trying to figure out how the np.where clause works. I create a simple df:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(0, 10, size=(3, 4)), columns=list('ABCD'))
print(df)

   A  B  C  D
0  5  8  9  5
1  0  0  1  7
2  6  9  2  4

Now when I implement:

print(np.where(df.values, 1, np.nan))

I receive:

[[  1.   1.   1.   1.]
 [ nan  nan   1.   1.]
 [  1.   1.   1.   1.]]

But when I create an empty_like array from df: and put it into where clause I receive this:

print(np.where(np.empty_like(df.values), 1, np.nan))

[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]

Really could use help on explaining how where clause works on a single array.

4
  • 1
    Why are you using np.empty_like? Note that its values will not be 0, and thus none of them will be falsey, which is why np.where returns an ndarray of ones Commented Mar 13, 2019 at 12:50
  • Hello @yatu The empty_like actually produces values of zero but the result is all nan unlike OP's. can't reproduce the problem Commented Mar 13, 2019 at 12:58
  • empty_like creates an array of abritrary data. So yes some of the of the time none of it is 0. Commented Mar 13, 2019 at 13:03
  • np.empty_like was in one case I've found on the Internet for my problem, but then realized that without it it works fine, so how to read the first implementation like np.where(array, 1, np.nan)? Commented Mar 13, 2019 at 13:04

1 Answer 1

1

np.empty_like()

Docs:-

numpy.empty_like(prototype, dtype=None, order='K', subok=True)

Return a new array with the same shape and type as a given array.

>>> a = ([1,2,3], [4,5,6])                         # a is array-like
>>> np.empty_like(a)
array([[-1073741821, -1073741821,           3],    #random
       [          0,           0, -1073741821]])

np.empty_like() creates an array of the same shape and type as the given array but with random numbers. This array now goes into np.where()

numpy.where()

Docs:-

numpy.where(condition[, x, y])

Return elements that are chosen from x or y depending on condition.

Example:-

>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a < 5, a, 10*a)
array([ 0,  1,  2,  3,  4, 50, 60, 70, 80, 90])
>>>np.where(a,1,np.nan)
array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In Python any number other than zero is considered to be TRUE whereas zero is considered to FALSE.

When np.where() gets a np.array it checks for the condition, Here the array itself acts as condition i.e, the np.where evaluates to TRUE when the array elements are not zero and FALSE when they are 0. So the "True" elements are replaced by 1 and "False" elements by np.nan.

Reference:-

  1. numpy.where()
  2. numpy.empty_like()
Sign up to request clarification or add additional context in comments.

1 Comment

"... any number other than zero is considered to be TRUE whereas zero is considered to FALSE." - that is what I was missing, great !

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.