Re-assign values with multiple if statements Numpy

Question

I have a large Numpy ndarray, here is a sample of that:

myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[np.nan,0.2,0.3,4.2,15.1]])
myarray

array([[ 1.01,  9.4 ,  0.0 ,  6.9 ,  5.7 ],
       [ 1.9 ,  2.6 ,   nan,  4.7 ,  -2.45],
       [  nan,  0.2 ,  0.3 ,  4.2 , 15.1 ]])

As you can see, my array contains floats, positive, negative, zeros and NaNs. I would like to re-assign (re-class) the values in the array based on multiple if statements. I've read many answers and docs but all of which I've seen refer to a simple one or two conditions which can be easily be resolved using np.where for example. I have multiple condition, for the sake of simplicity let's say I have four conditions (the desired solution should be able to handle more conditions). My conditions are:

if x > 6*y:
    x=3
elif x < 4*z:
    x=2
elif x == np.nan:
    x=np.nan # maybe pass is better?
else: 
    x=0

where x is a value in the array, y and z are variable that will change among arrays. For example, array #1 will have y=5, z=2, array #2 will have y = 0.9, z= 0.5 etc. The condition for np.nan just means that if a value is nan, do not alter it, keep it nan.

Note that this needs to be executed at the same time, because if I use several np.where one after the other, than condition #2 will overwrite condition #1.

I tried to create a function and then apply it on the array but with no success. It seems that in order to apply a function to an array, the function must include only one argument (the array), and if I out to use a function, it should contain 3 arguments: the array, and y and z values.

What would be the most efficient way to achieve my goal?

While you can apply tests like this to elements of x, you can't apply them to x itself. myarray>6 is a boolean array, which doesn't work in an if context (and not in an and or or). Another caution; don't use == np.nan. — hpaulj
– hpaulj, Commented Feb 13, 2019 at 8:15
Thanks. I've tried this nested np.where before and it did not work, but now I've copy-paste the syntax from the answer you linked and changed it accordingly and it seems to work. In case I have multiple large arrays, is there a more efficient way to achieve that? — user88484
– user88484, Commented Feb 13, 2019 at 9:53
It depends on your use case. Without further knowledge, I would say that you can build your conditions and choices also with multiple arrays. — Thomas Kühn
– Thomas Kühn, Commented Feb 13, 2019 at 10:28

hpaulj · Accepted Answer · 2019-02-13 21:14:07Z

In [11]: myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[
    ...: np.nan,0.2,0.3,4.2,15.1]])
In [13]: y, z = 0.9, 0.5

If I perform one of your tests on the whole array:

In [14]: mask1 = myarray >6*y
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in greater

It's the np.nan that cause this warning.

So lets first identify those nan (and replace):

In [25]: mask0 = np.isnan(myarray)
In [26]: mask0
Out[26]: 
array([[False, False, False, False, False],
       [False, False,  True, False, False],
       [ True, False, False, False, False]])
In [27]: arr = myarray.copy()
In [28]: arr[mask0] = 0     # temp replace the nan with 0

myarray == np.nan does not work; it produces False everywhere.

arr = np.nan_to_num(myarray) also works, replacing the nan with 0.

Now find the masks for the y and z tests. It doesn't matter how these handle the original nan (now 0). Calculate both masks first to reduce mutual interference.

In [29]: mask1 = arr > 6*y
In [30]: mask2 = arr < 4*z
In [31]: arr[mask1]
Out[31]: array([ 9.4,  6.9,  5.7, 15.1])
In [32]: arr[mask2]
Out[32]: array([ 1.01,  0.  ,  1.9 ,  0.  , -2.45,  0.  ,  0.2 ,  0.3 ])
In [33]: arr[mask0]
Out[33]: array([0., 0.])

Since you want everything else to be 0, lets initial an array of zeros:

In [34]: res = np.zeros_like(arr)

now apply the 3 masks:

In [35]: res[mask1] = 3
In [36]: res[mask2] = 2
In [37]: res[mask0] = np.nan
In [38]: res
Out[38]: 
array([[ 2.,  3.,  2.,  3.,  3.],
       [ 2.,  0., nan,  0.,  2.],
       [nan,  2.,  2.,  0.,  3.]])

I could have applied the masks to arr:

In [40]: arr[mask1] = 3        # np.where(mask1, 3, arr) should also work
In [41]: arr[mask2] = 2
In [42]: arr[mask0] = np.nan
In [43]: arr
Out[43]: 
array([[2. , 3. , 2. , 3. , 3. ],
       [2. , 2.6, nan, 4.7, 2. ],
       [nan, 2. , 2. , 4.2, 3. ]])

I still have to use some logic to combine the masks to identify the slots that are supposed to be 0.

Collectives™ on Stack Overflow

Re-assign values with multiple if statements Numpy

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related