0

I have a large Numpy ndarray, here is a sample of that:

myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[np.nan,0.2,0.3,4.2,15.1]])
myarray

array([[ 1.01,  9.4 ,  0.0 ,  6.9 ,  5.7 ],
       [ 1.9 ,  2.6 ,   nan,  4.7 ,  -2.45],
       [  nan,  0.2 ,  0.3 ,  4.2 , 15.1 ]])

As you can see, my array contains floats, positive, negative, zeros and NaNs. I would like to re-assign (re-class) the values in the array based on multiple if statements. I've read many answers and docs but all of which I've seen refer to a simple one or two conditions which can be easily be resolved using np.where for example. I have multiple condition, for the sake of simplicity let's say I have four conditions (the desired solution should be able to handle more conditions). My conditions are:

if x > 6*y:
    x=3
elif x < 4*z:
    x=2
elif x == np.nan:
    x=np.nan # maybe pass is better?
else: 
    x=0

where x is a value in the array, y and z are variable that will change among arrays. For example, array #1 will have y=5, z=2, array #2 will have y = 0.9, z= 0.5 etc. The condition for np.nan just means that if a value is nan, do not alter it, keep it nan.

Note that this needs to be executed at the same time, because if I use several np.where one after the other, than condition #2 will overwrite condition #1.

I tried to create a function and then apply it on the array but with no success. It seems that in order to apply a function to an array, the function must include only one argument (the array), and if I out to use a function, it should contain 3 arguments: the array, and y and z values.

What would be the most efficient way to achieve my goal?

7
  • While you can apply tests like this to elements of x, you can't apply them to x itself. myarray>6 is a boolean array, which doesn't work in an if context (and not in an and or or). Another caution; don't use == np.nan. Commented Feb 13, 2019 at 8:15
  • Please see my EDITED question. Thanks Commented Feb 13, 2019 at 8:29
  • 1
    See this answer. Commented Feb 13, 2019 at 8:46
  • Thanks. I've tried this nested np.where before and it did not work, but now I've copy-paste the syntax from the answer you linked and changed it accordingly and it seems to work. In case I have multiple large arrays, is there a more efficient way to achieve that? Commented Feb 13, 2019 at 9:53
  • It depends on your use case. Without further knowledge, I would say that you can build your conditions and choices also with multiple arrays. Commented Feb 13, 2019 at 10:28

1 Answer 1

0
In [11]: myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[
    ...: np.nan,0.2,0.3,4.2,15.1]])
In [13]: y, z = 0.9, 0.5

If I perform one of your tests on the whole array:

In [14]: mask1 = myarray >6*y
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in greater

It's the np.nan that cause this warning.

So lets first identify those nan (and replace):

In [25]: mask0 = np.isnan(myarray)
In [26]: mask0
Out[26]: 
array([[False, False, False, False, False],
       [False, False,  True, False, False],
       [ True, False, False, False, False]])
In [27]: arr = myarray.copy()
In [28]: arr[mask0] = 0     # temp replace the nan with 0

myarray == np.nan does not work; it produces False everywhere.

arr = np.nan_to_num(myarray) also works, replacing the nan with 0.

Now find the masks for the y and z tests. It doesn't matter how these handle the original nan (now 0). Calculate both masks first to reduce mutual interference.

In [29]: mask1 = arr > 6*y
In [30]: mask2 = arr < 4*z
In [31]: arr[mask1]
Out[31]: array([ 9.4,  6.9,  5.7, 15.1])
In [32]: arr[mask2]
Out[32]: array([ 1.01,  0.  ,  1.9 ,  0.  , -2.45,  0.  ,  0.2 ,  0.3 ])
In [33]: arr[mask0]
Out[33]: array([0., 0.])

Since you want everything else to be 0, lets initial an array of zeros:

In [34]: res = np.zeros_like(arr)

now apply the 3 masks:

In [35]: res[mask1] = 3
In [36]: res[mask2] = 2
In [37]: res[mask0] = np.nan
In [38]: res
Out[38]: 
array([[ 2.,  3.,  2.,  3.,  3.],
       [ 2.,  0., nan,  0.,  2.],
       [nan,  2.,  2.,  0.,  3.]])

I could have applied the masks to arr:

In [40]: arr[mask1] = 3        # np.where(mask1, 3, arr) should also work
In [41]: arr[mask2] = 2
In [42]: arr[mask0] = np.nan
In [43]: arr
Out[43]: 
array([[2. , 3. , 2. , 3. , 3. ],
       [2. , 2.6, nan, 4.7, 2. ],
       [nan, 2. , 2. , 4.2, 3. ]])

I still have to use some logic to combine the masks to identify the slots that are supposed to be 0.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.