How to replace 0 values in a numpy array to other values based on column range?

Question

I have a dataset in the following format:

[[ 226 600 3.33 915. 92.6 98.6 ] [ 217 700 3.34 640. 93.7 98.5 ] [ 213 900 3.35 662. 88.8 96. ] ... [ 108 600 2.31 291. 64. 70.4 ] [ 125 800 3.36 1094. 65.5 84.1 ] [ 109 400 2.44 941. 52.3 68.7 ]]

Each column is a separate criteria that has its own value range. How can I impute values that are 0 to a value that is more than zero based on its column range? In other words the worst minimal value other than 0.

I have written the following code but it can only either change the 0 to the minimal value in the column (which is of course 0) or max. The max varies by column. Thanks for your help!

# Impute 0 values -- give them the worst value for that column

I, J = np.nonzero(scores == 0)
scores[I,J] = scores.min(axis=0)[J] # can only do min or max

more than 0 but less than max, so in other words the worst value in a column other than 0. Sorry for the confusion — asleniovas
– asleniovas, Commented May 27, 2019 at 12:02

yatu · Accepted Answer · 2019-05-27 12:15:51Z

1

One way would be to use a masked array to find the minimum value along the columns masking those that are <=0. And replace the 0s in the array with the corresponding minimum using np.where:

min_gt0 = np.ma.array(r, mask=r<=0).min(0)
np.where(r == 0, min_gt0, r)

Here's an example:

r = np.random.randint(0,5,(5,5))

print(r)
array([[2, 1, 3, 0, 4],
       [0, 4, 4, 2, 2],
       [4, 0, 0, 0, 1],
       [1, 2, 2, 2, 2],
       [2, 0, 4, 4, 2]])

min_gt0 = np.ma.array(r, mask=r<=0).min(0)
np.where(r == 0, min_gt0, r)

array([[2, 1, 3, 2, 4],
       [1, 4, 4, 2, 2],
       [4, 1, 2, 2, 1],
       [1, 2, 2, 2, 2],
       [2, 1, 4, 4, 2]])

edited May 27, 2019 at 12:15

answered May 27, 2019 at 12:08

yatu

88.6k12 gold badges93 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

asleniovas Over a year ago

Thanks for the solution. I was having issues but then realised I need to declare a variable for np.where...

yatu Over a year ago

Yes you have to assign it to a variable. You're welcome @asleniovas :)

Bi Ao · Accepted Answer · 2019-05-27 12:24:32Z

1

I think the numpy.ma.masked_equal function is what you need.

consider an array:

a = np.array([4, 3, 8, 0, 5])
m = np.ma.masked_equal(a, 0) # mask = [0, 0, 0, 1, 0]

now you can call m.min() and the value is the second smallest value in the column.

m.min() # 3

answered May 27, 2019 at 12:24

Bi Ao

9106 silver badges12 bronze badges

Collectives™ on Stack Overflow

How to replace 0 values in a numpy array to other values based on column range?

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related