2

There are data from csv file using read_csv(). Now they are pandas data frames. They may be like the following.

1     2 11    
inf   2  1
1   inf  3

I used the code:

df = df.replace('inf', 1000000)

or

df.replace('inf', 1000000, inplace=True)

So they can not replace inf string with a scalar 1000000.

How to replace inf with 1000000?

6
  • Are there finite values larger than 1000000? I am asking because that might imply there are better solutions than the replace you are trying to do. Commented Jan 20, 2019 at 9:47
  • so the inf just means infinity. for handling conveniently, choose a number for example 1000000 to replace inf string. there are not values large than 1000000. Commented Jan 20, 2019 at 9:56
  • I know what it means. I am asking whether there is a number larger than "1000000" in your dataframe. If not, replace is a poor choice because you can use faster cython functions. See my answer for performance timings. Regardless of whether 1000000 is the largest number in your array or not, there are better alternatives to replace. Commented Jan 20, 2019 at 9:57
  • yes, there is not a number larger than 1000000 in the data frame. so choose 1000000. Commented Jan 20, 2019 at 10:00
  • Okay, you can take a look at this answer, all the solutions there are valid. If it was useful, please let me know. Commented Jan 20, 2019 at 10:01

2 Answers 2

2

Use np.inf, because 'inf' is string representation of inf:

print (df)
          a         b   c
0  1.000000  2.000000  11
1       inf  2.000000   1
2  1.000000       inf   3

df = df.replace(np.inf, 1000000)

print (df)
           a          b   c
0        1.0        2.0  11
1  1000000.0        2.0   1
2        1.0  1000000.0   3
Sign up to request clarification or add additional context in comments.

6 Comments

is inf a key word in numpy?
@LYY - It is constant, check link
@LYY replace works but you can probably do better. Are you trying to change "inf" to another smaller, maximum value in your dataframe?
so the inf string in data just means a very large number. so replacing it with a larger number value would be the choice.
inf means infinity.
|
1

I would suggest using df.clip_upper if establishing an upper bound:

df.clip_upper(1000000)

           a          b     c
0        1.0        2.0  11.0
1  1000000.0        2.0   1.0
2        1.0  1000000.0   3.0

Otherwise, you can use np.isfinite and set values:

df.where(np.isfinite(df), 1000000)
# df.mask(~np.isfinite(df), 1000000)

           a          b   c
0        1.0        2.0  11
1  1000000.0        2.0   1
2        1.0  1000000.0   3

If NaNs should not be affected, use

df.where(np.isfinite(df) | np.isnan(df), 1000000)

           a          b   c
0        1.0        2.0  11
1  1000000.0        2.0   1
2        1.0  1000000.0   3

You can also do this with isin:

df.where(~df.isin([np.inf]), 1000000)
# df.where(~np.isin(df, np.inf), 1000000)
# df.mask(df.isin([np.inf]), 1000000)

           a          b   c
0        1.0        2.0  11
1  1000000.0        2.0   1
2        1.0  1000000.0   3

There is an in-place version of the above using np.where:

df[:] = np.where(np.isin(df, np.inf), 10000000, df)

Or,

pd.DataFrame(np.where(np.isin(df, np.inf), 10000000, df), 
             index=df.index, 
             columns=df.columns)

           a          b   c
0        1.0        2.0  11
1  1000000.0        2.0   1
2        1.0  1000000.0   3

Performance

df_ = df.copy()
df = pd.concat([df_] * 10000, ignore_index=True)

%timeit df.replace(np.inf, 1000000)
%timeit df.where(np.isfinite(df) | np.isnan(df), 1000000)
%timeit df.where(np.isfinite(df), 1000000)
%timeit df.clip_upper(1000000)

9.44 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.26 ms ± 38.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.37 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
605 µs ± 17.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.