2

I have the following code which produces a df with 7 columns and 40000 rows:

df = pd.DataFrame(np.random.random(size=(40000, 7)), columns=list('ABCDEFGH'))

How do I replace every value less than 1/3 to "a", every value between 1/3 and 2/3 to be "b" and any above 2/3 and below 1 to be "c"? I have tried using pd.cut() but it seems to only work for only one column. I have also tried:

df[df <= 1/3] = "a"
df[(df > 1/3) & (df < 2/3)] = "b"
df[df > 2/3] = "c"
1
  • Does your actual problem also have all the limits either integer, or with denominator 3? Or are there more complex? Commented Oct 6, 2020 at 18:00

3 Answers 3

3

You can use np.select, you can add as many conditions and choices. df.lt is less than, df.gt is greater than, df.le is less than equal to, df.ge is greater than equal to.

np.random.seed(0) # for reproducing same results
df = pd.DataFrame(np.random.random(size=(40000, 7)), columns=list('ABCDEFG'))
df.head()

          A         B         C         D         E         F         G
0  0.548814  0.715189  0.602763  0.544883  0.423655  0.645894  0.437587
1  0.891773  0.963663  0.383442  0.791725  0.528895  0.568045  0.925597
2  0.071036  0.087129  0.020218  0.832620  0.778157  0.870012  0.978618
3  0.799159  0.461479  0.780529  0.118274  0.639921  0.143353  0.944669
4  0.521848  0.414662  0.264556  0.774234  0.456150  0.568434  0.018790

condlist = [df.lt(1/3), (df.gt(1/3)&df.lt(2/3)]
choicelist = ['a', 'b']
df = pd.DataFrame(np.select(condlist, choicelist, 'c')
df.head()
    A   B   C   D   E   F   G
0   b   c   b   b   b   b   b
1   c   c   b   c   b   b   c
2   a   a   a   c   c   c   c
3   c   b   c   a   b   a   c
4   b   b   a   c   b   b   a

Or use df.apply with pd.cut

# Using the same df as above.
df.apply(pd.cut,
         bins=[0, 1/3, 2/3, 1], 
         labels=['a', 'b', 'c']
        )

   A  B  C  D  E  F  G
0  b  c  b  b  b  b  b
1  c  c  b  c  b  b  c
2  a  a  a  c  c  c  c
3  c  b  c  a  b  a  c
4  b  b  a  c  b  b  a
Sign up to request clarification or add additional context in comments.

2 Comments

wow this was the best answer i have ever seen that really helped :)
I compared a dataframe with multiple conditions, and then wanted to replace values in this df based on these conditions. This helpt me much!
2

you might be facing error in second step of comparing the integer with string that got replaced in the first step. Try this

    t1=df <= 1/3
    t2=(df > 1/3) & (df < 2/3)
    t3=df > 2/3
    df[t1]="a"
    df[t2]="b"
    df[t3]="c"

we first made comparisons and saved filter and then made changes

1 Comment

Yeah that's the exact issue I just didn't know how to get around it, thanks alot!
2

Use applymap

Apply map documentation

def remap(x):
    if x <= 1/3:
        return 'a'
    elif x > 1/3 and x < 2/3:
        return 'b'
    else:
        return 'c'

df.applymap(remap)

Anytime you want to 'replace items in an array with another one' you usually want to use map

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.