1

Is there an elegant way to assign values based on multiple columns in a dataframe in pandas? Let's say I have a dataframe with 2 columns: FruitType and Color.

import pandas as pd
df = pd.DataFrame({'FruitType':['apple', 'banana','kiwi','orange','loquat'],
'Color':['red_black','yellow','greenish_yellow', 'orangered','orangeyellow']})

I would like to assign the value of a third column, 'isYellowSeedless', based on both 'FruitType' and 'Color' columns.

I have a list of fruits that I consider seedless, and would like to check the Color column to see if it contains the str "yellow".

seedless = ['banana', 'loquat']

How do I string this all together elegantly?

This is my attempt that didn't work:

df[(df['FruitType'].isin(seedless)) & (culture_table['Color'].str.contains("yellow"))]['isYellowSeedless'] = True

2 Answers 2

2

Use loc with mask:

m = (df['FruitType'].isin(seedless)) & (df['Color'].str.contains("yellow"))

df.loc[m, 'isYellowSeedless'] = True
print (df)
             Color FruitType isYellowSeedless
0        red_black     apple              NaN
1           yellow    banana             True
2  greenish_yellow      kiwi              NaN
3        orangered    orange              NaN
4     orangeyellow    loquat             True

If need True and False output:

df['isYellowSeedless'] = m
print (df)
             Color FruitType  isYellowSeedless
0        red_black     apple             False
1           yellow    banana              True
2  greenish_yellow      kiwi             False
3        orangered    orange             False
4     orangeyellow    loquat              True

For if-else by some scalars use numpy.where:

df['isYellowSeedless'] = np.where(m, 'a', 'b')
print (df)
             Color FruitType isYellowSeedless
0        red_black     apple                b
1           yellow    banana                a
2  greenish_yellow      kiwi                b
3        orangered    orange                b
4     orangeyellow    loquat                a

And for convert to 0 and 1:

df['isYellowSeedless'] = m.astype(int)
print (df)
             Color FruitType  isYellowSeedless
0        red_black     apple                 0
1           yellow    banana                 1
2  greenish_yellow      kiwi                 0
3        orangered    orange                 0
4     orangeyellow    loquat                 1
Sign up to request clarification or add additional context in comments.

1 Comment

Really nice solution(s). Thank you!
2

Or you can try

df['isYellowSeedless']=df.loc[df.FruitType.isin(seedless),'Color'].str.contains('yellow')
df
Out[546]: 
             Color FruitType isYellowSeedless
0        red_black     apple              NaN
1           yellow    banana             True
2  greenish_yellow      kiwi              NaN
3        orangered    orange              NaN
4     orangeyellow    loquat             True

2 Comments

I really like this one too. Didn't know they could be chained together like that! Thanks!
@J.W. they are connect by the index. Yw :-)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.