How to apply a function across two columns in pandas?

Question

I am writing a function to find if two columns satisfy a condition, and if so, I want to return a new column with a statement. I thought I could just do df.apply(function), but it does not seem to work!

def bucketing(df):
    if df['NATIONALITY'] == 'RU' and df['CTRY_OF_RESIDENCE'] == 'Russia':
        return 'High Risk'

merged.apply(bucketing, axis = 1)

This is my error:

TypeError: unsupported operand type(s) for |: 'str' and 'str'

My expected output would be a new column with the string 'High Risk' returned if the above condition is met.

Is there a more efficient way of doing this?

Thanks

Please explain better what are you trying to do. Post your dataframe and your expected output — gtomer
– gtomer, Commented Mar 8, 2022 at 13:05
Thanks, essentially my expected output would be a new column with the string 'High Risk' returned if the above condition is met like in my statement. — work_python
– work_python, Commented Mar 8, 2022 at 13:07

gtomer · Accepted Answer · 2022-03-08 13:58:35Z

1

Here is an easier way:

import numpy as np
df['new col'] = np.where((df['NATIONALITY'] == 'RU') & (df['CTRY_OF_RESIDENCE'] == 'Russia'), 'High Risk', np.where((df['NATIONALITY'] == 'UK') & (df['CTRY_OF_RESIDENCE'] == 'Ukraine'), 'Medium Risk', ''))

edited Mar 8, 2022 at 13:58

answered Mar 8, 2022 at 13:08

gtomer

6,5941 gold badge14 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

work_python Over a year ago

Thanks, this is the error it returns: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

gtomer Over a year ago

I have corrected the answer

work_python Over a year ago

Thanks very much this is awesome! Am I able to create multiple rules in this one where statement? for example, a second saying if df['Nationality'] != 'RU' and df['CTRY_of_Residence'] == 'RUSSIA'), 'Medium Risk').

gtomer Over a year ago

Sure. You can run it as many times as you like

work_python Over a year ago

Is it possible to merge it within the same statement, so the one column which is returned, returns either High Risk or Medium Risk?

|

kawuel · Accepted Answer · 2022-03-08 13:14:01Z

0

If you want to still utilize your code I think this would work but a sample DF would help to check

def bucketing(row):
    if row['NATIONALITY'] == 'RU' & row['CTRY_OF_RESIDENCE'] == 'Russia':
        return 'High Risk'
df['NEW COLUMN'] = df.apply(bucketing, axis=1)

answered Mar 8, 2022 at 13:14

kawuel

828 bronze badges

2 Comments

work_python Over a year ago

Thanks, this is the error I receive: TypeError: unsupported operand type(s) for &: 'str' and 'str'

kawuel Over a year ago

Would be good to get a short part of the DF for me to understand why you get that.

ArchAngelPwn · Accepted Answer · 2022-03-08 13:17:07Z

0

I would use a np.where() to get you what you are looking for

data = {'Name' : ['John Smith', 'Jane Doe'],
        'NATIONALITY':  ['RU', 'NA'],
        'CTRY_OF_RESIDENCE': ['Russia', 'America']
        }

df = pd.DataFrame(data)
df['new col'] = np.where((df['NATIONALITY'] == 'RU') & (df['CTRY_OF_RESIDENCE'] == 'Russia'), 'High Risk', '')
df

answered Mar 8, 2022 at 13:17

ArchAngelPwn

3,0561 gold badge6 silver badges17 bronze badges

Collectives™ on Stack Overflow

How to apply a function across two columns in pandas?

3 Answers 3

6 Comments

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

2 Comments

Comments

Related