0

I am writing a function to find if two columns satisfy a condition, and if so, I want to return a new column with a statement. I thought I could just do df.apply(function), but it does not seem to work!

def bucketing(df):
    if df['NATIONALITY'] == 'RU' and df['CTRY_OF_RESIDENCE'] == 'Russia':
        return 'High Risk'

merged.apply(bucketing, axis = 1)

This is my error:

TypeError: unsupported operand type(s) for |: 'str' and 'str'

My expected output would be a new column with the string 'High Risk' returned if the above condition is met.

Is there a more efficient way of doing this?

Thanks

2
  • Please explain better what are you trying to do. Post your dataframe and your expected output Commented Mar 8, 2022 at 13:05
  • Thanks, essentially my expected output would be a new column with the string 'High Risk' returned if the above condition is met like in my statement. Commented Mar 8, 2022 at 13:07

3 Answers 3

1

Here is an easier way:

import numpy as np
df['new col'] = np.where((df['NATIONALITY'] == 'RU') & (df['CTRY_OF_RESIDENCE'] == 'Russia'), 'High Risk', np.where((df['NATIONALITY'] == 'UK') & (df['CTRY_OF_RESIDENCE'] == 'Ukraine'), 'Medium Risk', ''))
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks, this is the error it returns: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have corrected the answer
Thanks very much this is awesome! Am I able to create multiple rules in this one where statement? for example, a second saying if df['Nationality'] != 'RU' and df['CTRY_of_Residence'] == 'RUSSIA'), 'Medium Risk').
Sure. You can run it as many times as you like
Is it possible to merge it within the same statement, so the one column which is returned, returns either High Risk or Medium Risk?
|
0

If you want to still utilize your code I think this would work but a sample DF would help to check

def bucketing(row):
    if row['NATIONALITY'] == 'RU' & row['CTRY_OF_RESIDENCE'] == 'Russia':
        return 'High Risk'
df['NEW COLUMN'] = df.apply(bucketing, axis=1)

2 Comments

Thanks, this is the error I receive: TypeError: unsupported operand type(s) for &: 'str' and 'str'
Would be good to get a short part of the DF for me to understand why you get that.
0

I would use a np.where() to get you what you are looking for

data = {'Name' : ['John Smith', 'Jane Doe'],
        'NATIONALITY':  ['RU', 'NA'],
        'CTRY_OF_RESIDENCE': ['Russia', 'America']
        }

df = pd.DataFrame(data)
df['new col'] = np.where((df['NATIONALITY'] == 'RU') & (df['CTRY_OF_RESIDENCE'] == 'Russia'), 'High Risk', '')
df

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.