Add new column to a dataframe with values based on multiple conditions

Question

df = pd.DataFrame({'salary': [2000,5000,7000, 3500, 8000],'rate':[2,4,6.5,7,5],'other':[4000,2500,4200, 5000,3000],
                'name':['bob','sam','ram','jam','flu'], 'last_name' :['bob','gan','ram', np.nan, 'flu' ]})

I have a dataframe as df1 and I need to populate the new column with values based on below conditions:

If 'name' is equal to 'last_name' then 'salary'+'other'
If 'last_name' is null then 'salary'+'other'
If 'name' is not equal to 'last_name' then ('rate' * 'other')+'salary'

I tried the below code but it is not giving the correct result:

if np.where(df["name"] == df["last_name"]) is True:
    df['new_col'] = df['salary'] + df['other']
else:
    df['new_col'] = (df['rate'] * df['other']) + df['salary']

Steven Rouk · Accepted Answer · 2020-08-19 23:11:10Z

You can do these one at a time using pandas DataFrame filtering. When you do something like df["name"] == df["last_name"], you create a boolean Series (called a "mask") that you can then use to index into the DataFrame.

# condition 1 - name == last name
name_equals_lastname = df["name"] == df["last_name"]  # first, create the boolean mask
df.loc[name_equals_lastname, "new_col"] = df["salary"] + df["other"]  # then, use the mask to index into the DataFrame at the correct positions and just set those values

# condition 2 - last name is null
last_name_is_null = df["last_name"].isnull()
df.loc[last_name_is_null, "new_col"] = df["salary"] + df["other"]

# condition 3 - name != last name
name_not_equal_to_last_name = df["name"] != df["last_name"]
df.loc[name_not_equal_to_last_name, "new_col"] = (df["rate"] * df["other"]) + df["salary"]

You could also use df.apply() with a custom function, like this:

def my_logic(row):
    if row["name"] == row["last_name"]:
        return row["salary"] + row["other"]
    elif ...  # you can fill in the rest of the logic here

df["new_col"] = df.apply(my_logic, axis=1)  # you need axis=1 to pass rows rather than columns

Thanks. but don't we have any other solution using if statements..Just asking
@mathew -- The only solution using an "if" statement that I might recommend is writing a function with your logic and then passing it to df.apply(my_func) (where my_func contains your logic). While there might be another if/else solution, pandas isn't really meant to be used that way. I'll add the df.apply() method to my solution right now though, so you can see it.

Andy L. · Accepted Answer · 2020-08-19 23:22:56Z

0

As your conditions, you don't need if-else. Just use np.where with combined boolean masks

c1 = df["name"] == df["last_name"]
c2 = df["last_name"].isna()

df['new_col'] = np.where(c1 | c2,
                         df['salary'] + df['other'],
                         df['rate'] * df['other'] + df['salary'])

Out[159]:
   salary  rate  other name last_name  new_col
0    2000   2.0   4000  bob       bob   6000.0
1    5000   4.0   2500  sam       gan  15000.0
2    7000   6.5   4200  ram       ram  11200.0
3    3500   7.0   5000  jam       NaN   8500.0
4    8000   5.0   3000  flu       flu  11000.0

answered Aug 19, 2020 at 23:22

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

2 Comments

mathew Over a year ago

@Andy..I found below error in your suggestion ..AttributeError: 'Series' object has no attribute 'isna'

Andy L. Over a year ago

It seems you have an old version of pandas . In that case, try isnull as c2 = df["last_name"].isnull()

Collectives™ on Stack Overflow

Add new column to a dataframe with values based on multiple conditions

2 Answers 2

2 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Related