Python Pandas Conditional Only Executes First "If" Statement

Question

I have a dataset 'df' like so:

a     b     c    e  f  g
1     Nan   Nan  2  3  4
Nan   1     Nan  2  3  4
Nan   Nan   1    2  3  4

I want to create a new column 'd' in my dataframe 'df' like so:

   a     b     c    d  e  f  g 
   1     Nan   Nan  1  2  3  4
   Nan   1     Nan  1  2  3  4
   Nan   Nan   1    1  2  3  4

So basically, 'd' is the amalgamation of columns 'a', 'b', and 'c' for any non-NaN values those three columns have. Between 'a', 'b', and 'c', if one column is non-NaN, the other columns are NaN. I need columns 'e', 'f', and 'g' in my dataset, but they do not have anything to do with constructing column 'd'

I created this function below:

def func(df):
    if df['a'] is not None:
        return df['a']
    if df['b'] is not None:
        return df['b']
    if df['c'] is not None:
        return df['c']

and then used concatenated_df['d']=concatenated_df.apply(func, axis=1) to apply the conditional statements for each row of the dataframe. However, I am only getting the first if statement to successfully execute. The rest returns NaN values to the new column 'd'

I also tried modifying the function to

def func(df):
    if df['a'] is not None:
        return df['a']
    elif df['b'] is not None:
        return df['b']
    elif df['c'] is not None:
        return df['c']

but it still only executed the first conditional statement

So you just want the first value that isn't None from a b and c? — user3483203
– user3483203, Commented Jul 9, 2018 at 18:51
I will update the question with an example to make it more clear — crayfishcray
– crayfishcray, Commented Jul 9, 2018 at 18:51
Are you looking for not-None or not-NaN? (They are not the same.) — DYZ
– DYZ, Commented Jul 9, 2018 at 18:51
Possible duplicate of stackoverflow.com/questions/31828240/… — DYZ
– DYZ, Commented Jul 9, 2018 at 19:02

user3483203 · Accepted Answer · 2018-07-09 19:01:25Z

2

You can use first_valid_index

df['d'] = df.apply(lambda x: x[x.first_valid_index()], axis=1)

     a    b    c  d
0    1  NaN  NaN  1
1  NaN    1  NaN  1
2  NaN  NaN    1  1

answered Jul 9, 2018 at 19:01

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

crayfishcray Over a year ago

thanks for this answer - I updated my question to be more specific - there are columns in my dataset that I don't want to use for constructing column 'd' (I only want to use columns 'a', 'b', and 'c'), so I think the other answer is better for my purposes. Thanks for your help though!

user3483203 Over a year ago

If you'd like to apply this to a subset, you can just index your dataframe first: df[['a', 'b', 'c']].apply(lambda x: x[x.first_valid_index()], axis=1)

DYZ · Accepted Answer · 2018-07-09 19:04:54Z

1

One more option not mentioned in the duplicated question:

df['d'] = df['a'].combine_first(df['b']).combine_first(df['c'])
#0    1.0
#1    1.0
#2    1.0

answered Jul 9, 2018 at 19:04

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Collectives™ on Stack Overflow

Python Pandas Conditional Only Executes First "If" Statement

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related