0

I have a dataset 'df' like so:

a     b     c    e  f  g
1     Nan   Nan  2  3  4
Nan   1     Nan  2  3  4
Nan   Nan   1    2  3  4

I want to create a new column 'd' in my dataframe 'df' like so:

   a     b     c    d  e  f  g 
   1     Nan   Nan  1  2  3  4
   Nan   1     Nan  1  2  3  4
   Nan   Nan   1    1  2  3  4

So basically, 'd' is the amalgamation of columns 'a', 'b', and 'c' for any non-NaN values those three columns have. Between 'a', 'b', and 'c', if one column is non-NaN, the other columns are NaN. I need columns 'e', 'f', and 'g' in my dataset, but they do not have anything to do with constructing column 'd'

I created this function below:

def func(df):
    if df['a'] is not None:
        return df['a']
    if df['b'] is not None:
        return df['b']
    if df['c'] is not None:
        return df['c']

and then used concatenated_df['d']=concatenated_df.apply(func, axis=1) to apply the conditional statements for each row of the dataframe. However, I am only getting the first if statement to successfully execute. The rest returns NaN values to the new column 'd'

I also tried modifying the function to

def func(df):
    if df['a'] is not None:
        return df['a']
    elif df['b'] is not None:
        return df['b']
    elif df['c'] is not None:
        return df['c']

but it still only executed the first conditional statement

5
  • So you just want the first value that isn't None from a b and c? Commented Jul 9, 2018 at 18:51
  • I will update the question with an example to make it more clear Commented Jul 9, 2018 at 18:51
  • 1
    Are you looking for not-None or not-NaN? (They are not the same.) Commented Jul 9, 2018 at 18:51
  • @DyZ maybe that is my issue let me try fixing that Commented Jul 9, 2018 at 18:53
  • Possible duplicate of stackoverflow.com/questions/31828240/… Commented Jul 9, 2018 at 19:02

2 Answers 2

2

You can use first_valid_index

df['d'] = df.apply(lambda x: x[x.first_valid_index()], axis=1)

     a    b    c  d
0    1  NaN  NaN  1
1  NaN    1  NaN  1
2  NaN  NaN    1  1
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for this answer - I updated my question to be more specific - there are columns in my dataset that I don't want to use for constructing column 'd' (I only want to use columns 'a', 'b', and 'c'), so I think the other answer is better for my purposes. Thanks for your help though!
If you'd like to apply this to a subset, you can just index your dataframe first: df[['a', 'b', 'c']].apply(lambda x: x[x.first_valid_index()], axis=1)
1

One more option not mentioned in the duplicated question:

df['d'] = df['a'].combine_first(df['b']).combine_first(df['c'])
#0    1.0
#1    1.0
#2    1.0

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.