7

I have the following dataframe using pandas

df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'], 
                   'Date0': ['01/01/1999','01/06/1999','01/01/1979'], 'Age0': [29,44,21],
                   'Date1': ['08/01/1999','07/01/2014','01/01/2016'],'Age1': [35, 45, 47],
                   'Date2': [None,'01/06/2035','08/01/1979'],'Age2': [47, None, 74],
                   'Last_age': [47,45,74]})

I would like to add new column to get the date corresponding to the value presents in 'Last_age' for each row to get something like that :

df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'], 
                   'Date0': ['01/01/1999','01/06/1999','01/01/1979'], 'Age0': [29,44,21],
                   'Date1': ['08/01/1999','07/01/2014','01/01/2016'],'Age1': [35, 45, 47],
                   'Date2': [None,'01/06/2035','08/01/1979'],'Age2': [47, None, 74],
                   'Last_age': [47,45,74],
                   'Last_age_date': ['Error no date','07/01/2014','08/01/1979']})
0

3 Answers 3

3

I will just using wide_to_long reshape your df

s=pd.wide_to_long(df.reset_index(),['Date','Age'],i=['Last_age','index'],j='Drop')
s.loc[s.Age==s.index.get_level_values(0),'Date']
Out[199]: 
Last_age  index  Drop
47        0      2             None
45        1      1       07/01/2014
74        2      2       08/01/1979
Name: Date, dtype: object
df['Last_age_date']=s.loc[s.Age==s.index.get_level_values(0),'Date'].values
df
Out[201]: 
  Last_Name       Date0  Age0      ...       Age2  Last_age Last_age_date
0     Smith  01/01/1999    29      ...       47.0        47          None
1      None  01/06/1999    44      ...        NaN        45    07/01/2014
2     Brown  01/01/1979    21      ...       74.0        74    08/01/1979
[3 rows x 9 columns]
Sign up to request clarification or add additional context in comments.

2 Comments

this should be higher as this is the only vectorized approach amongst all the answers
@JohnJohn first look at wide_to_long pandas.pydata.org/pandas-docs/stable/reference/api/…
1

Something like this should do what you are looking for:

# get the age and column rows (you might have more than just the 2)
age_columns = [c for c in df.columns if 'Age' in c][::-1]
date_columns = [c for c in df.columns if 'Date' in c][::-1]

def get_last_age_date(row):
    for age, date in zip(age_columns, date_columns):
        if not np.isnan(row[age]):
            return row[date]
    return np.nan

# apply the function to all the rows in the dataframe
df['Last_age_date'] = df.apply(lambda row: get_last_age_date(row), axis=1)

# fix the NaN values to say 'Error no date'
df.Last_age_date.where(~df.Last_age_date.isna(), 'Error no date', inplace=True)
print(df)

4 Comments

Thanks it works. May I ask how the function works? I understand that I get two lists (one for age and one for date) that you check if the value for row[age] is null or not and if not you get the date. But I don't understand how you get the right date.
you can simply get the Date and Age columns by doing df.filter(regex='Date|Age')
Sorry I mean I don't understand how you get the right date in your code. Does the zip function takes the first elements in each list, then the second and so on?
No. So zip is a super handy generator that finds corresponding elements in adjacent lists. So list1= [1,2,3,4] and list2=[a,b,c,d] then zip(list1, list2) = [(1,a),(2,b),(3,c),(4,d)]. The actual matching happens in the if statement. It says, if I find an age and the value is not None then return the corresponding date. If I find nothing (the last line in the function) then return NaN.
0

Welcome to Stackoverflow! You can write a small function and achieve this. Your input dataframe looks like this.

df
        Last_Name       Date0  Age0       Date1  Age1       Date2  Age2  Last_age
      0     Smith  01/01/1999    29  08/01/1999    35        None  47.0        47
      1      None  01/06/1999    44  07/01/2014    45  01/06/2035   NaN        45
      2     Brown  01/01/1979    21  01/01/2016    47  08/01/1979  74.0        74

Write a function like this:

def last_Age(row):
    if row['Last_age'] == row['Age2']:
        return row['Date2']
    elif row['Last_age'] == row['Age1']:
        return row['Date1']
    elif row['Last_age'] == row['Age0']:
        return row['Date0']
df['Last_age_date']=df.apply(last_Age, axis = 1)
df
   Last_Name       Date0  Age0       Date1  Age1       Date2  Age2  Last_age  Last_age_date
 0     Smith  01/01/1999    29  08/01/1999    35        None  47.0        47          None
 1      None  01/06/1999    44  07/01/2014    45  01/06/2035   NaN        45    07/01/2014
 2     Brown  01/01/1979    21  01/01/2016    47  08/01/1979  74.0        74    08/01/1979

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.