How to find the index of a value by row in a dataframe in python and extract the value of the following column

Question

I have the following dataframe using pandas

df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'], 
                   'Date0': ['01/01/1999','01/06/1999','01/01/1979'], 'Age0': [29,44,21],
                   'Date1': ['08/01/1999','07/01/2014','01/01/2016'],'Age1': [35, 45, 47],
                   'Date2': [None,'01/06/2035','08/01/1979'],'Age2': [47, None, 74],
                   'Last_age': [47,45,74]})

I would like to add new column to get the date corresponding to the value presents in 'Last_age' for each row to get something like that :

df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'], 
                   'Date0': ['01/01/1999','01/06/1999','01/01/1979'], 'Age0': [29,44,21],
                   'Date1': ['08/01/1999','07/01/2014','01/01/2016'],'Age1': [35, 45, 47],
                   'Date2': [None,'01/06/2035','08/01/1979'],'Age2': [47, None, 74],
                   'Last_age': [47,45,74],
                   'Last_age_date': ['Error no date','07/01/2014','08/01/1979']})

BENY · Accepted Answer · 2019-03-22 13:55:32Z

I will just using wide_to_long reshape your df

s=pd.wide_to_long(df.reset_index(),['Date','Age'],i=['Last_age','index'],j='Drop')
s.loc[s.Age==s.index.get_level_values(0),'Date']
Out[199]: 
Last_age  index  Drop
47        0      2             None
45        1      1       07/01/2014
74        2      2       08/01/1979
Name: Date, dtype: object
df['Last_age_date']=s.loc[s.Age==s.index.get_level_values(0),'Date'].values
df
Out[201]: 
  Last_Name       Date0  Age0      ...       Age2  Last_age Last_age_date
0     Smith  01/01/1999    29      ...       47.0        47          None
1      None  01/06/1999    44      ...        NaN        45    07/01/2014
2     Brown  01/01/1979    21      ...       74.0        74    08/01/1979
[3 rows x 9 columns]

this should be higher as this is the only vectorized approach amongst all the answers
@JohnJohn first look at wide_to_long pandas.pydata.org/pandas-docs/stable/reference/api/…

nick · Accepted Answer · 2019-03-22 13:39:52Z

1

Something like this should do what you are looking for:

# get the age and column rows (you might have more than just the 2)
age_columns = [c for c in df.columns if 'Age' in c][::-1]
date_columns = [c for c in df.columns if 'Date' in c][::-1]

def get_last_age_date(row):
    for age, date in zip(age_columns, date_columns):
        if not np.isnan(row[age]):
            return row[date]
    return np.nan

# apply the function to all the rows in the dataframe
df['Last_age_date'] = df.apply(lambda row: get_last_age_date(row), axis=1)

# fix the NaN values to say 'Error no date'
df.Last_age_date.where(~df.Last_age_date.isna(), 'Error no date', inplace=True)
print(df)

answered Mar 22, 2019 at 13:39

nick

1,3808 silver badges15 bronze badges

4 Comments

John John Over a year ago

Thanks it works. May I ask how the function works? I understand that I get two lists (one for age and one for date) that you check if the value for row[age] is null or not and if not you get the date. But I don't understand how you get the right date.

gold_cy Over a year ago

you can simply get the Date and Age columns by doing df.filter(regex='Date|Age')

John John Over a year ago

Sorry I mean I don't understand how you get the right date in your code. Does the zip function takes the first elements in each list, then the second and so on?

nick Over a year ago

No. So zip is a super handy generator that finds corresponding elements in adjacent lists. So list1= [1,2,3,4] and list2=[a,b,c,d] then zip(list1, list2) = [(1,a),(2,b),(3,c),(4,d)]. The actual matching happens in the if statement. It says, if I find an age and the value is not None then return the corresponding date. If I find nothing (the last line in the function) then return NaN.

Karthik Katragadda · Accepted Answer · 2019-03-22 14:24:45Z

Welcome to Stackoverflow! You can write a small function and achieve this. Your input dataframe looks like this.

df
        Last_Name       Date0  Age0       Date1  Age1       Date2  Age2  Last_age
      0     Smith  01/01/1999    29  08/01/1999    35        None  47.0        47
      1      None  01/06/1999    44  07/01/2014    45  01/06/2035   NaN        45
      2     Brown  01/01/1979    21  01/01/2016    47  08/01/1979  74.0        74

Write a function like this:

def last_Age(row):
    if row['Last_age'] == row['Age2']:
        return row['Date2']
    elif row['Last_age'] == row['Age1']:
        return row['Date1']
    elif row['Last_age'] == row['Age0']:
        return row['Date0']
df['Last_age_date']=df.apply(last_Age, axis = 1)
df
   Last_Name       Date0  Age0       Date1  Age1       Date2  Age2  Last_age  Last_age_date
 0     Smith  01/01/1999    29  08/01/1999    35        None  47.0        47          None
 1      None  01/06/1999    44  07/01/2014    45  01/06/2035   NaN        45    07/01/2014
 2     Brown  01/01/1979    21  01/01/2016    47  08/01/1979  74.0        74    08/01/1979

Collectives™ on Stack Overflow

How to find the index of a value by row in a dataframe in python and extract the value of the following column

3 Answers 3

2 Comments

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

4 Comments

Comments

Related