Create new column in a DataFrame using values from a different row

Question

I am trying to create a new column in a pandas dataframe that is the score for the the same id in the next year. See the sample original data below:

Year  ID    Score
2018  785   8.4 
2018  770   -1.2
2017  733   3.2
2017  785   7.9
2018  733   3.9

If there is not data for the next year it should fill with an NA. So the output I'm looking for would be:

Year  ID    Score col
2018  785   8.4   NA
2018  770   -1.2  NA
2017  733   3.2   3.9
2017  785   7.9   8.4
2018  733   3.9   NA

The data is not currently ordered.

Quang Hoang · Accepted Answer · 2019-10-17 20:45:23Z

3

If your data has consecutive years for all ID's (no ID with 2016 2018), then you can do:

 df['col'] = df.sort_values('Year').groupby('ID').Score.shift(-1)

Output:

   Year   ID  Score  col
0  2018  785    8.4  NaN
1  2018  770   -1.2  NaN
2  2017  733    3.2  3.9
3  2017  785    7.9  8.4
4  2018  733    3.9  NaN

If years are not guaranteed to be consecutive, then do a merge:

df.merge(df.assign(Year=lambda x: x.Year - 1),
         on=['Year', 'ID'],
         suffixes = ['','_new'],
         how='left')

Output:

   Year   ID  Score  Score_new
0  2018  785    8.4        NaN
1  2018  770   -1.2        NaN
2  2017  733    3.2        3.9
3  2017  785    7.9        8.4
4  2018  733    3.9        NaN

edited Oct 17, 2019 at 20:45

answered Oct 17, 2019 at 20:41

Quang Hoang

151k11 gold badges63 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bill K Over a year ago

Awesome! And if there is a gap in years will it just be an NaN?

Quang Hoang Over a year ago

The first solution won't work, it doesn't care about year. See the update

Collectives™ on Stack Overflow

Create new column in a DataFrame using values from a different row

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related