1

I am trying to create a new column in a pandas dataframe that is the score for the the same id in the next year. See the sample original data below:

Year  ID    Score
2018  785   8.4 
2018  770   -1.2
2017  733   3.2
2017  785   7.9
2018  733   3.9

If there is not data for the next year it should fill with an NA. So the output I'm looking for would be:

Year  ID    Score col
2018  785   8.4   NA
2018  770   -1.2  NA
2017  733   3.2   3.9
2017  785   7.9   8.4
2018  733   3.9   NA

The data is not currently ordered.

1 Answer 1

3

If your data has consecutive years for all ID's (no ID with 2016 2018), then you can do:

 df['col'] = df.sort_values('Year').groupby('ID').Score.shift(-1)

Output:

   Year   ID  Score  col
0  2018  785    8.4  NaN
1  2018  770   -1.2  NaN
2  2017  733    3.2  3.9
3  2017  785    7.9  8.4
4  2018  733    3.9  NaN

If years are not guaranteed to be consecutive, then do a merge:

df.merge(df.assign(Year=lambda x: x.Year - 1),
         on=['Year', 'ID'],
         suffixes = ['','_new'],
         how='left')

Output:

   Year   ID  Score  Score_new
0  2018  785    8.4        NaN
1  2018  770   -1.2        NaN
2  2017  733    3.2        3.9
3  2017  785    7.9        8.4
4  2018  733    3.9        NaN
Sign up to request clarification or add additional context in comments.

2 Comments

Awesome! And if there is a gap in years will it just be an NaN?
The first solution won't work, it doesn't care about year. See the update

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.