0

I have a large dataframe with the column DOB and ID:

import pandas as pd 
df = pd.read_csv('data.csv')

df.head() 

ID      DOB
223725  1975.0
223725  1975.0
223725  1975.0
223725  1975.0
223725  1975.0

There are 63 different years in DOB. I want to change the values in this column so that each year is replaced by a simple number. For example, the lowest value or year 1911 is changed to a value of 1, the 2nd lowest value in DOB is replaced by 2, the 3rd lowest by 3 etc.

How do I make this change fast?

1 Answer 1

2

You can use Series.rank:

df['DOB1'] = df['DOB'].rank(method='dense')
print (df)
       ID     DOB  DOB1
0  223725  1911.0   1.0
1  223725  2000.0   3.0
2  223725  2006.0   4.0
3  223725  1985.0   2.0
4  223725  1911.0   1.0
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.