0

I have a column with following data :

df['Exp'] = ['10+ years', '8 years', '6 years', '7 years', '5 years','1 year', '< 1 year', '4 years', '3 years', '2 years', '9 years']

I need to convert this column to int format.

How can I do it?

Thanks !

3
  • pd.factorize()? Commented Dec 11, 2020 at 19:15
  • Do you need to differentiate between 1 year, < 1 year or should they both be 1? Commented Dec 11, 2020 at 19:17
  • no need, just 1 <year = 1, 2 = 2 years, etc. Commented Dec 11, 2020 at 19:20

2 Answers 2

1
import pandas as pd
df = pd.DataFrame({'Exp': ['10+ years', '8 years', '6 years', '7 years', '5 years','1 year', '< 1 year', '4 years', '3 years', '2 years', '9 years']})    
df['Exp'] = df['Exp'].replace('\D','', regex=True).astype(int)

Output

Exp
0   10
1   8
2   6
3   7
4   5
5   1
6   1
7   4
8   3
9   2
10  9
Sign up to request clarification or add additional context in comments.

Comments

1

This should do the trick:

df.Exp.str.extract('(\d{1,})').astype(int)

For clarity the \d grabs any numeric string and the {1,} ensures there is at least one.

EDIT: (Sorry didn't read the question right) To convert it you could do:

df['Exp'] = df.Exp.str.extract('(\d{1,})').astype(int)

Assuming you want empty rows filled with minus one then you could do:

df['Exp'] = df.Exp.str.extract('(\d{1,})').fillna(-1).astype(int)

4 Comments

when i tried this method the following error pop up: ValueError: cannot convert float NaN to integer
@Nlk What is the desired outcome when there is an empty row?
well, there are about 1000 rows with the data, so for each row need to be converted to proper int. like [1,1,2,10,0.....etc]
Makes sense, I updated the answer to fill blank rows with -1.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.