1

I have a column in my pandas dataframe called last_pymnt which has dates in the format of 17-Mar, 13-Dec, etc. doing a string replace will be too tedious since there are so many unique dates so I tried to create a dictionary to replace wherever we see the month name with an integer however it does not seem to work. This is what I have.

integers = {'-Jan': 1, '-Feb': 2, '-Mar': 3, '-Apr': 4, '-May': 5, '-Jun': 6, '-Jul': 7, '-Aug': 8, 
'-Sep': 9, '-Oct': 10, '-Nov': 11, '-Dec': 12,}

data.replace({'-Jan': integers, '-Feb': integers, '-Mar': integers, '-Apr': integers, '-May': 
integers, '-Jun': integers, '-Jul': integers, '-Aug': integers, '-Sep': integers, '-Oct': integers, 
'-Nov': integers, '-Dec': integers})

The output was suppose to go throughout the entire dateframe and replace the partial matches with an integer so after running the code the date of 17-Mar should have given the output 173 but I still get the result of 17-Mar

2 Answers 2

1

IICU I would avoid handling dates and datetimes otherwise.

For instance;

Data

df=pd.DataFrame({'last_pymnt':['17-Mar', '12-Dec']})
df

I would go;

df['last_pymnt'] = pd.to_datetime(df['last_pymnt'], format='%d-%b').dt.strftime('%m-%d')
df

If isnt working for what you want try

df=pd.DataFrame({'last_pymnt':['17-Mar', '12-Dec']})
df.last_pymnt=df.last_pymnt.str.replace('-','')
df['last_pymnt'] = pd.to_datetime(df['last_pymnt'], format='%d%b').dt.strftime('%d%m')

Output

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

0

You can do this with regular expressions.
The parentheses/brackets around \d+ make that a captured group which you then reference with \1 in the substitution string.

import re

df = pd.DataFrame({'last_pymnt':['17-Mar','13-Dec']})
repl_dict = {re.compile(r'^(\d+)[-]Jan$'):r'\1 1', 
             re.compile(r'^(\d+)[-]Feb$'):r'\1 2', 
             re.compile(r'^(\d+)[-]Mar$'):r'\1 3', 
             re.compile(r'^(\d+)[-]Apr$'):r'\1 4', 
             re.compile(r'^(\d+)[-]May$'):r'\1 5', 
             re.compile(r'^(\d+)[-]Jun$'):r'\1 6', 
             re.compile(r'^(\d+)[-]Jul$'):r'\1 7', 
             re.compile(r'^(\d+)[-]Aug$'):r'\1 8', 
             re.compile(r'^(\d+)[-]Sep$'):r'\1 9', 
             re.compile(r'^(\d+)[-]Oct$'):r'\1 10', 
             re.compile(r'^(\d+)[-]Nov$'):r'\1 11', 
             re.compile(r'^(\d+)[-]Dec$'):r'\1 12',}  
df['last_pymnt_repl'] = df['last_pymnt'].replace(repl_dict,regex=True).str.replace('\s+','')

Result:

In [149]: df                                                                                        
Out[149]: 
  last_pymnt last_pymnt_repl
0     17-Mar             173
1     13-Dec            1312

1 Comment

Thank you but this code only worked for 17-Mar but for the other dates I get NAN. For example I also have dates such as 16-Aug, 16-Jun, 17-Oct, 18-May, etc but instead of replacing the 16-Aug to 168 it now says NAN for every other value except 17-Mar and 13-Dec

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.