Handle ValueError while creating date in pd

Question

I'm reading a csv file with p, day, month, and put it in a df. The goal is to create a date from day, month, current year, and I run into this error for 29th of Feb:

ValueError: cannot assemble the datetimes: day is out of range for month

I would like when this error occurs, to replace the day by the day before. How can we do that? Below are few lines of my pd and datex at the end is what I would like to get

        p  day month  year datex
0      p1  29    02  2021  28Feb-2021
1      p2  18    07  2021  18Jul-2021
2      p3  12    09  2021  12Sep-2021

Right now, my code for the date is only the below, so I have nan where the date doesn't exist.

df['datex'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')

I can't, because after that I want to calculate the difference of days between this datex and another date which is in 2021.. — owce
– owce, Commented Mar 5, 2021 at 9:28
Well, you could use a conditional, like if month == 2 && day == 29 then use day = 28 — FObersteiner
– FObersteiner, Commented Mar 5, 2021 at 9:38

LeoGlt · Accepted Answer · 2021-03-05 09:44:51Z

You could try something like this :

df['datex'] = pd.to_datetime(df[['year', 'month', 'day']], errors='coerce')

Indeed, you get NA :

    p  day  year  month      datex
0  p1   29  2021      2        NaT
1  p2   18  2021      7 2021-07-18
2  p3   12  2021      9 2021-09-12

You could then make a particular case for these NA :

df.loc[df.datex.isnull(), 'previous_day'] = df.day -1

    p  day  year  month      datex  previous_day
0  p1   29  2021      2        NaT          28.0
1  p2   18  2021      7 2021-07-18           NaN
2  p3   12  2021      9 2021-09-12           NaN

df.loc[df.datex.isnull(), 'datex'] = pd.to_datetime(df[['previous_day', 'year', 'month']].rename(columns={'previous_day': 'day'}))

    p  day  year  month      datex  previous_day
0  p1   29  2021      2 2021-02-28          28.0
1  p2   18  2021      7 2021-07-18           NaN
2  p3   12  2021      9 2021-09-12           NaN

You have to create a new day column if you want to keep day = 29 in the day column.

why not do a simple cleanup first? e.g. df['day'] = np.where((df['day']==29)&(df['month']==2), df['day']-1, df['day']) - don't need extra column nor special treatment of NaT.
It seems that the day columns must still contains 29 for this row in the expected output. But indeed, if it is okay to have the new day (28) for this row after the process, your code is clearer.

Collectives™ on Stack Overflow

Handle ValueError while creating date in pd

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related