0

I have dataframe that looks like below:

         Date      Region     Data
   0   200201        A        8.8
   1   200201        B        14.3
                    ...     
1545   202005        C        7.3
1546   202005        D        131

I wanted to convert the Date column(data type: object) to DateTime index without time. yyyymm or yyyymmdd or yyyy-mm-dd all of these don't matter as long as I can erase the time part.

I've searched stackoverflow and tried these codes

# (1) 
df["Date"] = pd.to_datetime(df["Date"], format = "%Y%m", errors = "coerce", uts = False)
# (2)
df["Date"] = pd.to_datetime(df["Date"], format = "%Y%m")
df["Date"] = df["Date"].dt.normalize()
# (3)
df["Date"] = pd.to_datetime(df["Date"], format = "%Y%m")
df["Date"] = df["Date"].dt.date

For (1) and (2), I get ["Date"] with time like yyyy-mm-dd 00:00:00.

For (3), I do get ["Date"] as yyyymm but the dtype is object.

I can't use date range because same date is repeated for some time.

Will there be any way to convert yyyymm[object] to yyyymmdd[datetime] in python?

Thanks in advance.

4
  • 1
    Welcome to SO! It's generally better if you insert the output of your program as text into your question instead of linking screenshots. For more information check out the how to ask page. Commented Jul 24, 2020 at 3:01
  • In my system the commands are working correctly Commented Jul 24, 2020 at 3:03
  • @a.deshpande012 Okay, thanks for your tips! Commented Jul 24, 2020 at 4:23
  • @bigbounty I think something's wrong with my spyder program. Thanks! Commented Jul 24, 2020 at 4:23

2 Answers 2

1

It could be a display configuration issue on how your DataFrames are showing in your editor. The simplest way to get the data in the right format is:

df['Date'] = pd.to_datetime(df['Date'], format = '%Y%m')

Below are the results from repl.it with your DataFrame and this code. The date is properly formatted without the time component, and it has the proper dtype.

        Date Region  Data
0 2002-01-01      A   8.8

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    1 non-null      datetime64[ns]
 1   Region  1 non-null      object        
 2   Data    1 non-null      float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 152.0+ bytes

You can also try a more convoluted way of going from datetime to date string and back to datetime.

df['Date'] = pd.to_datetime(df['Date'], format = '%Y%m').dt.date
df['Date'] = df['Date'].astype('datetime64[ns]')

The final display and dtypes are the same.

        Date Region  Data
0 2002-01-01      A   8.8

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    1 non-null      datetime64[ns]
 1   Region  1 non-null      object        
 2   Data    1 non-null      float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 152.0+ bytes
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I think something's wrong with my spyder. I tried the same code in colab and it shows in yyyy-mm-dd format. Thank you so much for your help and great explanation. I really do appreciate it.
0

The Date column in the question has the format YYYYMM (but no days). The function pd.to_datetime() implicitly sets the day to 1.

The function pd.Period() converts dates in the format YYYYMM to pandas periods. Note that df['Date'] can be strings or 6-digit integers.

df['Date'].apply(lambda x: pd.Period(x, freq='M'))

0    2002-01
1    2002-01
2    2020-05
3    2020-05
Name: Date, dtype: period[M]

1 Comment

Thanks for your help! I really appreciate it!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.