4

For a given timedata - 2018-06-01 06:36:40.047883+00:00, I want to remove microsecond and strip the value after '+'. Most of my dataset contains values like 2018-06-04 11:30:00+00:00 without the microsecond part.

How to have a common date time format for all values?

3
  • 4
    What have you done so far? Commented Aug 27, 2018 at 6:07
  • 2
    What is the datatype of time data? Is it a string/ datetime?? And where are you stuck at? Commented Aug 27, 2018 at 6:22
  • The data type is in the string format Commented Aug 27, 2018 at 7:02

3 Answers 3

5

Let's say you have a mix of different formats that looks like this:

import pandas as pd

df = pd.DataFrame()
df['time'] = ['2018-06-01 06:36:40.047883+00:00', '2018-06-01 06:36:40.047883+00:00', '2018-06-04 11:30:00+00:00', '2018-06-01 06:36:40.047883']

Corresponding output:

                               time
0  2018-06-01 06:36:40.047883+00:00
1  2018-06-01 06:36:40.047883+00:00
2         2018-06-04 11:30:00+00:00
3        2018-06-01 06:36:40.047883

You wish to get to a common format by removing microseconds and anything after +. In short, you want something that is in Y-M-D H-M-S format.

Currently, let me assume that your column is in string format. So, we now convert this to a datetime format and then replace the microseconds part with 0 and get rid of it.

df['time'] = pd.to_datetime(df['time'])
df['time'] = df['time'].apply(lambda x: x.replace(microsecond = 0))

Output:

                 time
0 2018-06-01 06:36:40
1 2018-06-01 06:36:40
2 2018-06-04 11:30:00
3 2018-06-01 06:36:40
Sign up to request clarification or add additional context in comments.

1 Comment

Very nice solution using Pandas - Can be used for a large dataset.
2

Another way to achieve that is by using str.split:

t = "2018-06-04 11:30:00+00:00"
t.split('+')[0]

Comments

1

I'm answering your question with an assumption that the type of the data is a string.

If you are facing problem in handling in different formats like "2018-06-01 06:36:40.047883+00:00" and "2018-06-04 11:30:00+00:00" you can use split(). Learn more about split() at here

str_data_time.split("+")[0].split(".")[0]

Like,

for str_data_time in ["2018-06-01 06:36:40.047883+00:00", "2018-06-04 11:30:00+00:00"]:
    output = str_data_time.split("+")[0].split(".")[0]
    print(output)

The output of the above script is,

2018-06-01 06:36:40
2018-06-04 11:30:00

3 Comments

Thank you. Can you explain the index [0] near split. Sorry, I am new to this language and in the learning process
When you split a string, it splits into 2 parts. [0] corresponds to the first half of the split while [1] corresponds to the second half of the split.
Your assumption was solid, and nailed my use-case. A great example of the legendary Open Source community. Definitely warrants the accepted answer. Thanks for saving me time.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.