I need to parse RFC 3339 strings like "2008-09-03T20:56:35.450686Z" into Python's datetime type.
I have found strptime in the Python standard library, but it is not very convenient.
What is the best way to do this?
I need to parse RFC 3339 strings like "2008-09-03T20:56:35.450686Z" into Python's datetime type.
I have found strptime in the Python standard library, but it is not very convenient.
What is the best way to do this?
isoparse function from python-dateutilThe python-dateutil package has dateutil.parser.isoparse to parse not only RFC 3339 datetime strings like the one in the question, but also other ISO 8601 date and time strings that don't comply with RFC 3339 (such as ones with no UTC offset, or ones that represent only a date).
>>> import dateutil.parser
>>> dateutil.parser.isoparse('2008-09-03T20:56:35.450686Z') # RFC 3339 format
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=tzutc())
>>> dateutil.parser.isoparse('2008-09-03T20:56:35.450686') # ISO 8601 extended format
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686)
>>> dateutil.parser.isoparse('20080903T205635.450686') # ISO 8601 basic format
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686)
>>> dateutil.parser.isoparse('20080903') # ISO 8601 basic format, date only
datetime.datetime(2008, 9, 3, 0, 0)
The python-dateutil package also has dateutil.parser.parse. Compared with isoparse, it is presumably less strict, but both of them are quite forgiving and will attempt to interpret the string that you pass in. If you want to eliminate the possibility of any misreads, you need to use something stricter than either of these functions.
datetime.datetime.fromisoformatdateutil.parser.isoparse is a full ISO-8601 format parser, but in Python ≤ 3.10 fromisoformat is deliberately not. In Python 3.11, fromisoformat supports almost all strings in valid ISO 8601. See fromisoformat's docs for this cautionary caveat. (See this answer).
python-dateutil not dateutil, so: pip install python-dateutil.dateutil.parser is intentionally hacky: it tries to guess the format and makes inevitable assumptions (customizable by hand only) in ambiguous cases. So ONLY use it if you need to parse input of unknown format and are okay to tolerate occasional misreads.Since Python 3.11, the standard library’s datetime.datetime.fromisoformat supports most valid ISO 8601 input (and some non-valid input, see docs). In earlier versions it only parses a specific subset, see the cautionary note at the end of the docs. If you are using Python 3.10 or earlier on strings that don't fall into that subset (like in the question), see other answers for functions from outside the standard library.
The current docs (so exceptions listed are still valid for Python 3.13):
classmethod
datetime.fromisoformat(date_string):Return a
datetimecorresponding to a date_string in any valid ISO 8601 format, with the following exceptions:
- Time zone offsets may have fractional seconds.
- The T separator may be replaced by any single unicode character.
- Fractional hours and minutes are not supported.
- Reduced precision dates are not currently supported (YYYY-MM, YYYY).
- Extended date representations are not currently supported (±YYYYYY-MM-DD).
- Ordinal dates are not currently supported (YYYY-OOO).
Examples:
>>> from datetime import datetime >>> datetime.fromisoformat('2011-11-04') datetime.datetime(2011, 11, 4, 0, 0) >>> datetime.fromisoformat('20111104') datetime.datetime(2011, 11, 4, 0, 0) >>> datetime.fromisoformat('2011-11-04T00:05:23') datetime.datetime(2011, 11, 4, 0, 5, 23) >>> datetime.fromisoformat('2011-11-04T00:05:23Z') datetime.datetime(2011, 11, 4, 0, 5, 23, tzinfo=datetime.timezone.utc) >>> datetime.fromisoformat('20111104T000523') datetime.datetime(2011, 11, 4, 0, 5, 23) >>> datetime.fromisoformat('2011-W01-2T00:05:23.283') datetime.datetime(2011, 1, 4, 0, 5, 23, 283000) >>> datetime.fromisoformat('2011-11-04 00:05:23.283') datetime.datetime(2011, 11, 4, 0, 5, 23, 283000) >>> datetime.fromisoformat('2011-11-04 00:05:23.283+00:00') datetime.datetime(2011, 11, 4, 0, 5, 23, 283000, tzinfo=datetime.timezone.utc) >>> datetime.fromisoformat('2011-11-04T00:05:23+04:00') datetime.datetime(2011, 11, 4, 0, 5, 23, tzinfo=datetime.timezone(datetime.timedelta(seconds=14400)))New in version 3.7.
Changed in version 3.11: Previously, this method only supported formats that could be emitted by date.isoformat() or datetime.isoformat().
If you only need dates, and not datetimes, you can use datetime.date.fromisoformat:
>>> from datetime import date
>>> date.fromisoformat("2024-01-31")
datetime.date(2024, 1, 31)
datetime may contain a tzinfo, and thus output a timezone, but datetime.fromisoformat() doesn't parse the tzinfo ? seems like a bug ..isoformat. It doesn't accept the example in the question "2008-09-03T20:56:35.450686Z" because of the trailing Z, but it does accept "2008-09-03T20:56:35.450686".Z the input script can be modified with date_string.replace("Z", "+00:00").datetime.fromisoformat seems to expect another format. I just tested both versions and while it works fine with +00:00, I get "ValueError: Invalid isoformat string" with +0000.Note in Python 2.6+ and Py3K, the %f character catches microseconds.
>>> datetime.datetime.strptime("2008-09-03T20:56:35.450686Z", "%Y-%m-%dT%H:%M:%S.%fZ")
See issue here
datetime.datetime.strptime(timestamp, '%Y-%m-%dT%H:%M:%S.%f') so this did the trickZ, you'll get back a "naive" datetime object with no timezone, instead of "timezone-aware" one with UTC as the timezone, which would be more correct.As of Python 3.7, you can basically (caveats below) get away with using datetime.datetime.strptime to parse RFC 3339 datetimes, like this:
from datetime import datetime
def parse_rfc3339(datetime_str: str) -> datetime:
try:
return datetime.strptime(datetime_str, "%Y-%m-%dT%H:%M:%S.%f%z")
except ValueError:
# Perhaps the datetime has a whole number of seconds with no decimal
# point. In that case, this will work:
return datetime.strptime(datetime_str, "%Y-%m-%dT%H:%M:%S%z")
It's a little awkward, since we need to try two different format strings in order to support both datetimes with a fractional number of seconds (like 2022-01-01T12:12:12.123Z) and those without (like 2022-01-01T12:12:12Z), both of which are valid under RFC 3339. But as long as we do that single fiddly bit of logic, this works.
Some caveats to note about this approach:
T to separate the date from the time, even though RFC 3339 purports to be a profile of ISO 8601 and ISO 8601 does not allow this. If you want to support this silly quirk of RFC 3339, you could add datetime_str = datetime_str.replace(' ', 'T') to the start of the function.+0500 without a colon, which RFC 3339 does not support. If you don't merely want to parse known-to-be-RFC-3339 datetimes but also want to rigorously validate that the datetime you're getting is RFC 3339, use another approach or add in your own logic to validate the timezone offset format.2009-W01-1 is a valid ISO 8601 date.)%z specifier only matches timezones offsets like +0500 or -0430 or +0000, not RFC 3339 timezone offsets like +05:00 or -04:30 or Z.fromisoformat now parses Z directly:
from datetime import datetime
s = "2008-09-03T20:56:35.450686Z"
datetime.fromisoformat(s)
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=datetime.timezone.utc)
A simple option from one of the comments: replace 'Z' with '+00:00' - and use fromisoformat:
from datetime import datetime
s = "2008-09-03T20:56:35.450686Z"
datetime.fromisoformat(s.replace('Z', '+00:00'))
# datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=datetime.timezone.utc)
fromisoformat?Although strptime's %z can parse the 'Z' character to UTC, fromisoformat is faster by ~ x40 (or even ~x60 for Python 3.11):
from datetime import datetime
from dateutil import parser
s = "2008-09-03T20:56:35.450686Z"
# Python 3.11+
%timeit datetime.fromisoformat(s)
85.1 ns ± 0.473 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
# Python 3.7 to 3.10
%timeit datetime.fromisoformat(s.replace('Z', '+00:00'))
134 ns ± 0.522 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
%timeit parser.isoparse(s)
4.09 µs ± 5.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%timeit datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f%z')
5 µs ± 9.26 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
%timeit parser.parse(s)
28.5 µs ± 99.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
(Python 3.11.3 x64 on GNU/Linux)
See also: A faster strptime
fromisoformat parses +00:00 but not Z to aware datetime with tzinfo being UTC. If your input e.g. ends with Z+00:00, you can just remove the Z before feeding it into fromisoformat. Other UTC offsets like e.g. +05:30 will then be parsed to a static UTC offset (not an actual time zone).fromisoformat of datetime.date is more explicit: "Return a date corresponding to a date_string given in any valid ISO 8601 format... " ... and it gives some surprising examples of strings which work which are not simple YYYY-MM-DD.Try the iso8601 module; it does exactly this.
There are several other options mentioned on the WorkingWithTime page on the python.org wiki.
iso8601.parse_date("2008-09-03T20:56:35.450686Z")Starting from Python 3.7, strptime supports colon delimiters in UTC offsets (source). So you can then use:
import datetime
def parse_date_string(date_string: str) -> datetime.datetime
try:
return datetime.datetime.strptime(date_string, '%Y-%m-%dT%H:%M:%S.%f%z')
except ValueError:
return datetime.datetime.strptime(date_string, '%Y-%m-%dT%H:%M:%S%z')
EDIT:
As pointed out by Martijn, if you created the datetime object using isoformat(), you can simply use datetime.fromisoformat().
EDIT 2:
As pointed out by Mark Amery, I added a try..except block to account for missing fractional seconds.
datetime.fromisoformat() which handles strings like your input automatically: datetime.datetime.isoformat('2018-01-31T09:24:31.488670+00:00').datetime.fromisoformat() and datetime.isoformat()ValueError: time data '2018-01-31T09:24:31.488670+00:00' does not match format '%Y-%m-%dT%H:%M:%S.%f%z' that's due to %z not matching +00:00. However +0000 matches %z see python doc docs.python.org/3.6/library/…fromisoformat() and it can handle the Z timezone now: datetime.fromisoformat('2018-01-31T09:24:31Z') produces datetime.datetime(2018, 1, 31, 9, 24, 31, tzinfo=datetime.timezone.utc).What is the exact error you get? Is it like the following?
>>> datetime.datetime.strptime("2008-08-12T12:20:30.656234Z", "%Y-%m-%dT%H:%M:%S.Z")
ValueError: time data did not match format: data=2008-08-12T12:20:30.656234Z fmt=%Y-%m-%dT%H:%M:%S.Z
If yes, you can split your input string on ".", and then add the microseconds to the datetime you got.
Try this:
>>> def gt(dt_str):
dt, _, us= dt_str.partition(".")
dt= datetime.datetime.strptime(dt, "%Y-%m-%dT%H:%M:%S")
us= int(us.rstrip("Z"), 10)
return dt + datetime.timedelta(microseconds=us)
>>> gt("2008-08-12T12:20:30.656234Z")
datetime.datetime(2008, 8, 12, 12, 20, 30, 656234)
datetime.fromisoformat which will handle most iso8601 and rfc3339 formats. docs.python.org/3.11/library/…import re
import datetime
s = "2008-09-03T20:56:35.450686Z"
d = datetime.datetime(*map(int, re.split(r'[^\d]', s)[:-1]))
datetime.datetime(*map(int, re.findall('\d+', s))In these days, Arrow also can be used as a third-party solution:
>>> import arrow
>>> date = arrow.get("2008-09-03T20:56:35.450686Z")
>>> date.datetime
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=tzutc())
Just use the python-dateutil module:
>>> import dateutil.parser as dp
>>> t = '1984-06-02T19:05:00.000Z'
>>> parsed_t = dp.parse(t)
>>> print(parsed_t)
datetime.datetime(1984, 6, 2, 19, 5, tzinfo=tzutc())
dateutil.parser.parse will accept formats that are definitely not ISO 8601, like "Sat Oct 11 17:13:46 UTC 2003". If you specifically want ISO 8601 parsing, you would probably rather use dateutil.parse.isoparse instead, as Flimms's answer recommends.I have found ciso8601 to be the fastest way to parse ISO 8601 timestamps.
It also has full support for RFC 3339, and a dedicated function for strict parsing RFC 3339 timestamps.
Example usage:
>>> import ciso8601
>>> ciso8601.parse_datetime('2014-01-09T21')
datetime.datetime(2014, 1, 9, 21, 0)
>>> ciso8601.parse_datetime('2014-01-09T21:48:00.921000+05:30')
datetime.datetime(2014, 1, 9, 21, 48, 0, 921000, tzinfo=datetime.timezone(datetime.timedelta(seconds=19800)))
>>> ciso8601.parse_rfc3339('2014-01-09T21:48:00.921000+05:30')
datetime.datetime(2014, 1, 9, 21, 48, 0, 921000, tzinfo=datetime.timezone(datetime.timedelta(seconds=19800)))
The GitHub Repo README shows their speedup versus all of the other libraries listed in the other answers.
My personal project involved a lot of ISO 8601 parsing. It was nice to be able to just switch the call and go faster. :)
Edit: I have since become a maintainer of ciso8601. It's now faster than ever!
datetime.strptime() is the next fastest solution. Thanks for putting all that info together!datetime.strptime() is not a full ISO 8601 parsing library. If you are on Python 3.7, you can use the datetime.fromisoformat() method, which is a little more flexible. You might be interested in this more complete list of parsers which should be merged into the ciso8601 README soon.If you are working with Django, it provides the dateparse module that accepts a bunch of formats similar to ISO format, including the time zone.
If you are not using Django and you don't want to use one of the other libraries mentioned here, you could probably adapt the Django source code for dateparse to your project.
DateTimeField uses this when you set a string value.If you don't want to use dateutil, you can try this function:
def from_utc(utcTime,fmt="%Y-%m-%dT%H:%M:%S.%fZ"):
"""
Convert UTC time string to time.struct_time
"""
# change datetime.datetime to time, return time.struct_time type
return datetime.datetime.strptime(utcTime, fmt)
Test:
from_utc("2007-03-04T21:08:12.123Z")
Result:
datetime.datetime(2007, 3, 4, 21, 8, 12, 123000)
strptime. This is a bad idea because it will fail to parse any datetime with a different UTC offset and raise an exception. See my answer that describes how parsing RFC 3339 with strptime is in fact impossible.toISOString method. But there's no mention of the limitation to Zulu time dates in this answer, nor did the question indicate that that's all that's needed, and just using dateutil is usually equally convenient and less narrow in what it can parse.I've coded up a parser for the ISO 8601 standard and put it on GitHub: https://github.com/boxed/iso8601. This implementation supports everything in the specification except for durations, intervals, periodic intervals, and dates outside the supported date range of Python's datetime module.
Tests are included! :P
This works for stdlib on Python 3.2 onwards (assuming all the timestamps are UTC):
from datetime import datetime, timezone, timedelta
datetime.strptime(timestamp, "%Y-%m-%dT%H:%M:%S.%fZ").replace(
tzinfo=timezone(timedelta(0)))
For example,
>>> datetime.utcnow().replace(tzinfo=timezone(timedelta(0)))
... datetime.datetime(2015, 3, 11, 6, 2, 47, 879129, tzinfo=datetime.timezone.utc)
strptime. This is a bad idea because it will fail to parse any datetime with a different UTC offset and raise an exception. See my answer that describes how parsing RFC 3339 with strptime is in fact impossible.timezone.utc instead of timezone(timedelta(0)). Also, the code works in Python 2.6+ (at least) if you supply utc tzinfo object%Z for timezone in the most recent versions of Python.One straightforward way to convert an ISO 8601-like date string to a UNIX timestamp or datetime.datetime object in all supported Python versions without installing third-party modules is to use the date parser of SQLite.
#!/usr/bin/env python
from __future__ import with_statement, division, print_function
import sqlite3
import datetime
testtimes = [
"2016-08-25T16:01:26.123456Z",
"2016-08-25T16:01:29",
]
db = sqlite3.connect(":memory:")
c = db.cursor()
for timestring in testtimes:
c.execute("SELECT strftime('%s', ?)", (timestring,))
converted = c.fetchone()[0]
print("%s is %s after epoch" % (timestring, converted))
dt = datetime.datetime.fromtimestamp(int(converted))
print("datetime is %s" % dt)
Output:
2016-08-25T16:01:26.123456Z is 1472140886 after epoch
datetime is 2016-08-25 12:01:26
2016-08-25T16:01:29 is 1472140889 after epoch
datetime is 2016-08-25 12:01:29
An another way is to use specialized parser for ISO-8601 is to use isoparse function of dateutil parser:
from dateutil import parser
date = parser.isoparse("2008-09-03T20:56:35.450686+01:00")
print(date)
Output:
2008-09-03 20:56:35.450686+01:00
This function is also mentioned in the documentation for the standard Python function datetime.fromisoformat:
A more full-featured ISO 8601 parser, dateutil.parser.isoparse is available in the third-party package dateutil.
Django's parse_datetime() function supports dates with UTC offsets:
parse_datetime('2016-08-09T15:12:03.65478Z') =
datetime.datetime(2016, 8, 9, 15, 12, 3, 654780, tzinfo=<UTC>)
So it could be used for parsing ISO 8601 dates in fields within entire project:
from django.utils import formats
from django.forms.fields import DateTimeField
from django.utils.dateparse import parse_datetime
class DateTimeFieldFixed(DateTimeField):
def strptime(self, value, format):
if format == 'iso-8601':
return parse_datetime(value)
return super().strptime(value, format)
DateTimeField.strptime = DateTimeFieldFixed.strptime
formats.ISO_INPUT_FORMATS['DATETIME_INPUT_FORMATS'].insert(0, 'iso-8601')
If pandas is used anyway, I can recommend Timestamp from pandas. There you can
ts_1 = pd.Timestamp('2020-02-18T04:27:58.000Z')
ts_2 = pd.Timestamp('2020-02-18T04:27:58.000')
Rant: It is just unbelievable that we still need to worry about things like date string parsing in 2021.
datetime.fromisoformat('2021-01-01T00:00:00+01:00').tzinfo.utc and pandas.Timestamp('2021-01-01T00:00:00+01:00').tzinfo.utc : Not the same at all.datetime.fromisoformat() is improved in Python 3.11 to parse most ISO 8601 formatsdatetime.fromisoformat() can now be used to parse most ISO 8601 formats, barring only those that support fractional hours and minutes. Previously, this method only supported formats that could be emitted by datetime.isoformat().
>>> from datetime import datetime
>>> datetime.fromisoformat('2011-11-04T00:05:23Z')
datetime.datetime(2011, 11, 4, 0, 5, 23, tzinfo=datetime.timezone.utc)
>>> datetime.fromisoformat('20111104T000523')
datetime.datetime(2011, 11, 4, 0, 5, 23)
>>> datetime.fromisoformat('2011-W01-2T00:05:23.283')
datetime.datetime(2011, 1, 4, 0, 5, 23, 283000)
Because ISO 8601 allows many variations of optional colons and dashes being present, basically CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]. If you want to use strptime, you need to strip out those variations first.
The goal is to generate a utc datetime object.
2016-06-29T19:36:29.3453Z:
datetime.datetime.strptime(timestamp.translate(None, ':-'), "%Y%m%dT%H%M%S.%fZ")
2016-06-29T19:36:29.3453-0400 or 2008-09-03T20:56:35.450686+05:00 use the following. These will convert all variations into something without variable delimiters like 20080903T205635.450686+0500 making it more consistent/easier to parse.
import re
# this regex removes all colons and all
# dashes EXCEPT for the dash indicating + or - utc offset for the timezone
conformed_timestamp = re.sub(r"[:]|([-](?!((\d{2}[:]\d{2})|(\d{4}))$))", '', timestamp)
datetime.datetime.strptime(conformed_timestamp, "%Y%m%dT%H%M%S.%f%z" )
%z strptime directive (you see something like ValueError: 'z' is a bad directive in format '%Y%m%dT%H%M%S.%f%z') then you need to manually offset the time from Z (UTC). Note %z may not work on your system in python versions < 3 as it depended on the c library support which varies across system/python build type (i.e. Jython, Cython, etc.).
import re
import datetime
# this regex removes all colons and all
# dashes EXCEPT for the dash indicating + or - utc offset for the timezone
conformed_timestamp = re.sub(r"[:]|([-](?!((\d{2}[:]\d{2})|(\d{4}))$))", '', timestamp)
# split on the offset to remove it. use a capture group to keep the delimiter
split_timestamp = re.split(r"[+|-]",conformed_timestamp)
main_timestamp = split_timestamp[0]
if len(split_timestamp) == 3:
sign = split_timestamp[1]
offset = split_timestamp[2]
else:
sign = None
offset = None
# generate the datetime object without the offset at UTC time
output_datetime = datetime.datetime.strptime(main_timestamp +"Z", "%Y%m%dT%H%M%S.%fZ" )
if offset:
# create timedelta based on offset
offset_delta = datetime.timedelta(hours=int(sign+offset[:-2]), minutes=int(sign+offset[-2:]))
# offset datetime with timedelta
output_datetime = output_datetime + offset_delta
timestamp is '2016-06-29T19:36:29.123Z' or '2016-06-29T19:36:29+00:00', both of which are valid RFC 3339 and ISO 8601 datetimes.Nowadays there's Maya: Datetimes for Humans™, from the author of the popular Requests: HTTP for Humans™ package:
>>> import maya
>>> str = '2008-09-03T20:56:35.450686Z'
>>> maya.MayaDT.from_rfc3339(str).datetime()
datetime.datetime(2008, 9, 3, 20, 56, 35, 450686, tzinfo=<UTC>)
The python-dateutil will throw an exception if parsing invalid date strings, so you may want to catch the exception.
from dateutil import parser
ds = '2012-60-31'
try:
dt = parser.parse(ds)
except ValueError, e:
print '"%s" is an invalid date' % ds
For something that works with the 2.X standard library try:
calendar.timegm(time.strptime(date.split(".")[0]+"UTC", "%Y-%m-%dT%H:%M:%S%Z"))
calendar.timegm is the missing gm version of time.mktime.
. character), like 2022-10-09T15:49:22-07:00. Such a value is a valid RFC 3339 and ISO 8601 date time string, so a parser shouldn't choke on it.Thanks to great Mark Amery's answer I devised function to account for all possible ISO formats of datetime:
class FixedOffset(tzinfo):
"""Fixed offset in minutes: `time = utc_time + utc_offset`."""
def __init__(self, offset):
self.__offset = timedelta(minutes=offset)
hours, minutes = divmod(offset, 60)
#NOTE: the last part is to remind about deprecated POSIX GMT+h timezones
# that have the opposite sign in the name;
# the corresponding numeric value is not used e.g., no minutes
self.__name = '<%+03d%02d>%+d' % (hours, minutes, -hours)
def utcoffset(self, dt=None):
return self.__offset
def tzname(self, dt=None):
return self.__name
def dst(self, dt=None):
return timedelta(0)
def __repr__(self):
return 'FixedOffset(%d)' % (self.utcoffset().total_seconds() / 60)
def __getinitargs__(self):
return (self.__offset.total_seconds()/60,)
def parse_isoformat_datetime(isodatetime):
try:
return datetime.strptime(isodatetime, '%Y-%m-%dT%H:%M:%S.%f')
except ValueError:
pass
try:
return datetime.strptime(isodatetime, '%Y-%m-%dT%H:%M:%S')
except ValueError:
pass
pat = r'(.*?[+-]\d{2}):(\d{2})'
temp = re.sub(pat, r'\1\2', isodatetime)
naive_date_str = temp[:-5]
offset_str = temp[-5:]
naive_dt = datetime.strptime(naive_date_str, '%Y-%m-%dT%H:%M:%S.%f')
offset = int(offset_str[-4:-2])*60 + int(offset_str[-2:])
if offset_str[0] == "-":
offset = -offset
return naive_dt.replace(tzinfo=FixedOffset(offset))
Since I never found a valid python implementation that is able to parse all kind of unusual date time format like
and all kind of time zone format like:
I finally wrote this method (that could certainly be improved):
def parse_datetime(str_time: str) -> datetime:
"""
parse date, ignoring nanoseconds
"""
# length can be variable (truncated when last digits are 0)
try:
z_index = str_time.index('Z')
except ValueError:
z_index = str_time.index('+')
# standardize nanoseconds / milliseconds / seconds format
if z_index >= 26:
# remove nanoseconds
str_time = str_time[0:26] + str_time[z_index:]
z_index = 26
elif z_index == 19:
# add milliseconds
str_time = str_time[0:19] + ".000" + str_time[z_index:]
z_index = 23
# convert timezone format
if str_time[z_index] == '+':
# convert +02:00 to +0200, to match strptime '%f'
str_time = str_time[0:z_index] + str_time[z_index:].replace(':', '')
elif str_time[-1] == 'Z':
# add explicit UTC timezone, to make strptime happy
str_time = str_time[0:z_index] + '+0000'
return datetime.strptime(str_time, '%Y-%m-%dT%H:%M:%S.%f%z')
Initially I tried with:
from operator import neg, pos
from time import strptime, mktime
from datetime import datetime, tzinfo, timedelta
class MyUTCOffsetTimezone(tzinfo):
@staticmethod
def with_offset(offset_no_signal, signal): # type: (str, str) -> MyUTCOffsetTimezone
return MyUTCOffsetTimezone((pos if signal == '+' else neg)(
(datetime.strptime(offset_no_signal, '%H:%M') - datetime(1900, 1, 1))
.total_seconds()))
def __init__(self, offset, name=None):
self.offset = timedelta(seconds=offset)
self.name = name or self.__class__.__name__
def utcoffset(self, dt):
return self.offset
def tzname(self, dt):
return self.name
def dst(self, dt):
return timedelta(0)
def to_datetime_tz(dt): # type: (str) -> datetime
fmt = '%Y-%m-%dT%H:%M:%S.%f'
if dt[-6] in frozenset(('+', '-')):
dt, sign, offset = strptime(dt[:-6], fmt), dt[-6], dt[-5:]
return datetime.fromtimestamp(mktime(dt),
tz=MyUTCOffsetTimezone.with_offset(offset, sign))
elif dt[-1] == 'Z':
return datetime.strptime(dt, fmt + 'Z')
return datetime.strptime(dt, fmt)
But that didn't work on negative timezones. This however I got working fine, in Python 3.7.3:
from datetime import datetime
def to_datetime_tz(dt): # type: (str) -> datetime
fmt = '%Y-%m-%dT%H:%M:%S.%f'
if dt[-6] in frozenset(('+', '-')):
return datetime.strptime(dt, fmt + '%z')
elif dt[-1] == 'Z':
return datetime.strptime(dt, fmt + 'Z')
return datetime.strptime(dt, fmt)
Some tests, note that the out only differs by precision of microseconds. Got to 6 digits of precision on my machine, but YMMV:
for dt_in, dt_out in (
('2019-03-11T08:00:00.000Z', '2019-03-11T08:00:00'),
('2019-03-11T08:00:00.000+11:00', '2019-03-11T08:00:00+11:00'),
('2019-03-11T08:00:00.000-11:00', '2019-03-11T08:00:00-11:00')
):
isoformat = to_datetime_tz(dt_in).isoformat()
assert isoformat == dt_out, '{} != {}'.format(isoformat, dt_out)
frozenset(('+', '-'))? Shouldn't a normal tuple like ('+', '-') be able to accomplish the same thing?to_datetime_tz function: 1. datetime strings without a decimal point in the seconds (like 2019-03-11T08:00:00+11:00) trigger exceptions despite being valid ISO 8601 and RFC 3339 datetimes, and 2. timezone offset Z is treated differently from +00:00 even though they are supposed to mean the same thing.frozenset lookup is gonna be faster with only two items, especially when you're actually having to construct and iterate over an equivalent 2-item tuple anyway as part of the construction of the frozenset. And even if it were faster, the cost of doing a lookup in a 2-item collection is never gonna matter.