1

I am trying to use pandas and groupby to extract the months from a date field for further manipulation. Line 40 is where I am trying to apply the dateutil to extract year, month, day.

My code:

df = pandas.DataFrame.from_records(defects, columns=headers)
df['date'] = pandas.to_datetime(df['date'], format="%Y-%m-%d")
df['date'] = df['date'].apply(dateutil.parser.parse, yearfirst=True)
 ....
print df.groupby(['month']).groups.keys()

And I'm getting:

Traceback (most recent call last):
 File "jira-sandbox.py", line 40, in <module>
 defects_df['created'] =    defects_df['created'].apply(dateutil.parser.parse, yearfirst=True)
  File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 2294, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer (pandas/lib.c:66124)
  File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 2282, in <lambda>
    f = lambda x: func(x, *args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 697, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 301, in parse
    res = self._parse(timestr, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 349, in _parse
    l = _timelex.split(timestr)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 143, in split
    return list(cls(s))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 137, in next
    token = self.get_token()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 68, in get_token
    nextchar = self.instream.read(1)
AttributeError: 'Timestamp' object has no attribute 'read'

1 Answer 1

1

I do not think you need the dateutil operation. The column is already a datetime after the pandas.to_datetime() call. Here is one way to construct a column that can be used by groupby().

Code:

# build a test dataframe
import datetime as dt
df = pd.DataFrame([dt.datetime.now() + dt.timedelta(days=x*15)
                   for x in range(10)],
                  columns=['date'])
print(df)

# add a year/moth column to allow grouping
df['month'] = df.date.apply(lambda x: x.year * 100 + x.month)

# show a groupby
print(df.groupby(['month']).groups.keys())

Results:

                     date
0 2017-03-17 14:30:24.344
1 2017-04-01 14:30:24.344
2 2017-04-16 14:30:24.344
3 2017-05-01 14:30:24.344
4 2017-05-16 14:30:24.344
5 2017-05-31 14:30:24.344
6 2017-06-15 14:30:24.344
7 2017-06-30 14:30:24.344
8 2017-07-15 14:30:24.344
9 2017-07-30 14:30:24.344

[201704, 201705, 201706, 201707, 201703]
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.