How to group a pandas dataframe by a defined time interval?

Question

I have a dataFrame like this, I would like to group every 60 minutes and start grouping at 06:30.

                           data
index
2017-02-14 06:29:57    11198648
2017-02-14 06:30:01    11198650
2017-02-14 06:37:22    11198706
2017-02-14 23:11:13    11207728
2017-02-14 23:21:43    11207774
2017-02-14 23:22:36    11207776

I am using:

df.groupby(pd.TimeGrouper(freq='60Min'))

I get this grouping:

                      data
index       
2017-02-14 06:00:00     x1
2017-02-14 07:00:00     x2
2017-02-14 08:00:00     x3
2017-02-14 09:00:00     x4
2017-02-14 10:00:00     x5

but I am looking for this result:

                      data
index       
2017-02-14 06:30:00     x1
2017-02-14 07:30:00     x2
2017-02-14 08:30:00     x3
2017-02-14 09:30:00     x4
2017-02-14 10:30:00     x5

How can I tell the function to start grouping at 6:30 at one-hour intervals?

If it can not be done by the .groupby(pd.TimeGrouper(freq='60Min')), how is the best way to do it?

A salute and thanks very much in advance

John Zwinck · Accepted Answer · 2019-06-11 05:06:15Z

35

Use base=30 in conjunction with label='right' parameters in pd.Grouper.

Specifying label='right' makes the time-period to start grouping from 6:30 (higher side) and not 5:30. Also, base is set to 0 by default, hence the need to offset those by 30 to account for the forward propagation of dates.

Suppose, you want to aggregate the first element of every sub-group, then:

df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).first()
# same thing using resample - df.resample('60Min', base=30, label='right').first()

yields:

                           data
index                          
2017-02-14 06:30:00  11198648.0
2017-02-14 07:30:00  11198650.0
2017-02-14 08:30:00         NaN
2017-02-14 09:30:00         NaN
2017-02-14 10:30:00         NaN
2017-02-14 11:30:00         NaN
2017-02-14 12:30:00         NaN
2017-02-14 13:30:00         NaN
2017-02-14 14:30:00         NaN
2017-02-14 15:30:00         NaN
2017-02-14 16:30:00         NaN
2017-02-14 17:30:00         NaN
2017-02-14 18:30:00         NaN
2017-02-14 19:30:00         NaN
2017-02-14 20:30:00         NaN
2017-02-14 21:30:00         NaN
2017-02-14 22:30:00         NaN
2017-02-14 23:30:00  11207728.0

edited Jun 11, 2019 at 5:06

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

answered Feb 15, 2017 at 17:20

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

EduardoRL Over a year ago

Why is there no documentation for this function in Pandas? Is there any way to see the code of the pd.TimeGrouper function?? I remember that in R you could see the code writting the function name without parentheses, is there something like that in python??

Nickil Maveli Over a year ago

Complete code of pd.TimeGrouper. It inherits some of the **kwargs from Complete code of resample too (for eg: base).

Nickil Maveli Over a year ago

If you're on Jupyter, help(pd.TimeGrouper) would also give you a short description regarding it's usage, data descriptors, parameters allowed, methods defined etc.

matanox Over a year ago

There's a deprecation warning now. I guess use pandas.pydata.org/pandas-docs/stable/generated/… now if you want to expect simple upgrades

Stefano Giannini Over a year ago

Now base is deprecated, you should use offset instead

Erfan · Accepted Answer · 2020-01-27 16:30:02Z

Using DataFrame.resample which is a dedicated method for resampling time series, this way we dont need DataFrame.GroupBy and pd.Grouper:

df.resample('60min', base=30, label='right').first()

Output

                           data
index                          
2017-02-14 06:30:00  11198648.0
2017-02-14 07:30:00  11198650.0
2017-02-14 08:30:00         NaN
2017-02-14 09:30:00         NaN
2017-02-14 10:30:00         NaN
2017-02-14 11:30:00         NaN
2017-02-14 12:30:00         NaN
2017-02-14 13:30:00         NaN
2017-02-14 14:30:00         NaN
2017-02-14 15:30:00         NaN
2017-02-14 16:30:00         NaN
2017-02-14 17:30:00         NaN
2017-02-14 18:30:00         NaN
2017-02-14 19:30:00         NaN
2017-02-14 20:30:00         NaN
2017-02-14 21:30:00         NaN
2017-02-14 22:30:00         NaN
2017-02-14 23:30:00  11207728.0

Notice: when you have multiple columns in your dataframe, you have to specify the column you want to aggregate on:

df.resample('60min', base=30, label='right')['data'].first()

FutureWarning: 'base' in .resample() and in Grouper() is deprecated. The new arguments that you should use are 'offset' or 'origin'.
This is great! and sort of the thing I am looking for. However, I want it to group by hour starting at the value first in the dataframe. e.g. if the first timing is 9:17, I wish it to group by 9:17-10:17, would this just mean changing the base to the first minute in the dataframe?

greenLeopard · Accepted Answer · 2024-09-25 08:11:26Z

To continue riffing on this question, since Pandas have upgraded their resample, grouping and rolling work, the current working solution is this:

df.resample(rule='60min', offset='30m', label='right').first()

Additionally, if you want to group in 30 minute intervals, starting at the minute of the first observation, you can use the origin argument, and label='left' to start each interval on the low side boundary:

df.resample(rule='30min', origin='start', label='left').first()

Although, this uses the hh:mm:ss of the first timestamp in the index. So if you want to cut off at the hh:mm level, then maybe preprocess your index so that the seconds are removed (at least on the first observation).

Read more in the Pandas.Resample docs, they have great working examples.

Collectives™ on Stack Overflow

How to group a pandas dataframe by a defined time interval?

3 Answers 3

5 Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

Comments

Linked

Related