36

I have a dataFrame like this, I would like to group every 60 minutes and start grouping at 06:30.

                           data
index
2017-02-14 06:29:57    11198648
2017-02-14 06:30:01    11198650
2017-02-14 06:37:22    11198706
2017-02-14 23:11:13    11207728
2017-02-14 23:21:43    11207774
2017-02-14 23:22:36    11207776

I am using:

df.groupby(pd.TimeGrouper(freq='60Min'))

I get this grouping:

                      data
index       
2017-02-14 06:00:00     x1
2017-02-14 07:00:00     x2
2017-02-14 08:00:00     x3
2017-02-14 09:00:00     x4
2017-02-14 10:00:00     x5

but I am looking for this result:

                      data
index       
2017-02-14 06:30:00     x1
2017-02-14 07:30:00     x2
2017-02-14 08:30:00     x3
2017-02-14 09:30:00     x4
2017-02-14 10:30:00     x5

How can I tell the function to start grouping at 6:30 at one-hour intervals?

If it can not be done by the .groupby(pd.TimeGrouper(freq='60Min')), how is the best way to do it?

A salute and thanks very much in advance

3 Answers 3

35

Use base=30 in conjunction with label='right' parameters in pd.Grouper.

Specifying label='right' makes the time-period to start grouping from 6:30 (higher side) and not 5:30. Also, base is set to 0 by default, hence the need to offset those by 30 to account for the forward propagation of dates.

Suppose, you want to aggregate the first element of every sub-group, then:

df.groupby(pd.Grouper(freq='60Min', base=30, label='right')).first()
# same thing using resample - df.resample('60Min', base=30, label='right').first()

yields:

                           data
index                          
2017-02-14 06:30:00  11198648.0
2017-02-14 07:30:00  11198650.0
2017-02-14 08:30:00         NaN
2017-02-14 09:30:00         NaN
2017-02-14 10:30:00         NaN
2017-02-14 11:30:00         NaN
2017-02-14 12:30:00         NaN
2017-02-14 13:30:00         NaN
2017-02-14 14:30:00         NaN
2017-02-14 15:30:00         NaN
2017-02-14 16:30:00         NaN
2017-02-14 17:30:00         NaN
2017-02-14 18:30:00         NaN
2017-02-14 19:30:00         NaN
2017-02-14 20:30:00         NaN
2017-02-14 21:30:00         NaN
2017-02-14 22:30:00         NaN
2017-02-14 23:30:00  11207728.0
Sign up to request clarification or add additional context in comments.

5 Comments

Why is there no documentation for this function in Pandas? Is there any way to see the code of the pd.TimeGrouper function?? I remember that in R you could see the code writting the function name without parentheses, is there something like that in python??
Complete code of pd.TimeGrouper. It inherits some of the **kwargs from Complete code of resample too (for eg: base).
If you're on Jupyter, help(pd.TimeGrouper) would also give you a short description regarding it's usage, data descriptors, parameters allowed, methods defined etc.
There's a deprecation warning now. I guess use pandas.pydata.org/pandas-docs/stable/generated/… now if you want to expect simple upgrades
Now base is deprecated, you should use offset instead
8

Using DataFrame.resample which is a dedicated method for resampling time series, this way we dont need DataFrame.GroupBy and pd.Grouper:

df.resample('60min', base=30, label='right').first()

Output

                           data
index                          
2017-02-14 06:30:00  11198648.0
2017-02-14 07:30:00  11198650.0
2017-02-14 08:30:00         NaN
2017-02-14 09:30:00         NaN
2017-02-14 10:30:00         NaN
2017-02-14 11:30:00         NaN
2017-02-14 12:30:00         NaN
2017-02-14 13:30:00         NaN
2017-02-14 14:30:00         NaN
2017-02-14 15:30:00         NaN
2017-02-14 16:30:00         NaN
2017-02-14 17:30:00         NaN
2017-02-14 18:30:00         NaN
2017-02-14 19:30:00         NaN
2017-02-14 20:30:00         NaN
2017-02-14 21:30:00         NaN
2017-02-14 22:30:00         NaN
2017-02-14 23:30:00  11207728.0

Notice: when you have multiple columns in your dataframe, you have to specify the column you want to aggregate on:

df.resample('60min', base=30, label='right')['data'].first()

2 Comments

FutureWarning: 'base' in .resample() and in Grouper() is deprecated. The new arguments that you should use are 'offset' or 'origin'.
This is great! and sort of the thing I am looking for. However, I want it to group by hour starting at the value first in the dataframe. e.g. if the first timing is 9:17, I wish it to group by 9:17-10:17, would this just mean changing the base to the first minute in the dataframe?
1

To continue riffing on this question, since Pandas have upgraded their resample, grouping and rolling work, the current working solution is this:

df.resample(rule='60min', offset='30m', label='right').first()

Additionally, if you want to group in 30 minute intervals, starting at the minute of the first observation, you can use the origin argument, and label='left' to start each interval on the low side boundary:

df.resample(rule='30min', origin='start', label='left').first()

Although, this uses the hh:mm:ss of the first timestamp in the index. So if you want to cut off at the hh:mm level, then maybe preprocess your index so that the seconds are removed (at least on the first observation).

Read more in the Pandas.Resample docs, they have great working examples.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.