54

I had a dataframe and did a groupby in FIPS and summed the groups that worked fine.

kl = ks.groupby('FIPS')

kl.aggregate(np.sum)

I just want a normal Dataframe back but I have a pandas.core.groupby.DataFrameGroupBy object.

2
  • 14
    The question title indicates that the question is about how to generally convert a groupby object back to a data frame, yet the question and the accepted answer are only about one special case (sum aggregation). Both the question and the accepted answer would be a lot more helpful if they were about how to generally convert a groupby object to a data frame, without performing any numeric processing on it. Commented Nov 7, 2019 at 10:03
  • to get the groups as a dataFrame use something like this ks.groupby('FIPS').get_group("What ever the groupby values you have"). Commented May 27, 2020 at 14:22

6 Answers 6

29
 df_g.apply(lambda x: x) 

will return the original dataframe.

Sign up to request clarification or add additional context in comments.

7 Comments

But why is this needed?
this is still returns DFGroupby
@cs95 This is equivalent to pd.DataFrame(grouped.groups). The GroupBy.apply function apply func to every group and combine them together in a DataFrame.
@C.K. I understand that, thank you. However, my point was more about why we need this method to return the original DataFrame if df_g itself is the original DataFrame? If it's a question of what apply does and how to apply a function to every group, that's a discussion for another post. 2c
@cs95 Yeap, you're right. I vote for your comment the first time I saw this answer, cause I thought there must be an easier way like grouped.to_df(). However, after I checked the API of the GroupBy object, I found there wasn't such a function, so I came back to tell everyone this is the easiest way to do that. lol.
|
25

The result of kl.aggregate(np.sum) is a normal DataFrame, you just have to assign it to a variable to further use it. With some random data:

>>> df = DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
>>>                         'foo', 'bar', 'foo', 'foo'],
...                  'B' : ['one', 'one', 'two', 'three',
...                         'two', 'two', 'one', 'three'],
...                  'C' : randn(8), 'D' : randn(8)})
>>> grouped = df.groupby('A')
>>> grouped
<pandas.core.groupby.DataFrameGroupBy object at 0x04E2F630>
>>> test = grouped.aggregate(np.sum)
>>> test
            C         D
A                      
bar -1.852376  2.204224
foo -3.398196 -0.045082

7 Comments

Actually, many of DataFrameGroupBy object methods such as (apply, transform, aggregate, head, first, last) return a DataFrame object. I used the method filter in one of my blog posts.
It's not a completely normal DataFrame. For example, if you try to call the .info() method on a GroupBy object, you get AttributeError: Cannot access callable attribute 'info' of 'DataFrameGroupBy' objects, try using the 'apply' method.
call .reset_index() to convert the grouped indices.
+1 @hungryMind - that is the answer. Re Joris answer - it may be a "dataframe" but it's not normal - you can see it has different column grouping of A vs C and D, which causes plots etc to fail when using as a normal dataframe. It needs collapsing with .reset_index() to make it proper!
kl.count() returns a DataFrame
|
1

Using pd.concat, just like this:

   pd.concat(map(lambda x: x[1], groups))

Or also keep index aligned:

   pd.concat(map(lambda x: x[1], groups)).sort_index()

Comments

1

You can output the results of the groupby with a .head('# of rows')to a variable.

Ex: df2 = grouped.head(100)

Now you have a Pandas data frame "df2" with all your grouped data.

Comments

0

The cleanest solution is using reset_index().

df = grouped_df.reset_index()

Docs: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.reset_index.html

1 Comment

The question is how to convert DataFrameGroupBy to DataFrame. There is no reset_index in DataFrameGroupBy pandas.pydata.org/docs/reference/groupby.html you refer to misleading doc
0
df_agg = df[['Col1','Col2']].groupby(['Col1','Col2']).sum().reset_index()

type(df_agg)

Returns

pandas.core.frame.DataFrame

And df_agg has 2 columns : Col1 and Col2.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.