Split pandas dataframe based on values in a column using groupby

Question

I want to split the following dataframe based on column ZZ

df = 
        N0_YLDF  ZZ        MAT
    0  6.286333   2  11.669069
    1  6.317000   6  11.669069
    2  6.324889   6  11.516454
    3  6.320667   5  11.516454
    4  6.325556   5  11.516454
    5  6.359000   6  11.516454
    6  6.359000   6  11.516454
    7  6.361111   7  11.516454
    8  6.360778   7  11.516454
    9  6.361111   6  11.516454

As output, I want a new DataFrame with the N0_YLDF column split into 4, one new column for each unique value of ZZ. How do I go about this? I can do groupby, but do not know what to do with the grouped object.

qwwqwwq · Accepted Answer · 2014-05-16 01:15:12Z

183

gb = df.groupby('ZZ')    
[gb.get_group(x) for x in gb.groups]

answered May 16, 2014 at 1:15

qwwqwwq

7,3592 gold badges32 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

maximus Over a year ago

Great answer! How do we extract the respective dataframes from gb?

qwwqwwq Over a year ago

The method get_group(x) returns a new DataFrame object containing only the rows where column ZZ == x

Anton vBR · Accepted Answer · 2019-06-16 19:12:07Z

43

There is another alternative as the groupby returns a generator we can simply use a list-comprehension to retrieve the 2nd value (the frame).

dfs = [x for _, x in df.groupby('ZZ')]

edited Jun 16, 2019 at 19:12

answered Jun 14, 2018 at 22:24

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

2 Comments

DataPlug Over a year ago

would this one liner work if I'm looking to make specific aggregations to every data frame?

Anton vBR Over a year ago

This one-liner simply stores the dataframes in an array. What you do next is up to you. Maybe have a look at ALollz answer to access keys.

Jeff Mandell · Accepted Answer · 2017-03-13 02:55:17Z

12

In R there is a dataframe method called split. This is for all the R users out there:

def split(df, group):
     gb = df.groupby(group)
     return [gb.get_group(x) for x in gb.groups]

answered Mar 13, 2017 at 2:55

Jeff Mandell

8637 silver badges16 bronze badges

5 Comments

Adam Over a year ago

shouldn't you put it all into a series? ending with pd.Series(...)

rsmith54 Over a year ago

This is amazing. Is there an easy way to get the key which identifies of the group, so I can return a list of tuples, like [ (key, gb.get_group(x) ) for x in gb.group]?

rsmith54 Over a year ago

I found this, which makes this easy: stackoverflow.com/questions/42513049/…

de1 Over a year ago

Just to provide an answer to the comment (which is explained in more detail in the link: [(key, gb.get_group(key)) for key in gb.groups]

Jonatas Eduardo Over a year ago

The same solution but with iterators def split(df, group): gb = df.groupby(group) for g in gb.groups: yield gb.get_group(g)

ALollz · Accepted Answer · 2019-06-27 17:04:06Z

Store them in a dict, which allows you access to the group DataFrames based on the group keys.

d = dict(tuple(df.groupby('ZZ')))
d[6]

#    N0_YLDF  ZZ        MAT
#1  6.317000   6  11.669069
#2  6.324889   6  11.516454
#5  6.359000   6  11.516454
#6  6.359000   6  11.516454
#9  6.361111   6  11.516454

If you need only a subset of the DataFrame, in this case just the 'NO_YLDF' Series, you can modify the dict comprehension.

d = dict((idx, gp['N0_YLDF']) for idx, gp in df.groupby('ZZ'))
d[6]
#1    6.317000
#2    6.324889
#5    6.359000
#6    6.359000
#9    6.361111
#Name: N0_YLDF, dtype: float64

Mykola Zotko · Accepted Answer · 2023-10-10 07:33:18Z

0

You can iterate over unique values and get groups using loc or query:

[df.loc[df['ZZ'] == i] for i in df['ZZ'].unique()]

or

[df.query('ZZ == @i') for i in df['ZZ'].unique()]

answered Oct 10, 2023 at 7:33

Mykola Zotko

18.1k6 gold badges87 silver badges90 bronze badges

Comments

mpal09 · Accepted Answer · 2024-01-29 08:21:27Z

0

Adding to user qwwqwwq answer:

gb = df.groupby('ZZ')
df_six = gb.get_group("6") #to create another dataframe with ZZ = 6
df_one = gb.get_group("7") #to create another dataframe with ZZ = 7

answered Jan 29, 2024 at 8:21

mpal09

12 bronze badges

Collectives™ on Stack Overflow

Split pandas dataframe based on values in a column using groupby

6 Answers 6

2 Comments

2 Comments

5 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

2 Comments

5 Comments

Comments

Comments

Comments

Linked

Related