Save a pandas dataframe inside another dataframe

Question

I have a tricky case. Can't wrap my head around it.

I have a pandas dataframe like below:

In [3]: df = pd.DataFrame({'stat_101':[31937667515, 47594388534, 43568256234], 'group_id_101':[1,1,1], 'level_101':[1,2,2], 'stat_102':['00005@60-78','00005@60-78','00005@60-78'], 'avg_104':[27305.34552, 44783.49401, 22990.77442]})

In [4]: df
Out[4]: 
      stat_101  group_id_101  level_101     stat_102      avg_104
0  31937667515             1          1  00005@60-78  27305.34552
1  47594388534             1          2  00005@60-78  44783.49401
2  43568256234             1          2  00005@60-78  22990.77442

I want to group this on 'group_id_101','stat_102' columns and create another dataframe which will be storing the result of the grouped dataframe inside it.

Expected output:

In [27]: res = pd.DataFrame({'new_stat_101':[1], 'stat_102':['00005@60-78'], 'new_avg':['Dataframe_obj']})

In [28]: res
Out[28]: 
   new_stat_101     stat_102        new_avg
0             1  00005@60-78  Dataframe_obj

Where the Dataframe_obj will be another dataframe with rows like below:

      stat_101  level_101      avg_104
0  31937667515          1  27305.34552
1  47594388534          2  44783.49401
2  43568256234          2  22990.77442

What is the best way to do this? Should I be saving a dataframe inside another dataframe or there's a more cleaner way of doing it?

Hope my question is clear.

how about setting the 2 columns as index : m = df.set_index(['group_id_101','stat_102']) , then you can access the group by accessing the index: m.loc[(1,"00005@60-78")] ? Would that meet your requirement? — anky
– anky, Commented Sep 13, 2020 at 14:54
You can indeed nest dataframes by just passing a dataframe as an element. But you should rethink your design as it's unneccessarily complex. — runDOSrun
– runDOSrun, Commented Sep 13, 2020 at 15:02
Saving a DataFrames into the entries of another doesn't seem like a good idea to me (I've worked with dictionaries inside DataFrames and it was not fun). You won't be able to save it as a CSV file, for example. What about lists of DataFrames? — user2317421
– user2317421, Commented Sep 13, 2020 at 15:02

cs95 · Accepted Answer · 2020-09-13 20:28:18Z

1

Let's try

g = ['group_id_101', 'stat_102']
idx, dfs = zip(*df.groupby(g))
pd.DataFrame({'new_avg': dfs}, index=pd.MultiIndex.from_tuples(idx, names=g))

                                                                    new_avg
group_id_101 stat_102                                                      
1            00005@60-78        stat_101  group_id_101  level_101     st...

"new_avg" is a column of DataFrames accessible by index.

Obligatory disclaimer: This is blatant abuse of DataFrames, you should typically not store objects that cannot take advantage of pandas vectorization.

answered Sep 13, 2020 at 20:28

cs95

406k106 gold badges744 silver badges795 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mayank Porwal Over a year ago

Thanks for the answer. How to access the new_avg column from the resultant dataframe?

cs95 Over a year ago

@MayankPorwal with loc or at, as you would any normal pandas df cell.

Collectives™ on Stack Overflow

Save a pandas dataframe inside another dataframe

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related