2

I have a tricky case. Can't wrap my head around it.

I have a pandas dataframe like below:

In [3]: df = pd.DataFrame({'stat_101':[31937667515, 47594388534, 43568256234], 'group_id_101':[1,1,1], 'level_101':[1,2,2], 'stat_102':['00005@60-78','00005@60-78','00005@60-78'], 'avg_104':[27305.34552, 44783.49401, 22990.77442]})

In [4]: df
Out[4]: 
      stat_101  group_id_101  level_101     stat_102      avg_104
0  31937667515             1          1  00005@60-78  27305.34552
1  47594388534             1          2  00005@60-78  44783.49401
2  43568256234             1          2  00005@60-78  22990.77442

I want to group this on 'group_id_101','stat_102' columns and create another dataframe which will be storing the result of the grouped dataframe inside it.

Expected output:

In [27]: res = pd.DataFrame({'new_stat_101':[1], 'stat_102':['00005@60-78'], 'new_avg':['Dataframe_obj']})

In [28]: res
Out[28]: 
   new_stat_101     stat_102        new_avg
0             1  00005@60-78  Dataframe_obj

Where the Dataframe_obj will be another dataframe with rows like below:

      stat_101  level_101      avg_104
0  31937667515          1  27305.34552
1  47594388534          2  44783.49401
2  43568256234          2  22990.77442

What is the best way to do this? Should I be saving a dataframe inside another dataframe or there's a more cleaner way of doing it?

Hope my question is clear.

3
  • how about setting the 2 columns as index : m = df.set_index(['group_id_101','stat_102']) , then you can access the group by accessing the index: m.loc[(1,"00005@60-78")] ? Would that meet your requirement? Commented Sep 13, 2020 at 14:54
  • You can indeed nest dataframes by just passing a dataframe as an element. But you should rethink your design as it's unneccessarily complex. Commented Sep 13, 2020 at 15:02
  • Saving a DataFrames into the entries of another doesn't seem like a good idea to me (I've worked with dictionaries inside DataFrames and it was not fun). You won't be able to save it as a CSV file, for example. What about lists of DataFrames? Commented Sep 13, 2020 at 15:02

1 Answer 1

1

Let's try

g = ['group_id_101', 'stat_102']
idx, dfs = zip(*df.groupby(g))
pd.DataFrame({'new_avg': dfs}, index=pd.MultiIndex.from_tuples(idx, names=g))

                                                                    new_avg
group_id_101 stat_102                                                      
1            00005@60-78        stat_101  group_id_101  level_101     st...

"new_avg" is a column of DataFrames accessible by index.

Obligatory disclaimer: This is blatant abuse of DataFrames, you should typically not store objects that cannot take advantage of pandas vectorization.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer. How to access the new_avg column from the resultant dataframe?
@MayankPorwal with loc or at, as you would any normal pandas df cell.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.