I have a tricky case. Can't wrap my head around it.
I have a pandas dataframe like below:
In [3]: df = pd.DataFrame({'stat_101':[31937667515, 47594388534, 43568256234], 'group_id_101':[1,1,1], 'level_101':[1,2,2], 'stat_102':['00005@60-78','00005@60-78','00005@60-78'], 'avg_104':[27305.34552, 44783.49401, 22990.77442]})
In [4]: df
Out[4]:
stat_101 group_id_101 level_101 stat_102 avg_104
0 31937667515 1 1 00005@60-78 27305.34552
1 47594388534 1 2 00005@60-78 44783.49401
2 43568256234 1 2 00005@60-78 22990.77442
I want to group this on 'group_id_101','stat_102' columns and create another dataframe which will be storing the result of the grouped dataframe inside it.
Expected output:
In [27]: res = pd.DataFrame({'new_stat_101':[1], 'stat_102':['00005@60-78'], 'new_avg':['Dataframe_obj']})
In [28]: res
Out[28]:
new_stat_101 stat_102 new_avg
0 1 00005@60-78 Dataframe_obj
Where the Dataframe_obj will be another dataframe with rows like below:
stat_101 level_101 avg_104
0 31937667515 1 27305.34552
1 47594388534 2 44783.49401
2 43568256234 2 22990.77442
What is the best way to do this? Should I be saving a dataframe inside another dataframe or there's a more cleaner way of doing it?
Hope my question is clear.
m = df.set_index(['group_id_101','stat_102']), then you can access the group by accessing the index:m.loc[(1,"00005@60-78")]? Would that meet your requirement?