Access pandas dataframe column with two header pandas

Question

I created a dataframe using groupby and pd.cut to calculate the mean, std and number of elements inside a bin. I used the agg()and this is the command I used:

df_bin=df.groupby(pd.cut(df.In_X, ranges,include_lowest=True)).agg(['mean', 'std','size'])

df_bin looks like this:

                 X                  Y
                 mean   std size   mean         std  size
In_X                    
(10.424, 10.43] 10.425  NaN  1      0.003786    NaN   1
(10.43, 10.435] 10.4    NaN  0      NaN         NaN   0

I want to create an array with the values of the mean for the first header X. If I didn't have the two header level, I would use something like:

mean=np.array(df_bin['mean'])

But how to do that with the two headers?

r.ook · Accepted Answer · 2020-05-23 00:17:46Z

2

This documentation would serve you well: https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html

To answer your question, if you just want a particular column:

mean = np.array(df_bin['X', 'mean'])

But if you wanted to slice to the second level:

mean = np.array(df_bin.loc[:, (slice(None), 'mean')])

Or:

mean = np.array(df_bin.loc[:, pd.IndexSlice[:, 'mean']])

edited May 23, 2020 at 0:17

answered May 23, 2020 at 0:01

r.ook

13.9k2 gold badges26 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ziulfer Over a year ago

Thanks for the documentation. Your solution takes the mean values from both X and Y main headers

r.ook Over a year ago

Oh I misunderstood. If you just want the 'X' then mean = np.array(df_bin['X', 'mean']) would work already.

ziulfer Over a year ago

Worked great. Perhaps, I should open a new question. But how do know how to use dropna() to drop only the rows where the mean of X is NaN?

r.ook Over a year ago

You could always apply dropna() directly df_bin['X', 'mean'] before you pass into array. mean = df_bin['X', 'mean'].dropna().values

BENY · Accepted Answer · 2020-05-22 23:58:58Z

1

We can do

df_bin.stack(level=0)['mean'].values

answered May 22, 2020 at 23:58

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Access pandas dataframe column with two header pandas

2 Answers 2

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Linked

Related