How to sort multi-index pandas data frame using one top level column?

Question

I have a multi-index dataset like this:

                                           mean             std
                                Happiness Score Happiness Score
Region                                                         
Australia and New Zealand              7.302500        0.020936
Central and Eastern Europe             5.371184        0.578274
Eastern Asia                           5.632333        0.502100
Latin America and Caribbean            6.069074        0.728157
Middle East and Northern Africa        5.387879        1.031656
North America                          7.227167        0.179331
Southeastern Asia                      5.364077        0.882637
Southern Asia                          4.590857        0.535978
Sub-Saharan Africa                     4.150957        0.584945
Western Europe                         6.693000        0.777886

I would like to sort it by standard deviation.

My attempt:

import numpy as np
import pandas as pd

df1.sort_values(by=('Region','std'))

How to fix the problem?

Sorry for late reply, df.columns gives MultiIndex(levels=[['mean', 'std'], ['Happiness Score']], labels=[[0, 1], [0, 0]]) — BhishanPoudel
– BhishanPoudel, Commented Dec 24, 2018 at 17:42
OK, think you're good to go. Just try any of the two solutions given below in my answer. — cs95
– cs95, Commented Dec 24, 2018 at 17:43

cs95 · Accepted Answer · 2018-12-24 17:31:33Z

Setup

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 2)))
df.columns = pd.MultiIndex.from_arrays([['mean', 'std'], ['Happiness Score'] * 2])

df
             mean             std
  Happiness Score Happiness Score
0               5               0
1               3               3
2               7               9
3               3               5
4               2               4

You can use argsort and reindex df:

df.loc[:, ('std', 'Happiness Score')].argsort().values
# array([0, 1, 4, 3, 2])

df.iloc[df.loc[:, ('std', 'Happiness Score')].argsort().values]
# df.iloc[np.argsort(df.loc[:, ('std', 'Happiness Score')])]

             mean             std
  Happiness Score Happiness Score
0               5               0
1               3               3
4               2               4
3               3               5
2               7               9

Another solution is sort_values, passing a tuple:

df.sort_values(by=('std', 'Happiness Score'), axis=0)

             mean             std
  Happiness Score Happiness Score
0               5               0
1               3               3
4               2               4
3               3               5
2               7               9

I think you had the idea right, but the ordering of the tuples incorrect.

solution1: AttributeError: 'DataFrame' object has no attribute 'argsort'
@astro123 There OK, so it looks like your actual data is different from the sample posted here. Please figure out what changes to make to get it to work. If the columns are different, you will certainly need to change the column names before accessing. Also, regarding the argsort error, that is bizarre and all I can do is ask you to check and see if you ran my code correctly. Thanks.
@astro123 Make sure you passed a tuple for ('std', 'happiness score'), (and not a list). That is important.
Yeah, you are right sir!, there was little mismatch between data. Now it works.

Collectives™ on Stack Overflow

How to sort multi-index pandas data frame using one top level column?

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related