1

I have a multi-index dataset like this:

                                           mean             std
                                Happiness Score Happiness Score
Region                                                         
Australia and New Zealand              7.302500        0.020936
Central and Eastern Europe             5.371184        0.578274
Eastern Asia                           5.632333        0.502100
Latin America and Caribbean            6.069074        0.728157
Middle East and Northern Africa        5.387879        1.031656
North America                          7.227167        0.179331
Southeastern Asia                      5.364077        0.882637
Southern Asia                          4.590857        0.535978
Sub-Saharan Africa                     4.150957        0.584945
Western Europe                         6.693000        0.777886

I would like to sort it by standard deviation.

My attempt:

import numpy as np
import pandas as pd

df1.sort_values(by=('Region','std'))

How to fix the problem?

3
  • What is df.columns for you? Commented Dec 24, 2018 at 17:34
  • Sorry for late reply, df.columns gives MultiIndex(levels=[['mean', 'std'], ['Happiness Score']], labels=[[0, 1], [0, 0]]) Commented Dec 24, 2018 at 17:42
  • OK, think you're good to go. Just try any of the two solutions given below in my answer. Commented Dec 24, 2018 at 17:43

1 Answer 1

1

Setup

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 2)))
df.columns = pd.MultiIndex.from_arrays([['mean', 'std'], ['Happiness Score'] * 2])

df
             mean             std
  Happiness Score Happiness Score
0               5               0
1               3               3
2               7               9
3               3               5
4               2               4

You can use argsort and reindex df:

df.loc[:, ('std', 'Happiness Score')].argsort().values
# array([0, 1, 4, 3, 2])

df.iloc[df.loc[:, ('std', 'Happiness Score')].argsort().values]
# df.iloc[np.argsort(df.loc[:, ('std', 'Happiness Score')])]

             mean             std
  Happiness Score Happiness Score
0               5               0
1               3               3
4               2               4
3               3               5
2               7               9

Another solution is sort_values, passing a tuple:

df.sort_values(by=('std', 'Happiness Score'), axis=0)

             mean             std
  Happiness Score Happiness Score
0               5               0
1               3               3
4               2               4
3               3               5
2               7               9

I think you had the idea right, but the ordering of the tuples incorrect.

Sign up to request clarification or add additional context in comments.

4 Comments

solution1: AttributeError: 'DataFrame' object has no attribute 'argsort'
@astro123 There OK, so it looks like your actual data is different from the sample posted here. Please figure out what changes to make to get it to work. If the columns are different, you will certainly need to change the column names before accessing. Also, regarding the argsort error, that is bizarre and all I can do is ask you to check and see if you ran my code correctly. Thanks.
@astro123 Make sure you passed a tuple for ('std', 'happiness score'), (and not a list). That is important.
Yeah, you are right sir!, there was little mismatch between data. Now it works.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.