Pandas Dataframe Mutli index sorting by level and column value

Question

I have a pandas dataframe which looks like this:

                         value
           Id              
2014-03-13 1          -3
           2          -6
           3          -3.2
           4          -3.1
           5          -5
2014-03-14 1          -3.4
           2          -6.2
           3          -3.2
           4          -3.2
           5          -5.9

which is basically a groupby object with two levels of multi-index.

I want to sort it in ascending order according to the value column, but keeping the level 0 (dates) untouched so that the result should look like this:

                         value
           Id              
2014-03-13 2          -6
           5          -5
           3          -3.2
           4          -3.1
           1          -3
2014-03-14 2          -6.2
           5          -5.9
           1          -3.4
           3          -3.2
           4          -3.2

Here is the code to generate the initial data:

import pandas as pd

dates = [pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-13', format='%Y-%m-%d'),
         pd.to_datetime('2014-03-13', format='%Y-%m-%d'),pd.to_datetime('2014-03-14', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d'), 
         pd.to_datetime('2014-03-14', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d')]

values = [-3,-6,-3.2,-3.1,-5,-3.4,-6.2,-3.2,-3.2,-5.9]
Ids = [1,2,3,4,5,1,2,3,4,5]
df = pd.DataFrame({'Id': pd.Series(Ids, index=dates),
                   'value': pd.Series(values, index=dates)})

df = df.groupby([df.index,'Id']).sum()

jezrael · Accepted Answer · 2018-05-17 08:11:53Z

For me works reset_index + sort_values + set_index + rename_axis:

df = df.reset_index() \
       .sort_values(['level_0','value']) \
       .set_index(['level_0','Id']) \
       .rename_axis([None, 'Id'])
print (df)
               value
           Id       
2014-03-13 2    -6.0
           5    -5.0
           3    -3.2
           4    -3.1
           1    -3.0
2014-03-14 2    -6.2
           5    -5.9
           1    -3.4
           3    -3.2
           4    -3.2

Another solution with sort_values + swaplevel + sort_index:

df = df.sort_values('value')
       .swaplevel(0,1)
       .sort_index(level=1, sort_remaining=False)
       .swaplevel(0,1)
print (df)
               value
           Id       
2014-03-13 2    -6.0
           5    -5.0
           3    -3.2
           4    -3.1
           1    -3.0
2014-03-14 2    -6.2
           5    -5.9
           1    -3.4
           3    -3.2
           4    -3.2

Swap levels is necessary because:

print (df.sort_values('value').sort_index(level=0, sort_remaining=False))
               value
           Id       
2014-03-13 1    -3.0
           2    -6.0
           3    -3.2
           4    -3.1
           5    -5.0
2014-03-14 1    -3.4
           2    -6.2
           3    -3.2
           4    -3.2
           5    -5.9

For pandas 0.23.0 is possible sort columns and index levels together:

df.index.names = ['level1','level2']
print (df.sort_values(['level1','value']))
                   value
level1     level2       
2014-03-13 2        -6.0
           5        -5.0
           3        -3.2
           4        -3.1
           1        -3.0
2014-03-14 2        -6.2
           5        -5.9
           1        -3.4
           3        -3.2
           4        -3.2

this solution assumes that outer level is "sortable", is there a way to sort inner level of a multiindex respecting outer level? My outer level is consists of strings and I do not want to sort outer level alphabetically.
Hard question without data. But maybe need something like df = df.reset_index().groupby('level_0', sort=False).apply(lambda x : x.sort_values('level_1')).set_index(['level_1'], append=True) should working. If not, can you add some data?
Thank you, Sir. That solves the issue, but has two unexpected consequences... it would be great if you could help me understand how come (1) index becomes a part of the table (2) one if the columns gets duplicated after running df.reset_index().groupby('level_0', sort=False).apply(lambda x : x.sort_values('level_1')) here is the before and after. The general structure of the data frame is outlined here
I obviously solved those issues by dropping the columns, I am just trying to figure out what happens behind the scene.
Yoi can try add parameter group_keys=False to groupby. Not sure if it working, bwcause now on phone only.

elPastor · Accepted Answer · 2017-05-17 11:17:29Z

To my knowledge, a simultaneous sort on both index and column isn't possible, but a simple workaround would be the following:

df = df.reset_index().sort_values(by = ['level_0','values']).set_index(['level_0','Id'])

...and if you need to get rid of the 'level_0' index label:

df.index.names = [None, 'Id']

Setup:

import pandas as pd
import io

c = io.StringIO(u'''
                Id      value
2014-03-13      1       -3
2014-03-13      2       -6
2014-03-13      3       -3.2                                                                                                                      2014-03-13      4       -3.1
2014-03-13      5       -5
2014-03-14      1       -3.4
2014-03-14      2       -6.2
2014-03-14      3       -3.2
2014-03-14      4       -3.2
2014-03-14      5       -5.9
''')

df = pd.read_csv(c, delim_whitespace = True)
df = df.groupby([df.index,'Id']).max()

Initial df:

               value
           Id
2014-03-13 1    -3.0
           2    -6.0
           3    -3.2
           4    -3.1
           5    -5.0
2014-03-14 1    -3.4
           2    -6.2
           3    -3.2
           4    -3.2
           5    -5.9

Ouput:

               value
           Id
2014-03-13 2    -6.0
           5    -5.0
           3    -3.2
           4    -3.1
           1    -3.0
2014-03-14 2    -6.2
           5    -5.9
           1    -3.4
           3    -3.2
           4    -3.2

Right, in my setup I included a date column, which I now realize you don't have. Working on it.
Since my index including the dates does not have name, it will end up as level_0. But even replacing date by level_0 does not change anything.
@JefeBelfort - I ran this and it worked perfectly. Used level_0 and got values sorted as requested.
I get an error at line 5: pandas.io.common.CParserError: Error tokenizing data.

Collectives™ on Stack Overflow

Pandas Dataframe Mutli index sorting by level and column value

2 Answers 2

6 Comments

11 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

11 Comments

Linked

Related