5

I have a pandas dataframe which looks like this:

                         value
           Id              
2014-03-13 1          -3
           2          -6
           3          -3.2
           4          -3.1
           5          -5
2014-03-14 1          -3.4
           2          -6.2
           3          -3.2
           4          -3.2
           5          -5.9

which is basically a groupby object with two levels of multi-index.

I want to sort it in ascending order according to the value column, but keeping the level 0 (dates) untouched so that the result should look like this:

                         value
           Id              
2014-03-13 2          -6
           5          -5
           3          -3.2
           4          -3.1
           1          -3
2014-03-14 2          -6.2
           5          -5.9
           1          -3.4
           3          -3.2
           4          -3.2

Here is the code to generate the initial data:

import pandas as pd

dates = [pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-13', format='%Y-%m-%d'), pd.to_datetime('2014-03-13', format='%Y-%m-%d'),
         pd.to_datetime('2014-03-13', format='%Y-%m-%d'),pd.to_datetime('2014-03-14', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d'), 
         pd.to_datetime('2014-03-14', format='%Y-%m-%d'), pd.to_datetime('2014-03-14', format='%Y-%m-%d')]

values = [-3,-6,-3.2,-3.1,-5,-3.4,-6.2,-3.2,-3.2,-5.9]
Ids = [1,2,3,4,5,1,2,3,4,5]
df = pd.DataFrame({'Id': pd.Series(Ids, index=dates),
                   'value': pd.Series(values, index=dates)})

df = df.groupby([df.index,'Id']).sum()
0

2 Answers 2

9

For me works reset_index + sort_values + set_index + rename_axis:

df = df.reset_index() \
       .sort_values(['level_0','value']) \
       .set_index(['level_0','Id']) \
       .rename_axis([None, 'Id'])
print (df)
               value
           Id       
2014-03-13 2    -6.0
           5    -5.0
           3    -3.2
           4    -3.1
           1    -3.0
2014-03-14 2    -6.2
           5    -5.9
           1    -3.4
           3    -3.2
           4    -3.2

Another solution with sort_values + swaplevel + sort_index:

df = df.sort_values('value')
       .swaplevel(0,1)
       .sort_index(level=1, sort_remaining=False)
       .swaplevel(0,1)
print (df)
               value
           Id       
2014-03-13 2    -6.0
           5    -5.0
           3    -3.2
           4    -3.1
           1    -3.0
2014-03-14 2    -6.2
           5    -5.9
           1    -3.4
           3    -3.2
           4    -3.2

Swap levels is necessary because:

print (df.sort_values('value').sort_index(level=0, sort_remaining=False))
               value
           Id       
2014-03-13 1    -3.0
           2    -6.0
           3    -3.2
           4    -3.1
           5    -5.0
2014-03-14 1    -3.4
           2    -6.2
           3    -3.2
           4    -3.2
           5    -5.9

For pandas 0.23.0 is possible sort columns and index levels together:

df.index.names = ['level1','level2']
print (df.sort_values(['level1','value']))
                   value
level1     level2       
2014-03-13 2        -6.0
           5        -5.0
           3        -3.2
           4        -3.1
           1        -3.0
2014-03-14 2        -6.2
           5        -5.9
           1        -3.4
           3        -3.2
           4        -3.2
Sign up to request clarification or add additional context in comments.

6 Comments

this solution assumes that outer level is "sortable", is there a way to sort inner level of a multiindex respecting outer level? My outer level is consists of strings and I do not want to sort outer level alphabetically.
Hard question without data. But maybe need something like df = df.reset_index().groupby('level_0', sort=False).apply(lambda x : x.sort_values('level_1')).set_index(['level_1'], append=True) should working. If not, can you add some data?
Thank you, Sir. That solves the issue, but has two unexpected consequences... it would be great if you could help me understand how come (1) index becomes a part of the table (2) one if the columns gets duplicated after running df.reset_index().groupby('level_0', sort=False).apply(lambda x : x.sort_values('level_1')) here is the before and after. The general structure of the data frame is outlined here
I obviously solved those issues by dropping the columns, I am just trying to figure out what happens behind the scene.
Yoi can try add parameter group_keys=False to groupby. Not sure if it working, bwcause now on phone only.
|
0

To my knowledge, a simultaneous sort on both index and column isn't possible, but a simple workaround would be the following:

df = df.reset_index().sort_values(by = ['level_0','values']).set_index(['level_0','Id'])

...and if you need to get rid of the 'level_0' index label:

df.index.names = [None, 'Id']

Setup:

import pandas as pd
import io

c = io.StringIO(u'''
                Id      value
2014-03-13      1       -3
2014-03-13      2       -6
2014-03-13      3       -3.2                                                                                                                      2014-03-13      4       -3.1
2014-03-13      5       -5
2014-03-14      1       -3.4
2014-03-14      2       -6.2
2014-03-14      3       -3.2
2014-03-14      4       -3.2
2014-03-14      5       -5.9
''')

df = pd.read_csv(c, delim_whitespace = True)
df = df.groupby([df.index,'Id']).max()

Initial df:

               value
           Id
2014-03-13 1    -3.0
           2    -6.0
           3    -3.2
           4    -3.1
           5    -5.0
2014-03-14 1    -3.4
           2    -6.2
           3    -3.2
           4    -3.2
           5    -5.9

Ouput:

               value
           Id
2014-03-13 2    -6.0
           5    -5.0
           3    -3.2
           4    -3.1
           1    -3.0
2014-03-14 2    -6.2
           5    -5.9
           1    -3.4
           3    -3.2
           4    -3.2

11 Comments

This does not work, I get the same as my input data!
Right, in my setup I included a date column, which I now realize you don't have. Working on it.
Since my index including the dates does not have name, it will end up as level_0. But even replacing date by level_0 does not change anything.
@JefeBelfort - I ran this and it worked perfectly. Used level_0 and got values sorted as requested.
I get an error at line 5: pandas.io.common.CParserError: Error tokenizing data.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.