How to sort a pandas dataFrame by two or more columns?

Question

Suppose I have a dataframe with columns a, b and c. I want to sort the dataframe by column b in ascending order, and by column c in descending order. How do I do this?

Does this answer your question? Pandas sort by group aggregate and column — vestland
– vestland, Commented Aug 14, 2020 at 5:57

Andy Hayden · Accepted Answer · 2017-05-18 19:10:08Z

922

As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same:

df.sort_values(['a', 'b'], ascending=[True, False])

You can use the ascending argument of sort:

df.sort(['a', 'b'], ascending=[True, False])

For example:

In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])

In [12]: df1.sort(['a', 'b'], ascending=[True, False])
Out[12]:
   a  b
2  1  4
7  1  3
1  1  2
3  1  2
4  3  2
6  4  4
0  4  3
9  4  3
5  4  1
8  4  1

As commented by @renadeen

Sort isn't in place by default! So you should assign result of the sort method to a variable or add inplace=True to method call.

that is, if you want to reuse df1 as a sorted DataFrame:

df1 = df1.sort(['a', 'b'], ascending=[True, False])

or

df1.sort(['a', 'b'], ascending=[True, False], inplace=True)

edited May 18, 2017 at 19:10

answered Jun 17, 2013 at 6:43

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

renadeen Over a year ago

Sort isn't in place by default! So you should assign result of the sort method to a variable or add inplace=True to method call.

Andy Hayden Over a year ago

@renadeen very good point, I've updated by answer with that comment.

Andy Hayden Over a year ago

@Snoozer Yeah, I don't think sort's ever going to go away (mainly as it's used extensively in Wes' book), but there has been some big changes in calling sort. Thanks! .. I really need to automate going through all my 1000s of pandas answers for deprecations!

Iphy Kelvin Over a year ago

Is there a way to for the sort to be 1,3,4 instead of 1,1,1,1,3,4,4,4,4 ?

santiago arizti Over a year ago

I was using tuples, that is why it failed with me. Feels like tuples should be allowed

Kyle Heuton · Accepted Answer · 2015-11-20 23:11:35Z

93

As of pandas 0.17.0, DataFrame.sort() is deprecated, and set to be removed in a future version of pandas. The way to sort a dataframe by its values is now is DataFrame.sort_values

As such, the answer to your question would now be

df.sort_values(['b', 'c'], ascending=[True, False], inplace=True)

answered Nov 20, 2015 at 23:11

Kyle Heuton

9,8464 gold badges45 silver badges54 bronze badges

Comments

jpp · Accepted Answer · 2020-02-05 15:02:28Z

For large dataframes of numeric data, you may see a significant performance improvement via numpy.lexsort, which performs an indirect sort using a sequence of keys:

import pandas as pd
import numpy as np

np.random.seed(0)

df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])
df1 = pd.concat([df1]*100000)

def pdsort(df1):
    return df1.sort_values(['a', 'b'], ascending=[True, False])

def lex(df1):
    arr = df1.values
    return pd.DataFrame(arr[np.lexsort((-arr[:, 1], arr[:, 0]))])

assert (pdsort(df1).values == lex(df1).values).all()

%timeit pdsort(df1)  # 193 ms per loop
%timeit lex(df1)     # 143 ms per loop

One peculiarity is that the defined sorting order with numpy.lexsort is reversed: (-'b', 'a') sorts by series a first. We negate series b to reflect we want this series in descending order.

Be aware that np.lexsort only sorts with numeric values, while pd.DataFrame.sort_values works with either string or numeric values. Using np.lexsort with strings will give: TypeError: bad operand type for unary -: 'str'.

cottontail · Accepted Answer · 2024-01-29 04:11:10Z

sort_values has a stable sorting option which can be invoking by passing kind='stable'. Note that we need to reverse the columns to sort by to use the stable sorting correctly.

So the following two methods produce the same output, i.e. df1 and df2 are equivalent.

df = pd.DataFrame(np.random.randint(10, size=(100,2)), columns=['a', 'b'])

df1 = df.sort_values(['a', 'b'], ascending=[True, False])  # sort by 'a' then 'b'

df2 = (
    df
    .sort_values('b', ascending=False)                     # sort by 'b' first
    .sort_values('a', ascending=True, kind='stable')       # then by 'a'
)

assert df1.eq(df2).all().all()

This is especially useful if you need a bit more involved sorting key.

Say, given df below, you want to sort by 'date' and 'value' but treat 'date' like datetime values even though they are strings. A straightforward sort_values with two sort by columns would produce a wrong result; however, calling sort_values twice with the relevant sorting key would produce the correct output.

df = pd.DataFrame({'date': ['10/1/2024', '10/1/2024', '2/23/2024'], 'value': [0, 1, 0]})

df1 = df.sort_values(['date', 'value'], ascending=[True, False])  # <--- wrong output

df2 = (
    df
    .sort_values('value', ascending=False)
    .sort_values('date', ascending=True, kind='stable', key=pd.to_datetime) 
)  # <--- correct output

N.B. We can get the same output by assigning a new datetime column and use it as a sort-by column but IMO, the stable sort with the sorting key is much cleaner.

df3 = df.assign(dummy=pd.to_datetime(df['date'])).sort_values(['dummy', 'value'], ascending=[True, False]).drop(columns='dummy')

Muhammad Yasirroni · Accepted Answer · 2023-06-25 15:46:31Z

For those that come here for multi-column DataFrame, use tuple with elements corresponding to each level.

tuple with elements corresponding to each level:

d = {}
d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31],
                                               [20, 0.34, 0.78, 0.34]]).set_index('idx')
d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99],
                                                [20, 0.23, 0.26, 0.98]]).set_index('idx')

df = pd.concat(d, axis=1)
df.sort_values(('second_level', 'b'))

Collectives™ on Stack Overflow

How to sort a pandas dataFrame by two or more columns?

5 Answers 5

5 Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

Comments

Comments

Comments

Comments

Linked

Related