How to sort pandas DataFrame by two columns collectively, rather than one then the other?

Question

Say I have a dataframe, df, as below:

     Client    Number (#)    Volume ($)    Num. z-Score    Vol. z-Score

0    ABC       63            131.22        1.17            0.68
1    DEF       44            98.71         2.68            1.35
2    JKL       17            64.15         0.45            0.57
3    PQR       75            180.47        0.88            1.43
4    XYZ       28            75.93         0.23            3.96

I would like to sort it such that the maximum values of either of the last two columns appear as the first row. As z-Score tracks deviation from the mean with respect to SD, I am looking for the greatest deviation for either of my two measures (number, volume) and would rather not prioritise them.

For instance, the current method I am using: df.sort_values(['Num. z-Score','Vol. z-Score'], ascending=False) , is discriminatory as it sorts by 'Num. z-Score' first and would ONLY look at 'Vol. z-Score' if there were any equal values.

How can I instead sort the column in such a way that the final result looks as below:

     Client    Number (#)    Volume ($)    Num. z-Score    Vol. z-Score

4    XYZ       28            75.93         0.23            3.96
1    DEF       44            98.71         2.68            1.35
0    ABC       63            131.22        1.17            0.68
3    PQR       75            180.47        0.88            1.03
2    JKL       17            64.15         0.45            0.57

Any help would be greatly appreciated :)

Yes, row 0 has a max z-score of 1.17 (num) whereas row 3 has a max of 1.03 (vol) — vicefreak04
– vicefreak04, Commented Nov 1, 2022 at 0:22

jezrael · Accepted Answer · 2022-10-31 06:28:14Z

Use DataFrame.sort_index with maximum of both columns:

df = df.sort_index(key=df[['Vol. z-Score','Num. z-Score']].max(axis=1).get, ascending=False)
print (df)
  Client  Number (#)  Volume ($)  Num. z-Score  Vol. z-Score
4    XYZ          28       75.93          0.23          3.96
1    DEF          44       98.71          2.68          1.35
3    PQR          75      180.47          0.88          1.43
0    ABC          63      131.22          1.17          0.68
2    JKL          17       64.15          0.45          0.57

Alternative solution with Series.argsort and negative values for descending order:

df = df.iloc[(-df[['Vol. z-Score','Num. z-Score']].max(axis=1)).argsort()]
print (df)
  Client  Number (#)  Volume ($)  Num. z-Score  Vol. z-Score
4    XYZ          28       75.93          0.23          3.96
1    DEF          44       98.71          2.68          1.35
3    PQR          75      180.47          0.88          1.43
0    ABC          63      131.22          1.17          0.68
2    JKL          17       64.15          0.45          0.57

crx91 · Accepted Answer · 2022-10-31 06:01:40Z

1

Looks like you're trying to find the max value between 'Num. z-Score' and 'Vol. z-Score'.

df['max_value'] = df[['Num. z-Score','Vol. z-Score']].max(axis=1)
df.sort_values(['max_value'], ascending=False)

answered Oct 31, 2022 at 6:01

crx91

4832 silver badges8 bronze badges

Collectives™ on Stack Overflow

How to sort pandas DataFrame by two columns collectively, rather than one then the other?

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related