Pandas: How to custom-sort on multiple columns? [duplicate]

Question

I have a pandas dataframe with data like:

+-----------+-----------------+---------+
| JOB-NAME  |  Status         | SLA     |
+-----------+-----------------+---------+
| job_1     | YET_TO_START    | ---     |
| job_3     | COMPLETED       | MET     |
| job_4     | RUNNING         | MET     |
| job_2     | YET_TO_START    | LATE    |
| job_6     | RUNNING         | LATE    |
| job_5     | FAILED          | LATE    |
| job_7     | YET_TO_START    | ---     |
| job_8     | COMPLETED       | NOT_MET |
+-----------+-----------------+---------+

I need to sort this table based on the Status and SLA states, like for Status: FAILED will be top on the table, then YET_TO_START, then RUNNING, and finally COMPLETED. Similarly for SLA the order will be LATE, ---, NOT_MET, and MET. Like this:

+-----------+-----------------+---------+
| JOB-NAME  |  Status         | SLA     |
+-----------+-----------------+---------+
| job_5     | FAILED          | LATE    |
| job_2     | YET_TO_START    | LATE    |
| job_1     | YET_TO_START    | ---     |
| job_7     | YET_TO_START    | ---     |
| job_6     | RUNNING         | LATE    |
| job_4     | RUNNING         | MET     |
| job_8     | COMPLETED       | NOT_MET |
| job_3     | COMPLETED       | MET     |
+-----------+-----------------+---------+

I am able to do this custom sorting priority-based only on single column Status, but unable to do for multiple columns.

sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3}
joined_df = joined_df.sort_values(by=['status'], key=lambda x: x.map(sort_order_dict))

A solution is given here, but its for single column, not multiple column.

jezrael · Accepted Answer · 2022-11-29 09:24:36Z

You can extend dictionary by values from another columns, only necessary different keys in both columns for correct working like mentioned mozway in comments:

sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3,
                   "LATE":4, "---":5, "NOT_MET":6, "MET":7}
df = df.sort_values(by=['Status','SLA'], key=lambda x: x.map(sort_order_dict))
print (df)
  JOB-NAME        Status      SLA
5    job_5        FAILED     LATE
3    job_2  YET_TO_START     LATE
0    job_1  YET_TO_START      ---
6    job_7  YET_TO_START      ---
4    job_6       RUNNING     LATE
2    job_4       RUNNING      MET
7    job_8     COMPLETED  NOT_MET
1    job_3     COMPLETED      MET

Or use ordered Categorical:

df['Status'] = pd.Categorical(df['Status'], ordered=True, 
                              categories=['FAILED', 'YET_TO_START', 'RUNNING', 'COMPLETED'])
                                          
df['SLA'] = pd.Categorical(df['SLA'], ordered=True, 
                              categories= ['LATE', '---', 'NOT_MET', 'MET'])
df = df.sort_values(by=['Status','SLA'])
print (df)
  JOB-NAME        Status      SLA
5    job_5        FAILED     LATE
3    job_2  YET_TO_START     LATE
0    job_1  YET_TO_START      ---
6    job_7  YET_TO_START      ---
4    job_6       RUNNING     LATE
2    job_4       RUNNING      MET
7    job_8     COMPLETED  NOT_MET
1    job_3     COMPLETED      MET

I like the idea of using a single dictionary, but the caveat is that if you have common items in both columns and want a different order, you won't be able to have duplicate keys.
@mozway - Then is possible use another solution. Also obiously different values.

mozway · Accepted Answer · 2022-11-29 09:12:02Z

Use numpy.lexsort, you can use any number of parameters easily:

import numpy as np

sort_order_dict = {"FAILED":0, "YET_TO_START":1, "RUNNING":2, "COMPLETED":3}
sort_order_dict2 = {'LATE': 0, '---': 1, 'NOT_MET': 2, 'MET': 3}

order = np.lexsort([df['SLA'].map(sort_order_dict2),
                    df['Status'].map(sort_order_dict),
                    ])

out = df.iloc[order]

Output:

  JOB-NAME        Status      SLA
5    job_5        FAILED     LATE
3    job_2  YET_TO_START     LATE
0    job_1  YET_TO_START      ---
6    job_7  YET_TO_START      ---
4    job_6       RUNNING     LATE
2    job_4       RUNNING      MET
7    job_8     COMPLETED  NOT_MET
1    job_3     COMPLETED      MET

Great solution! A small tip is that the primary key that needs to be sorted should be placed at the end.
@Jeremy yes, lexsort reverses the order of the key. In fact it's logical in terms of the sorting steps assuming successive sorts, the last sorting step is the significant one.

Collectives™ on Stack Overflow

Pandas: How to custom-sort on multiple columns? [duplicate]

2 Answers 2

3 Comments

2 Comments

Hot Network Questions