Pandas Dataframe ordering and sorting of column values

Question

I was wondering if someone knows a good way on how to sort a pandas dataframe in the following way:

a) I have the following randomly sorted data with an id that appears multiple times and a label that is either 0 or 1:

id | label
------ | ------ 
1 | 1
1 | 0
1 | 0
2 | 1
2 | 0
2 | 0
3 | 0
3 | 0
3 | 0

I would like to sort the labels in ascending order and then also sort the id's in ascending order, but not grouped, so like this:

id | label
------ | ------ 
1 | 0
2 | 0
3 | 0
1 | 0
2 | 0
3 | 0
3 | 0
1 | 1
2 | 1

Thanks in advance!

Scott Boston · Accepted Answer · 2017-08-17 13:42:37Z

3

First sort by id and label, then use cumcount to create an index representing 1,2,3 groups, then sort on index and by labels.

df_out = df.sort_values(by=['id','label'])\
  .set_index(df.groupby('id').cumcount())\
  .sort_index()\
  .sort_values(by='label')

Output:

   id  label
0   1      0
0   2      0
0   3      0
1   1      0
1   2      0
1   3      0
2   3      0
2   1      1
2   2      1

answered Aug 17, 2017 at 13:42

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

SirTobi Over a year ago

That works perfectly, thank you very much! You are awesome.

Bharath M Shetty Over a year ago

I knew it was definitely related to cumcount. I tried but failed. This is fantastic using sort_index.

Scott Boston Over a year ago

Thank you. I recognized the pattern in OP's results. To me this is why people need to explain their logic and put sample inputs with expected outputs.

Collectives™ on Stack Overflow

Pandas Dataframe ordering and sorting of column values

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related