2

I was wondering if someone knows a good way on how to sort a pandas dataframe in the following way:

a) I have the following randomly sorted data with an id that appears multiple times and a label that is either 0 or 1:

id | label
------ | ------ 
1 | 1
1 | 0
1 | 0
2 | 1
2 | 0
2 | 0
3 | 0
3 | 0
3 | 0

I would like to sort the labels in ascending order and then also sort the id's in ascending order, but not grouped, so like this:

id | label
------ | ------ 
1 | 0
2 | 0
3 | 0
1 | 0
2 | 0
3 | 0
3 | 0
1 | 1
2 | 1

Thanks in advance!

0

1 Answer 1

3

First sort by id and label, then use cumcount to create an index representing 1,2,3 groups, then sort on index and by labels.

df_out = df.sort_values(by=['id','label'])\
  .set_index(df.groupby('id').cumcount())\
  .sort_index()\
  .sort_values(by='label')

Output:

   id  label
0   1      0
0   2      0
0   3      0
1   1      0
1   2      0
1   3      0
2   3      0
2   1      1
2   2      1
Sign up to request clarification or add additional context in comments.

3 Comments

That works perfectly, thank you very much! You are awesome.
I knew it was definitely related to cumcount. I tried but failed. This is fantastic using sort_index.
Thank you. I recognized the pattern in OP's results. To me this is why people need to explain their logic and put sample inputs with expected outputs.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.