2

Example below... why does this happen and how can I prevent it?

>>> df = pd.DataFrame({'a': list(range(150)), 'b': [1, 2, 3] * 50})
>>> df.sort_values('b').equals(df.sort_values('b').sort_values('b'))
False
>>> df.sort_values('b').head()
       a  b
0      0  1
39    39  1
42    42  1
45    45  1
132  132  1
>>> df.sort_values('b').sort_values('b').head()
       a  b
0      0  1
87    87  1
120  120  1
84    84  1
81    81  1

2 Answers 2

3

For me working specify mergesort like only one stable sorting method in DataFrame.sort_values, because if sorting by only one column is default method kind=quicksort:

kind{‘quicksort’, ‘mergesort’, ‘heapsort’}, default quicksort

Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.

If sorting by multiple columns default is mergesort.

print (df.sort_values('b', kind='mergesort').head())
     a  b
0    0  1
3    3  1
6    6  1
9    9  1
12  12  1

print (df.sort_values('b', kind='mergesort').sort_values('b', kind='mergesort').head())
     a  b
0    0  1
3    3  1
6    6  1
9    9  1
12  12  1
Sign up to request clarification or add additional context in comments.

1 Comment

Okay thanks for the explanation! I wish they made the stable algorithm the default, seems to me like premature optimization with a hard to debug side-effect.
1

This should be a comment, but it is too long.

According to docs for DataFrame.sort_values

kind: .. mergesort is the only stable algorithm.

You getting different results for column a because there is no guarantee that the order of equivalent elements in column b will be retained during sorting. And since the column b consists of 1s only, order of the elements are undetermined. You can either use mergesort as suggested by jezrael, or sort by column b then by column a.

Also, please see Quick Sort vs Merge Sort for additional info. The most important point regarding your question is

  1. Stability : Merge sort is stable as two elements with equal value appear in the same order in sorted output as they were in the input unsorted array.
    Quick sort is unstable in this scenario.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.