Where is the smoothsort algorithm used?

Question

While studying Algorithm in my University, I have come across this algorithm called Smoothsort. It is a great algorithm as it is an adaptive algorithm, the time complexity can vary from being a linear complexity to linearithmic, and it is an in-place algorithm as well. However, the resources on this algorithm is very scarce and those are available are quite hard to understand. However, the reason I am posting this question, is to learn about the use cases and when it should be used and when it should not be used?

That's all I wanted to know and thank you very much for answering or reviewing this question.

David Eisenstat · Accepted Answer · 2020-09-13 15:23:44Z

musl libc uses Smoothsort to implement qsort(). This Stack Overflow answer from the library author describes the reasoning: it's worst-case O(n log n) (unlike Quicksort), in-place (unlike mergesort), and adaptive (unlike heapsort). I think Smoothsort doesn't get used so much for some combination of

Smoothsort is not stable, and stability is often more desirable than in-place in practice because it means that the sorting order in the presence of equal keys is completely determined. musl has the freedom allowed by the C standard not to be stable, and one of the design goals is not to allocate memory except as explicitly requested (this is why musl strstr() doesn't use KMP, for example).

Note that there exist in-place, stable, O(n log n)-time sorting algorithms, but the constant factor is too big to be practical.
Adaptivity is more of a nice to have, and Smoothsort is more complicated than heapsort.
Smoothsort isn't widely known due to its omission from almost all undergraduate algorithms courses (and even graduate courses – I didn't see it in school, for example).

I had no idea anyone used this algorithm in practice! When I last did a deep dive into it the performance numbers weren’t good enough to justify its use. Seems like the library is using it as a replacement for insertion sort after quick sorting down to a certain block size, which is something I hadn’t thought to try. Thanks for sharing!
I didn't test it thoroughly, but it seems that smoothsort suffers the same referential locality problem as heapsort does, which makes it equally impractical.
@David, thanks for answering and making the correction. And, I am extremely sorry, I wanted to nlogn in brackets beside the linearithmic term that you added. However, after I made the change, your name disappeared. I am sorry for that.

4xel · Accepted Answer · 2023-01-12 07:39:08Z

I just came across a very nice use case playing bitburner*, so I figured why not share it.

Imagine you want to optimally invest a sum of money among different investment items, I_1, I_2, ... I_p, with varying yields. The solution is pretty easy, sort the items by yields, in time p (log p), and invest your money in items from best to worse. Now some time later, the weather changes and so do the yields of the items, but continuously so. Solving the problem again with an adaptive sort using the already sorted array allows a complexity p instead of p (log p).

The full problem is more complicated, but similar. Each Item has variants, I_1_1 I_1_2 ... I_1_k , each next variant with a substantially bigger volume and slightly lesser marginal yield. A brutish way to solve the problem would be to treat all the I_i_(j+1) - I_i_j as different investments, and reuse the previous solution, with complexity n (log n), where n = p*k.

A first improvement is to observe that for any i, I_i_1 I_i_2 ... I_i_k is already in order, so doing a fusion sort of these lists only costs : p * k * (log p).

If p is much bigger than k, you may want to do the following instead : Sort I_1_1, I_2_1, ... I_p_1, then sort the I_1_j, I_2_j, ... I_p_j using the order of the previous variants using an adaptive sort. Then fuse sort everything back. The total cost should be p log(p) + k * p + p * k * log(k) which is p * k * log(k).

Since were only taking things in order, we don't actually need to fuse sort, we only need to merge priority queues/heaps. Since we can pop all, or a non negligible fraction of the variants, it makes no difference, merging heaps here is just one particular way to fuse sort ordered trees.

A third approach capitalizing on this observation consists in only holding (at most) p elements in the heap at any time, one per item. You initialize the heap with I_1_1, I_2_1, ... I_p_1 (with best marginal yield in the root). Then, as long as you have money, pop the root and push the next best variant of that item. Depending on which kind of max-heap you use, building it requires complexity from p to p log(p), and at least popping or pushing is O(log(p)), and both can be combined in a decrease-key operation. So overall, the worst case scenario should be p * k * log(p).

In my understanding, a Leonardo max-heap, the data structure underlying Smoothsort, has a very low cost for the non standard decrease-key operation when the decrease does not change the order of elements much, much closer to 1 than to log(p). I haven't looked into many other heaps, but it stands to reason that the one yielding an almost adaptive sort algorithm would have one of the best decrease-key (again, in a max-heap, where increase-key is the easier operation). This should give a linear solution, of complexity p * k

There should be an even better solution in about p * log (p * k) to p * log^2 (p * k) with dichotomy search on a marginal yield threshold, if I'm not mistaken, but it doesn't involves much sorting, and I think no smooth sort at all so I'll leave it out.

In any cases, any solution benefits from adaptive sorting for solving problems on a data set close to one already solved.

(*) for the curious player, money is Ram, items are servers to hack and variants are server saturation.

Collectives™ on Stack Overflow

Where is the smoothsort algorithm used?

2 Answers 2

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Linked

Related