A while ago, I found a somewhat interesting sorting algorithm named Exact-Sort, which is introduced with the following somewhat bold claim:
No sort algorithm can sort an array with less position changing (relocating) than Exact-Sort.
While the claim is perhaps not as impressive as the fact that the website is still hosted on Geocities, it still caught my interest and I decided to have a look at it. Note that even though it performs few moves, it performs an unreasonable number of comparisons and uses additional memory, so it's only suitable to sort objects that are really expensive to move and cheap to compare.
How does it work?
Here is how the algorithm works: it picks the first element and counts the number of elements smaller than this one to compute its final position once everything is sorted (plus some additional trickery to handle duplicate elements). Then it puts the element at that position in a temporary variable and moves the first element in its final position. Then it starts again with the element that was at the final position of the first elements. It does so until the final position of one element corresponds to the first position. Then, it looks for another unsorted element and performs another such cycle. Then it continues to do so until the collection is sorted.
Yes, but...
Unfortunately, the algorithm doesn't (really) hold its promise: in this form, it does roughly a number of moves equivalent to that of a selection sort. The problem is that it swaps the elements once it has found its position, which means that it always has to move an element to a temporary location before moving it back into the collection. In other words, while it indeed relocates an object only once, it still has to swap it with a temporary location, which means that it performs two additional moves. Ideally, we should have to store only one element per cycle in a temporary location and move all the other ones directly from their original position to their final position.
To solve this problem, I have devised a variant of the algorithm that stacks the positions of the elements to move and actually moves them only once the end of a cycle has been reached. It takes more memory to store the positions, but it should always perform between \$0\$ and \$\frac{2}{3}(n - 1)\$ move operations to sort the collection (where \$n\$ is the size of the collection). In comparison, a typical selection sort performs \$3n\$ move operations.
Implementation
Enough talk, here is my C++14 implementation of this variant of the algorithm:
#include <cstddef>
#include <iterator>
#include <stack>
#include <utility>
#include <vector>
// Returns the index of the position of the nth element of the
// collection not already in its place
template<typename RandomAccessIterator>
auto real_index(RandomAccessIterator first, RandomAccessIterator last,
                const std::vector<bool>& sorted, std::size_t n)
    -> RandomAccessIterator
{
    // Number of encountered elements not already sorted
    std::size_t unsorted = 0;
    for (std::size_t i = 0 ; i < sorted.size() ; ++i)
    {
        if (not sorted[i])
        {
            ++unsorted;
        }
        if (unsorted == n + 1)
        {
            return first + i;
        }
    }
    return last;
}
// Returns the index of the first element in the collection
// that hasn't been sorted yet
template<typename RandomAccessIterator>
auto first_false(RandomAccessIterator first, RandomAccessIterator last,
                 const std::vector<bool>& sorted)
    -> RandomAccessIterator
{
    return real_index(first, last, sorted, 0);
}
// Returns the destination of the given value, where the destination
// corresponds to the final position of the given value once the whole
// collection has been sorted
template<typename RandomAccessIterator, typename T, typename Compare>
auto get_destination(RandomAccessIterator first, RandomAccessIterator last,
                     Compare compare, const std::vector<bool>& sorted,
                     const T& value, RandomAccessIterator start)
    -> RandomAccessIterator
{
    // Number of unsorted elements smaller elements than value
    std::size_t count = 0;
    for (auto it = first ; it != last ; ++it)
    {
        if (not sorted[it - first] && compare(*it, value) && it != start)
        {
            ++count;
        }
    }
    return real_index(first, last, sorted, count);
}
template<typename RandomAccessIterator, typename Compare>
auto exact_sort(RandomAccessIterator first, RandomAccessIterator last,
                Compare compare)
    -> void
{
    if (first == last) return;
    // Which elements are already sorted, and which ones still
    // need to be sorted
    std::vector<bool> sorted(std::distance(first, last), false);
    // Element where the current cycle starts
    RandomAccessIterator start = first;
    // Stack of elements, top is the current element
    std::stack<
        RandomAccessIterator,
        std::vector<RandomAccessIterator>
    > positions;
    while (true)
    {
        RandomAccessIterator dest; // Final destination of the current element
        if (positions.empty())
        {
            dest = get_destination(first, last, compare, sorted,
                                   *start, start);
        }
        else
        {
            dest = get_destination(first, last, compare, sorted,
                                   *positions.top(), start);
        }
        // There is nothing else to sort
        if (dest == last) return;
        // Mark the destination as "sorted"
        sorted[dest - first] = true;
        // When the beginning of the current cycle is the same as the
        // destination of the element to sort, we have reached the end
        // of the cycle
        if (dest == start)
        {
            // If the stack is empty, it means that the starting point
            // is already in its final position, do nothing
            if (not positions.empty())
            {
                // Move elements to their final positions
                auto tmp = std::move(*dest);
                while (not positions.empty())
                {
                    *dest = std::move(*positions.top());
                    dest = positions.top();
                    positions.pop();
                }
                *dest = std::move(tmp);
            }
            // The next cycle starts at the first unsorted element
            // of the collection
            start = first_false(first, last, sorted);
            // If there is no such element, it means that the collection
            // is already sorted
            if (start == last) return;
        }
        else
        {
            // Push the destination on the top of the stack
            positions.push(dest);
        }
    }
}
My original implementation also handles projections but it's a bit difficult to explain and trivial to add (and almost impossible to get wrong), so I skipped it for the review. All of these functions live in a detail namespace and are only given values from other functions (the users never use them directly), otherwise I would also have provided a version that doesn't take a comparator.
Conclusion
To sum up, this algorithm should always perform an optimal number of move operations (the ideal algorithm to sort fridges by price says the original description on Geocities), but always performs \$O(n²)\$ comparisons and uses \$O(n)\$ auxiliary memory.
Is there any way I could improve this algorithm (performance, efficiency, correctness, etc...) without losing its move-optimal property?
Trivia: Apparently, the original Exact-Sort corresponds to another sorting algorithm known as cycle sort, except that Exact-Sort stores a boolean array to perform an optimized lookup. Otherwise, the cycles logic and the aim of both algorithms (minimal number of writes to the original array) are pretty much the same. I realize that my algorithm has a slightly different goal, so it might be a good idea to find another name for it.
