I run into a performance bottleneck in a program. I have to generate all permutations for a given sequence with size up to n=11.
My code:
#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>
int factorial(int n)
{
    if (n == 0) {
        return 1;
    }
    return n * factorial(n - 1);
}
std::vector<std::vector<int>> generatePermutations(int size)
{
    std::vector<int> sequence(size);
    std::iota(sequence.begin(), sequence.end(), 1);
    std::vector<std::vector<int>> permutations;
    permutations.reserve(factorial(sequence.size()));
    do {
        permutations.emplace_back(sequence);
    } while (std::next_permutation(sequence.begin(), sequence.end()));
    return permutations;
}
int main()
{
    for (int i = 2; i <= 11; ++i) {
        auto t1 = std::chrono::high_resolution_clock::now();
        auto permutations = generatePermutations(i);
        auto t2 = std::chrono::high_resolution_clock::now();
        std::cout << "i:\t" << i << "\tpermutation count:\t" << permutations.size()
                  << "\ttime ms:\t"
                  << std::chrono::duration_cast<std::chrono::milliseconds>(t2 -
                                                                     t1).count() << '\n';
    }
}
The Output:
i:      2       permutation count:      2       time ms:        0
i:      3       permutation count:      6       time ms:        0
i:      4       permutation count:      24      time ms:        0
i:      5       permutation count:      120     time ms:        0
i:      6       permutation count:      720     time ms:        0
i:      7       permutation count:      5040    time ms:        1
i:      8       permutation count:      40320   time ms:        10
i:      9       permutation count:      362880  time ms:        94
i:      10      permutation count:      3628800 time ms:        944
i:      11      permutation count:      39916800        time ms:        10569
As you can see the permutations count gets big very fast because the permutation count is array.size!, e.g. 1*2*3*4*5*6*7*8*9*10*11 for n=11
I need to be a lot faster than 10s for n = 11.
I wonder is there a faster way than using std::next_permutation?
Or is there an faster approach with threading?
edit:
Since the requirements were stated as not clear for performance optimizations I want to summarize them here:
-generate all permutations of a given sequence.
-generated permutations don't need to be in a specific order. Currently I use std::next_permuation which generates them in order. Is there an algorithm which does it faster in not specific order?
-generated permutations must be possible to traverse over. In my code I used a std::vector<int> to store each permutation. Any other container is fine too if it is faster.
-generated permutations need to be read but never modified or deleted.
-generation needs to be very fast because the count of permutations gets very high (~40mio permutations for n=11)
generatePermutations()be used for? Otherwise, have a look at stdlib's implementation and, say, Python's "on-demand" one, and guess how much room for improvement is left. \$\endgroup\$main? \$\endgroup\$