I run into a performance bottleneck in a program. I have to generate all permutations for a given sequence with size up to n=11.
My code:
#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>
int factorial(int n)
{
if (n == 0) {
return 1;
}
return n * factorial(n - 1);
}
std::vector<std::vector<int>> generatePermutations(int size)
{
std::vector<int> sequence(size);
std::iota(sequence.begin(), sequence.end(), 1);
std::vector<std::vector<int>> permutations;
permutations.reserve(factorial(sequence.size()));
do {
permutations.emplace_back(sequence);
} while (std::next_permutation(sequence.begin(), sequence.end()));
return permutations;
}
int main()
{
for (int i = 2; i <= 11; ++i) {
auto t1 = std::chrono::high_resolution_clock::now();
auto permutations = generatePermutations(i);
auto t2 = std::chrono::high_resolution_clock::now();
std::cout << "i:\t" << i << "\tpermutation count:\t" << permutations.size()
<< "\ttime ms:\t"
<< std::chrono::duration_cast<std::chrono::milliseconds>(t2 -
t1).count() << '\n';
}
}
The Output:
i: 2 permutation count: 2 time ms: 0
i: 3 permutation count: 6 time ms: 0
i: 4 permutation count: 24 time ms: 0
i: 5 permutation count: 120 time ms: 0
i: 6 permutation count: 720 time ms: 0
i: 7 permutation count: 5040 time ms: 1
i: 8 permutation count: 40320 time ms: 10
i: 9 permutation count: 362880 time ms: 94
i: 10 permutation count: 3628800 time ms: 944
i: 11 permutation count: 39916800 time ms: 10569
As you can see the permutations count gets big very fast because it is array.size!.
I need to be a lot faster than 10s for n = 11.
I wonder is there a faster way than using std::next_permutation?
Or is there an faster approach with threading?
edit: Other requirements I forgot which maybe help.
The generated permuations can stay const the rest of the program I don't modify them at all after.
Data structure can change. It does not need to be necessary std::vector.
I need to read the generated permutations frequently.