Code Review
I have some small nitpicks, but otherwise looks good.
Use string_view
As Deduplicator mentioned, it is better to clarify that the function does not expect exactly std::string, but just a continuous sequence of chars with a size.
Structured binding
Although in this case the map is not useful, it would be better to name the .first and .second variables:
for (const auto& [value, occurence_count]: counter_map)
Misspelling std::size_t
This is even more minor one, but perhaps with advent of modules this might change.
Consumption
Although this is not intended to be consumed by somebody as a library, it would be great to write basic CMakeLists file to specify build requirements.
Benchmarks
Without too much talking, I will dump my benchmark code (there are some slight modifications):
#include <iostream>
#include <string>
#include <unordered_map>
#include <algorithm>
#include <bitset>
#include <memory_resource>
#include "input_gen.h"
// Based on https://codereview.stackexchange.com/q/272128
// from https://codereview.stackexchange.com/users/58360/coderodde
bool is_permutation_palindrome_original(const std::string& text)
{
const std::size_t buffer_size = 8 * 1024;
std::array<std::byte, buffer_size> scratch;
std::pmr::monotonic_buffer_resource resource(scratch.data(), buffer_size);
std::pmr::unordered_map<char, size_t> counter_map(&resource);
// counter_map.reserve(256);
for (const auto ch : text)
++counter_map[ch];
size_t number_of_odd_chars = 0;
for (const auto &pair: counter_map)
if (counter_map[pair.first] % 2 == 1)
{
++number_of_odd_chars;
if (number_of_odd_chars > 1)
return false;
}
return true;
}
// based on https://codereview.stackexchange.com/a/272130
// from https://codereview.stackexchange.com/users/42409/deduplicator
bool is_permutation_palindrome_array(std::string_view s) noexcept {
unsigned char counts[1u + (unsigned char)-1] {};
for (unsigned char c : s)
++counts[c];
return std::count_if(std::begin(counts), std::end(counts), [](auto a){ return a % 2; }) < 2;
}
// https://codereview.stackexchange.com/a/272129
// from https://codereview.stackexchange.com/users/129343/g-sliepen
bool is_permutation_palindrome_bitset(const std::string& text)
{
std::bitset<256> odd_characters;
for (const auto ch : text)
odd_characters.flip(static_cast<std::uint8_t>(ch));
return odd_characters.count() <= 1;
}
#include <chrono>
namespace chrono = std::chrono;
struct sample_t {
chrono::nanoseconds original_time;
chrono::nanoseconds array_time;
chrono::nanoseconds bitset_time;
};
sample_t measure(const std::string& input) {
sample_t sample{};
{
auto start_time = chrono::steady_clock::now();
volatile bool is_pal = is_permutation_palindrome_original(input);
auto end_time = chrono::steady_clock::now();
sample.original_time = chrono::duration_cast<chrono::nanoseconds>(end_time - start_time);
}
{
auto start_time = chrono::steady_clock::now();
volatile bool is_pal = is_permutation_palindrome_array(input);
auto end_time = chrono::steady_clock::now();
sample.array_time = chrono::duration_cast<chrono::nanoseconds>(end_time - start_time);
}
{
auto start_time = chrono::steady_clock::now();
volatile bool is_pal = is_permutation_palindrome_bitset(input);
auto end_time = chrono::steady_clock::now();
sample.bitset_time = chrono::duration_cast<chrono::nanoseconds>(end_time - start_time);
}
return sample;
}
struct metric_t {
chrono::nanoseconds min = chrono::nanoseconds(std::numeric_limits<std::int64_t>::max());
chrono::nanoseconds max = chrono::nanoseconds(0);
chrono::nanoseconds sum = chrono::nanoseconds(0);
void update(chrono::nanoseconds new_sample) {
if (min > new_sample) {
min = new_sample;
}
if (max < new_sample) {
max = new_sample;
}
sum += new_sample;
}
};
#include <sstream>
#include <fstream>
int main(int argc, char* argv[]) {
if (argc != 4) {
std::cerr << "usage: " << argv[0] << " <input-size> <run-count> <output-file>\n";
return EXIT_FAILURE;
}
const std::size_t size = std::stoull(argv[1]);
const std::size_t target_amount_of_runs = std::stoull(argv[2]);
std::vector<sample_t> samples;
samples.reserve(target_amount_of_runs);
for (std::size_t i = 0; i < target_amount_of_runs; ++i) {
auto input = shino::generate_random_input(size);
auto sample = measure(input);
samples.push_back(sample);
}
metric_t metrics_for_og;
metric_t metrics_for_array;
metric_t metrics_for_bitset;
for (const auto& sample: samples) {
metrics_for_og.update(sample.original_time);
metrics_for_array.update(sample.array_time);
metrics_for_bitset.update(sample.bitset_time);
}
chrono::nanoseconds avg_time_og = metrics_for_og.sum / target_amount_of_runs;
chrono::nanoseconds avg_time_array = metrics_for_array.sum / target_amount_of_runs;
chrono::nanoseconds avg_time_bitset = metrics_for_bitset.sum / target_amount_of_runs;
std::vector<chrono::nanoseconds> og_samples;
og_samples.reserve(target_amount_of_runs);
std::vector<chrono::nanoseconds> array_samples;
array_samples.reserve(target_amount_of_runs);
std::vector<chrono::nanoseconds> bitset_samples;
bitset_samples.reserve(target_amount_of_runs);
for (const auto& sample: samples) {
og_samples.push_back(sample.original_time);
array_samples.push_back(sample.array_time);
bitset_samples.push_back(sample.bitset_time);
}
std::sort(og_samples.begin(), og_samples.end());
std::sort(array_samples.begin(), array_samples.end());
std::sort(bitset_samples.begin(), bitset_samples.end());
std::ostringstream oss;
oss << "original function metrics (ns):\n"
<< "\tmin: " << metrics_for_og.min.count() << '\n'
<< "\tmax: " << metrics_for_og.max.count() << '\n'
<< "\tavg: " << avg_time_og.count() << '\n'
<< "\t98th percentile: " << og_samples[target_amount_of_runs * 0.98].count()
<< "\n\n";
oss << "array based function metrics (ns):\n"
<< "\tmin: " << metrics_for_array.min.count() << '\n'
<< "\tmax: " << metrics_for_array.max.count() << '\n'
<< "\tavg: " << avg_time_array.count() << '\n'
<< "\t98th percentile: " << array_samples[target_amount_of_runs * 0.98].count()
<< "\n\n";
oss << "bitset based function metrics (ns):\n"
<< "\tmin: " << metrics_for_bitset.min.count() << '\n'
<< "\tmax: " << metrics_for_bitset.max.count() << '\n'
<< "\tavg: " << avg_time_bitset.count() << '\n'
<< "\t98th percentile: " << bitset_samples[target_amount_of_runs * 0.98].count()
<< "\n\n";
std::ofstream output_file(argv[3]);
if (!output_file) {
std::cerr << "opening output file failed, printing results to stderr:\n"
<< oss.str();
return EXIT_FAILURE;
}
output_file << oss.str();
}
And input_gen.h:
#ifndef IS_PAL_INPUT_GEN_H
#define IS_PAL_INPUT_GEN_H
#include <string>
#include <random>
#include <algorithm>
namespace shino {
// the input is under 1MiB
// only request at a time
inline std::string generate_random_input(std::size_t size) {
std::minstd_rand0 generator;
std::uniform_int_distribution<char> dist;
std::string result(size, '\0');
std::generate(result.begin(), result.end(),
[&dist, &generator]() {
return dist(generator);
});
return result;
}
}
#endif //IS_PAL_INPUT_GEN_H
Alright, with that out of the way, here is how and why I benchmarked the code:
Benchmark metric: I used latency as the metric. Of course I also got OS scheduling, I/O interrupts and other stuff in there, but I also did 98th percentile test to see if the deviation is too big. All of the three versions I benchmarked were pretty close to average on 98th percentile, albeit being slower than average sample.
Sampling: I decided to do the basic run and measure, using std::chrono::steady_clock, because both std::chrono::high_resolution_clock and system version failed me once (They readjusted to invalidate my results). I did not use any benchmarking library because they don't really measure latency, but throughput.
Metrics list:
Fastest sample
Slowest sample
Average sample
98th percentile sample
Results:
I'm gonna separate the answers' versions from the changes I made, so here are the results from Deduplicator and G. Sliepen:
array based function metrics (ns):
min: 402
max: 30785
avg: 418
98th percentile: 460
bitset based function metrics (ns):
min: 705
max: 41427
avg: 768
98th percentile: 916
Verbatim version (perhaps there are small changes, I do not remember correctly):
original function metrics (ns):
min: 4055
max: 83646
avg: 4169
98th percentile: 4303
Reserve:
original function metrics (ns):
min: 4350
max: 69625
avg: 4546
98th percentile: 4678
Reserve seems to give negative effect.
PMR version with no reserve:
original function metrics (ns):
min: 4814
max: 91582
avg: 5129
98th percentile: 5539
PMR version seems to be slower.
The original version of PMR, with oversized buffer on the stack:
original function metrics (ns):
min: 5111
max: 582302
avg: 5345
98th percentile: 5489
Even worse.
PMR with 8kb buffer and reserve:
min: 5160
max: 74025
avg: 5416
98th percentile: 5572
Even more worse.
I have no idea why anything I do with std::unordered_map makes it perform worse. It is clear that the array based version is faster, a lot faster than map version and considerably faster than bitset version.
I also managed to stumble upon the functions being optimized out, but volatile solved that problem.
Repo: https://github.com/simmplecoder/is-palindrome-benchmark