Return to Revisions

1 of 6

answered Dec 19, 2021 at 18:32

9.7k
3
34
73

Code Review

I have some small nitpicks, but otherwise looks good.

Use string_view

As Deduplicator mentioned, it is better to clarify that the function does not expect exactly std::string, but just a continuous sequence of chars with a size.

Structured binding

Although in this case the map is not useful, it would be better to name the .first and .second variables:

    for (const auto& [value, occurence_count]: counter_map)

Misspelling std::size_t

This is even more minor one, but perhaps with advent of modules this might change.

Consumption

Although this is not intended to be consumed by somebody as a library, it would be great to write basic CMakeLists file to specify build requirements.

Benchmarks

Without too much talking, I will dump my benchmark code (there are some slight modifications):

#include <iostream>
#include <string>
#include <unordered_map>
#include <algorithm>
#include <bitset>
#include <memory_resource>

#include "input_gen.h"

// Based on https://codereview.stackexchange.com/q/272128
// from https://codereview.stackexchange.com/users/58360/coderodde
bool is_permutation_palindrome_original(const std::string& text)
{
    const std::size_t buffer_size = 8 * 1024;
    std::array<std::byte, buffer_size> scratch;
    std::pmr::monotonic_buffer_resource resource(scratch.data(), buffer_size);
    std::pmr::unordered_map<char, size_t> counter_map(&resource);
//    counter_map.reserve(256);

    for (const auto ch : text)
        ++counter_map[ch];

    size_t number_of_odd_chars = 0;

    for (const auto &pair: counter_map)
        if (counter_map[pair.first] % 2 == 1)
        {
            ++number_of_odd_chars;

            if (number_of_odd_chars > 1)
                return false;
        }

    return true;
}

// based on https://codereview.stackexchange.com/a/272130
// from https://codereview.stackexchange.com/users/42409/deduplicator
bool is_permutation_palindrome_array(std::string_view s) noexcept {
    unsigned char counts[1u + (unsigned char)-1] {};
    for (unsigned char c : s)
        ++counts[c];
    return std::count_if(std::begin(counts), std::end(counts), [](auto a){ return a % 2; }) < 2;
}

// https://codereview.stackexchange.com/a/272129
// from https://codereview.stackexchange.com/users/129343/g-sliepen
bool is_permutation_palindrome_bitset(const std::string& text)
{
    std::bitset<256> odd_characters;

    for (const auto ch : text)
        odd_characters.flip(static_cast<std::uint8_t>(ch));

    return odd_characters.count() <= 1;
}

#include <chrono>
namespace chrono = std::chrono;

struct sample_t {
    chrono::nanoseconds original_time;
    chrono::nanoseconds array_time;
    chrono::nanoseconds bitset_time;
};

sample_t measure(const std::string& input) {
    sample_t sample{};
    {
        auto start_time = chrono::steady_clock::now();
        volatile bool is_pal = is_permutation_palindrome_original(input);
        auto end_time = chrono::steady_clock::now();
        sample.original_time = chrono::duration_cast<chrono::nanoseconds>(end_time - start_time);
    }


    {
        auto start_time = chrono::steady_clock::now();
        volatile bool is_pal = is_permutation_palindrome_array(input);
        auto end_time = chrono::steady_clock::now();
        sample.array_time = chrono::duration_cast<chrono::nanoseconds>(end_time - start_time);
    }

    {
        auto start_time = chrono::steady_clock::now();
        volatile bool is_pal = is_permutation_palindrome_bitset(input);
        auto end_time = chrono::steady_clock::now();
        sample.bitset_time = chrono::duration_cast<chrono::nanoseconds>(end_time - start_time);
    }

    return sample;
}

struct metric_t {
    chrono::nanoseconds min = chrono::nanoseconds(std::numeric_limits<std::int64_t>::max());
    chrono::nanoseconds max = chrono::nanoseconds(0);
    chrono::nanoseconds sum = chrono::nanoseconds(0);

    void update(chrono::nanoseconds new_sample) {
        if (min > new_sample) {
            min = new_sample;
        }

        if (max < new_sample) {
            max = new_sample;
        }

        sum += new_sample;
    }
};


#include <sstream>
#include <fstream>

int main(int argc, char* argv[]) {
    if (argc != 4) {
        std::cerr << "usage: " << argv[0] << " <input-size> <run-count> <output-file>\n";
        return EXIT_FAILURE;
    }

    const std::size_t size = std::stoull(argv[1]);
    const std::size_t target_amount_of_runs = std::stoull(argv[2]);

    std::vector<sample_t> samples;
    samples.reserve(target_amount_of_runs);
    for (std::size_t i = 0; i < target_amount_of_runs; ++i) {
        auto input = shino::generate_random_input(size);
        auto sample = measure(input);
        samples.push_back(sample);
    }

    metric_t metrics_for_og;
    metric_t metrics_for_array;
    metric_t metrics_for_bitset;
    for (const auto& sample: samples) {
        metrics_for_og.update(sample.original_time);
        metrics_for_array.update(sample.array_time);
        metrics_for_bitset.update(sample.bitset_time);
    }

    chrono::nanoseconds avg_time_og = metrics_for_og.sum / target_amount_of_runs;
    chrono::nanoseconds avg_time_array = metrics_for_array.sum / target_amount_of_runs;
    chrono::nanoseconds avg_time_bitset = metrics_for_bitset.sum / target_amount_of_runs;

    std::vector<chrono::nanoseconds> og_samples;
    og_samples.reserve(target_amount_of_runs);
    std::vector<chrono::nanoseconds> array_samples;
    array_samples.reserve(target_amount_of_runs);
    std::vector<chrono::nanoseconds> bitset_samples;
    bitset_samples.reserve(target_amount_of_runs);
    for (const auto& sample: samples) {
        og_samples.push_back(sample.original_time);
        array_samples.push_back(sample.array_time);
        bitset_samples.push_back(sample.bitset_time);
    }

    std::sort(og_samples.begin(), og_samples.end());
    std::sort(array_samples.begin(), array_samples.end());
    std::sort(bitset_samples.begin(), bitset_samples.end());

    std::ostringstream oss;
    oss << "original function metrics (ns):\n"
        << "\tmin: " << metrics_for_og.min.count() << '\n'
        << "\tmax: " << metrics_for_og.max.count() << '\n'
        << "\tavg: " << avg_time_og.count() << '\n'
        << "\t98th percentile: " << og_samples[target_amount_of_runs * 0.98].count()
              << "\n\n";

    oss << "array based function metrics (ns):\n"
        << "\tmin: " << metrics_for_array.min.count() << '\n'
        << "\tmax: " << metrics_for_array.max.count() << '\n'
        << "\tavg: " << avg_time_array.count() << '\n'
        << "\t98th percentile: " << array_samples[target_amount_of_runs * 0.98].count()
              << "\n\n";

    oss << "bitset based function metrics (ns):\n"
        << "\tmin: " << metrics_for_bitset.min.count() << '\n'
        << "\tmax: " << metrics_for_bitset.max.count() << '\n'
        << "\tavg: " << avg_time_bitset.count() << '\n'
        << "\t98th percentile: " << bitset_samples[target_amount_of_runs * 0.98].count()
              << "\n\n";

    std::ofstream output_file(argv[3]);
    if (!output_file) {
        std::cerr << "opening output file failed, printing results to stderr:\n"
                  << oss.str();
        return EXIT_FAILURE;
    }

    output_file << oss.str();
}

And input_gen.h:

#ifndef IS_PAL_INPUT_GEN_H
#define IS_PAL_INPUT_GEN_H

#include <string>
#include <random>
#include <algorithm>

namespace shino {

// the input is under 1MiB
// only request at a time
inline std::string generate_random_input(std::size_t size) {
    std::minstd_rand0 generator;
    std::uniform_int_distribution<char> dist;
    std::string result(size, '\0');
    std::generate(result.begin(), result.end(),
                  [&dist, &generator]() {
        return dist(generator);
    });

    return result;
}

}

#endif //IS_PAL_INPUT_GEN_H

Alright, with that out of the way, here is how and why I benchmarked the code:

Benchmark metric: I used latency as the metric. Of course I also got OS scheduling, I/O interrupts and other stuff in there, but I also did 98th percentile test to see if the deviation is too big. All of the three versions I benchmarked were pretty close to average on 98th percentile, albeit being slower than average sample.

Sampling: I decided to do the basic run and measure, using std::chrono::steady_clock, because both std::chrono::high_resolution_clock and system version failed me once (They readjusted to invalidate my results). I did not use any benchmarking library because they don't really measure latency, but throughput.

Metrics list:

Fastest sample
Slowest sample
Average sample
98th percentile sample

Results:

I'm gonna separate the answers' versions from the changes I made, so here are the results from Deduplicator and G. Sliepen:

array based function metrics (ns):
        min: 402
        max: 30785
        avg: 418
        98th percentile: 460

bitset based function metrics (ns):
        min: 705
        max: 41427
        avg: 768
        98th percentile: 916

Verbatim version (perhaps there are small changes, I do not remember correctly):

original function metrics (ns):
        min: 4055
        max: 83646
        avg: 4169
        98th percentile: 4303

Reserve:

original function metrics (ns):
        min: 4350
        max: 69625
        avg: 4546
        98th percentile: 4678

Reserve seems to give negative effect.

PMR version with no reserve:

original function metrics (ns):
        min: 4814
        max: 91582
        avg: 5129
        98th percentile: 5539

PMR version seems to be slower.

The original version of PMR, with oversized buffer on the stack:

original function metrics (ns):
        min: 5111
        max: 582302
        avg: 5345
        98th percentile: 5489

Even worse.

PMR with 8kb buffer and reserve:

        min: 5160
        max: 74025
        avg: 5416
        98th percentile: 5572

Even more worse.

I have no idea why anything I do with std::unordered_map makes it perform worse. It is clear that the array based version is faster, a lot faster than map version and considerably faster than bitset version.

I also managed to stumble upon the functions being optimized out, but volatile solved that problem.

Repo: https://github.com/simmplecoder/is-palindrome-benchmark

answered Dec 19, 2021 at 18:32

Incomputable

9.7k
3
34
73