False optimization (and makes it hard to read).
template <typename T>
constexpr unsigned num_bits(T = 0) {
return sizeof(T) << 3;
}
This should be sizeof(T) * 8 that's so much easier to read. And if it quicker to shift that is what the compiler will actually do. Don't try and outsmart the compiler the only thing that will is happen is that you will outsmart yourself.
Also its not actually 8. TO be compliant and accurate use CHAR_BIT.
This is a bit inefficient.
template <typename T>
unsigned num_bits_set(T x) {
unsigned cnt = 0;
// behavior of signed type underflow is unspecified
std::make_unsigned_t<T> v = x;
while (v) {
++cnt;
v &= v - 1;
}
return cnt;
}
You are using a runtime loop (over each bit). You could use a compile time loop over each byte. Because the number of bits set for each specific value for a byte it is easy to pre-calculate.
Note: For each byte there are only CHAR_BIT number of bits (usually 8). Which is 256 values. So you can swap space for time here for a single byte.
unsigned num_bits_set_count_value[] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, // 000-015
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, // 016-031
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, // 032-047
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, // 048-063
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, // 064-079
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, // 080-095
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, // 096-111
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, // 112-127
// ---
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, // 128-143
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, // 144-159
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, // 160-175
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, // 176-191
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, // 192-207
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, // 208-223
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, // 224-239
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8}; // 240-255
// Use compile time meta program
// to unwind a multi byte value into a set of adds
// these will all be inlined.
template <int x>
unsigned num_bits_set_count(char const* byteStream) {
return num_bits_set_count_value[*byteStream]
+ num_bits_set_count<x - 1>(byteStream + 1);
}
template<> unsigned num_bits_set_count<0>(char const*) {return 0;}
template <typename T>
unsigned num_bits_set(T x) {
return num_bits_set_count<sizeof(T)>(reinterpret_cast<char const*>(&x));
}