Return to Revisions

4 of 10

The naive algorithm is still far more than five times faster.

edited Jan 6, 2015 at 14:18

20.2k
3
69
132

Gray codes addition

As some of you know, I have been working with Gray codes and am trying to design a good gray_code class in modern C++. I will end up posting several parts of the implementation, but what I want to be reviewed is mostly the addition function. Anyway, here is the code of the class essentials:

template<typename Unsigned>
struct gray_code
{
    static_assert(std::is_unsigned<Unsigned>::value,
                  "gray code only supports built-in unsigned integers");

    // Underlying unsigned integer
    using value_type = Unsigned;

    // Variable containing the gray code
    value_type value;

    ////////////////////////////////////////////////////////////
    // Constructors operations

    // Default constructor
    constexpr gray_code() noexcept:
        value(0u)
    {}

    /**
     * @brief Construction from an unsigned integer.
     *
     * The integer is converted to gray code. The value
     * is preserved. The representation is not.
     *
     * @param value Unsigned integer to convert
     */
    constexpr explicit gray_code(value_type value) noexcept:
        value( (value >> 1) ^ value )
    {}

    ////////////////////////////////////////////////////////////
    // Assignment operations

    /**
     * @brief Assigns an unsigned integer.
     *
     * It works the same way as the equivalent
     * constructor does.
     */
    auto operator=(value_type other) & noexcept
        -> gray_code&
    {
        value = (other >> 1) ^ other;
        return *this;
    }

    ////////////////////////////////////////////////////////////
    // Conversion operations

    /**
     * @brief Conversion to the underlying type.
     *
     * See http://www.dspguru.com/dsp/tricks/gray-code-conversion
     */
    operator value_type() const noexcept
    {
        value_type res = value;
        for (value_type mask = std::numeric_limits<value_type>::digits / 2
             ; mask ; mask >>= 1)
        {
            res ^= res >> mask;
        }
        return res;
    }
};

As you can see, a gray_code<T> is just a wrapper around an unsigned built-in type T. The main operation are the construction and the conversion to and from the type T; the underlying value can be accessed thanks to the public variable value. The purpose of this class is to be easy to use. Thanks to the conversion operator, a gray_code can be used wherever the corresponding integer can be used. One of my goals is to provide operations specialized for gray_code only if the operation can be faster than converting the Gray code to the corresponding integer, performing the operation on the integer and converting the integer back to a Gray code. Conversions are really fast, so operations that are faster than the double conversion are hard to write. I did write some of them, not included here (comparison, bitwise operations, increment and decrement; assume that these functions work between gray_codes for the rest of the question) and I am currently working on the addition.

My addition algorithm is based on the pseudo-code algorithm provided in this paper. Here is the original pseudo-code addition algorithm (⊕ means xor):

procedure add (n: integer; A,B:word; PA,PB:bit;
               var S:word; var PS:bit; var CE, CF:bit);
var i: integer; E, F, T: bit;
begin
   E := PA; F := PB;
   for i:= 0 to n-1 do begin {in parallel, using previous inputs}
       S[i] := (E and F) ⊕ A[i] ⊕ B[i];
       E := (E and (not F)) ⊕ A[i];
       F := ((not E) and F) ⊕ B[i];
   end;
   CE := E; CF := F;
end;

I translated the algorithm in C++. Here is the naive implementation (your can find the corresponding C version here):

template<typename Unsigned>
auto operator+(gray_code<Unsigned> lhs, gray_code<Unsigned> rhs) noexcept
    -> gray_code<Unsigned>
{
    // parity of lhs and rhs
    bool lhs_p = is_odd(lhs);
    bool rhs_p = is_odd(rhs);

    gray_code<Unsigned> res{};
    for (Unsigned i{} ; i < std::numeric_limits<Unsigned>::digits ; ++i)
    {
        // Get the ith bit of rhs and lhs
        bool lhs_i = (lhs.value >> i) & 1u;
        bool rhs_i = (rhs.value >> i) & 1u;

        // Copy lhs_p and rhs_p (see {in parallel} in the original algorithm)
        bool lhs_p_cpy = lhs_p;
        bool rhs_p_cpy = rhs_p;

        // Set the ith bit of res
        Unsigned res_i = (lhs_p_cpy & rhs_p_cpy) ^ lhs_i ^ rhs_i;
        res |= (res_i << i);

        // Update e and f
        lhs_p = (lhs_p_cpy & (!rhs_p_cpy)) ^ lhs_i;
        rhs_p = ((!lhs_p_cpy) & rhs_p_cpy) ^ rhs_i;
    }
    return res;
}

This algorithm works. However, it is not as efficient as converting the Gray codes to integers, adding them and converting the result back to a Gray code. Therefore, I tried to optimize it. Here is what I did:

First, we can see that rhs_p has been copied for symmetry with lhs_p but that we can actually get rid of it.
Also, for each bit, lhs_i ^ rhs_i is computed. However, this computation does not depend on the previous iterations of the loop (it does not depend on the previous values of lhs_p and rhs_p). Actually, lhs_i ^ rhs_i can be computed in one operation and stored into res directly:
```
  gray_code<Unsigned> res = rhs ^ lhs;
```
Therefore, we can use operator^= instead of operator|= to set the bits of res in the loop:
```
  Unsigned res_i = lhs_p & rhs_p;
  res ^= res_i << i;
```

Here is the optimized version of the algorithm:

template<typename Unsigned>
auto operator+(gray_code<Unsigned> lhs, gray_code<Unsigned> rhs) noexcept
    -> gray_code<Unsigned>
{
    bool lhs_p = is_odd(lhs);
    bool rhs_p = is_odd(rhs);

    gray_code<Unsigned> res = lhs ^ rhs;
    for (Unsigned i{} ; i < std::numeric_limits<Unsigned>::digits ; ++i)
    {
        Unsigned res_i = lhs_p & rhs_p;
        res ^= res_i << i;

        bool tmp = lhs_p;
        bool lhs_i = (lhs.value >> i) & 1u;
        bool rhs_i = (rhs.value >> i) & 1u;
        lhs_p = (tmp & not rhs_p) ^ lhs_i;
        rhs_p = (rhs_p & not tmp) ^ rhs_i;
    }
    return res;
}

This version of the addition algorithm is 20% faster than the previous on my computer but is still many times slower than the double conversion solution (all optimizations turned on). Is there any way to speed up the algorithm or is there any Gray code-specific algorithm that does not require a double conversion to and from regular integers and that may perform faster?

In case you want it, here is the implementation of is_odd, even though it does not add that much overhead to the addition since it is only performed once. The parity of a Gray code corresponds to the parity of the number of bits set. The function uses compiler intrinsics when possible and a \$ O(\log n) \$ algorithm to count bits otherwise.

template<typename Unsigned>
auto is_odd(gray_code<Unsigned> code)
    -> bool
{
    #if defined(__GNUC__) || defined(__clang__)

        return bool(__builtin_parity(code.value));

    #else

        unsigned nb_bits{};
        for (auto val = code.value ; val ; ++nb_bits)
        {
            // clear the least significant bit set
            val &= val - 1;
        }
        return bool(nb_bits % 2);

    #endif
}

c++performance algorithm bitwise c++14

asked Nov 6, 2014 at 23:31

Morwenn

20.2k
3
69
132