Skip to main content
8 of 8
fix verb
luser droog
  • 2.1k
  • 15
  • 32

It can be done simply with 8 ANDs, 8 SHIFTs, and 6 ORs.

A naive counting of the delta-swap's operations is 1 AND, 2 SHIFTs and 3 XORs which if called 4 times yields 4 ANDs, 8 SHIFTs, and 12 XORs for the whole rotation. So the optimum may well depend upon the particular instructions' speeds on a particular hardware implementation. More ANDs and less (X)ORs or vice versa?

uint64_t
rotateXY_CW( uint64_t b ){
    uint64_t c =
            (b & 0x0033003300330033) << 2 |
            (b & 0x00cc00cc00cc00cc) << 8 |
            (b & 0xcc00cc00cc00cc00) >> 2 |
            (b & 0x3300330033003300) >> 8
            ;
    uint64_t d =
            (c & 0x0505050505050505) << 1 |
            (c & 0x0a0a0a0a0a0a0a0a) << 4 |
            (c & 0xa0a0a0a0a0a0a0a0) >> 1 |
            (c & 0x5050505050505050) >> 4
            ;
    return  d;
}

There is literature on this subject but it may not be immediately obvious that the results are applicable here. A bit-cube can be considered as a stack of bit-planes. And rotation of a bitmap is a topic that underwent much study at Xerox PARC when they invented WIMP and GUI. An algorithm is presented in the [Byte Magazine 1981 SmallTalk Issue] (https://archive.org/details/byte-magazine-1981-08) on page 188.

As the image at the bottom of that page illustrates, the steps of the algorithm are to:

  • rotate the quadrants
  • rotate the quadrants of the quadrants
  • rotate the quadrants of the quadrants of the quadrants
  • ..... until the quadrants just moved were pixel-sized, then we're done.

More details

So, assume for simplicity that we have a 4x4 bitmap to rotate, mapped into 16 bits.

With 4 bits in a row, we only need 2 steps of taking quadrants before we're down to pixels.

uint16_t x = 0x1234;
x = (x & 0x0033) << 2
  | (x & 0x00cc) << 8
  | (x & 0xcc00) >> 2
  | (x & 0x3300) >> 8;
x = (x & 0x0505) << 1
  | (x & 0x0a0a) << 4
  | (x & 0xa0a0) >> 1
  | (x & 0x5050) >> 4;

Notice that in the second step, operations parallelize over all sub-quadrants by selecting more pieces with the mask.

   0 1  2 3
                   >>         >    >
0  0 0  1 0     0 1  0 0     1 0V 1 0V   1 0  1 0
1  1 1  0 0     1 0  1 1    ^0<1 ^1<0    0 1  1 0
              ^^        VV    
2  0 1  0 0     0 0  1 0     0>0V 0>1V   0 0  0 1
3  1 0  0 0     0 0  0 0    ^0 0 ^0 0    0 0  0 0
                   <<         <    <

Masks for each quadrant:

1100          0011          0000          0000
1100          0011          0000          0000
0000          0000          0011          1100
0000 = 0033   0000 = 00cc   0011 = cc00   1100 = 3300

Shifting each mask yields the next mask

0033 << 2 == 00cc
00cc << 8 == cc00
cc00 >> 2 == 3300
3300 >> 8 == 0033

Masks for each sub-quadrant

1010          0101          0000          0000
0000          0000          0101          1010
1010          0101          0000          0000
0000 = 0505   0000 = 0a0a   0101 = a0a0   1010 = 5050

Now watch what happens if we extend this to a stack of 2 bit-planes. 2 4x4 bitmaps in 32 bits.

uint32_t x = 0x00001234;
x = (x & 0x00330033) << 2
  | (x & 0x00cc00cc) << 8
  | (x & 0xcc00cc00) >> 2
  | (x & 0x33003300) >> 8;
x = (x & 0x05050505) << 1
  | (x & 0x0a0a0a0a) << 4
  | (x & 0xa0a0a0a0) >> 1
  | (x & 0x50505050) >> 4;

The number of operations doesn't increase! only the lengths of the masks increase.

And by replicating the mask from the 4x4 code 4 times it applies to a 4x4x4 cube.

Reversing the direction

Rotating in the other direction can be done by changing the directions of all the shifts.

uint64_t
rotateXY_CCW( uint64_t b ){
    uint64_t c =
            (b & 0x0033003300330033) << 8 |
            (b & 0x3300330033003300) << 2 |
            (b & 0xcc00cc00cc00cc00) >> 8 |
            (b & 0x00cc00cc00cc00cc) >> 2 |
            ;
    uint64_t d =
            (c & 0x0505050505050505) << 4 |
            (c & 0x5050505050505050) << 1 |
            (c & 0xa0a0a0a0a0a0a0a0) >> 4 |
            (c & 0x0a0a0a0a0a0a0a0a) >> 1
            ;
    return  d;
}

Rotating on a different axis

uint64_t
rotateXZ_CW( uint64_t b ){
    uint64_t c =
            (b & 0x0000000033333333) <<  2 |
            (b & 0x00000000cccccccc) << 32 |
            (b & 0xcccccccc00000000) >>  2 |
            (b & 0x3333333300000000) >> 32
            ;
    uint64_t d =
            (b & 0x0000555500005555) <<  1 |
            (b & 0x0000aaaa0000aaaa) << 16 |
            (b & 0xaaaa0000aaaa0000) >>  1 |
            (b & 0x5555000055550000) >> 16
            ;
    return  d;
}

uint64_t
rotateXZ_CCW( uint64_t b ){
    uint64_t c =
            (b & 0x0000000033333333) << 32 |
            (b & 0x3333333300000000) <<  2 |
            (b & 0xcccccccc00000000) >> 32 |
            (b & 0x00000000cccccccc) >>  2
            ;
    uint64_t d =
            (b & 0x0000555500005555) << 16 |
            (b & 0x5555000055550000) <<  1 |
            (b & 0xaaaa0000aaaa0000) >> 16 |
            (b & 0x0000aaaa0000aaaa) >>  1
            ;
    return  d;
}


quadrant masks
    1100 1100 0000 0000   
    1100 1100 0000 0000
    1100 1100 0000 0000
    1100 1100 0000 0000

    0011 0011 0000 0000
    ...

    0000 0000 0011 0011
    ...

    0000 0000 1100 1100
    ...

subquadrant masks
    1010 0000 1010 0000
    0101 0000 0101 0000
    0000 0101 0000 0101
    0000 1010 0000 1010

Rotating on the other, other axis

This one was harder for me to wrap my head around, but my ascii art of the masks seems to make sense.

uint64_t
rotateYZ_CW( uint64_t b ){
    uint64_t c =
            (b & 0x0000000000ff00ff) << 32 |
            (b & 0x00ff00ff00000000) <<  8 |
            (b & 0xff00ff0000000000) >> 32 |
            (b & 0x00000000ff00ff00) >>  8
            ;
    uint64_t d =
            (c & 0x00000f0f00000f0f) << 16 |
            (c & 0x0f0f00000f0f0000) <<  4 |
            (c & 0xf0f00000f0f00000) >> 16 |
            (c & 0x0000f0f00000f0f0) >>  4
            ;
    return  d;
}

uint64_t
rotateYZ_CCW( uint64_t b ){
    uint64_t c =
            (b & 0x0000000000ff00ff) <<  8 |
            (b & 0x00000000ff00ff00) << 32 |
            (b & 0xff00ff0000000000) >>  8 |
            (b & 0x00ff00ff00000000) >> 32
            ;
    uint64_t d =
            (c & 0x00000f0f00000f0f) <<  4 |
            (c & 0x0000f0f00000f0f0) << 16 |
            (c & 0xf0f00000f0f00000) >>  4 |
            (c & 0x0f0f00000f0f0000) >> 16
            ;
    return  d;
}

YZ quadrant masks
    1111 1111 0000 0000
    1111 1111 0000 0000
    0000 0000 0000 0000
    0000 0000 0000 0000

    0000 0000 1111 1111
    0000 0000 1111 1111
    0000 0000 0000 0000
    0000 0000 0000 0000

    0000 0000 0000 0000
    0000 0000 0000 0000
    0000 0000 1111 1111
    0000 0000 1111 1111

    0000 0000 0000 0000
    0000 0000 0000 0000
    1111 1111 0000 0000
    1111 1111 0000 0000

subquadrant masks
    1111 0000 1111 0000
    0000 0000 0000 0000
    1111 0000 1111 0000
    0000 0000 0000 0000

    0000 1111 0000 1111
    0000 0000 0000 0000
    0000 1111 0000 1111
    0000 0000 0000 0000

    0000 0000 0000 0000
    0000 1111 0000 1111
    0000 0000 0000 0000
    0000 1111 0000 1111

    0000 0000 0000 0000
    1111 0000 1111 0000
    0000 0000 0000 0000
    1111 0000 1111 0000
luser droog
  • 2.1k
  • 15
  • 32