Bitarray representation of Connect Four for Zobrist hashing

Question

This began primarily as a though experiment on how best to do this, but I've gone too deep to just forget about it.

The question I began with was: How is it best to save a game tree of Connect Four for alpha-beta minimax?

Rather than just implement an off-the-shelf Zobrist hash, I settled on something (size-wise) smaller.

Devised method

A Connect Four game can be boiled down to just a board. Rules can be applied to the game state on the board, but the board can exist separate from the rules. (State values, such as win/lose and favors, could be stored with the board, but they are not necessary. The case could be made to expand the board later.) The Connect Four board is a size of 7 columns by 6 rows, for 42 cells. Each cell can have 3 states: empty, yellow, or red. As such, each cell requires 2 bits. (Yes, I know that isn't true - I'm wasting 1/4 the space - but I will address that later.) Since the board is 42 cells, each board will require a bitarray of 84 bits.

Note: This though experiment, at least for me, is language-independent. Each language will have its own overhead for the implementation of a bitarray, but exact schemantics do not matter as much as general mechanics.

This allows a one-to-one conversion between the bitarray and the board state (if that is the term to use here). Every two-bit set can automatically be converted to its cells contents, and adding a piece (changing a cell from "empty" to a specific "player-color") just requires changing those two bits.

Unused space

To talk about the wasted space, the 1/4 space out of every 2 bits. Repeated over 84 bits, that constitutes a large value that is never utilized. So can this be made more efficient?

Shrinking the bitarray

My first thought was to not make this more efficient, but rather make the entire bitarray more space efficient. A Connect Four board consists of six rows. Player pieces may stack up to reach the higher rows, but not all rows will be utilized. For the first move of the game, while the played column may change, it will always be on the bottom row. The top five rows must be empty for a valid board configuration. The second move can have the game extend to the second-to-bottom row, or remain in the bottom, but the upper rows will remain empty. All games will follow this cycle.

This brought me to variable-width encoding. In a similar vein, boards could be - not ranked by most common, there are likely too many possibilities, even if common human strategies are factored in - but pared down to the most permutations. All games will use the bottom row, then the one on top of it, etc., until the top row. So why not just encode what was necessary?

Instead of the normal method for saving the Connect Four board - a two-dimensional array ([6][7]) or a flattened one-dimentional with the total length - the board can be broken down to its individual rows. For the first move (and potentially for the next six), only the bottom row needs be stored, resulting in an array of 7 cells, or a bitarray of length 14. The board is thus little endian-esque. (Mentally, I've arranged it that it starts from the left, completes the row, then can move to the row above.) The rows on top can be added as needed, maintaining a bittarray with a length mutiple of 14. To reconstruct a full board, the board would begin populating from the bottom row, and continue as far as values permit.

Ternary calculations

But can it be efficiently commpressed even more? Returning to the 1/4 problem, we can attempt to remove the wasted space.

In the current representation of board, each cell has three possibilities: empty, red, or yellow. (No further data has been stored (at least as of now), such as player turn, game favor, etc.) This would favor a ternary format over our current binary format. However, modern computers operate using binary, so its only use is the math.

The likelies groupings are: single cell (3^1 = 3), two-cell pairing (3^2 = 9), single-row grouping (3^7 = 2,187), double-row grouping (3^14 = 4,472,969), and entire board (3^42 = 1.09e+20). (Yes, I know that both binary and ternary representations sllow for impossible board configuations, but I know of no easy algorithm to generate all valid boards and assign them keys.) Those numbers each need to be converted to binary. They will be listed in the format of (number of bits when two used for each cell : bits required using ternary-generated number). They are: single (2:2), two-cell (4:4), single-row (14:12), double-row (28:23), and entire (84:68). As can be seen from this, a single bit can be saved for each three cells when grouped together (technically slightly more, but requires larger groups). (A set calculation can be made to unpack groupings into their cell content combinations.)

However, this loses the ease of conversion between board and array that the original method allows. To make practical use of this method requires groupings, and to have any real effect would require at least the double-row grouping, but that will likely mainly cause loss, as it may generate an unused row and requires calculations between board and hash. Additionally, it prevents any calculations from dealing with the hash array as an expanded board.

Other compression method

During the process of writing this post, I discovered another method of compression. It will also use little endian-esque styles, but instead of necessarily using row-based hashing with 2 bits per cell, it will only encode what is necessary to save any existing pieces on the board. In the 2 bit style, the four possibilities for each cell can be explained thus: 00 empty cell, 01 unused state, 10 yellow, and 11 red. (I deliberately moved the "unused state" designation to 01 to help explain the transition to the new method.) In the new method, it loses one-to-one conversion (it will no longer be possible to understand a cell and contents when jumping to any position), and requires all bits in the array to be read in a left-to-right manner. This way, a 0 denotes a blank cell, while a 1 changes the next bit to mean yellow (0) or red (1). This way, worst case it requires 84 bits similar to the 2 bit style. If a single piece is placed in the next-to-left bottom cell, the bitarray will contain 010.

However, as this process loses the ability for one-to-one quick conversions, I would think it is less useful because of its inability for a calculation to treat it as the board itself.

Conclusion

I have yet to completely explain how the bitarray is to be used for Zobrist hashing. The board bitarray will be used as the key for a Hashmap-style structure. Its values will be a list of the bitarrays of the potential boards that can result from the next move. As such, a large tree of gamestates can be saved in a small space.

All of this hinges on the ability to use a bitarray as the key for a HashMap-style structure (or dict, etc.). This will require a wrapper class that also allows for equality testing between bitarrays. Depending on implementation, the wrapper class may not be a true wrapper, as it can also contain flags for "game completed", "player turn", etc.

Would this be a practical implementation for a hash? Do you see any problems (in a language-independent context), or any ways to improve it?

Nagy Fathy · Accepted Answer · 2024-11-01 08:31:23Z

0

Have you implemented this? I think Zobrist hashing using 128 bit hash key with random representation would be much faster. I don't figure out any difficulties to implement this in C++, as it has full set of bitwise operators. In addition, if your main goal is to reduce the size of the key, why don't you use Huffman coding (as a variable length code) for the different rows. I did not calculate it, but obviously, the key will be less than 64 bit, which will improve the performance.

answered Nov 1, 2024 at 8:31

Nagy Fathy

212 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Bitarray representation of Connect Four for Zobrist hashing

Devised method

Unused space

Shrinking the bitarray

Ternary calculations

Other compression method

Conclusion

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

Devised method

Unused space

Shrinking the bitarray

Ternary calculations

Other compression method

Conclusion

1 Answer 1

Comments

Related