The Application of Intelligent Algorithms in Reversi Game Program
24 Oct 2016
Reading time ~11 minutes
Master - 1st year - Project
|Bowen LIU||Mr. Emmanuel FLEURY|
This reversi project is proposed by Mr. Emmanuel FLEURY in University of Bordeaux. The complete program is able to take a move and play according to the reversi game rules.
Reversi is a strategy board game for two players, played on an 2x2 / 4x4 / 6x6 / 8x8 unchecked board. There are sixty-four identical game pieces called disks (often spelled “discs”), which are light on one side and dark on the other. Players take turns placing disks on the board with their assigned color facing up. During a play, any disks to the opponent’s color that is in a straight line and bounded by the disk just placed and another disk to the current player’s color is turned over to the current player’s color. And the game player could be a human or a computer which implement with intelligent algorithms. The object of the game is to have the majority of disks turned to display your color when the last playable empty square is filled.
- Tree Traversal
- References and Tools
List of Algorithms
List of Figures
List of Tables
The constitution of the reversi game using a specific optimal data-structure – bitboard. And the part of AI player used intelligence algorithm to choose a best move and apply it.
The experience of playing reversi tells us that when the reversi process in a certain step there are several moves to choose from, players should make decisions, choose the most beneficial to their own way, then how to determine whether it is beneficial? Approach is to find an evaluation function, this function gives a valuation of each situation, the higher the valuation of the more favorable to their own. The evaluation of the state S noted as V(S), and if the evaluation function is good enough to fully reflect the current state of its own advantages and disadvantages, then it is clear that the state S, the best action should be taken:
However, such a function must have a very complex form, it is not possible to parse it. In order to improve the program’s strength, but also make up for this deficiency, there are generally two solutions:
- Multi-step search, in the termination of the search process and then call the evaluation function;
- Use the machine learning method to learn a better evaluation function.
A bitboard is a data structure commonly used in computer systems that play board games.
A bitboard, often used for boardgames such as chess, checkers, othello and word games, is a specialization of the bit array data structure, where each bit represents a game position or state, designed for optimization of speed and/or memory or disk use in mass calculations. Bits in the same bitboard relate to each other in the rules of the game, often forming a game position when taken together. Other bitboards are commonly used as masks to transform or answer queries about positions. The “game” may be any game-like system where information is tightly packed in a structured form with “rules” affecting how the individual units or pieces relate.
2.1 The Board of Sets
To represent the board we typically need one bitboard for each color - likely encapsulated inside a class or structure, or as an array of bitboards as part of a position object. An one-bit inside a bitboard implies the existence of a piece of black or white stone on a certain square - one to one associated by the bit-position.
2.2 Basic Bitboard
Of course bitboards are not only about the existence of pieces - it is a general purpose, set-wise data-structure fitting in one 64-bit register. For example, a bitboard can represent things like attack- and defend sets, move-target sets and so on.
2.3 Special Bitboard
Except the general bitboard like checkerboard and moves. There is also several special bitboard which is used to compute the weight of AI’s move, for example, the bitboard for corner.
The weight of the game board is the valor in the table below:
3 Tree Traversal
Reversi is a typical two-agent, complete information, zero-sum game process, each player is fully familiar with the environment and their own and each other all possible ways to move and impact. Game player’s strategy search process can be expressed as a tree structure, called the game tree, the tree node on the board for the state may occur, the parent node through a step-by-step action derived from all successor nodes for its sub-nodes. In order to select the best course of action from a variety of alternative courses of action, we need to analyze the current situation and what is to happen, and select the best walking step by a search algorithm. In the game problem, each pattern has a lot of action options to choose from, so it will generate a very large game tree, trying to get through the search until the end of the best move is not possible, it can only move forward Search for a limited number of steps, the most basic search strategy known as the MinMax Search.
3.1 Evaluation Function
A simplified version, based on the reversi grid table and the mobility of the valuation. reversi table is in fact the weight of the table on different positions of the table, one side of the action is the location of the total number of possible. If the state of the board is denoted by S (state can be defined as the presence and absence of the pawns and colors of each grid), then based on the above model we have:
color[i, j] represents the color of the i-th row and j-th column on the board, which can be either my color, opp color or blank. w[i, j] is the weight at (i, j), is the empirical data. Mobility(my color) is the size of one’s own mobility.
3.2 MinMax Search
We assume that the two players involved in the game are Black and White. From Black’s point of view, each step of Black’s action always strives for its maximum benefit, called the Max process. The nodes on the corresponding game tree are called Max nodes. As the white is not known by black. So we can only think that each step of White always make the minimum of his income, called the Min process, the node corresponding to the game tree is called the Min node; So reversi is actually Max and Min alternating process (of course, exception when one has no available move). Consider the previous one-step search model, we assume that it’s black player’s turn at this time, black finds that the node with the highest evaluated value is , then the maximum gain is , and the corresponding action is the optimal solution that can be found by single step search.
But if Black searches two steps forward, he may find that the minimum value of derived from is smaller than the minimum value in the child node derived from some other node, let be the largest node in the minimum of the child nodes derived from the peer node, then he should choose the action corresponding to as his best action, since the worst case for choosing is not Less than , and this is the best case. If the search continues to deepen, then the choice will be more “wise”.
A practical example is shown in Figure 2.
We could easily get:
The nodes of current player correspond to the Max node, the nodes of anti player correspond to the Min node;
- The leaf node invokes the evaluation function and propagates its value upwards;
- a. The Max node is assigned the maximum of all its children;
- b. The Min node is assigned the minimum of all its children;
- The root node selects the action corresponding to the node with the highest value as the best action in its child node. The implementation of the algorithm is similar to the depth-first traversal of the tree. The Max (Min) node takes the largest (smallest) value of the successor node as its estimate and returns it to the parent node. Pseudo-code see the algorithm 1, 2, 3.
The Evaluate function in the code is V (S) as defined earlier.
3.3 NegaMax Search
The Minmax Search check all the possible moves, so the excution is too slow to get the best move in a limited time (5s).
We may have a better way to achieve, NegaMax, which is shown in Figure 3.
In the code of reversi, the NegaMax method is not the nest method. The code of NegaMax is in the file named “negaMax.c”. It’s possible that it runs better with the method Alpha-Beta which is base on the method similar as NegaMax.
3.4 AlphaBeta Search
Based on NegaMax Search, we could now use the method Alpha-Beta Search.
In Figure 4, A is the Max node, which will be assigned the maximum value in B, C, D. To know the value of A, we first need to know the value of B, for which we search for the child of B and assign B to 6. Because B is the Min node. Then start to determine the value of C, we first search E, the value of the two sub-nodes are 4 and -2, and E is a Max node, so the value of E is 4. Because C is the Min node, we know that the value of C is definitely less than or equal to 4, so C will not be selected by A because A is the Max node, and now a sub-node B has a value greater than 4. So there is no need to search for F and G. That is, pruning occurs.
When the algorithm is implemented, two values are introduced for each node: and , which denote the lower and upper bounds of the node’s estimate, respectively, which form an interval called - window. The length of the window is defined as - - 1. At the beginning, = -∞, = ∞ of the root node, and and of other nodes are inherited from the parent node during depth-first search. For the above example, after searching for B, we know that the final estimate of A will not be less than the value of B (since A is the Max node), which is the value of A, = 6. (C is the Min node), then the value of C is 4, note that the of C is also 6 (which is inherited from A), and the value of C is 0. From the previous discussion, C pruning occurred at this time, in fact, when ≤ , pruning occurs.
We see that both and are updated during the depth-first search. For the Max node, if the return value of the child node is greater than , is updated to the return value. For the Min node, the process is just the opposite, and is updated with the return value of the smaller child node. The value of Max node will not decrease and the value of Min node will not increase. Pruning occurs when ≤ of a node. Max node pruning pruning occurs, called Min Pruning Pruning. According to the return value of sub-node and , , can be divided into three sub-nodes (Max process as an example, S represents the corresponding Max node, represents a child node):
- Poor node: , will certainly not be selected, you need to continue to search for the next sub-node;
- Good node: , pruning; occurred;
- Main node: , pruning can not occur, but may be selected, also need to continue the search.
4.1 Primary Work
In this paper, we introduce the application of artificial intelligence in the game of reversi, and propose a basic scheme to solve this problem, namely the game tree search. In the introduction of the game tree search, that is a very small maximum search. Firstly, the calculation model and the commonly used algorithm are introduced. Combining with the practice of Othello, it is proved that the method is the basis of strategy choice and the feasibility of this method is proved.
4.2 Problem during Coding
- The realization of reversi rules;
- The conversion from pointer to bitboard;
- The duration of computing for choosing a move;
- The usage of mask.
The program is not so optimal for now, and it’s could only win with chance of 50%.
So next, the program will be improved and a better algorithme will be implemented inside.
5 References and Tools
NO CODE GIVEN.