Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
9° CONTECSI - International Conference on Information Systems and Technology Management Cover / Capa Authors / Autores Topics / Áreas Papers / Trabalhos Committees / Comitês Sponsors / Patrocinadores Conference Overview / Panorama do Congresso PS-67 SELECTION OF EFFICIENT EVALUATION FUNCTION FOR OTHELLO Yousef Al-Ohali (King Saud University, Riyadh, Saudi Arabia) - [email protected] Lolowah R. Alssum (King Saud University, Riyadh, Saudi Arabia)- [email protected], Basit Shahzad (King Saud University, Riyadh, Saudi Arabia)- [email protected] Computer games have made themselves important increasingly with time. The intelligence level of these games has improved and humans have been defeated several times by the game engines like ‘Chess’, ‘Go’, ‘Othello’ and ‘Checkers’. ‘Time per move’ and ‘number of win/lose’ are two different yardsticks and are used to measure the effectiveness of the evaluation functions used to implement the game. This paper is concentrated on finding the most efficient evaluation function for Othello. Extensive experimentation (approaching to 144,000 experiments collectively) has been done to measure the effectiveness of each evaluation function. The online engine is capable of playing the game up-till ten levels but in this paper we are covering the comparisons up to level 4. By making the comparisons based on the time, complexity, wins and draws it is easy to vindicate or reconsider the choice of evaluation function. After thorough experimentation it is proved that ‘Standard Weighted Piece Counter (S-WPC)’ is the best strategy among available with respect to its efficiency. Keywords: Othello evaluation functions, Othello experiments, Othello engine intelligence 002822 TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br 9° CONTECSI - International Conference on Information Systems and Technology Management Cover / Capa Authors / Autores Topics / Áreas Papers / Trabalhos Committees / Comitês Sponsors / Patrocinadores Conference Overview / Panorama do Congresso 1. Introduction The game of Othello is perceived to have been divided in three different components, namely the beginning, middle and end [1]. Although it is hard to develop consensus in concrete terms about the beginning, middle and ending parts of the game yet a through guideline can be established accordingly. It is common to say that almost first 15 moves are considered as the beginning of the game. Brian Rose thinks that, the beginning game is over when one of the edge squares is taken [2]. The beginning and end game can be handled easily, the first by using an open book, which is the moves that are prepared and memorized before the game begins (collection of openings), while the second by maximizing the player's pieces and minimizing the opponent's pieces. The middle part of the game is more complex as compared to the beginning phase and ending phase of the game. There are two main strategies that can be used in the middle game. First is the positional strategy, in this strategy the position of the pieces in the board is very important, for example the corners and the edges of the board are more valuable than other board positions, especially the corners because once taken, the opponent can never recapture [1]. In this strategy the player can uses two evaluation functions; the first is used during the start and the middle of the game to maximize the player's valuable position while minimizing opponent's valuable position. The second strategy is the mobility strategy, the goal of this strategy is to maximize the number of moves that can be chosen from and minimizes the opponent's moves, therefore this strategy can force an opponent to make a bad move by limiting its chance to make a move and the opponent is somehow forced to make bad move as the choice for adapting a move is limited. This strategy is the same as positional strategy and uses two evaluation function but the different is in the first evaluation function. The first evaluation function is used during the begin and the middle of the game to maximize the player's mobility and the number of corner squares that are occupied by player's pieces while minimizing the opponent's mobility and the number of corner squares that are occupied by opponent's pieces. 2. Implementation Strategies The implementation of the game can be done by using one of the many strategies available now days. The available technique includes but not limited to the following: a. b. c. d. Artificial Neural Networks are powerful computational systems consisting of many simple processing units "neurons, nodes" connected together (by weighted connections) according to a specific network architecture to perform a specific task. Each unit receives inputs, processes inputs (by applying Activation Function) to form an output. ANNs can learn and generalize from training data by adjusting the weights of the connections between units according to some learning rule [4]. Weighted Piece Counter (WPC) is a linear evaluation function which gives each board position a weight. The linear evaluation function is computed by summing the product of the inputs value and the weights [5]. Multilayer Perception Networks (MLP) is a feed-forward neural network, which means that the data flow in one direction from input layer to output layer. It could contain more than one hidden layer between input and output layer. Each layer in the network contains a set of units which receive the inputs from units in the previous layer and send the outputs to units in the next layer. There are no connections between units in the same layer. Also, each layer is fully connected to the next layer [6]. Another class of MLP, called Spatially Aware Multilayer Perception Networks is a multilayer perception network but the difference is in the first hidden layer. The input layer is not fully connected to first hidden layer. In SMLP [7] the spatial processing layer consists of 91 units, one for each sub-board of the Othello board (from 8×8 to 3×3 sub-boards). The spatial processing layer consists of 65 units [8], one for each Qposition patterns (64 Queen position patterns and one 8×8 board). 002823 TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br 9° CONTECSI - International Conference on Information Systems and Technology Management Cover / Capa Authors / Autores Topics / Áreas Papers / Trabalhos Committees / Comitês Sponsors / Patrocinadores Conference Overview / Panorama do Congresso e. f. g. h. i. 3. In N-Tuple Networks each n-tuple has n inputs taken from specific locations in the Othello board. Each input has 8 or 4 equivalent squares under reflection and rotation (symmetric sampling). Each n-tuple is associated with a lookup table (LUT) that contains weights. The value of each input and its equivalents is used to calculate the index into the table. The value function is computed by summing all the LUT weights indexed by all n-tuples [9]. The Temporal Difference Learning (TDL) is a learning method that is used to solve the prediction problems by estimating the next action of the system using the past experiences. Learning happens when there is a change in the system's state and it is based on the error between temporally successive predictions. The goal of learning is to make the previous prediction more closer to the current prediction [10]. Monte Carlo Methods is a learning method that learns directly from online or simulated experience. The basic idea is to run the program (game) from each possible move in the current board position randomly to the end of the game a fixed number of times. Then the average result is returned for each possible move and the move that has the higher value is selected [10] [11]. Evolutionary Algorithms (EAs) are optimization techniques inspired by biology. The basic idea is that the individuals of the population (candidate solutions) that are more appropriate (fit) than other will survive (parent) and reproduce (by recombination and mutation) to generate a new individuals (offspring). The offspring will evaluate their fitness and compete for survival. There are many types of evolutionary algorithm, including Genetic Algorithm (GA), Genetic Programming (GP), Evolutionary Strategies (ES) and Evolutionary Programming (EP) [12]. Co-Evolutionary Algorithms is an evolutionary algorithm (EA) in which evaluation is based on interactions between individuals. EAs are used when the fitness value can be specified, but in some cases, such as game, this value is unspecified or unknown. This problem can be solved using Co-Evolutionary Algorithm which determines fitness through interaction with other individuals (compete with each other) [12]. Experimentation The scientific comparative study results in the establishment of effective selection of evaluation function to guide about the best possible move available. This section presents a comparative investigation of the selected evaluation functions. This study focuses on establishing the pros and cons of each evaluation technique. In order to make the comparison a user friendly customized online engine has been developed which can be accessed at http://gameee.ksu.edu.sa. The evaluation functions are evaluated on the basis of the time per game and time per move in each evaluation function at different levels to conduct this experimentation. The evaluation environment was tested on a system that uses Windows Vista Service Pack 2 operating system. The computer specification included the: Processor: Intel Core 2 Due CPU P8400 speeds of 2.26 GHz, Memory (RAM): 4 GB, Hard Disk: 320 GB and System type: 32- bit Operating System. 3.1 Experimental Design In this evaluation, we used the criteria of time consumption to examine the six evaluation functions at four different levels of search ( depth of game tree). Criteria of space consumption has not been used in these experiments as it will yield variable results depending on the platform used for experimentation and there would be no significant difference in the results because of the type of search algorithm used in this research. In order to get more realistic results 144,000 experiments were run. The extensive experimentation helped ensuring that the results are realistic and uninfluenced as the game is random in 002824 TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br 9° CONTECSI - International Conference on Information Systems and Technology Management Cover / Capa Authors / Autores Topics / Áreas Papers / Trabalhos Committees / Comitês Sponsors / Patrocinadores Conference Overview / Panorama do Congresso nature. When there is more than one legal move that has the same evaluation value, the choice of the best move would be random. At the end of each game we have identified the following information: a. b. The time consumed (in millisecond) by each function to finish the game, The time consumed (in millisecond) by each function to do one move, and 4. Results This section presents the results of experiments that we have conducted on the evaluation environment and explain the variables used in these results. The experiment covers the time taken to complete the moves and game ultimately to measure the efficiency of the evaluation function. In order to achieve realistic results 144,000 experiments have been rum collectively, which includes 6000 experiments for each evaluation function on each level. The evaluation functions are to be evaluated based on the efficiency they observe. For this measure to be considered, 144,000 experiments were conducted. The results of these experiments are given in table 1: Function Level # Level 1 Level 2 Level 3 Level 4 Name Standard WPC CTDL Ensemble Mobility MLPNN Lolo Engine Standard WPC CTDL Ensemble Mobility MLPNN Lolo Engine Standard WPC CTDL Ensemble Mobility MLPNN Lolo Engine Standard WPC CTDL Ensemble Mobility MLPNN Lolo Engine Average Time per Game (ATG) 53.935 56.72 58.465 64.555 64.515 64.16 91.51 105.135 98.32 121.145 119.78 122.65 264.815 405.825 302.97 498.86 454.75 467.555 1170.985 1617.135 1092.325 2127.375 1626.765 1916.81 Average Time per Move (ATM) 1.87 1.945 1.965 2.125 2.145 2.14 3.095 3.525 3.285 4.015 4.01 4.08 9 13.6 10.215 16.55 15.11 15.545 39.735 54.245 36.585 70.81 54.285 63.505 Table 1: ATG and ATM for evaluation functions at all levels 002825 TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br 9° CONTECSI - International Conference on Information Systems and Technology Management Cover / Capa Authors / Autores Topics / Áreas Papers / Trabalhos Committees / Comitês Sponsors / Patrocinadores Conference Overview / Panorama do Congresso Mobility 702.9838 23.375 MLPNN 566.4525 18.8875 Lolo Engine 642.7938 21.3175 Table 2: Cumulative points for evaluation function at all levels. Lolo Engine 13.0125 MLPNN 388.02 Ensemble ATG Mobility CTDL Ensemble 395.3113 13.425 546.2038 18.32875 Standard WPC 800 600 400 200 0 CTDL ATM ATG Standard… Function name Título do Eixo Table 2 describes the cumulative ATG and ATM, which is helpful in determining the overall time consumed by the evaluation functions across all four levels. ATG Figure 1: Cumulative Time per Game for evaluation function at all levels. 25 23,375 20 18,8875 18,32875 15 13,425 13,0125 10 5 0 Standard WPC CTDL Ensemble Mobility MLPNN Figure 2: Cumulative Time per Move for evaluation function at all levels. Table 1 and 2 describe the time required to complete a game and the moves respectively. It can be observed that “Ensemble” evaluation function requires shortest time to complete the games and the moves to complete the game. Standard WPC is a closer match in this regard. Conclusion The paper has discussed the parameters for measuring the effectiveness of the evaluation function used to implement the Othello environment. Six specific evaluation functions were considered for this purpose and were evaluated for the time efficiency. Ensemble evaluation function has been found most efficient as compared to other evaluation functions under consideration. The consideration about the evaluation function may be slightly different if the 002826 TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br 9° CONTECSI - International Conference on Information Systems and Technology Management Cover / Capa Authors / Autores Topics / Áreas Papers / Trabalhos Committees / Comitês Sponsors / Patrocinadores Conference Overview / Panorama do Congresso decision has to be made on the basis of win/lose of the game. The efficient solution requires the usage of ‘Ensemble’ evaluation function. References [1] D. Moriarty and R. Miikkulainen, "Evolving Complex Othello Strategies Using Marker-Based Genetic Encoding Of Neural Networks", 1993. [2] B. Rose Othello: A Minute to Learn... A Lifetime to Master, http://othellogateway.com/rose/book.pdf. [3] N. Eck and M. Wezel, "Reinforcement Learning and its Application to Othello", 2004. [4] K. Gurney, An Introduction to Neural Networks, Routledge, 1997. [5] E. Manning, "Temporal Difference Learning of an Othello Evaluation Function for Small Neural Network with Shared Weights", IEEE Symposium On Computational Intelligence and Games, 2007. [6] D. Michie, D. Spiegelhalter and C. Taylor, Machine Learning, Neural and Statistical Classification, 1994. [7] S. Chong, M. Tan and J. White, "Observing the Evolution of Neural Networks Learning to Play the Game of Othello", IEEE Transactions on Evolutionary Computation, 9(3), 2005, pp.240-251. [8] S. Lin and J. White, "Spatial Processing Layer Effects on the Evolution of Neural Networks to Play the Game of Othello", IEEE Congress on Evolutionary Computation, 2009, pp. 646-651. [9] E. Manning, "Using Resource-Limited Nash Memory to Improve an Othello Evaluation Function", IEEE Transactions On Computational Intelligence and AI In Games, 2010, vol.2. [10] R.Sutton and A.Barto, Introduction to Reinforcement Learning, MIT Press, 1998. [11] P. Hingston and M. Masek, "Experiments with Monte Carlo Othello", IEEE Congress on Evolutionary Computation, 2007, pp. 4059- 4064. [12] A. Eiben and J. Smith, Introduction to Evolutionary Computing, Springer, 2003. [13] K. Kim and S. Cho, "Ensemble Approach in Evolutionary Game Strategies: A case Study in Othello", C IEEE Symposium On Computational Intelligence and Game, 2008, pp. 212-219. [14] J. Nijssen, "Playing Othello Using Monte Carlo", 2007. [15] http://www.worldothellofederation.com/woc2010/ Dr Yousef Al-Ohali, is privileged to serve as an Associate Professor in Computer Science Department at King Saud University (KSU), Riyadh, Saudi Arabia. KSU is known to be the best university in the region and among the leading universities in the world. Dr Al-Ohali has published a number of research papers in conferences and journals of international repute. His current areas of research include but not limited to Artificial Intelligence, Social Networking and Pattern recognition. 002827 TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br