Download -67 PS SELECTION OF EFFICIENT EVALUATION FUNCTION FOR

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Minimax wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Computer Go wikipedia , lookup

Transcript
9° CONTECSI - International Conference on Information Systems and Technology Management
Cover / Capa
Authors / Autores
Topics / Áreas
Papers / Trabalhos
Committees / Comitês
Sponsors / Patrocinadores
Conference Overview / Panorama do Congresso
PS-67
SELECTION OF EFFICIENT EVALUATION FUNCTION FOR OTHELLO
Yousef Al-Ohali (King Saud University, Riyadh, Saudi Arabia) - [email protected]
Lolowah R. Alssum (King Saud University, Riyadh, Saudi Arabia)- [email protected],
Basit Shahzad (King Saud University, Riyadh, Saudi Arabia)- [email protected]
Computer games have made themselves important increasingly with time. The
intelligence level of these games has improved and humans have been defeated several
times by the game engines like ‘Chess’, ‘Go’, ‘Othello’ and ‘Checkers’. ‘Time per move’
and ‘number of win/lose’ are two different yardsticks and are used to measure the
effectiveness of the evaluation functions used to implement the game. This paper is
concentrated on finding the most efficient evaluation function for Othello. Extensive
experimentation (approaching to 144,000 experiments collectively) has been done to
measure the effectiveness of each evaluation function. The online engine is capable of
playing the game up-till ten levels but in this paper we are covering the comparisons up
to level 4. By making the comparisons based on the time, complexity, wins and draws it
is easy to vindicate or reconsider the choice of evaluation function. After thorough
experimentation it is proved that ‘Standard Weighted Piece Counter (S-WPC)’ is the best
strategy among available with respect to its efficiency.
Keywords: Othello evaluation functions, Othello experiments, Othello engine intelligence
002822
TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br
9° CONTECSI - International Conference on Information Systems and Technology Management
Cover / Capa
Authors / Autores
Topics / Áreas
Papers / Trabalhos
Committees / Comitês
Sponsors / Patrocinadores
Conference Overview / Panorama do Congresso
1.
Introduction
The game of Othello is perceived to have been divided in three different components, namely the beginning, middle
and end [1]. Although it is hard to develop consensus in concrete terms about the beginning, middle and ending parts
of the game yet a through guideline can be established accordingly. It is common to say that almost first 15 moves
are considered as the beginning of the game. Brian Rose thinks that, the beginning game is over when one of the
edge squares is taken [2]. The beginning and end game can be handled easily, the first by using an open book, which
is the moves that are prepared and memorized before the game begins (collection of openings), while the second by
maximizing the player's pieces and minimizing the opponent's pieces. The middle part of the game is more complex
as compared to the beginning phase and ending phase of the game. There are two main strategies that can be used in
the middle game. First is the positional strategy, in this strategy the position of the pieces in the board is very
important, for example the corners and the edges of the board are more valuable than other board positions,
especially the corners because once taken, the opponent can never recapture [1]. In this strategy the player can uses
two evaluation functions; the first is used during the start and the middle of the game to maximize the player's
valuable position while minimizing opponent's valuable position.
The second strategy is the mobility strategy, the goal of this strategy is to maximize the number of moves that can be
chosen from and minimizes the opponent's moves, therefore this strategy can force an opponent to make a bad move
by limiting its chance to make a move and the opponent is somehow forced to make bad move as the choice for
adapting a move is limited. This strategy is the same as positional strategy and uses two evaluation function but the
different is in the first evaluation function. The first evaluation function is used during the begin and the middle of
the game to maximize the player's mobility and the number of corner squares that are occupied by player's pieces
while minimizing the opponent's mobility and the number of corner squares that are occupied by opponent's pieces.
2.
Implementation Strategies
The implementation of the game can be done by using one of the many strategies available now days. The available
technique includes but not limited to the following:
a.
b.
c.
d.
Artificial Neural Networks are powerful computational systems consisting of many simple processing units
"neurons, nodes" connected together (by weighted connections) according to a specific network architecture
to perform a specific task. Each unit receives inputs, processes inputs (by applying Activation Function) to
form an output. ANNs can learn and generalize from training data by adjusting the weights of the
connections between units according to some learning rule [4].
Weighted Piece Counter (WPC) is a linear evaluation function which gives each board position a weight.
The linear evaluation function is computed by summing the product of the inputs value and the weights [5].
Multilayer Perception Networks (MLP) is a feed-forward neural network, which means that the data flow in
one direction from input layer to output layer. It could contain more than one hidden layer between input
and output layer. Each layer in the network contains a set of units which receive the inputs from units in the
previous layer and send the outputs to units in the next layer. There are no connections between units in the
same layer. Also, each layer is fully connected to the next layer [6].
Another class of MLP, called Spatially Aware Multilayer Perception Networks is a multilayer perception
network but the difference is in the first hidden layer. The input layer is not fully connected to first hidden
layer. In SMLP [7] the spatial processing layer consists of 91 units, one for each sub-board of the Othello
board (from 8×8 to 3×3 sub-boards). The spatial processing layer consists of 65 units [8], one for each Qposition patterns (64 Queen position patterns and one 8×8 board).
002823
TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br
9° CONTECSI - International Conference on Information Systems and Technology Management
Cover / Capa
Authors / Autores
Topics / Áreas
Papers / Trabalhos
Committees / Comitês
Sponsors / Patrocinadores
Conference Overview / Panorama do Congresso
e.
f.
g.
h.
i.
3.
In N-Tuple Networks each n-tuple has n inputs taken from specific locations in the Othello board. Each
input has 8 or 4 equivalent squares under reflection and rotation (symmetric sampling). Each n-tuple is
associated with a lookup table (LUT) that contains weights. The value of each input and its equivalents is
used to calculate the index into the table. The value function is computed by summing all the LUT weights
indexed by all n-tuples [9].
The Temporal Difference Learning (TDL) is a learning method that is used to solve the prediction problems
by estimating the next action of the system using the past experiences. Learning happens when there is a
change in the system's state and it is based on the error between temporally successive predictions. The goal
of learning is to make the previous prediction more closer to the current prediction [10].
Monte Carlo Methods is a learning method that learns directly from online or simulated experience. The
basic idea is to run the program (game) from each possible move in the current board position randomly to
the end of the game a fixed number of times. Then the average result is returned for each possible move and
the move that has the higher value is selected [10] [11].
Evolutionary Algorithms (EAs) are optimization techniques inspired by biology. The basic idea is that the
individuals of the population (candidate solutions) that are more appropriate (fit) than other will survive
(parent) and reproduce (by recombination and mutation) to generate a new individuals (offspring). The
offspring will evaluate their fitness and compete for survival. There are many types of evolutionary
algorithm, including Genetic Algorithm (GA), Genetic Programming (GP), Evolutionary Strategies (ES)
and Evolutionary Programming (EP) [12].
Co-Evolutionary Algorithms is an evolutionary algorithm (EA) in which evaluation is based on interactions
between individuals. EAs are used when the fitness value can be specified, but in some cases, such as game,
this value is unspecified or unknown. This problem can be solved using Co-Evolutionary Algorithm which
determines fitness through interaction with other individuals (compete with each other) [12].
Experimentation
The scientific comparative study results in the establishment of effective selection of evaluation function to guide
about the best possible move available. This section presents a comparative investigation of the selected evaluation
functions. This study focuses on establishing the pros and cons of each evaluation technique. In order to make the
comparison a user friendly customized online engine has been developed which can be accessed at
http://gameee.ksu.edu.sa. The evaluation functions are evaluated on the basis of the time per game and time per move
in each evaluation function at different levels to conduct this experimentation.
The evaluation environment was tested on a system that uses Windows Vista Service Pack 2 operating system. The
computer specification included the: Processor: Intel Core 2 Due CPU P8400 speeds of 2.26 GHz, Memory (RAM):
4 GB, Hard Disk: 320 GB and System type: 32- bit Operating System.
3.1 Experimental Design
In this evaluation, we used the criteria of time consumption to examine the six evaluation functions at four different
levels of search ( depth of game tree).
Criteria of space consumption has not been used in these experiments as it will yield variable results depending on
the platform used for experimentation and there would be no significant difference in the results because of the type
of search algorithm used in this research. In order to get more realistic results 144,000 experiments were run. The
extensive experimentation helped ensuring that the results are realistic and uninfluenced as the game is random in
002824
TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br
9° CONTECSI - International Conference on Information Systems and Technology Management
Cover / Capa
Authors / Autores
Topics / Áreas
Papers / Trabalhos
Committees / Comitês
Sponsors / Patrocinadores
Conference Overview / Panorama do Congresso
nature. When there is more than one legal move that has the same evaluation value, the choice of the best move
would be random.
At the end of each game we have identified the following information:
a.
b.
The time consumed (in millisecond) by each function to finish the game,
The time consumed (in millisecond) by each function to do one move, and
4.
Results
This section presents the results of experiments that we have conducted on the evaluation environment and explain
the variables used in these results. The experiment covers the time taken to complete the moves and game ultimately
to measure the efficiency of the evaluation function.
In order to achieve realistic results 144,000 experiments have been rum collectively, which includes 6000
experiments for each evaluation function on each level. The evaluation functions are to be evaluated based on the
efficiency they observe. For this measure to be considered, 144,000 experiments were conducted. The results of these
experiments are
given in table
1:
Function
Level #
Level 1
Level 2
Level 3
Level 4
Name
Standard WPC
CTDL
Ensemble
Mobility
MLPNN
Lolo Engine
Standard WPC
CTDL
Ensemble
Mobility
MLPNN
Lolo Engine
Standard WPC
CTDL
Ensemble
Mobility
MLPNN
Lolo Engine
Standard WPC
CTDL
Ensemble
Mobility
MLPNN
Lolo Engine
Average Time per
Game (ATG)
53.935
56.72
58.465
64.555
64.515
64.16
91.51
105.135
98.32
121.145
119.78
122.65
264.815
405.825
302.97
498.86
454.75
467.555
1170.985
1617.135
1092.325
2127.375
1626.765
1916.81
Average Time per Move
(ATM)
1.87
1.945
1.965
2.125
2.145
2.14
3.095
3.525
3.285
4.015
4.01
4.08
9
13.6
10.215
16.55
15.11
15.545
39.735
54.245
36.585
70.81
54.285
63.505
Table 1: ATG and ATM for evaluation functions at all levels
002825
TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br
9° CONTECSI - International Conference on Information Systems and Technology Management
Cover / Capa
Authors / Autores
Topics / Áreas
Papers / Trabalhos
Committees / Comitês
Sponsors / Patrocinadores
Conference Overview / Panorama do Congresso
Mobility
702.9838
23.375
MLPNN
566.4525
18.8875
Lolo Engine
642.7938
21.3175
Table 2: Cumulative points for evaluation
function at all levels.
Lolo Engine
13.0125
MLPNN
388.02
Ensemble
ATG
Mobility
CTDL
Ensemble
395.3113
13.425
546.2038 18.32875
Standard WPC
800
600
400
200
0
CTDL
ATM
ATG
Standard…
Function name
Título do Eixo
Table 2 describes the cumulative ATG and ATM, which is helpful in determining the overall time consumed by the
evaluation functions across all four levels.
ATG
Figure 1: Cumulative Time per Game for evaluation
function at all levels.
25
23,375
20
18,8875
18,32875
15
13,425
13,0125
10
5
0
Standard WPC
CTDL
Ensemble
Mobility
MLPNN
Figure 2: Cumulative Time per Move for evaluation function at all levels.
Table 1 and 2 describe the time required to complete a game and the moves respectively. It can be observed that
“Ensemble” evaluation function requires shortest time to complete the games and the moves to complete the game.
Standard WPC is a closer match in this regard.
Conclusion
The paper has discussed the parameters for measuring the effectiveness of the evaluation function used to implement
the Othello environment. Six specific evaluation functions were considered for this purpose and were evaluated for
the time efficiency. Ensemble evaluation function has been found most efficient as compared to other evaluation
functions under consideration. The consideration about the evaluation function may be slightly different if the
002826
TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br
9° CONTECSI - International Conference on Information Systems and Technology Management
Cover / Capa
Authors / Autores
Topics / Áreas
Papers / Trabalhos
Committees / Comitês
Sponsors / Patrocinadores
Conference Overview / Panorama do Congresso
decision has to be made on the basis of win/lose of the game. The efficient solution requires the usage of ‘Ensemble’
evaluation function.
References
[1] D. Moriarty and R. Miikkulainen, "Evolving Complex Othello Strategies Using Marker-Based Genetic
Encoding Of Neural Networks", 1993.
[2] B. Rose Othello: A Minute to Learn... A Lifetime to Master, http://othellogateway.com/rose/book.pdf.
[3] N. Eck and M. Wezel, "Reinforcement Learning and its Application to Othello", 2004.
[4] K. Gurney, An Introduction to Neural Networks, Routledge, 1997.
[5] E. Manning, "Temporal Difference Learning of an Othello Evaluation Function for Small Neural Network with
Shared Weights", IEEE Symposium On Computational Intelligence and Games, 2007.
[6] D. Michie, D. Spiegelhalter and C. Taylor, Machine Learning, Neural and Statistical Classification, 1994.
[7] S. Chong, M. Tan and J. White, "Observing the Evolution of Neural Networks Learning to Play the Game of
Othello", IEEE Transactions on Evolutionary Computation, 9(3), 2005, pp.240-251.
[8] S. Lin and J. White, "Spatial Processing Layer Effects on the Evolution of Neural Networks to Play the Game of
Othello", IEEE Congress on Evolutionary Computation, 2009, pp. 646-651.
[9] E. Manning, "Using Resource-Limited Nash Memory to Improve an Othello Evaluation Function", IEEE
Transactions On Computational Intelligence and AI In Games, 2010, vol.2.
[10] R.Sutton and A.Barto, Introduction to Reinforcement Learning, MIT Press, 1998.
[11] P. Hingston and M. Masek, "Experiments with Monte Carlo Othello", IEEE Congress on Evolutionary
Computation, 2007, pp. 4059- 4064.
[12] A. Eiben and J. Smith, Introduction to Evolutionary Computing, Springer, 2003.
[13] K. Kim and S. Cho, "Ensemble Approach in Evolutionary Game Strategies: A case Study in Othello", C IEEE
Symposium On Computational Intelligence and Game, 2008, pp. 212-219.
[14] J. Nijssen, "Playing Othello Using Monte Carlo", 2007.
[15] http://www.worldothellofederation.com/woc2010/
Dr Yousef Al-Ohali, is privileged to serve as an Associate Professor in Computer Science Department at
King Saud University (KSU), Riyadh, Saudi Arabia. KSU is known to be the best university in the region
and among the leading universities in the world. Dr Al-Ohali has published a number of research papers in
conferences and journals of international repute. His current areas of research include but not limited to
Artificial Intelligence, Social Networking and Pattern recognition.
002827
TECSI - Laboratório de Tecnologia e Sistemas de Informação FEA USP - www.tecsi.fea.usp.br