Download Machine Learning in Board Games

Neural Networks, Genetic Algorithms and Propositional Nets Jeff Zurita Why Board Games?  Board games are a good proving ground for AI because  Complex Problems  Clearly defined inputs and results  Game environment is easily amenable to computer simulation  Like the Turing Test, board games are a good example of human cognition which AI would like to replicate Game Over – You Lose!  But aren’t board games a solved problem?  Minimax (and other tree pruning methods, alpha beta, Negamax, SSS*, etc) and evaluation functions make this an engineering problem, not an AI problem  Two words: Deep Blue! Game Over – Not so fast!  One word: Go  The Humans Strike Back!  Arimmaa  Octi  Havannah  Human Expertise vs. Machine Learning  That’s what AI is for  Humans can make mistakes! Case Study: Samuel’s Checkers  Program: Arthur Samuel’s Checkers (1956) [1].  Game: Checkers  Implementation: Polynomial evaluation function  Learning Method:  “rote learning” : Iterative deepening of search tree based on past board positions  "learning by generalization“: Polynomial Coefficients changed  Results: Moderate strength player  IBM Stock jumped 15 points Samuel’s Learning Method  Evaluation function had 39 features, each individually weighted. Sixteen features use at any particular time.  Learning program compared value of the current position with a future position  If the difference was positive, weights would increase  If negative, weights would decrease  If the gap could not be closed, features would be swapped.  Many games played against a static version to arrive at optimal weight values Reproduced from [2] Case Study: TD-Gammon  Program: TD-Gammon (1992) [3]  Game: Backgammon  Implementation: MLP evaluation function  Training Method: TD(λ) aka Temporal Difference Learning  Results:  Strong player  Revised expert opinions on game opening TD-Gammon Evaluation Function  The Perceptron  Inputs: 1D Vector of board and piece positions  Outputs: 1D, 4-element Vector containing predicted winner and whether the win was a gammon Reproduced from [4] Neural Network approximation of a function x0 x1 xn [xi] x0 w00 w01 w0i w00 x1 [xi] ∑ f [w0i] w01 xn f . .f . w0i [w0i] f f . .f . . .f . ∑ f f [w1ij] TD-Gammon Learning Method - TD(λ)  The “Temporal Difference” dt is defined as : dt = Yt+1 – Yt where Y is the neural network evaluation function of the game board at turn t. If we knew the real evaluation function, then dt = 0.  At the end of the game YT is known. Then adjust weights matrices to minimize dt with a limiting learning parameter α.  The parameter λ is the decay parameter, which limits the amount of weight change according to time (turns) away from the game end. Case Study: Blondie24  Program: Blondie24 [5] and [6]  Game: Checkers  Implementation: MLP evaluation function  Training Method: Genetic Algorithm  Results: Success! Blondie24: Evaluation Function  Inputs: 1D Vector of board and piece positions  Output: Value of the board passed to minimax (alpha beta) evaluation Reproduced from [5] Blondie24: Training by Genetic Algorithm 1. Generate a pool of random MLPs 2. Use MLPs in a round of games between each other 3. Rank the MLPS based on win/loss scores 4. Delete the lowest performing MLPs, keep the best performing MLPs as the basis for the next generation 5. Create a new generation of MLPs by mutating (random variation) of the weights and biases of the successful MLPs 6. Repeat Reproduced from [5] Blondie24: Results  Resulting program was tested against humans in an online gaming site.  Blondie24 ranked 18 of 44,000 registered users  Able to draw a Master rated checkers player. Case Study: Hex Player  Program: Hex Player [8]  Game: Hex  Implementation: MLP evaluation function  Training Method: Genetic Algorithm Hex Player – Implementation Notes  Board represented as a linear vector of n2 elements weightij  MLP included two hidden weightjk weightk layers  Population of 12 nets, 6 survivors in each generation  Mutation governed by a log-normal process where σ is constant depending on the total number of weights Game board represented as onedimensional array 141 Input Neurons 20 Neurons in Medial Layer 1 Reproduced from [8] 20 Neurons in Medial Layer 2 One Output Neuron Hex Player: Results  Learning was observed…  …but it was not human rated.  Lessons learned:  Lots of computer time needed!  Many tradeoffs in board size, population size, number of games, etc.  Troubleshooting alpha beta is hard. Reproduced from [8] GA Limitations in Hex Player  Hex has a much larger search space than Checkers  Hex has no Piece Differential  Hex is a divergent game, as is Go  GA applied to Go also resulted in a single strategy [9]  Board representation is an important factor  Need to capture positional relationships between pieces  Blondie24 (checkers) [5] and Blondie25 (chess) [7] included preprocessing neural nets for key areas of the game board Conditions Required for Temporal Displacement Learning  Conjectures by I. Ghory [10]  These reasons likely apply to the GA method as well. 1. Smoothness of board evaluation function. 2. Divergence rate of boards at single-ply depth. 3. State-space complexity. 4. Forced exploration. Complexity of Select Games Reproduced from [10] The Frontier: Monte Carlo Tree Search  An adversarial search method in which selected nodes are played-out and the results become part of the evaluation  Allows for aheuristic game playing: only legal moves and winning conditions are needed  Machine learning methods can be applied to the node selection step, as explained in reference [13] Reproduced from [11] and [12] The Frontier: General Game Playing  State machine  Implement the rules of a game as a state machine  Implemented commercially in Zillions of Games [14]  Stanford GGP : Logical Proposition Nets  Decompose game into logical propositions  Marking a proposition assigns boolean values – equivalent to a game state  Compute legal moves  Compute consequences (aka new game states)  Compute goal achievement (aka winning) See references [15] through [17] for online course materials Reproduced from [15] References          [1]. “Some Studies in Machine Learning using the game of Checkers”, A. Samuel, IBM Journal, Vol 3, No. 3, July 1959. [2]. Samuel’s Checkers Illustration is reprinted from “Reinforcement Learning: An Introduction’”, R. Sutton and A. Barto, MIT Press, available at: http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node109.html [3]. “Temporal Difference Learning and TD-Gammon”, G. Tesauro, Communications of the ACM, Vol. 38, No. 3, March 1995. [4]. “Reinforcement Learning: An Introduction”, A. Barto. MIT Press, 1998. Available online at: http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html [5]. “Evolving Neural Networks to Play Checkers without Relying on Expert Knowledge”, K. Chellapilla and D. Fogel, IEEE Transactions on Neural Networks, Vol. 10, No. 6, 1999. [6].” Blondie24: Playing at the Edge of AI”, Fogel DB, Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2002 [7]. "A Self-Learning Evolutionary Chess Program," Fogel DB, Hays TJ, Hahn SL, and Quon J, Proceedings of the IEEE, Vol.92 (12), pp. 1947-1954. [8]. “The Hex Player Project: An Experiment in Self Directed Machine Learning”, J. Zurita, MSCS Thesis, Villanova University, 2012. [9]. A downloadable version of Litho, the genetically developed Go player, s is available in this forum discussion: http://lifein19x19.com/forum/viewtopic.php?f=18&t=5368 References (cont’d)  [10]. “Reinforcement learning in board games”, I. Ghory, CSTR-04-004, Department of        Computer Science, University of Bristol, 2004. Available at: http://www.cs.bris.ac.uk/Publications/Papers/2000100.pdf [11]. “Monte-Carlo Tree Search: A New Framework for Game AI”, G. Chaslot, S. Bakkes, I. Szita, P. Spronck, Universiteit Maastricht, The Netherlands. Available at: http://www.personeel.unimaas.nl/g-chaslot/papers/AIIDE08_Chaslot.pdf [12]. Monte Carlo Tree Search (MCTS) research hub, available at: http://mcts.ai/about/index.html [13]. “Evolutionary Learning of Policies for MCTS Simulations”, J. Petit and D. Helmbold. (Made available by personal communication with the authors). [14]. Zillions of Games (commercially sold program). Available at: http://www.zillions-of-games.com/ [15]. “GENERAL GAME PLAYING “, M. Genesereth , M. Thielscher , available on-line at http://logic.stanford.edu/ggp/chapters/cover.html [16]. Stanford General Game Playing Course materials: http://logic.stanford.edu/classes/cs227/2012/index.html [17]. Coursera site for online GGP course: https://www.coursera.org/course/ggp

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Machine Learning in Board Games