Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Neural Networks, Genetic Algorithms and Propositional Nets Jeff Zurita Why Board Games? Board games are a good proving ground for AI because Complex Problems Clearly defined inputs and results Game environment is easily amenable to computer simulation Like the Turing Test, board games are a good example of human cognition which AI would like to replicate Game Over – You Lose! But aren’t board games a solved problem? Minimax (and other tree pruning methods, alpha beta, Negamax, SSS*, etc) and evaluation functions make this an engineering problem, not an AI problem Two words: Deep Blue! Game Over – Not so fast! One word: Go The Humans Strike Back! Arimmaa Octi Havannah Human Expertise vs. Machine Learning That’s what AI is for Humans can make mistakes! Case Study: Samuel’s Checkers Program: Arthur Samuel’s Checkers (1956) [1]. Game: Checkers Implementation: Polynomial evaluation function Learning Method: “rote learning” : Iterative deepening of search tree based on past board positions "learning by generalization“: Polynomial Coefficients changed Results: Moderate strength player IBM Stock jumped 15 points Samuel’s Learning Method Evaluation function had 39 features, each individually weighted. Sixteen features use at any particular time. Learning program compared value of the current position with a future position If the difference was positive, weights would increase If negative, weights would decrease If the gap could not be closed, features would be swapped. Many games played against a static version to arrive at optimal weight values Reproduced from [2] Case Study: TD-Gammon Program: TD-Gammon (1992) [3] Game: Backgammon Implementation: MLP evaluation function Training Method: TD(λ) aka Temporal Difference Learning Results: Strong player Revised expert opinions on game opening TD-Gammon Evaluation Function The Perceptron Inputs: 1D Vector of board and piece positions Outputs: 1D, 4-element Vector containing predicted winner and whether the win was a gammon Reproduced from [4] Neural Network approximation of a function x0 x1 xn [xi] x0 w00 w01 w0i w00 x1 [xi] ∑ f [w0i] w01 xn f . .f . w0i [w0i] f f . .f . . .f . ∑ f f [w1ij] TD-Gammon Learning Method - TD(λ) The “Temporal Difference” dt is defined as : dt = Yt+1 – Yt where Y is the neural network evaluation function of the game board at turn t. If we knew the real evaluation function, then dt = 0. At the end of the game YT is known. Then adjust weights matrices to minimize dt with a limiting learning parameter α. The parameter λ is the decay parameter, which limits the amount of weight change according to time (turns) away from the game end. Case Study: Blondie24 Program: Blondie24 [5] and [6] Game: Checkers Implementation: MLP evaluation function Training Method: Genetic Algorithm Results: Success! Blondie24: Evaluation Function Inputs: 1D Vector of board and piece positions Output: Value of the board passed to minimax (alpha beta) evaluation Reproduced from [5] Blondie24: Training by Genetic Algorithm 1. Generate a pool of random MLPs 2. Use MLPs in a round of games between each other 3. Rank the MLPS based on win/loss scores 4. Delete the lowest performing MLPs, keep the best performing MLPs as the basis for the next generation 5. Create a new generation of MLPs by mutating (random variation) of the weights and biases of the successful MLPs 6. Repeat Reproduced from [5] Blondie24: Results Resulting program was tested against humans in an online gaming site. Blondie24 ranked 18 of 44,000 registered users Able to draw a Master rated checkers player. Case Study: Hex Player Program: Hex Player [8] Game: Hex Implementation: MLP evaluation function Training Method: Genetic Algorithm Hex Player – Implementation Notes Board represented as a linear vector of n2 elements weightij MLP included two hidden weightjk weightk layers Population of 12 nets, 6 survivors in each generation Mutation governed by a log-normal process where σ is constant depending on the total number of weights Game board represented as onedimensional array 141 Input Neurons 20 Neurons in Medial Layer 1 Reproduced from [8] 20 Neurons in Medial Layer 2 One Output Neuron Hex Player: Results Learning was observed… …but it was not human rated. Lessons learned: Lots of computer time needed! Many tradeoffs in board size, population size, number of games, etc. Troubleshooting alpha beta is hard. Reproduced from [8] GA Limitations in Hex Player Hex has a much larger search space than Checkers Hex has no Piece Differential Hex is a divergent game, as is Go GA applied to Go also resulted in a single strategy [9] Board representation is an important factor Need to capture positional relationships between pieces Blondie24 (checkers) [5] and Blondie25 (chess) [7] included preprocessing neural nets for key areas of the game board Conditions Required for Temporal Displacement Learning Conjectures by I. Ghory [10] These reasons likely apply to the GA method as well. 1. Smoothness of board evaluation function. 2. Divergence rate of boards at single-ply depth. 3. State-space complexity. 4. Forced exploration. Complexity of Select Games Reproduced from [10] The Frontier: Monte Carlo Tree Search An adversarial search method in which selected nodes are played-out and the results become part of the evaluation Allows for aheuristic game playing: only legal moves and winning conditions are needed Machine learning methods can be applied to the node selection step, as explained in reference [13] Reproduced from [11] and [12] The Frontier: General Game Playing State machine Implement the rules of a game as a state machine Implemented commercially in Zillions of Games [14] Stanford GGP : Logical Proposition Nets Decompose game into logical propositions Marking a proposition assigns boolean values – equivalent to a game state Compute legal moves Compute consequences (aka new game states) Compute goal achievement (aka winning) See references [15] through [17] for online course materials Reproduced from [15] References [1]. “Some Studies in Machine Learning using the game of Checkers”, A. Samuel, IBM Journal, Vol 3, No. 3, July 1959. [2]. Samuel’s Checkers Illustration is reprinted from “Reinforcement Learning: An Introduction’”, R. Sutton and A. Barto, MIT Press, available at: http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node109.html [3]. “Temporal Difference Learning and TD-Gammon”, G. Tesauro, Communications of the ACM, Vol. 38, No. 3, March 1995. [4]. “Reinforcement Learning: An Introduction”, A. Barto. MIT Press, 1998. Available online at: http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html [5]. “Evolving Neural Networks to Play Checkers without Relying on Expert Knowledge”, K. Chellapilla and D. Fogel, IEEE Transactions on Neural Networks, Vol. 10, No. 6, 1999. [6].” Blondie24: Playing at the Edge of AI”, Fogel DB, Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2002 [7]. "A Self-Learning Evolutionary Chess Program," Fogel DB, Hays TJ, Hahn SL, and Quon J, Proceedings of the IEEE, Vol.92 (12), pp. 1947-1954. [8]. “The Hex Player Project: An Experiment in Self Directed Machine Learning”, J. Zurita, MSCS Thesis, Villanova University, 2012. [9]. A downloadable version of Litho, the genetically developed Go player, s is available in this forum discussion: http://lifein19x19.com/forum/viewtopic.php?f=18&t=5368 References (cont’d) [10]. “Reinforcement learning in board games”, I. Ghory, CSTR-04-004, Department of Computer Science, University of Bristol, 2004. Available at: http://www.cs.bris.ac.uk/Publications/Papers/2000100.pdf [11]. “Monte-Carlo Tree Search: A New Framework for Game AI”, G. Chaslot, S. Bakkes, I. Szita, P. Spronck, Universiteit Maastricht, The Netherlands. Available at: http://www.personeel.unimaas.nl/g-chaslot/papers/AIIDE08_Chaslot.pdf [12]. Monte Carlo Tree Search (MCTS) research hub, available at: http://mcts.ai/about/index.html [13]. “Evolutionary Learning of Policies for MCTS Simulations”, J. Petit and D. Helmbold. (Made available by personal communication with the authors). [14]. Zillions of Games (commercially sold program). Available at: http://www.zillions-of-games.com/ [15]. “GENERAL GAME PLAYING “, M. Genesereth , M. Thielscher , available on-line at http://logic.stanford.edu/ggp/chapters/cover.html [16]. Stanford General Game Playing Course materials: http://logic.stanford.edu/classes/cs227/2012/index.html [17]. Coursera site for online GGP course: https://www.coursera.org/course/ggp