Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of artificial intelligence wikipedia , lookup
Human-Computer Interaction Institute wikipedia , lookup
Catastrophic interference wikipedia , lookup
Artificial intelligence in video games wikipedia , lookup
Pattern recognition wikipedia , lookup
Reinforcement learning wikipedia , lookup
Computer chess wikipedia , lookup
Concept learning wikipedia , lookup
The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Program By James Mannion Computer Systems Lab 08-09 Period 3 Abstract Searching through large sets of data Complex, vast domains Heuristic searches Chess Evaluation Function Machine Learning Introduction Games Minimax search Alpha-beta pruning Only look 2-3 moves into the future Estimate strength of position Evaluation function Can improve heuristic by learning Introduction Seems simple, but can become quite complex. Chess masters spend careers learning how to “evaluate” moves Purpose: can a computer learn a good evaluation function? Background Claude Shannon, 1950 Brute force would take too long Discusses evaluation function 2-ply algorithm, but looks further into the future for moves that could lead to checkmate Possibility of learning in distant future Development Python Stage 1: Text based chess game Two humans input their moves Illegal moves not allowed Development Development Development Development • Stage 2: Introduce a computer player • 2-3 ply • Evaluation function will start out such that choices are based on a simple piecedifferential where each piece is waited equally Development Stage 3: Learning Temporal Difference Learning Weight adjustment: w ← w + a*(Pt - Pt-1)*∂wPt-1 a = 200/(199 + n) P = 1/(1 + e-h) h = w1(j1 – k1) + … + w5(j5 – k5) Testing Learning vs No Learning Two equal, piece-differential players pitted against each other. One will have the ability to learn Multiple Games Weight values and win-loss differential tracked over the length of the test Results Change In Weights Over Time 5 4 Weight 3 Pawn Knight Bishop Rook Queen 2 1 0 1 9 17 25 33 41 49 57 65 73 -1 Turns Taken (10 turns) 81 89 97 105 Results Win Percentage Over Time 1 0.9 Win Percentage 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 Games Played 15 Results • • • • • Weights changed This affected performance Equilibrium values reached Program actually got worse at chess Probably due to code error References Shannon, Claude. “Programming a Computer for Playing Chess.” 1950 Beal, D.F., Smith, M.C. “Temporal Difference Learning for Heuristic Search and Game Playing.” 1999 Moriarty, David E., Miikkulainen, Risto. “Discovering Complex Othello Strategies Through Evolutionary Neural Networks.” Huang, Shiu-li, Lin, Fu-ren. “Using TemporalDifference Learning for Multi-Agent Bargaining.” 2007