* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Development (cont`d)
Human-Computer Interaction Institute wikipedia , lookup
Catastrophic interference wikipedia , lookup
Artificial intelligence in video games wikipedia , lookup
History of artificial intelligence wikipedia , lookup
Reinforcement learning wikipedia , lookup
Pattern recognition wikipedia , lookup
Human–computer chess matches wikipedia , lookup
Computer chess wikipedia , lookup
Concept learning wikipedia , lookup
The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme By James Mannion Computer Systems Lab 08-09 Period 3 Abstract  Searching through large sets of data  Complex, vast domains  Heuristic searches  Chess  Evaluation Function  Machine Learning Introduction  Simple domains, simple heuristics  The domain of chess  Deep Blue – brute force  Looking at 30^6 moves before making the first  Supercomputer  Too many calculations  Not efficient Introduction (cont’d)  Minimax search  Alpha-beta pruning  Only look 2-3 moves into the future  Estimate strength of position  Evaluation function  Can improve heuristic by learning Introduction (cont’d)    Seems simple, but can become quite complex. Chess masters spend careers learning how to “evaluate” moves Purpose: can a computer learn a good evaluation function? Background  Claude Shannon, 1950  Brute force would take too long  Discusses evaluation function   2-ply algorithm, but looks further into the future for moves that could lead to checkmate Possibility of learning in distant future Development  Python  Stage 1: Text based chess game  Two humans input their moves  Illegal moves not allowed Development (cont’d) Development (cont’d) Development (cont’d) Development (cont’d) • Stage 2: Introduce a computer player • 2-3 ply • Evaluation function will start out such that choices are based on a simple piecedifferential where each piece is waited equally Development (cont’d)  Stage 3: Learning  Temporal Difference Learning  Weight adjustment:  w_i < − − w_i + a((n_ic − n_ip)/(n_ic))  Heuristic function:  h = c_1(p_1) + c_2(p_2) + c_3(p_3) + c_4(p_4) + c_5(p_5)  Piece values:  p-i = Sum(w_i) – Sum(b_i) over i Testing   Learning vs No Learning Two equal, piece-differential players pitted against each other.  One will have the ability to learn  Thousands of games   Win-loss differential tracked over the length of the test By the end, the learner should be winning significantly more games. Data Data (cont'd) References     Shannon, Claude. “Programming a Computer for Playing Chess.” 1950 Beal, D.F., Smith, M.C. “Temporal Difference Learning for Heuristic Search and Game Playing.” 1999 Moriarty, David E., Miikkulainen, Risto. “Discovering Complex Othello Strategies Through Evolutionary Neural Networks.” Huang, Shiu-li, Lin, Fu-ren. “Using TemporalDifference Learning for Multi-Agent Bargaining.” 2007