Download Development (cont`d)

The Implementation of Artificial Intelligence and Temporal Difference Learning Algorithms in a Computerized Chess Programme By James Mannion Computer Systems Lab 08-09 Period 3 Abstract  Searching through large sets of data  Complex, vast domains  Heuristic searches  Chess  Evaluation Function  Machine Learning Introduction  Simple domains, simple heuristics  The domain of chess  Deep Blue – brute force  Looking at 30^6 moves before making the first  Supercomputer  Too many calculations  Not efficient Introduction (cont’d)  Minimax search  Alpha-beta pruning  Only look 2-3 moves into the future  Estimate strength of position  Evaluation function  Can improve heuristic by learning Introduction (cont’d)    Seems simple, but can become quite complex. Chess masters spend careers learning how to “evaluate” moves Purpose: can a computer learn a good evaluation function? Background  Claude Shannon, 1950  Brute force would take too long  Discusses evaluation function   2-ply algorithm, but looks further into the future for moves that could lead to checkmate Possibility of learning in distant future Development  Python  Stage 1: Text based chess game  Two humans input their moves  Illegal moves not allowed Development (cont’d) Development (cont’d) Development (cont’d) Development (cont’d) • Stage 2: Introduce a computer player • 2-3 ply • Evaluation function will start out such that choices are based on a simple piecedifferential where each piece is waited equally Development (cont’d)  Stage 3: Learning  Temporal Difference Learning  Weight adjustment:  w_i < − − w_i + a((n_ic − n_ip)/(n_ic))  Heuristic function:  h = c_1(p_1) + c_2(p_2) + c_3(p_3) + c_4(p_4) + c_5(p_5)  Piece values:  p-i = Sum(w_i) – Sum(b_i) over i Testing   Learning vs No Learning Two equal, piece-differential players pitted against each other.  One will have the ability to learn  Thousands of games   Win-loss differential tracked over the length of the test By the end, the learner should be winning significantly more games. Data Data (cont'd) References     Shannon, Claude. “Programming a Computer for Playing Chess.” 1950 Beal, D.F., Smith, M.C. “Temporal Difference Learning for Heuristic Search and Game Playing.” 1999 Moriarty, David E., Miikkulainen, Risto. “Discovering Complex Othello Strategies Through Evolutionary Neural Networks.” Huang, Shiu-li, Lin, Fu-ren. “Using TemporalDifference Learning for Multi-Agent Bargaining.” 2007

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Development (cont`d)