Download Lecture-09-20050914 - Kansas State University

Lecture 9 of 42 Game Tree Search II Wednesday, 14 September 2005 William H. Hsu Department of Computing and Information Sciences, KSU http://www.kddresearch.org http://www.cis.ksu.edu/~bhsu Reading: Chapter 6, Russell and Norvig 2e CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Lecture Outline • Today’s Reading – Sections 6.1-6.4, Russell and Norvig 2e – Recommended references: Rich and Knight, Winston • Reading for Next Class: Sections 6.5-6.8, Russell and Norvig • Games as Search Problems – Frameworks: two-player, multi-player; zero-sum; perfect information – Minimax algorithm • Perfect decisions • Imperfect decisions (based upon static evaluation function) – Issues • Quiescence • Horizon effect – Need for pruning • Next Lecture: Alpha-Beta Pruning, Expectiminimax, Current “Hot” Problems • Next Week: Knowledge Representation – Logics and Production Systems CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Overview • Perfect Play – General framework(s) – What could agent do with perfect info? • Resource Limits – Search ply – Static evaluation: from heuristic search to heuristic game tree search – Examples • Tic-tac-toe, connect four, checkers, connect-five / Go-Moku / wu3 zi3 qi2 • Chess, go • Games with Uncertainty – Explicit: games of chance (e.g., backgammon, Monopoly) – Implicit: see project suggestions! Adapted from slides by S. Russell, UC Berkeley CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Minimax Algorithm: Decision and Evaluation  what’s this?  what’s this? Adapted from slides by S. Russell, UC Berkeley CIS 730: Introduction to Artificial Intelligence Figure 5.3 p. 126 R&N Kansas State University Department of Computing and Information Sciences Properties of Minimax • Complete? – … yes, provided following are finite: • Number of possible legal moves (generative breadth of tree) • “Length of game” (depth of tree) – more specifically? – Perfect vs. imperfect information? • Q: What search is perfect minimax analogous to? • A: Bottom-up breadth-first • Optimal? – … yes, provided perfect info (evaluation function) and opponent is optimal! – … otherwise, guaranteed if evaluation function is correct • Time Complexity? – Depth of tree: m – Legal moves at each point: b – O(bm) – NB, m  100, b  35 for chess! • Space Complexity? O(bm) – why? Adapted from slides by S. Russell, UC Berkeley CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Review: Alpha-Beta (-) Pruning Example What are ,  values here? ≥3 MAX MIN ≤2 3 ≤ 14 ≤5 2 MAX 3 12 8 2 Adapted from slides by S. Russell, UC Berkeley CIS 730: Introduction to Artificial Intelligence 14 5 2 Figure 5.6 p. 131 R&N Kansas State University Department of Computing and Information Sciences Alpha-Beta (-) Pruning: Modified Minimax Algorithm Adapted from slides by S. Russell, UC Berkeley CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Digression: Learning Evaluation Functions • Learning = Improving with Experience at Some Task – Improve over task T, – with respect to performance measure P, – based on experience E. • Example: Learning to Play Checkers – T: play games of checkers – P: percent of games won in world tournament – E: opportunity to play against self • Refining the Problem Specification: Issues – What experience? – What exactly should be learned? – How shall it be represented? – What specific algorithm to learn it? • Defining the Problem Milieu – Performance element: How shall the results of learning be applied? – How shall the performance element be evaluated? The learning system? CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Example: Learning to Play Checkers • Type of Training Experience – Direct or indirect? – Teacher or not? – Knowledge about the game (e.g., openings/endgames)? • Problem: Is Training Experience Representative (of Performance Goal)? • Software Design • – Assumptions of the learning system: legal move generator exists – Software requirements: generator, evaluator(s), parametric target function Choosing a Target Function – ChooseMove: Board  Move (action selection function, or policy) – V: Board  R (board evaluation function) – Ideal target V; approximated target – Goal of learning process: operational description (approximation) of V Vˆ CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences A Target Function for Learning to Play Checkers • Possible Definition – If b is a final board state that is won, then V(b) = 100 – If b is a final board state that is lost, then V(b) = -100 – If b is a final board state that is drawn, then V(b) = 0 – If b is not a final board state in the game, then V(b) = V(b’) where b’ is the best final board state that can be achieved starting from b and playing optimally until the end of the game – Correct values, but not operational • Choosing a Representation for the Target Function – Collection of rules? – Neural network? – Polynomial function (e.g., linear, quadratic combination) of board features? – Other? • A Representation for Learned Function Vˆ b   w 0  w 1bp b   w 2 rp b   w 3 bk b   w 4 rk b   w 5 bt b   w 6 rt b  – bp/rp = number of black/red pieces; bk/rk = number of black/red kings; bt/rt = number of black/red pieces threatened (can be taken on next turn) – CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences A Training Procedure for Learning to Play Checkers • Obtaining Training Examples – the learned function the training value One Rule For Estimating Training Values: – • the target function V̂ b  – Vtrain b  – • V b  Vtrain b   Vˆ Successor b  Choose Weight Tuning Rule – Least Mean Square (LMS) weight update rule: REPEAT • Select a training example b at random • Compute the error(b) for this training example error b   Vtrain b   Vˆ b  • For each board feature fi, update weight wi as follows: w i  w i  c  fi  error b  where c is a small, constant factor to adjust the learning rate CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Design Choices for Learning to Play Checkers Determine Type of Training Experience Games against experts Games against self Table of correct moves Determine Target Function Board  move Board  value Determine Representation of Learned Function Polynomial Linear function of six features Artificial neural network Determine Learning Algorithm Gradient descent Linear programming Completed Design CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Knowledge Bases Adapted from slides by S. Russell, UC Berkeley CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Simple Knowledge-Based Agent Adapted from slides by S. Russell, UC Berkeley CIS 730: Introduction to Artificial Intelligence Figure 6.1 p. 152 R&N Kansas State University Department of Computing and Information Sciences Summary Points • Introduction to Games as Search Problems – Frameworks • Two-player versus multi-player • Zero-sum versus cooperative • Perfect information versus partially-observable (hidden state) – Concepts • Utility and representations (e.g., static evaluation function) • Reinforcements: possible role for machine learning • Game tree • Family of Algorithms for Game Trees: Minimax – Propagation of credit – Imperfect decisions – Issues • Quiescence • Horizon effect – Need for pruning CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences Terminology • Game Graph Search – Frameworks • Two-player versus multi-player • Zero-sum versus cooperative • Perfect information versus partially-observable (hidden state) – Concepts • Utility and representations (e.g., static evaluation function) • Reinforcements: possible role for machine learning • Game tree: node/move correspondence, search ply • Family of Algorithms for Game Trees: Minimax – Propagation of credit – Imperfect decisions – Issues • Quiescence • Horizon effect – Need for (alpha-beta) pruning CIS 730: Introduction to Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture-09-20050914 - Kansas State University