Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Notes 6: Game-Playing ICS 171 Fall 2006 ICS-171:Notes 6: 1 Overview • Computer programs which play 2-player games – game-playing as search – with the complication of an opponent • General principles of game-playing and search – evaluation functions – minimax principle – alpha-beta-pruning – heuristic techniques • Status of Game-Playing Systems – in chess, checkers, backgammon, Othello, etc, computers routinely defeat leading world players • Applications? – think of “nature” as an opponent – economics, war-gaming, medical drug treatment ICS-171:Notes 6: 2 Solving 2-players Games • • • • • Two players, perfect information Examples: – e.g., chess, checkers, tic-tac-toe configuration of the board = unique arrangement of “pieces” Statement of Game as a Search Problem – States = board configurations – Operators = legal moves – Initial State = current configuration – Goal = winning configuration – payoff function = gives numerical value of outcome of the game A working example: Grundy's game – Given a set of coins, a player takes a set and divides it into two unequal sets. The player who plays last, looses. ICS-171:Notes 6: 3 Grundy’s game - special case of nim ICS-171:Notes 6: 4 Games vs. search problems • "Unpredictable" opponent specifying a move for every possible opponent reply • • • Time limits unlikely to find goal, must approximate ICS-171:Notes 6: 5 ICS-171:Notes 6: 6 Game Trees ICS-171:Notes 6: 7 ICS-171:Notes 6: 8 An optimal procedure: The Min-Max method • Designed to find the optimal strategy for Max and find best move: – 1. Generate the whole game tree to leaves – 2. Apply utility (payoff) function to leaves – 3. Back-up values from leaves toward the root: • a Max node computes the max of its child values • a Min node computes the Min of its child values – 4. When value reaches the root: choose max value and the corresponding move. ICS-171:Notes 6: 10 Properties of minimax • • • • • • • • Complete? Yes (if tree is finite) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible Optimal? Yes (against an optimal opponent) Time complexity? O(bm) Space complexity? O(bm) (depth-first exploration) • – – Chess: • b ~ 35 (average branching factor) • d ~ 100 (depth of game tree for typical game) • bd ~ 35100 ~10154 nodes!! Tic-Tac-Toe • ~5 legal moves, total of 9 moves • 59 = 1,953,125 • 9! = 362,880 (Computer goes first) • 8! = 40,320 (Computer goes second) ICS-171:Notes 6: 11 An optimal procedure: The Min-Max method • Designed to find the optimal strategy for Max and find best move: – 1. Generate the whole game tree to leaves – 2. Apply utility (payoff) function to leaves – 3. Back-up values from leaves toward the root: • a Max node computes the max of its child values • a Min node computes the Min of its child values – 4. When value reaches the root: choose max value and the corresponding move. • However: It is impossible to develop the whole search tree, instead develop part of the tree and evaluate promise of leaves using a static evaluation function. ICS-171:Notes 6: 12 Static (Heuristic) Evaluation Functions • • • • An Evaluation Function: – estimates how good the current board configuration is for a player. – Typically, one figures how good it is for the player, and how good it is for the opponent, and subtracts the opponents score from the players – Othello: Number of white pieces - Number of black pieces – Chess: Value of all white pieces - Value of all black pieces Typical values from -infinity (loss) to +infinity (win) or [-1, +1]. If the board evaluation is X for a player, it’s -X for the opponent Example: – Evaluating chess boards, – Checkers – Tic-tac-toe ICS-171:Notes 6: 13 ICS-171:Notes 6: 14 Deeper Game Trees ICS-171:Notes 6: 15 Applying MiniMax to tic-tac-toe • The static evaluation function heuristic ICS-171:Notes 6: 16 Backup Values ICS-171:Notes 6: 17 ICS-171:Notes 6: 18 ICS-171:Notes 6: 19 Pruning with Alpha/Beta • In Min-Max there is a separation between node generation and evaluation. Backup Values ICS-171:Notes 6: 20 Alpha Beta Procedure • • • Idea: – Do Depth first search to generate partial game tree, – Give static evaluation function to leaves, – compute bound on internal nodes. Alpha, Beta bounds: – Alpha value for Max node means that Max real value is at least alpha. – Beta for Min node means that Min can guarantee a value below Beta. Computation: – Alpha of a Max node is the maximum value of its seen children. – Beta of a Min node is the lowest value seen of its child node . ICS-171:Notes 6: 21 When to Prune • Pruning – Below a Min node whose beta value is lower than or equal to the alpha value of its ancestors. – Below a Max node having an alpha value greater than or equal to the beta value of any of its Min nodes ancestors. ICS-171:Notes 6: 22 α-β pruning example ICS-171:Notes 6: 23 α-β pruning example ICS-171:Notes 6: 24 α-β pruning example ICS-171:Notes 6: 25 α-β pruning example ICS-171:Notes 6: 26 α-β pruning example ICS-171:Notes 6: 27 Properties of α-β • • Pruning does not affect final result • • Good move ordering improves effectiveness of pruning • With "perfect ordering," time complexity = O(bm/2) doubles depth of search • A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) • ICS-171:Notes 6: 28 Effectiveness of Alpha-Beta Search • Worst-Case – branches are ordered so that no pruning takes place. In this case alpha-beta gives no improvement over exhaustive search • Best-Case – each player’s best move is the left-most alternative (i.e., evaluated first) – in practice, performance is closer to best rather than worst-case • In practice often get O(b(d/2)) rather than O(bd) – this is the same as having a branching factor of sqrt(b), • since (sqrt(b))d = b(d/2) • i.e., we have effectively gone from b to square root of b – e.g., in chess go from b ~ 35 to b ~ 6 • this permits much deeper search in the same amount of time ICS-171:Notes 6: 29 Why is it called α-β? • α is the value of the best (i.e., highest-value) choice found so far at any choice point along the path for max • • If v is worse than α, max will avoid it • prune that branch • • Define β similarly for min ICS-171:Notes 6: 30 The α-β algorithm ICS-171:Notes 6: 31 Resource limits Suppose we have 100 secs, explore 104 nodes/sec 106 nodes per move Standard approach: • cutoff test: e.g., depth limit (perhaps add quiescence search) • evaluation function = estimated desirability of position ICS-171:Notes 6: 32 Evaluation functions • For chess, typically linear weighted sum of features Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s) • e.g., w1 = 9 with f1(s) = (number of white queens) – (number of black queens), etc. ICS-171:Notes 6: 33 Cutting off search MinimaxCutoff is identical to MinimaxValue except 1. Terminal? is replaced by Cutoff? 2. Utility is replaced by Eval 3. Does it work in practice? bm = 106, b=35 m=4 4-ply lookahead is a hopeless chess player! – – – – 4-ply ≈ human novice 8-ply ≈ typical PC, human master 12-ply ≈ Deep Blue, Kasparov ICS-171:Notes 6: 34 Deterministic games in practice • • Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion positions. • • Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. • • Othello: human champions refuse to compete against computers, who are too good. • • Go: human champions refuse to compete against computers, who are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves. • ICS-171:Notes 6: 35 Iterative (Progressive) Deepening • In real games, there is usually a time limit T on making a move • How do we take this into account? – using alpha-beta we cannot use “partial” results with any confidence unless the full breadth of the tree has been searched – So, we could be conservative and set a conservative depth-limit which guarantees that we will find a move in time < T • disadvantage is that we may finish early, could do more search • In practice, iterative deepening search (IDS) is used – IDS runs depth-first search with an increasing depth-limit – when the clock runs out we use the solution found at the previous depth limit ICS-171:Notes 6: 36 Heuristics and Game Tree Search • The Horizon Effect – sometimes there’s a major “effect” (such as a piece being captured) which is just “below” the depth to which the tree has been expanded – the computer cannot see that this major event could happen – it has a “limited horizon” – there are heuristics to try to follow certain branches more deeply to detect to such important events – this helps to avoid catastrophic losses due to “short-sightedness” • Heuristics for Tree Exploration – it may be better to explore some branches more deeply in the allotted time – various heuristics exist to identify “promising” branches ICS-171:Notes 6: 37 Summary • Game playing is best modeled as a search problem • Game trees represent alternate computer/opponent moves • Evaluation functions estimate the quality of a given board configuration for the Max player. • Minimax is a procedure which chooses moves by assuming that the opponent will always choose the move which is best for them • Alpha-Beta is a procedure which can prune large parts of the search tree and allow search to go deeper • For many well-known games, computer algorithms based on heuristic search match or out-perform human world experts. • Reading:R&N Chapter 5. ICS-171:Notes 6: 39