Download Lecture 7: Game-Playing and Search

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Notes 6: Game-Playing
ICS 171 Fall 2006
ICS-171:Notes 6: 1
Overview
•
Computer programs which play 2-player games
– game-playing as search
– with the complication of an opponent
•
General principles of game-playing and search
– evaluation functions
– minimax principle
– alpha-beta-pruning
– heuristic techniques
•
Status of Game-Playing Systems
– in chess, checkers, backgammon, Othello, etc, computers routinely
defeat leading world players
•
Applications?
– think of “nature” as an opponent
– economics, war-gaming, medical drug treatment
ICS-171:Notes 6: 2
Solving 2-players Games
•
•
•
•
•
Two players, perfect information
Examples:
– e.g., chess, checkers, tic-tac-toe
configuration of the board = unique arrangement of “pieces”
Statement of Game as a Search Problem
– States = board configurations
– Operators = legal moves
– Initial State = current configuration
– Goal = winning configuration
– payoff function = gives numerical value of outcome of the game
A working example: Grundy's game
– Given a set of coins, a player takes a set and divides it into two
unequal sets. The player who plays last, looses.
ICS-171:Notes 6: 3
Grundy’s game - special case of nim
ICS-171:Notes 6: 4
Games vs. search problems
•
"Unpredictable" opponent  specifying a move for every possible
opponent reply
•
•
•
Time limits  unlikely to find goal, must approximate
ICS-171:Notes 6: 5
ICS-171:Notes 6: 6
Game Trees
ICS-171:Notes 6: 7
ICS-171:Notes 6: 8
An optimal procedure: The Min-Max method
•
Designed to find the optimal strategy for Max and find best move:
– 1. Generate the whole game tree to leaves
– 2. Apply utility (payoff) function to leaves
– 3. Back-up values from leaves toward the root:
• a Max node computes the max of its child values
• a Min node computes the Min of its child values
– 4. When value reaches the root: choose max value and the
corresponding move.
ICS-171:Notes 6: 10
Properties of minimax
•
•
•
•
•
•
•
•
Complete? Yes (if tree is finite)
•
For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible
Optimal? Yes (against an optimal opponent)
Time complexity? O(bm)
Space complexity? O(bm) (depth-first exploration)
•
–
–
Chess:
• b ~ 35 (average branching factor)
• d ~ 100 (depth of game tree for typical game)
• bd ~ 35100 ~10154 nodes!!
Tic-Tac-Toe
• ~5 legal moves, total of 9 moves
• 59 = 1,953,125
• 9! = 362,880 (Computer goes first)
• 8! = 40,320 (Computer goes second)
ICS-171:Notes 6: 11
An optimal procedure: The Min-Max method
•
Designed to find the optimal strategy for Max and find best move:
– 1. Generate the whole game tree to leaves
– 2. Apply utility (payoff) function to leaves
– 3. Back-up values from leaves toward the root:
• a Max node computes the max of its child values
• a Min node computes the Min of its child values
– 4. When value reaches the root: choose max value and the
corresponding move.
• However: It is impossible to develop the whole search
tree, instead develop part of the tree and evaluate
promise of leaves using a static evaluation function.
ICS-171:Notes 6: 12
Static (Heuristic) Evaluation Functions
•
•
•
•
An Evaluation Function:
– estimates how good the current board configuration is for a player.
– Typically, one figures how good it is for the player, and how good it
is for the opponent, and subtracts the opponents score from the
players
– Othello: Number of white pieces - Number of black pieces
– Chess: Value of all white pieces - Value of all black pieces
Typical values from -infinity (loss) to +infinity (win) or [-1, +1].
If the board evaluation is X for a player, it’s -X for the opponent
Example:
– Evaluating chess boards,
– Checkers
– Tic-tac-toe
ICS-171:Notes 6: 13
ICS-171:Notes 6: 14
Deeper Game Trees
ICS-171:Notes 6: 15
Applying MiniMax to tic-tac-toe
•
The static evaluation function heuristic
ICS-171:Notes 6: 16
Backup Values
ICS-171:Notes 6: 17
ICS-171:Notes 6: 18
ICS-171:Notes 6: 19
Pruning with Alpha/Beta
•
In Min-Max there is a separation between node generation and
evaluation.
Backup Values
ICS-171:Notes 6: 20
Alpha Beta Procedure
•
•
•
Idea:
– Do Depth first search to generate partial game tree,
– Give static evaluation function to leaves,
– compute bound on internal nodes.
Alpha, Beta bounds:
– Alpha value for Max node means that Max real value is at least
alpha.
– Beta for Min node means that Min can guarantee a value below
Beta.
Computation:
– Alpha of a Max node is the maximum value of its seen children.
– Beta of a Min node is the lowest value seen of its child node .
ICS-171:Notes 6: 21
When to Prune
•
Pruning
– Below a Min node whose beta value is lower than or equal to the
alpha value of its ancestors.
– Below a Max node having an alpha value greater than or equal to
the beta value of any of its Min nodes ancestors.
ICS-171:Notes 6: 22
α-β pruning example
ICS-171:Notes 6: 23
α-β pruning example
ICS-171:Notes 6: 24
α-β pruning example
ICS-171:Notes 6: 25
α-β pruning example
ICS-171:Notes 6: 26
α-β pruning example
ICS-171:Notes 6: 27
Properties of α-β
•
•
Pruning does not affect final result
•
•
Good move ordering improves effectiveness of pruning
•
With "perfect ordering," time complexity = O(bm/2)
 doubles depth of search
•
A simple example of the value of reasoning about which computations are relevant (a
form of metareasoning)
•
ICS-171:Notes 6: 28
Effectiveness of Alpha-Beta Search
•
Worst-Case
– branches are ordered so that no pruning takes place. In this case
alpha-beta gives no improvement over exhaustive search
•
Best-Case
– each player’s best move is the left-most alternative (i.e., evaluated
first)
– in practice, performance is closer to best rather than worst-case
•
In practice often get O(b(d/2)) rather than O(bd)
– this is the same as having a branching factor of sqrt(b),
• since (sqrt(b))d = b(d/2)
• i.e., we have effectively gone from b to square root of b
– e.g., in chess go from b ~ 35 to b ~ 6
• this permits much deeper search in the same amount of time
ICS-171:Notes 6: 29
Why is it called α-β?
•
α is the value of the best (i.e.,
highest-value) choice found so far
at any choice point along the path
for max
•
•
If v is worse than α, max will
avoid it
•
 prune that branch

•
•
Define β similarly for min
ICS-171:Notes 6: 30
The α-β algorithm
ICS-171:Notes 6: 31
Resource limits
Suppose we have 100 secs, explore 104 nodes/sec
 106 nodes per move
Standard approach:
•
cutoff test:
e.g., depth limit (perhaps add quiescence search)
•
evaluation function
= estimated desirability of position
ICS-171:Notes 6: 32
Evaluation functions
•
For chess, typically linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
•
e.g., w1 = 9 with
f1(s) = (number of white queens) – (number of black queens), etc.
ICS-171:Notes 6: 33
Cutting off search
MinimaxCutoff is identical to MinimaxValue except
1.
Terminal? is replaced by Cutoff?
2.
Utility is replaced by Eval
3.
Does it work in practice?
bm = 106, b=35  m=4
4-ply lookahead is a hopeless chess player!
–
–
–
–
4-ply ≈ human novice
8-ply ≈ typical PC, human master
12-ply ≈ Deep Blue, Kasparov
ICS-171:Notes 6: 34
Deterministic games in practice
•
•
Checkers: Chinook ended 40-year-reign of human world champion Marion
Tinsley in 1994. Used a precomputed endgame database defining perfect
play for all positions involving 8 or fewer pieces on the board, a total of 444
billion positions.
•
•
Chess: Deep Blue defeated human world champion Garry Kasparov in a
six-game match in 1997. Deep Blue searches 200 million positions per
second, uses very sophisticated evaluation, and undisclosed methods for
extending some lines of search up to 40 ply.
•
•
Othello: human champions refuse to compete against computers, who are
too good.
•
•
Go: human champions refuse to compete against computers, who are too
bad. In go, b > 300, so most programs use pattern knowledge bases to
suggest plausible moves.
•
ICS-171:Notes 6: 35
Iterative (Progressive) Deepening
•
In real games, there is usually a time limit T on making a move
•
How do we take this into account?
– using alpha-beta we cannot use “partial” results with any confidence
unless the full breadth of the tree has been searched
– So, we could be conservative and set a conservative depth-limit
which guarantees that we will find a move in time < T
• disadvantage is that we may finish early, could do more search
•
In practice, iterative deepening search (IDS) is used
– IDS runs depth-first search with an increasing depth-limit
– when the clock runs out we use the solution found at the previous
depth limit
ICS-171:Notes 6: 36
Heuristics and Game Tree Search
•
The Horizon Effect
– sometimes there’s a major “effect” (such as a piece being captured)
which is just “below” the depth to which the tree has been expanded
– the computer cannot see that this major event could happen
– it has a “limited horizon”
– there are heuristics to try to follow certain branches more deeply to
detect to such important events
– this helps to avoid catastrophic losses due to “short-sightedness”
•
Heuristics for Tree Exploration
– it may be better to explore some branches more deeply in the
allotted time
– various heuristics exist to identify “promising” branches
ICS-171:Notes 6: 37
Summary
•
Game playing is best modeled as a search problem
•
Game trees represent alternate computer/opponent moves
•
Evaluation functions estimate the quality of a given board configuration
for the Max player.
•
Minimax is a procedure which chooses moves by assuming that the
opponent will always choose the move which is best for them
•
Alpha-Beta is a procedure which can prune large parts of the search
tree and allow search to go deeper
•
For many well-known games, computer algorithms based on heuristic
search match or out-perform human world experts.
•
Reading:R&N Chapter 5.
ICS-171:Notes 6: 39