* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Alpha-Beta Example
Survey
Document related concepts
Transcript
Artificial Intelligence Game Playing State of the art State of the art AI Game • Game is primarily behavioral, since this is how player’s perceive the intelligence • Game is not focused on winning – it enhances play and enjoyment Rock-paper-scissors Column player Ahmed player 2 (simultaneously) chooses a column Row player Ahmed player 1 chooses a row A row or column is called an action or (pure) strategy 0, 0 -1, 1 1, -1 1, -1 0, 0 -1, 1 -1, 1 1, -1 0, 0 Row player’s utility is always listed first, column player’s second Zero-sum game: the utilities in each entry sum to 0 (or a constant) Three-player game would be a 3D table with 3 utilities per entry, etc. “Chicken” • Two players drive cars towards each other • If one player goes straight, that player wins • If both go straight, they both die S D D S D D S S 0, 0 -1, 1 1, -1 -5, -5 not zero-sum Virtua Fighter 2 & 4 • Virtua Fighter 2 opponents could do basic pattern recognition (learn favored moves and sequences) to determine which countermoves to use • Virtua Fighter 4 allows you to train a fighter to use your moves and combos, then have them fight for you Creatures 1, 2 & 3 • Creatures uses neural nets to learn behaviors over time and genetic algorithm concepts to breed neural nets and creatures • This game is an example of Artificial Life Half-life • Uses “schedule-driven state machines” to control’ behavior – The state determines the schedule of behavior available to it • Uses flocking and formation rules to control squad behavior. Unreal • Unreal has enemy flocking behaviors, similar to Half-life • Unreal has extensive script language that allows the development of AI bots The Sims – Embeds behavior code in the objects themselves • How to do the behaviors • Conditions for the behaviors – An object-oriented approach – Allows a VERY extensible environment Racing Games • Dirt Track Racing : Uses neural networks to control opponent driving • Formula 1 Grand Prix 2 : Opponent driving styles based on real Formula 1 drivers • General ideas: – Most opponents use pre-stored information on how to approach upcoming track – Opponents could be trained based on real drivers with specific styles of driving Game playing • Many similarities to search • Most of the games studied – have two players, – are zero-sum: what one player wins, the other loses – have perfect information: the entire state of the game is known to both players at all times • E.g., tic-tac-toe, checkers, chess, Go, backgammon, … • Will focus on these for now • Recently more interest in other games – Esp. games without perfect information; e.g., poker • Need probability theory, game theory for such games Types of Games Perfect Information Games • We consider 2 players perfect information games • Perfect Information: both players know everything there is to know about the game position – no hidden information – no random events – two players need not have same set of moves available – examples are Chess, Go, Checkers, O’s and X’s 15 Game Trees • A game tree is like a search tree – nodes are search states, with full details about a position – edges between nodes correspond to moves – leaf nodes correspond to determined positions • e.g. Win/Lose/Draw • number of points for or against player – at each node it is one or other player’s turn to move 16 Basis of Game Playing: Search for best move every time Search for Opponent Move 1 Moves Initial Board State Board State 2 Search for Opponent Move 3 Moves Board State 4 Board State 3 Board State 5 Relation of Games to Search • Search – Solution is (heuristic) method for finding goal – Heuristics can find optimal solution – Evaluation function: estimate of cost from start to goal through given node – Examples: path planning, scheduling activities • Games – Solution is strategy (strategy specifies move for every possible opponent reply). – Time limits force an approximate solution – Evaluation function: evaluate “goodness” of game position – Examples: chess, checkers, Othello, backgammon Coping with impossibility • It is usually impossible to solve games completely • This means we cannot search entire game tree – we have to cut off search at a certain depth • like depth bounded depth first, lose completeness • Instead we have to estimate cost of internal nodes • Do so using evaluation function 19 Evaluation functions • Evaluations how good a ‘board position’ is – Based on static features of that board alone • Zero-sum assumption lets us use one function to describe goodness for both players. – f(n)>0 if we are winning in position n – f(n)=0 if position n is tied – f(n)<0 if our opponent is winning in position n • Build using expert knowledge, – Tic-tac-toe: f(n)=(# of 3 lengths open for me) - (# open for you) 20 Example Chess Score • Black has: – 5 pawns, 1 bishop, 2 rooks • Score = 1*(5)+3*(1)+5*(2) = 5+3+10 = 18 White has: – 5 pawns, 1 rook • Score = 1*(5)+5*(1) = 5 + 5 = 10 Overall scores for this board state: black = 18-10 = 8 white = 10-18 = -8 Some Chess Positions and their Evaluations White to move f(n)=(9+3)-(5+5+3.25) =-1.25 … Nxg5?? f(n)=(9+3)-(5+5) =2 So, considering our opponent’s possible responses would be wise. 22 Uh-oh: Rxg4+ f(n)=(3)-(5+5) =-7 And black may force checkmate Hexapawn: Simplified Game Tree for 2 Moves White moves ….. Black moves For Trivial Games • Draw the entire search space • Put the scores associated with each final board state at the ends of the paths • Move the scores from the ends of the paths to the starts of the paths – Whenever there is a choice use minimax assumption – This guarantees the scores you can get • Choose the path with the best score at the top – Take the first move on this path as the next move Entire Search Space O’s and X’s • A simple evaluation function for O’s and X’s is: – Count lines still open for me (maX), – Subtract number of lines still open for you (min) – evaluation at start of game is 0 • Evaluation functions are only heuristics – e.g. might have score -2 but maX can win at next move •O - X •- O X •- - • Use combination of evaluation function and search 26 A Partial Game Tree for Tic-Tac-Toe f(n)=6-6=0 f(n)=6-4=2 f(n)=4-4=0 f(n)=4-3=1 -∞ 27 f(n)=2 f(n)=2 f(n)=2 f(n)=2 f(n)=2 f(n)=2 f(n)=0 f(n)=0 f(n)=4-3=1 0 f(n)=2 f(n)=2 f(n)=4-2=2 +∞ f(n)=# of potential three-lines for X – # of potential three-line for Y if n is not terminal f(n)=0 if n is a terminal tie f(n)=+ ∞ if n is a terminal win f(n)=- ∞ if n is a terminal loss CSE 391 - Intro to AI MiniMax • Assume that both players play perfectly – Therefore we cannot optimistically assume player will miss winning response to our moves • Consider Min’s strategy – wants lowest possible score, ideally - – but must account for Max aiming for + – Min’s best strategy is: • choose the move that minimizes the score that will result when Max chooses the maximizing move – hence the name MiniMax 28 MINI MAX • Restrictions: – 2 players: MAX (computer) and MIN (opponent) deterministic, perfect information Select a depth-bound and evaluation function MAX MIN MAX Select this move 3 2 2 3 1 5 3 1 4 4 3 - Construct the tree up till the depth-bound - Compute the evaluation function for the leaves - Propagate the evaluation function upwards: - taking minima in MIN - taking maxima in MAX Modified game • From leaves upward, analyze best decision for player at node, give node a value 6 Player 1 0 Player 2 -1 0 -1 1 Player 1 -1 1 -2 Player 2 6 0 Player 1 1 1 0 6 4 0 1 0 -3 4 -5 7 Player 1 1 0 6 7 Player 1 1 -8 Entire Search Space Moving the scores from the bottom to the top Moving a score when there’s a choice • Use minimax assumption – Rational choice for the player below the number you’re moving Choosing the best move Minimax algorithm Properties of minimax • • • • • • • • Complete? Yes (if tree is finite) Optimal? Yes (against an optimal opponent) Time complexity? O(bm) Space complexity? O(bm) (depth-first exploration) • For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible Multiplayer games • Games allow more than two players • Single minimax values become vectors Exercise What is value at the root? Game Playing – Example • Nim (a simple game) • Start with a single pile of tokens • At each move the player must select a pile and divide the tokens into two non-empty, non-equal piles + + + 7 6-1 5-1-1 5-2 4-2-1 4-1-1-1 4-3 3-2-2 3-2-1-1 3-1-1-1-1 3-3-1 2-2-2-1 2-2-1-1-1 2-1-1-1-1-1 Maximilian vs. Minerva (7,Min) (6,1,Max) (5,1,1,Min) (5,2,Max) (4,2,1,Min) (4,1,1,1,Max) (3,1,1,1,1,Min) (2,1,1,1,1,1,Max) (4,3,Max) (3,2,2,Min) (3,2,1,1,Max) (3,3,1,Min) (2,2,2,1,Max) (2,2,1,1,1,Min) Game tree From M. T. Jones, Artificial Intelligence: A Systems Approach Current board: X’s move Minimax algorithm: Example From M. T. Jones, Artificial Intelligence: A Systems Approach Current board X’s move Problem of minimax search • Number of games states is exponential to the number of moves. – Solution: Do not examine every node – ==> Alpha-beta pruning • Remove branches that do not influence final decision Alpha-beta pruning • Basic idea: “If you have an idea that is surely bad, don't take the time to see how truly awful it is.” -- Pat Winston MAX >=2 MIN =2 • We don’t need to compute the value at this node. <=1 MAX 2 7 1 ? • No matter what it is, it can’t affect the value of the root node. Example of Alpha-Beta Pruning player 1 player 2 • Depth first search a good idea here – See notes for explanation Alpha-Beta Pruning for Player 1 1. Given a node N which can be chosen by player one, then if there is another node, X, along any path, such that (a) X can be chosen by player two (b) X is on a higher level than N and (c) X has been shown to guarantee a worse score for player one than N, then the parent of N can be pruned. 2. Given a node N which can be chosen by player two, then if there is a node X along any path such that (a) player one can choose X (b) X is on a higher level than N and (c) X has been shown to guarantee a better score for player one than N, then the parent of N can be pruned. Modified game • From leaves upward, analyze best decision for player at node, give node a value 6 Player 1 0 Player 2 -1 0 -1 1 Player 1 -1 1 -2 Player 2 6 0 Player 1 1 1 0 6 4 0 1 0 -3 4 -5 7 Player 1 1 0 6 7 Player 1 1 -8 Alpha and Beta values • Mx node has value – the alpha value is lower bound on the exact minimax score – with best play Mx can guarantee scoring at least • Min node has value – the beta value is upper bound on the exact minimax score – with best play Min can guarantee scoring no more than • At Max node, if an ancestor Min node has < – Min’s best play must never let Max move to this node • therefore this node is irrelevant – if = , Min can do as well without letting Max get here • so again we need not continue 49 Alpha-Beta Pruning Rule • Two key points: – alpha values can never decrease – beta values can never increase • Search can be discontinued at a node if: – It is a Max node and • the alpha value is the beta of any Min ancestor • this is beta cutoff – Or it is a Min node and • the beta value is the alpha of any Max ancestor • this is alpha cutoff 50 Rules for Alpha-beta Pruning • Alpha Pruning: Search can be stopped below any MIN node having a beta value less than or equal to the alpha value of any of its MAX ancestors. • Beta Pruning: Search can be stopped below any MAX node having a alpha value greater than or equal to the beta value of any of its MIN ancestors. Alpha-Beta Pruning Summary • Alpha = the value of the best choice we’ve found so far for MAX (highest) • Beta = the value of the best choice we’ve found so far for MIN (lowest) • When maximizing, cut off values lower than Alpha • When minimizing, cut off values greater than Beta α-β pruning example α-β pruning example α-β pruning example α-β pruning example α-β pruning example Alpha-beta example 3 MAX 3 MIN 3 12 8 14 1 - prune 2 - prune 2 14 1 Another α-β Pruning Example A B D H 6 C E F G I J K L M 5 8 10 2 1 N 15 O 18 With alpha-beta pruning (view presentation for animation) Minimax with Alpha-Beta pruning alpha=3 Max Min beta=3 Max 5> beta, so prune 3 3 My turn 0< alpha, so prune 3 No more branches, so this is the value 2< alpha, so prune Opp turn 0 2 My turn Min 2 3 5 0 2 1 Max 6 Min 6 A B C 2 6 Max D >=8 E F 2 G H I J 6 5 8 Computer Move K L M 2 1 Opponent Move Properties of α-β • Pruning does not affect final result • • Good move ordering improves effectiveness of pruning • • With "perfect ordering," time complexity = O(bm/2) doubles depth of search • A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) • Alpha-Beta Example 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 Alpha-Beta Example 1 0 1 0 0 0 0 -3 2 2 2 1 3 3 1 2 2 1 -5 2 1 -5 2 -3 -5 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 The Game Rules: 1. Red goes first 2. On their turn, a player must move their piece 3. They must move to a neighboring square, or if their opponent is adjacent to them, with a blank on the far side, they can hop over them 4. The player that makes it to the far side first wins. Game Tree Example MIN MAX 10 11 9 12 14 15 13 14 5 2 4 1 3 22 20 21 Stage 1 α = -∞ α = -∞ α = -∞ ? β=∞ β=∞ β=∞ α = -∞ β = ∞ 10 11 9 12 14 15 13 14 5 2 4 1 3 22 20 21 Stage 1 ? α = 10 β=∞ 10 α = -∞ β = 10 α = 10 β = ∞ 10 10 11 9 12 14 15 13 14 5 2 4 1 3 22 20 21 Stage 2 – Shallow Pruning α = -∞ β = 10 10 α = 10 10 β = ∞ α = -∞ β = 10 9 α = 10 β = 9 α = -∞ β = 10 10 11 9 12 14 15 13 14 5 2 4 1 3 22 20 21 Game Tree example contd. α = 10 β=∞ 10 α = -∞ 10 β = 10 14 α = 14 β = 10 14 α = -∞ β = 10 10 11 9 12 14 15 13 14 5 2 4 1 3 22 20 21 Game Tree example contd. α = 10 β=∞ α = 10 α = 10 β=∞ β=∞ α = 10 β = ∞ 10 11 9 12 14 15 13 14 5 2 4 1 3 22 20 21 Game Tree example contd. α = 10 β=∞ 5 α = 10 β = 5 10 11 9 12 14 15 13 14 5 2 α = 10 β = ∞ 4 1 3 22 20 21 Game Tree example contd. 10 5 α = 10 β=5 5 α = 10 β=∞ 4 α = 10 β = 4 10 11 9 12 14 15 13 14 5 2 4 1 3 22 20 21 The α-β algorithm The α-β algorithm Games of chance Ex.: Blackgammon: Form of the game tree: Games that include chance chance nodes Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5- • 11,11-16) [1,1], [6,6] chance 1/36, all other chance 1/18 • “2/3 of the average” game • Everyone writes down a number between 0 and 100 • Person closest to 2/3 of the average wins • Example: – – – – – – A says 50 B says 10 C says 90 Average(50, 10, 90) = 50 2/3 of average = 33.33 A is closest (|50-33.33| = 16.67), so A wins The End 107