Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
M369, Game Theory Part 2, diagrams etc. 1 Part 2: Solving easy games. (several examples are not from AJJ's book, so numbers do not correspond.) Definition. Solving a game is the operation of finding an optimum strategy for each player and what payoff each player can expect, against intelligent opponents. Very often this is impossible. As explained in Part 1 , we are only considering matrix games. So we only give the matrices. Definition. Let M be a matrix. We say that row i dominates row j if every entry of row i is the corresponding entry in row j . We say that column k dominates column l if every entry of column k is the corresponding entry in column l . Example 2.1. Inset. Jill vs Mary. Every entry of col 1 is the corresponding entry of col 3 . So whatever Jill does, Mary will be better off if she plays M3 than M1 . So we 1 can delete the first column. This leaves a 3 2 matrix in Jill 2 which every entry of row 2 is the corresponding entries of either row 1 or 3 . So Jill should play J2 . Then M 3 is intelligent so she can follow J's thinking. So M knows that J will play J2, then M3 is better for M than M2, so this game is soluble. Mary 1 2 3 3 0 1 2 5 2 3 3 -1 Solution: J2 , M3 , value 2 Beware of two booby traps when rejecting dominated strategies. (1) Row i dominates row j if every entry in i is larger, but col i dominates col j if every entry of col i is smaller. (2) when you cross out a row or col, don’t relabel the ones remaining. More generally: Definition. Let X be a player in any sort of game and , any 2 of X's strategies. If the game has no chance element, we say that dominates if the payoff to X from is always at least as good as the payoff from for every possible choice of strategies by all of the other players. If the game has chance elements, we say that dominates if the mean payoff to X from (averaged over all choices by Nature) is always at least as great as that from . Two of X's strategies are equivalent if they give the same expected payoff to X for all possible choices of strategies by the other players. Strategy strictly dominates if dominates and is not equivalent, i.e. if sometimes gives a strictly greater payoff to X . Theorem 2.1. Any finite non-cooperative game of complete information can be solved in finite time by rejecting dominated strategies and pruning the game tree. Explanation: "in finite time" means that the amount of calculation might be far too large. For example in Chess there are 20 possible first moves for either player. If we make a crude estimate by accepting this 20 as typical, after 30 moves by each player, there are ~~1080 possible states: more than the estimated number of electrons in the universe. Proof. Let N be the length of the game, then there is at least one play of that length. Let S be the state just before the last move in that play, and X the player who is due to move at S . Then any state that follows S must be a final state. M369, Game Theory Part 2, diagrams etc. 2 Case 1. X has only one legal move, say from S to T . Then we can delete the state T . Whatever payoff any player would have got at T (s)he gets at S . Case 2. X has several legal moves at S , say from S to T1 , T2 , . . . Tk with respective payoffs to X of p1 , p2 , . . . pk . By renumbering, we may assume that p1 is the largest. Let be any strategy of X that says "move from S to Ti " . Then S has a strategy that is identical to at every other state, but says "move from S to T1 " . Since p1 pi , is dominated by we can delete it. Hence we can delete any of X's strategies that move to any Ti different from T1 . Then all of the states T2 , . . . Tk become inaccessible, so we can prune them. Case 3. X is Nature. We know (on average) what she will do. On arrival at S Nature will move to one of the states T1 , T2 , . . . Tk with respective probabilities 1 , 2 , . . . k . For each player (Y, say) the mean payoff if the game reaches S will be ipi where pi is the payoff to Y at Ti . So again we can prune the tree, removing all the states Ti and attaching these payoffs at state S . Continuing this way, eventually the game is reduced to a null game (no moves allowed, everybody collects their mean payoff at the initial state). For every state in the original game, the argument of Case 2 finds an optimum move for whoever has the move at that state. This Theorem is equivalent to Th 2.1 in Jones, but I think the proof is easier. Example: 2—2 Nim: strategy B3 always wins for B . Partial Algorithm. for finding dominated rows & columns in a matrix game. "Partial" because it only gets you part way. When searching for dominated rows, the following is a good way to -1 0 2 start. For each column, let cj be the greatest value in column j . Mark the cells in column j where cj appears. Suppose cj appears 1 3 2 in cell p,j and nowhere else in that column. Then row p cannot be 4 -4 -3 dominated by any rows that have lesser values in column j . The 5 0 1 inset is assumed to be the left hand 3 columns of a larger matrix. So in column 1 the max value 5 appears in row 4 and nowhere else, -2 3 -1 row 4 cannot be dominated. In column 2 the max 3 appears 0 3 0 in rows 2 , 5 , 6 and nowhere else these 3 rows cannot be 3 2 -4 dominated by any others. But in column 3 the max value 2 appears in rows 1 and 2 and nowhere else. Combining these 2 2 1 -2 columns we see that Row 2 cannot be dominated. Similarly for columns. For each row, let ri be the least value in row i . Then the columns that contain value ri ( in that row ) cannot be dominated by any columns that have larger values in that row. In this way, we try to build a list of "indomitable" rows and columns. In a matrix of any size, checking for domination is tedious, and this process is helpful for reducing the tedium. Definition. Let G be a r-person game, and r a set of strategies for the r players. We represent this set as a sort of vector . A deviation from is a set of strategies in which precisely one player (the deviant) chooses a different strategy. We represent a deviation as ||k where k is the number of the player who deviates. M369, Game Theory Part 2, diagrams etc. 3 Definition. A set of strategies is an equilibrium set if for each player ( k , say) the payoff to k from any deviation ||k is never better that the payoff from . Ex 2.2. This game has no dominated row or column, but it has a so called saddle point at B1—J3 . Definition. Let M be a matrix. For each row, say row i , let ri be the least value in row i and let cj be the greatest value in Joe 1 2 3 1 2 0 -1 -1 2 -5 2 -2 -5 3 4 -3 -2 -3 col max 4 2 -1 Bill column j . Then the maximin of M is the largest ri and the minimax of M is the row min least cj . Alternatively: maximin = maxi ( minj Mij ) and minimax = minj ( maxi Mij ) Definition. We say that a matrix game based on matrix M has a saddle point if maximin(M) = minimax(M) . or max(column min) = min( row max) Another definition: M has a saddle point at cell ij if Mij is both the largest number in its row and the smallest number in its column . Theorem 2.2. If a matrix game with matrix M has a saddle point, say at cell ij , then that row and column are optimum strategies for their respective players. Theorem 2.3. In any matrix M maximin(M) is always <= minimax(M) Proof. Define ri and cj as above. Suppose that the greatest row min occurs in row p . Then maximin(M) = rp = min( row p ) . Likewise, suppose that the least column max occurs in column q . Then for any row, say row i : Miq = an entry in col q max ( col q ) = cq = minimax (M) but also Miq = an entry in row i min( row i ) = ri Putting these together: minimax (M) ri for every i . In particular, when i = p , we get minimax (M) rp = maximin (M) Example 2.3. 2 politicians called Blear and Smut are about to fight an election and they can either spend the last 2 days of the campaign in the city of Rundown or in Burningup or 1 day in each city. Plans must be made in advance & neither Smut min knows what the other intends. Estimated gains in RR BB RB seats (by Blear ) are: RR -3 2 6 -3 Although I introduced domination before saddle Blear BB 2 0 2 0 points I think that SP’s (if they exist) are both easier RB 5 -2 -4 -4 to find & more useful. So: given a matrix game, always start by looking for a saddle point. If that fails, look for domination. col max 5 0 6 M369, Game Theory Part 2, diagrams etc. Unfortunately, most matrices have neither SP’s nor domination: so what next? 4