Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Nash equilibrium wikipedia , lookup
Strategic management wikipedia , lookup
Artificial intelligence in video games wikipedia , lookup
Porter's generic strategies wikipedia , lookup
Prisoner's dilemma wikipedia , lookup
The Evolution of Cooperation wikipedia , lookup
M369, Game Theory Part 3, diagrams etc. 1 Part 3: 2m games. So far, I have discussed domination and saddle points. Unfortunately most matrix games cannot be solved by either method. If a game is of any size, then typically it has a few dom. rows and cols. After you have removed all of these, there is no saddle point. Consider the game shown here. Dither chooses a card, either red or black, and plays it face down. Swither must then call red or black. He wins £1 if he called right, loses £1 if he called wrong. Look at the matrix (inset). This game has no saddle and no domination, so it cannot be row min Swither R B R -1 1 -1 B solved in pure strategies. This impasse is similar to one that occurs in the theory of equations. After solving a few col max simple quadratic equations, we find that x2 + 1 = 0 has no real solution. This discovery led to the invention of complex numbers. 1 -1 -1 1 1 Dither Going back to Dither/Swither. Clearly if either D or S plays predictably, then the other can win. The only way that either player can defend against an intelligent opponent is to play unpredictably. So in a long sequence of plays, D must choose red and black at random, and by symmetry, he should choose them equally often. Likewise S should choose R and B at random, equally often. Then each player will win half and lose half. Suppose that D deviates, say by calling R 2/3 of the time and B 1/3 . If S continues to play correctly, D does not gain anything. Also, as soon as S notices that D is deviating, S can penalise D by calling R more often. So D has no incentive to deviate, and S ditto by the same argument. Definition. A mixed strategy is a decision by one player to make a choice between some of his pure strategies, regulated by chance. Definition. Let V be a vector space over the reals and u1 , u2 , . . . ur a set of vectors in V . A convex combination of these vectors is a linear combination: v = iui where the numbers i are all 0 and their sum is 1 . Then we can regard any mixed strategy as a convex combination of that player's pure strategies. Now change the numbers: Pamela vs Diana, inset. After we delete the dominated col D1 and row P3, there is no saddle point and no dom. row or col, we look for mixed strategies. (2nd inset). Algorithm 3.1. To find correct mixed strategies for a 22 matrix game with no saddle point and no dominated rows or columns. Diana Pamela D1 D2 D3 P1 3 0 2 P2 2 5 1 P3 3 3 -1 WARNING: This will always give the wrong answer if the matrix has a saddle point or dominated strategies. Diana Pamela D2 D3 P1 0 2 P2 5 1 M369, Game Theory Part 3, diagrams etc. 2 Let M be the original matrix: take Pamela v Diana as example, find Diana's strategy first: 0 2 5 1 Add an extra row containing row 1 row 2 . -5 One of these numbers will be < 0 ; change its sign. Swop the 2 numbers 1 Call these numbers p and q . To convert these to probabilities, divide by p+q . This gives: 1/6 5/6 So Diana should play D2 with probability 1/6 and D3 with probability 5/6 each (and not D1). Now do the same for the row player Pamela . Start: 0 2 then col 1 2 5 1 col 2 = Change -2 to 2 and swop 4 2 Probabilities: So Pamela should play P1 with prob. 2/3 and P2 with prob. 1/3 and never P3 . Go back to the original game. This was as in the inset, and we rejected D1 and P3 which were dominated. Find the payoffs. 2/3 1/3 Diana Pamela From the previous work, we know that Pamela should play P1 with prob. 2/3 and P2 with prob. 1/3 and never P3 . We can write this as D1 D2 D3 P1 3 0 2 P2 2 5 1 P3 3 3 -1 P* = 2/3 P1 + 1/3 P2 + 0 P3 If she does this then depending on what Diana does, Pamela's payoffs will be given by 2/3 row 1 + 1/3 row 2 which is Diana D1 D2 D3 8/3 5/3 5/3 Note that Diana is penalised if she plays the bad strategy D1 . Likewise we found that Diana should play D2 with probability 1/6 and D3 with probability 5/6 (and not D1). D* = 0 D1 + 1/6 D2 + 5/6 D3 If Diana does this then depending on what Pamela does, Pamela's payoffs will be given by 1/6 column 2 + 5/6 column 3 which is Pamela P1 5/3 P2 5/3 P3 1/3 Definition. Let J be a matrix game with no saddle point. An active strategy is a pure strategy (for either player) that appears with probability > 0 in that player's optimum mixed strategy. M369, Game Theory Part 3, diagrams etc. 3 Diana's active strategies are D2 and D3 , Pamela's are P1 and P2 . If P plays her optimum strategy, D gets her expected payoff of 5/3 from either of her active strategies but she gets a worse payoff from inactive D3. Conversely if D plays her opt strategy. This is actually a general result; it is the way you test a mixed strategy to see if it is optimum. Definition. The value of a matrix game is the mean payoff to the first player if both players play their optimum strategies. Theorem 3.1. (not proved here) Let M be a matrix game, v the value, and X , Y the 2 players. If Y (the 2nd player) plays her optimum strategy, the mean payoff to X will be equal to v if X plays any of her active strategies v if X plays any of her non-active strategies. Conversely, if X (the 1st player) plays her optimum strategy, the mean payoff to Y will be equal to v if Y plays any, of her active strategies v if Y plays any of her non-active strategies. Now consider 2m matrix games with m > 2 . Given any 2m matrix game with no saddle point, there is always at least one 22 submatrix whose solution gives optimum strategies for the original game. Example. Elizabeth Henry E1 E2 E3 E4 E5 E6 E7 H1 -6 -1 1 4 7 4 3 H2 7 -2 6 3 -2 -5 7 To solve this, draw a ladder diagram as shown below. This has 2 vertical axes marked with the same scale. The LH axis corresponds to H1 , the RH axis to H2 . For each column of the matrix, say 6 col 1 = put in a line from 6 on the L.H. axis to 7 on the RH axis. The L.H. diagram shows 7 the first 2 columns only. Repeat for all the columns, getting the RH 8 8 8 8 diagram. C Mark in double weight the lines that bound this figure from below . Mark the highest point of this underneath border. The lines that cross at this point correspond to the required columns of the matrix: here 1 and 2 . So the matrix is 6 1 reduced to = M , say . 7 2 6 6 6 6 4 4 4 4 2 2 2 2 0 0 0 -2 -2 -2 -4 -4 -4 -6 -6 -6 -8 -8 -8 0 A -2 P -4 -6 -8 B D We solve as in Algorithm 3.1. Find E's strategy: Row 1 – row 2 of M is ( 13 1 ). Swop, change the 13 to +13: ( 1 13 ) So E should play E1 with prob 1/14 and E2 with prob 13/14 . Depending on what Henry does, his payoff will be 1 2 3 4 5 6 7 M369, Game Theory Part 3, diagrams etc. 4 1/14 column 1 + 13/14 column 2 which is Henry 1 -19/14 2 -19/14 Find H's strategy. 5 9 /14 Column 1 column 2 of M is . Swop, change the 9 to +9 , divide by 14: 9 5 /14 So Henry must play strategy H1 with prob 9/14 and H2 with prob 5/14 . Depending on what Elizabeth does, his payoff will be 9/14 row 1 + 5/14 row 2 which is Elizabeth 1 2 3 4 5 6 7 -19/14 -19/14 39/14 51/14 53/14 11/14 62/14 and again we see that if E plays either of her active strategies, the payoff (to H) is -19/14 , but if E plays any of her non-active strategies, H gets a bigger payoff, which is worse for E . Suppose now that in the diagram the boundary from below has a flat 8 8 segment. This will happen if (for example) E's 2nd column had been 6 6 ( -2 , -2 )T instead of the actual ( -1 , -2 )T . ( T means transpose ). 4 4 Then the ladder diagram will look like this inset, where I have omitted 2 several dominated columns. The lower boundary then has 2 highest 2 0 C corners: A = columns 1 and 2 and B = columns 1 and 6 . If you 0 -2 analyse either of these, you will find that E's best strategy will be pure -2 A B 1 column 2 . Against pure E2 Henry can play H1 or H2 or any -4 -4 2 mixture and he will score 2 . But if he plays H1 , E can penalise -6 -6 him by playing E1 . H must play somewhere along the line between 6 -8 -8 A and B because that will let him penalise E if she deviates. The Henry vs Elizabeth II best point for H is in line with point C. So H's best mixed strategy is got by solving columns 1 and 6 . This is the small table far right. Solving for H only: H should play E1 E6 H* = 6/11 H1 + 5/11 H2 H1 -6 4 This gives payoff 2 against E's optimum strategy E2 , but if E deviates and H2 7 -5 plays either E1 or E6 , his mean payoff rises to 1/11 . Now consider a m 2 matrix, like the first inset. This time the ladder has one line per row . You must find the lowest point on the upper boundary: here this is point A on rows 1 and 6 . So the matrix reduces to the 3rd inset. Solving as above: the row player must play R* = 5/8 R1 + 3/8 R6 and the column player must play C* = 5/8 C1 + 3/8 C2 and the average payoff is 5/4 . 1 2 3 4 5 6 7 1 -1 -3 0 -3 1 5 2 2 5 1 -3 0 -3 -5 -4 1 2 1 -1 5 6 5 -5 5 4 3 2 1 0 -1 -2 -3 -4 -5 A 5 4 3 2 1 0 -1 -2 -3 -4 -5 M369, Game Theory Part 3, diagrams etc. Given 3 lines in a ladder diagram, does line 3 go through the intersection of lines 1 and 2 or above or below? Answer: suppose that line 1 is the steepest of the 3 lines and goes from a to b ; line 2 goes from c to d ; and line 3 from e to f . 5 line 1 a b line 2 c d line 3 e f 1 1 1 So they correspond to the columns in the first inset. Form the matrix K a c e and let be the b d f determinant of K . Lemma . Line 3 passes above / through / below the intersection of lines 1 and 2 according as is >0 or =0 or <0 . Proof. Let X be the point where lines 1 and 2 meet. First, suppose that line 3 does go through X. (first diagram) Then triangles AEX , BFX are similar AE/BF = AX/BX Triangles ACX , BDX are similar c X b f d X b f d e AC/BD = AX/BX = AE/BF ( c – a )( b – f ) = ACBF = AEBD = ( e – a )( b – d ) a Multiply out: collect terms on the RHS: 0 = eb – ab – ed + ad – cb + cf + ab – af = . Now suppose line 3 passes above X . Suppose the line fX meets the LH axis at e' . Then ' (with e' in place of e ) is 0 by the above. c e e' Now is got by replacing e' by e in K' ; a 1 1 1 1 1 1 1 1 0 ' a c e a c e ' 0 0 e e ' (e e ')(b d ) b d f b d f b d 0 where b > d because line 1 = a to b was the steepest . Since Line 3 is above X , e > e' and hence > 0 . If line 3 passes below X then e < e' and so < 0 Part 4 concluded. Given a pair of supposedly-optimal mixed strategies for Roy and Clara, say r and c , how to check if they are genuinely optimal? Answer: Let be the value. If R plays r , he presents C with the row vector rTM . The entries of this vector must all be , with equality for all of C's active strategies. Then C cannot gain by deviating. If C plays c she presents R with the column Mc . All entries of this column must be with equality for all of R's active strategies. Whenever you get an alleged solution, you need to check both r and c .