* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 5: Mixed strategies and expected payoffs
Survey
Document related concepts
Transcript
Lecture 5: Mixed strategies and expected payoffs As we have seen for example for the Matching pennies game or the Rock-Paper-scissor game, sometimes game have no Nash equilibrium. Actually we will see that Nash equilibria exist if we extend our concept of strategies and allow the players to randomize their strategies. In a first step we review basic ideas of probability and introduce notation which will be useful in the context of game theory. Mini-review of probability: Probability: Let us suppose that an experiment results in N possible outcomes which we denote by {1, · · · , N }. We assign a probability σ(i) to each possible outcomes σ(i) = Probability that outcome i occurs The numbers σ(i) are such that σ(i) ≥ 0 , X σ(i) = 1 . i You may think that σ(i) as a frequency: if the experiment is repeated many times, say X times with X very large then if you observe that the event i occurs X(i) times out (this is the Law of Large Numbers). X experiments then σ(i) ≈ X(i) X We will represent the probability of all the N outcomes of the experiment by a probability vector σ = (σ(1), · · · , σ(N )) , with σ(i) ≥ 0 and N X σ(i) = 1 , Probability vector i=1 For example if N = 2 then the probability vectors can be written as σ = (p, 1 − p) with 0 ≤ p ≤ 1 . For N = 3 we the probability vectors can be written as σ = (p, q, 1 − p − q) with 0 ≤ p ≤ 1 and 0 ≤ q ≤ 1 . 1 Expected value: Suppose that the experiment results in a certain reward. Let us say that that the outcome i leads to the reward f (i) for each i. Then if we repeat the experiment many times the average reward is given by f (1)σ(1) + f (2)σ(2) + · · · + f (N )σ(N ) We can think of f as a function with f (i) being the reward for outcome i. In probability the function f is called a random variable and the average reward is called the expected value of the random variable f and is denoted by E[f ], that is, E[f ] = N X f (i)σ(i) i=1 It will be useful to use a vector notation for random variable and expected value. We form a vector f as f = (f (1), f (2), · · · , f (N )) The expected value can be written using the scalar product: if x and y are two vectors in RN then the scalar product is given by hx, yi = N X xi yi = x1 y1 + · · · + xN yN i=1 We have then E[f ] = hf, σi , Expected value of the random variable f Mixed strategies and expected payoffs: In the context of game we now allow the players to randomize their strategies. Suppose player R has N strategies at his disposal and we will number them 1, 2, 3, · · · , N . From now we will call these strategies pure strategies. In a pure strategy the player makes a choice which involves no chance or probability. Definition 1. A mixed strategy for player R is a probability vector σR = (σR (1), · · · , σR (N )) σR (i) = Probability that R plays strategy i A pure strategy for player R is a special case of a mixed strategy Strategy i ←→ σR = (0, · · · , |{z} 1 , · · · , 0) ith 2 If a player plays a mixed strategy then its opponent’s payoff becomes a random variable and we will assume that every player is trying to maximize his expected payoff. To compute this it will be useful to use matrix notation. We assume that player R has N strategies to choose from and that player C has M strategies to choose from, then the payoff matrices PR has N rows and M columns 1 PR (1, 1) PR (2, 1) .. . 2 PR (1, 2) PR (2, 2) .. . ··· ··· ··· 1 2 PR = N PR (N, 1) PR (N, 2) · · · M PR (1, M ) PR (2, M ) .. . PR (N, M ) If we multiply the matrix PR with the vector σC we obtain the vector PR (1, 1) PR (1, 2) · · · PR (1, M ) σC (1) PR (2, 1) PR (2, 2) · · · PR (2, M ) σC (2) PR σ C = .. .. .. .. . . . . PR (N, 1) PR (N, 2) · · · PR (N, M ) σ(M ) PR (1, 1)σC (1) + · · · + PR (1, M )σC (M ) PR (2, 1)σC (1) + · · · + PR (2, M )σC (M ) = .. . PR (N, 1)σC (1) + · · · + PR (N, M )σC (M ) and we see that the it h entry of PR σC is PR σC (i) = PR (i, 1)σC (1) + · · · + PR (i, M )σC (M ) = Expected payoff for R to play i against σC Further if we think that R is playing a mixed strategy σR then his payoff will be then PR σC (i) with probability σR (i) and so we have hσR , PR σC i = Expected payoff for R to play σR against σC We can argue in the same way to compute payoffs for C but to do this we should interchange rows and columns or in other words we consider the transpose matrix PCT 3 (which has M rows and N columns). If we multiply this matrix by the vector σR we find PC (1, 1) PC (2, 1) · · · PC (N, 1) σR (1) PC (1, 2) PC (2, 2) · · · PC (N, 2) σR (2) T PC σR = .. .. .. .. . . . . PC (1, M ) PC (2, M ) · · · PC (N, M ) σR (N ) PC (1, 1)σR (1) + PC (2, 1)σR (2) + · · · + PC (N, 1)σR (N ) PC (1, 2)σR (1) + PC (2, 2)σR (2) + · · · + PR (N, 2)σR (N ) = .. . PC (1, M )σR (1) + PC (2, M )σR (2) · · · + PR (N, M )σR (N ) and we find that PCT σR (i) = PC (1, i)σR (1) + · · · + PC (N, i)σR (N ) = Expected payoff for C to play i against σR If we average over the strategy of σC we find that the expected payoff for C to play σC against σR is given by hPCT σR , σC i. By the property of the transpose matrix we have hPCT σR , σC i = hσR , PC σC i and so we have hσR , PC σC i = Expected payoff for C to play σC against σR Example: Consider the game with payoff table Collin B 1 5 0 5 A Robert I II 0 10 1 10 C -2 4 -1 10 If σC = (p, q, 1 − p − q) then the payoff for R are given by p 10 5 4 4 + 6p + q = q PR σC = 10 5 10 10 − 5q 1−p−q 4 (1) so the payoff for Robert to play I against σC is 4 + 6p + q and to play II the payoff is 10 − 5q. If R has the mixed strategy (r, 1 − r) then the payoffs for C are given by 0 1 1−r r 0 r PCT σR = 1 = (2) 1−r −1 − r −2 −1 and so the payoff for Collin to play A against σR is 1 − r, to play B it is r and to play C it is −1 − r. 5