Download Lecture 5: Mixed strategies and expected payoffs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Strategic management wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Nash equilibrium wikipedia , lookup

The Evolution of Cooperation wikipedia , lookup

Minimax wikipedia , lookup

Prisoner's dilemma wikipedia , lookup

Evolutionary game theory wikipedia , lookup

Chicken (game) wikipedia , lookup

Transcript
Lecture 5: Mixed strategies and expected payoffs
As we have seen for example for the Matching pennies game or the Rock-Paper-scissor
game, sometimes game have no Nash equilibrium. Actually we will see that Nash equilibria
exist if we extend our concept of strategies and allow the players to randomize their
strategies.
In a first step we review basic ideas of probability and introduce notation which will
be useful in the context of game theory.
Mini-review of probability:
Probability: Let us suppose that an experiment results in N possible outcomes which
we denote by {1, · · · , N }. We assign a probability σ(i) to each possible outcomes
σ(i) = Probability that outcome i occurs
The numbers σ(i) are such that
σ(i) ≥ 0 ,
X
σ(i) = 1 .
i
You may think that σ(i) as a frequency: if the experiment is repeated many times,
say X times with X very large then if you observe that the event i occurs X(i) times out
(this is the Law of Large Numbers).
X experiments then σ(i) ≈ X(i)
X
We will represent the probability of all the N outcomes of the experiment by a probability vector
σ = (σ(1), · · · , σ(N )) , with σ(i) ≥ 0 and
N
X
σ(i) = 1 ,
Probability vector
i=1
For example if N = 2 then the probability vectors can be written as
σ = (p, 1 − p) with 0 ≤ p ≤ 1 .
For N = 3 we the probability vectors can be written as
σ = (p, q, 1 − p − q) with 0 ≤ p ≤ 1 and 0 ≤ q ≤ 1 .
1
Expected value: Suppose that the experiment results in a certain reward. Let us say
that that the outcome i leads to the reward f (i) for each i. Then if we repeat the
experiment many times the average reward is given by
f (1)σ(1) + f (2)σ(2) + · · · + f (N )σ(N )
We can think of f as a function with f (i) being the reward for outcome i.
In probability the function f is called a random variable and the average reward is
called the expected value of the random variable f and is denoted by E[f ], that is,
E[f ] =
N
X
f (i)σ(i)
i=1
It will be useful to use a vector notation for random variable and expected value. We
form a vector f as
f = (f (1), f (2), · · · , f (N ))
The expected value can be written using the scalar product: if x and y are two vectors in
RN then the scalar product is given by
hx, yi =
N
X
xi yi = x1 y1 + · · · + xN yN
i=1
We have then
E[f ] = hf, σi ,
Expected value of the random variable f
Mixed strategies and expected payoffs: In the context of game we now allow the
players to randomize their strategies. Suppose player R has N strategies at his disposal
and we will number them 1, 2, 3, · · · , N . From now we will call these strategies pure
strategies. In a pure strategy the player makes a choice which involves no chance or
probability.
Definition 1. A mixed strategy for player R is a probability vector σR = (σR (1), · · · , σR (N ))
σR (i) = Probability that R plays strategy i
A pure strategy for player R is a special case of a mixed strategy
Strategy i ←→ σR = (0, · · · , |{z}
1 , · · · , 0)
ith
2
If a player plays a mixed strategy then its opponent’s payoff becomes a random variable
and we will assume that every player is trying to maximize his expected payoff. To compute
this it will be useful to use matrix notation.
We assume that player R has N strategies to choose from and that player C has M
strategies to choose from, then the payoff matrices PR has N rows and M columns

1
PR (1, 1)
PR (2, 1)
..
.
2
PR (1, 2)
PR (2, 2)
..
.
···
···
···
1
2 

PR =


N PR (N, 1) PR (N, 2) · · ·
M
PR (1, M )
PR (2, M )
..
.





PR (N, M )
If we multiply the matrix PR with the vector σC we obtain the vector


PR (1, 1) PR (1, 2) · · · PR (1, M )
σC (1)
 PR (2, 1) PR (2, 2) · · · PR (2, M )   σC (2)


PR σ C = 
  ..
..
..
..

 .
.
.
.
PR (N, 1) PR (N, 2) · · · PR (N, M )
σ(M )


PR (1, 1)σC (1) + · · · + PR (1, M )σC (M )
 PR (2, 1)σC (1) + · · · + PR (2, M )σC (M ) 


= 

..


.
PR (N, 1)σC (1) + · · · + PR (N, M )σC (M )





and we see that the it h entry of PR σC is
PR σC (i)
= PR (i, 1)σC (1) + · · · + PR (i, M )σC (M )
= Expected payoff for R to play i against σC
Further if we think that R is playing a mixed strategy σR then his payoff will be then
PR σC (i) with probability σR (i) and so we have
hσR , PR σC i = Expected payoff for R to play σR against σC
We can argue in the same way to compute payoffs for C but to do this we should
interchange rows and columns or in other words we consider the transpose matrix PCT
3
(which has M rows and N columns). If we multiply this matrix by the vector σR we find



PC (1, 1) PC (2, 1) · · · PC (N, 1)
σR (1)
 PC (1, 2) PC (2, 2) · · · PC (N, 2)   σR (2) 



T
PC σR = 


..
..
..
..



.
.
.
.
PC (1, M ) PC (2, M ) · · · PC (N, M )
σR (N )


PC (1, 1)σR (1) + PC (2, 1)σR (2) + · · · + PC (N, 1)σR (N )
 PC (1, 2)σR (1) + PC (2, 2)σR (2) + · · · + PR (N, 2)σR (N ) 


= 

..


.
PC (1, M )σR (1) + PC (2, M )σR (2) · · · + PR (N, M )σR (N )
and we find that
PCT σR (i)
= PC (1, i)σR (1) + · · · + PC (N, i)σR (N )
= Expected payoff for C to play i against σR
If we average over the strategy of σC we find that the expected payoff for C to play
σC against σR is given by hPCT σR , σC i. By the property of the transpose matrix we have
hPCT σR , σC i = hσR , PC σC i and so we have
hσR , PC σC i = Expected payoff for C to play σC against σR
Example: Consider the game with payoff table
Collin
B
1
5
0
5
A
Robert
I
II
0
10
1
10
C
-2
4
-1
10
If σC = (p, q, 1 − p − q) then the payoff for R are given by


p
10 5 4
4
+
6p
+
q

 =
q
PR σC =
10 5 10
10 − 5q
1−p−q
4
(1)
so the payoff for Robert to play I against σC is 4 + 6p + q and to play II the payoff is
10 − 5q.
If R has the mixed strategy (r, 1 − r) then the payoffs for C are given by




0
1
1−r
r

0 
r
PCT σR =  1
= 
(2)
1−r
−1 − r
−2 −1
and so the payoff for Collin to play A against σR is 1 − r, to play B it is r and to play C
it is −1 − r.
5