Download 10/(1+ δ)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Evolutionary game theory wikipedia , lookup

Nash equilibrium wikipedia , lookup

Minimax wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Prisoner's dilemma wikipedia , lookup

The Evolution of Cooperation wikipedia , lookup

Chicken (game) wikipedia , lookup

Transcript
Game Theory
Episode 6
The Grim Trigger
Agenda
•
•
•
•
Main ideas
Key terms
Basic strategy for solving a repeated game
Example Problem
– Set-up
– Mathematical detour (infinite series)
– Solution and implications
• Conclusion
Main Ideas
• Cooperation is possible in a repeated game when it
is repeated indefinitely (or infinitely), and when
people place sufficient value on future payoffs
– Sometimes we talk about the “shadow of the future”
being sufficiently small
– Equivalent to saying that the discount rate is
sufficiently small
– Typically requires a strategy based on the credible
threat of punishment in response to defection
• Reintroduces the notion of continuation value
– Typically allows for a multiplicity of strategies
• Reputation can matter in a repeated game
Key Terms
• Grim Trigger:
– Begin by cooperating, with defection rewarded by the
punishment of never-ending pain
• Continuation value:
– present value of a payoff stream
• Discount factor:
– degree to which future is valued, δ, bounded by [0, 1]
– may think of it as probability you’re in the game next round
• History:
– past actions taken by players
• Subgame-Perfect Equilibrium:
– An equilibrium that specifies a Nash-equilibrium strategy in
every subgame, particularly appropriate for repeated games
Basic Strategy
• Read the problem, noting strategic setting
• Identify the question
– Usually, “Find conditions sufficient to sustain a certain
stable pattern of behavior; describe the equilibrium.”
• FIRST: Guess at what the equilibrium might be
– Usually a grim trigger strategy, where cheating is
rewarded with never-ending punishment
• THEN: Check to make sure that this strategy fulfills
equilibrium conditions
– That it is a best response for players to perform the
prescribed behavioral pattern.
– Write down the entire set of strategies and sufficient
conditions
• Usually the condition is a bound on the discount rate
Example Problem
• Consider infinite repetition of the Prisoner’s Dilemma:
C
D
C
3, 3
10, 0
D
0, 10
1, 1
Each player’s payoff in the infinitely repeated game is the
discounted sum of its payoffs in each period (ie., the
standard case we considered in class). Assume that the
players have common discount rate δ, where 0 > δ > 1.
For what values of δ is CC sustainable?
Player 2
C
D
Player 1
C
3, 3
0, 10
D
10, 0
1, 1
It’s round 1. Take the role of player 2.
You know that the grim trigger is a potential solution.
Assume player 1 begins by cooperating. Your move:
If player 1 is cooperating, and you defect, you get:
10 + δ*1 + δ2*1 + δ3*1 …
If player 1 is cooperating, and you cooperate, you get:
3 + δ*3 + δ2*3 + δ3*3 …
Mathematical Detour
If player 1 is cooperating, and you defect, you get:
PD = 10 + δ*1 + δ2*1 + δ3*1 + …
(once you defect, you always defect: it’s a dominant strategy given the other player always defects)
If player 1 is cooperating, and you cooperate, you get:
PC = 3 + δ3 +*3 + δ2*3 + δ3*3 + …
Notice the summation of an infinite series?
You might think that this sum would be incalculable.
But no. We have ways of dealing with this… it won’t hurt a bit…
Call the infinite series PC, like so:
Now multiply PC by δ, like so:
Now subtract one from the other:
Factor out the PC :
Divide by (1 – δ)
Therefore, 3 + δ*3 + δ2*3 + … =
PC = 3 + 3δ + 3δ + 3δ3 + …
δ* PC = 3δ + 3δ2 + 3δ3 + 3δ4 + …
(PC – δ PC) = 3
PC *(1 – δ) = 3
PC = 3 / (1 – δ)
3 / (1 – δ)
Mathematical Detour
If player 1 is cooperating, and you defect, you get:
PD = 10 + δ*1 + δ2*1 + δ3*1 + …
Call the infinite series PD, like so:
Now multiply PD – 10 by δ, like so:
Now subtract one from the other:
Simplify:
Get everything with PD to one side:
Simplify:
Factor out the PD :
Divide by (1 – δ)
Therefore, 10 + δ*10 + δ2*10 + … =
PD – 10 = δ + δ2 + δ3 + …
δ*(PD – 10) = δ2 + δ3 + δ4 …
PD – 10 – δ*(PD – 10) = δ
PD – 10 – δPD + 10δ = δ
PD– δPD = δ - 10δ + 10
PD– δPD = - 9δ + 10
PD *(1 – δ) = 10 - 9δ
PD = (10 - 9δ) / (1 – δ)
(10 - 9δ) / (1 – δ)
Define the Equilibrium Strategy
If player 1 is cooperating, and you defect, you get:
PD = 10 + δ*1 + δ2*1 + δ3*1 + …
PD = (10 - 9δ) / (1 – δ)
If player 1 is cooperating, and you cooperate, you get:
PC = 3 + δ3 +*3 + δ2*3 + δ3*3 + …
PC = 3 / (1 – δ)
These are the continuation values of these strategies
Cooperation is sustained if the expected value of cooperation
exceeds the expected value of defection:
3 / (1 – δ) > (10 - 9δ) / (1 – δ), or 3 > 10 - 9δ, or 7 < 9δ
δ > or equal to 7 / 9, then cooperation can be sustained.
Strategy is: Play C as long as opponent played C in all previous rounds;
if opponent played D in a previous round, then play D from now on.
Let’s get funky with it…
Can we sustain (C,D) in odd periods and (D,C) in even periods?
Player 2
Player 1
C
D
C
3, 3
10, 0
D
0, 10
1, 1
State the strategies:
Player 1: Play C in odd periods as long as opponent has played C in all previous even
periods; play D in odd periods if opponent has ever played D in an even period. Play D
in all even periods.
Player 2: Play C in even periods as long as opponent has played C in all previous odd
periods; play D in even periods if opponent has ever played D in an even period. Play
D in all odd periods.
Let’s get funky with it…
Can we sustain (C,D) in odd periods and (D,C) in even periods?
Player 2
Player 1
C
D
Player 1’s payoffs would be
C
3, 3
10, 0
D
0, 10
1, 1
P1 = 10 + δ*0 + δ2*10 + δ3*0 + … = 10/(1- δ2)]
Given that we begin in an even period (alternatively, these are player 2’s payoffs if we start in an odd period.)
Player 2’s payoffs would be
P2 = 0 + δ*10 + δ2*0 + δ3*10 + … = δ[10/(1- δ2)]
Given that we begin in an even period (alternatively, these are player 1’s payoffs if we start in an odd period.)
These are the continuation values of these strategies
Cheating is defined as playing defect. Given a grim trigger,
Player 2’s payoffs for cheating: P2, cheating = 1 + δ*1 + δ2*1 + δ3*1 + … = 1/(1- δ)
Player 2 will play C in even periods as long as δ[10/(1- δ2)] > or equal to 1/(1- δ)
10δ/(1- δ)(1+ δ) > 1/(1- δ),
10δ/(1+ δ) > 1,
10δ > (1+ δ),
9δ > 1
δ > 1/9
Let’s get funky with it…
Can we sustain (C,D) in odd periods and (D,C) in even periods?
Player 2
Player 1
C
D
C
3, 3
10, 0
D
0, 10
1, 1
What about the other player, the one who begins with 10?
As before, cheating is defined as playing defect. Given a grim trigger,
Pl 1’s payoffs (alternating): P2, cheating = 10 + δ*0 + δ2*10 + δ3*0 + … = 10 /(1- δ2)
Pl 1’s payoffs (cheating): P2, cheating = 10 + δ*1 + δ2*1 + δ3*1 + … = δ /(1- δ) + 10
Player 1 will play C in even periods as long as 10 /(1- δ2) > or equal to δ /(1- δ) + 10
10/(1- δ2) > δ /(1- δ) + 10
10/(1+ δ)(1- δ) > δ/(1- δ) + (10 -10δ) / (1- δ)
10/(1+ δ) > δ + (10 -10δ)
10 > (-9δ + 10) (1+ δ)
10 > -9δ -9δ2 + 10 +10 δ
0 > -9δ -9δ2 +10 δ
- 9δ2 - 9δ +10 δ = - 9δ2 + δ
9δ2 –δ > 0
9δ > 1
δ > 1/9, just like before.
One Last Question
When will alternating be preferred to fully cooperative equilibria?
Player 2
Player 1
C
D
C
3, 3
10, 0
D
0, 10
1, 1
Recall that when they fully cooperate, BOTH players get 3/(1- δ), and
When a player cooperates in alternating equilibria, that player gets 10/(1- δ2) or δ[10/(1- δ2)],
depending on which is the first to cooperate.
For both players to prefer alternating equilibrium strategies,
both 10/(1- δ2) and δ[10/(1- δ2)] must be greater than 3/(1- δ).
Since the discount factor is a number between zero and one,
10/(1- δ2) > δ[10/(1- δ2)], and so if δ[10/(1- δ2)] > 3/(1- δ), then 3/(1- δ) > 10/(1- δ2),
so, for both players to prefer alternating equilibrium strategies, δ[10/(1- δ2)] > 3/(1- δ)
is sufficient.
One Last Question
Player 2
Player 1
C
D
C
3, 3
10, 0
D
0, 10
1, 1
For both players to prefer alternating equilibrium strategies, δ[10/(1- δ2)] > 3/(1- δ)
is sufficient.
δ[10/(1- δ2)] > 3/(1- δ)
10δ/(1+ δ)(1- δ) > 3/(1- δ)
10δ/(1+ δ) > 3
10δ > 3 + 3δ
7δ > 3, so δ > or equal to 3/7
Note that the player who begins by defecting in the alternating eqm will
always prefer this eqm to the cooperative equilibrium.
10/(1- δ2)] > 3/(1- δ) becomes δ > 7/3, which is true for all allowable δ
Do you believe in the Grim Trigger?
• Don’t underestimate the
power of beliefs
– Beliefs do all the work
– We saw this in Myerson & Weber
• Nash Equilibrium (and
SPE) rest on beliefs
– These beliefs provide support for
strategies played in equilibrium
• The Grim Trigger hinges
upon beliefs
– Belief in enforcement is what makes
compliance a best-response