Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Game Theory Episode 6 The Grim Trigger Agenda • • • • Main ideas Key terms Basic strategy for solving a repeated game Example Problem – Set-up – Mathematical detour (infinite series) – Solution and implications • Conclusion Main Ideas • Cooperation is possible in a repeated game when it is repeated indefinitely (or infinitely), and when people place sufficient value on future payoffs – Sometimes we talk about the “shadow of the future” being sufficiently small – Equivalent to saying that the discount rate is sufficiently small – Typically requires a strategy based on the credible threat of punishment in response to defection • Reintroduces the notion of continuation value – Typically allows for a multiplicity of strategies • Reputation can matter in a repeated game Key Terms • Grim Trigger: – Begin by cooperating, with defection rewarded by the punishment of never-ending pain • Continuation value: – present value of a payoff stream • Discount factor: – degree to which future is valued, δ, bounded by [0, 1] – may think of it as probability you’re in the game next round • History: – past actions taken by players • Subgame-Perfect Equilibrium: – An equilibrium that specifies a Nash-equilibrium strategy in every subgame, particularly appropriate for repeated games Basic Strategy • Read the problem, noting strategic setting • Identify the question – Usually, “Find conditions sufficient to sustain a certain stable pattern of behavior; describe the equilibrium.” • FIRST: Guess at what the equilibrium might be – Usually a grim trigger strategy, where cheating is rewarded with never-ending punishment • THEN: Check to make sure that this strategy fulfills equilibrium conditions – That it is a best response for players to perform the prescribed behavioral pattern. – Write down the entire set of strategies and sufficient conditions • Usually the condition is a bound on the discount rate Example Problem • Consider infinite repetition of the Prisoner’s Dilemma: C D C 3, 3 10, 0 D 0, 10 1, 1 Each player’s payoff in the infinitely repeated game is the discounted sum of its payoffs in each period (ie., the standard case we considered in class). Assume that the players have common discount rate δ, where 0 > δ > 1. For what values of δ is CC sustainable? Player 2 C D Player 1 C 3, 3 0, 10 D 10, 0 1, 1 It’s round 1. Take the role of player 2. You know that the grim trigger is a potential solution. Assume player 1 begins by cooperating. Your move: If player 1 is cooperating, and you defect, you get: 10 + δ*1 + δ2*1 + δ3*1 … If player 1 is cooperating, and you cooperate, you get: 3 + δ*3 + δ2*3 + δ3*3 … Mathematical Detour If player 1 is cooperating, and you defect, you get: PD = 10 + δ*1 + δ2*1 + δ3*1 + … (once you defect, you always defect: it’s a dominant strategy given the other player always defects) If player 1 is cooperating, and you cooperate, you get: PC = 3 + δ3 +*3 + δ2*3 + δ3*3 + … Notice the summation of an infinite series? You might think that this sum would be incalculable. But no. We have ways of dealing with this… it won’t hurt a bit… Call the infinite series PC, like so: Now multiply PC by δ, like so: Now subtract one from the other: Factor out the PC : Divide by (1 – δ) Therefore, 3 + δ*3 + δ2*3 + … = PC = 3 + 3δ + 3δ + 3δ3 + … δ* PC = 3δ + 3δ2 + 3δ3 + 3δ4 + … (PC – δ PC) = 3 PC *(1 – δ) = 3 PC = 3 / (1 – δ) 3 / (1 – δ) Mathematical Detour If player 1 is cooperating, and you defect, you get: PD = 10 + δ*1 + δ2*1 + δ3*1 + … Call the infinite series PD, like so: Now multiply PD – 10 by δ, like so: Now subtract one from the other: Simplify: Get everything with PD to one side: Simplify: Factor out the PD : Divide by (1 – δ) Therefore, 10 + δ*10 + δ2*10 + … = PD – 10 = δ + δ2 + δ3 + … δ*(PD – 10) = δ2 + δ3 + δ4 … PD – 10 – δ*(PD – 10) = δ PD – 10 – δPD + 10δ = δ PD– δPD = δ - 10δ + 10 PD– δPD = - 9δ + 10 PD *(1 – δ) = 10 - 9δ PD = (10 - 9δ) / (1 – δ) (10 - 9δ) / (1 – δ) Define the Equilibrium Strategy If player 1 is cooperating, and you defect, you get: PD = 10 + δ*1 + δ2*1 + δ3*1 + … PD = (10 - 9δ) / (1 – δ) If player 1 is cooperating, and you cooperate, you get: PC = 3 + δ3 +*3 + δ2*3 + δ3*3 + … PC = 3 / (1 – δ) These are the continuation values of these strategies Cooperation is sustained if the expected value of cooperation exceeds the expected value of defection: 3 / (1 – δ) > (10 - 9δ) / (1 – δ), or 3 > 10 - 9δ, or 7 < 9δ δ > or equal to 7 / 9, then cooperation can be sustained. Strategy is: Play C as long as opponent played C in all previous rounds; if opponent played D in a previous round, then play D from now on. Let’s get funky with it… Can we sustain (C,D) in odd periods and (D,C) in even periods? Player 2 Player 1 C D C 3, 3 10, 0 D 0, 10 1, 1 State the strategies: Player 1: Play C in odd periods as long as opponent has played C in all previous even periods; play D in odd periods if opponent has ever played D in an even period. Play D in all even periods. Player 2: Play C in even periods as long as opponent has played C in all previous odd periods; play D in even periods if opponent has ever played D in an even period. Play D in all odd periods. Let’s get funky with it… Can we sustain (C,D) in odd periods and (D,C) in even periods? Player 2 Player 1 C D Player 1’s payoffs would be C 3, 3 10, 0 D 0, 10 1, 1 P1 = 10 + δ*0 + δ2*10 + δ3*0 + … = 10/(1- δ2)] Given that we begin in an even period (alternatively, these are player 2’s payoffs if we start in an odd period.) Player 2’s payoffs would be P2 = 0 + δ*10 + δ2*0 + δ3*10 + … = δ[10/(1- δ2)] Given that we begin in an even period (alternatively, these are player 1’s payoffs if we start in an odd period.) These are the continuation values of these strategies Cheating is defined as playing defect. Given a grim trigger, Player 2’s payoffs for cheating: P2, cheating = 1 + δ*1 + δ2*1 + δ3*1 + … = 1/(1- δ) Player 2 will play C in even periods as long as δ[10/(1- δ2)] > or equal to 1/(1- δ) 10δ/(1- δ)(1+ δ) > 1/(1- δ), 10δ/(1+ δ) > 1, 10δ > (1+ δ), 9δ > 1 δ > 1/9 Let’s get funky with it… Can we sustain (C,D) in odd periods and (D,C) in even periods? Player 2 Player 1 C D C 3, 3 10, 0 D 0, 10 1, 1 What about the other player, the one who begins with 10? As before, cheating is defined as playing defect. Given a grim trigger, Pl 1’s payoffs (alternating): P2, cheating = 10 + δ*0 + δ2*10 + δ3*0 + … = 10 /(1- δ2) Pl 1’s payoffs (cheating): P2, cheating = 10 + δ*1 + δ2*1 + δ3*1 + … = δ /(1- δ) + 10 Player 1 will play C in even periods as long as 10 /(1- δ2) > or equal to δ /(1- δ) + 10 10/(1- δ2) > δ /(1- δ) + 10 10/(1+ δ)(1- δ) > δ/(1- δ) + (10 -10δ) / (1- δ) 10/(1+ δ) > δ + (10 -10δ) 10 > (-9δ + 10) (1+ δ) 10 > -9δ -9δ2 + 10 +10 δ 0 > -9δ -9δ2 +10 δ - 9δ2 - 9δ +10 δ = - 9δ2 + δ 9δ2 –δ > 0 9δ > 1 δ > 1/9, just like before. One Last Question When will alternating be preferred to fully cooperative equilibria? Player 2 Player 1 C D C 3, 3 10, 0 D 0, 10 1, 1 Recall that when they fully cooperate, BOTH players get 3/(1- δ), and When a player cooperates in alternating equilibria, that player gets 10/(1- δ2) or δ[10/(1- δ2)], depending on which is the first to cooperate. For both players to prefer alternating equilibrium strategies, both 10/(1- δ2) and δ[10/(1- δ2)] must be greater than 3/(1- δ). Since the discount factor is a number between zero and one, 10/(1- δ2) > δ[10/(1- δ2)], and so if δ[10/(1- δ2)] > 3/(1- δ), then 3/(1- δ) > 10/(1- δ2), so, for both players to prefer alternating equilibrium strategies, δ[10/(1- δ2)] > 3/(1- δ) is sufficient. One Last Question Player 2 Player 1 C D C 3, 3 10, 0 D 0, 10 1, 1 For both players to prefer alternating equilibrium strategies, δ[10/(1- δ2)] > 3/(1- δ) is sufficient. δ[10/(1- δ2)] > 3/(1- δ) 10δ/(1+ δ)(1- δ) > 3/(1- δ) 10δ/(1+ δ) > 3 10δ > 3 + 3δ 7δ > 3, so δ > or equal to 3/7 Note that the player who begins by defecting in the alternating eqm will always prefer this eqm to the cooperative equilibrium. 10/(1- δ2)] > 3/(1- δ) becomes δ > 7/3, which is true for all allowable δ Do you believe in the Grim Trigger? • Don’t underestimate the power of beliefs – Beliefs do all the work – We saw this in Myerson & Weber • Nash Equilibrium (and SPE) rest on beliefs – These beliefs provide support for strategies played in equilibrium • The Grim Trigger hinges upon beliefs – Belief in enforcement is what makes compliance a best-response