Download Evolution and Repeated Games

Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton) Cooperate Defect C D 2, 2 1,3 3, 1 0, 0 Theory of repeated games important • central model for explaining how self-interested agents can cooperate • used in economics, biology, political science and other fields 2 But theory has a serious flaw: • although cooperative behavior possible, so is uncooperative behavior (and everything in between) • theory doesn’t favor one behavior over another • theory doesn’t make sharp predictions 3 Evolution (biological or cultural) can promote efficiency • might hope that uncooperative behavior will be “weeded out” • this view expressed in Axelrod (1984) 4 Basic idea: Cooperate Defect • • C D 2, 2 1,3 3, 1 0, 0 Start with population of repeated game strategy Always D Consider small group of mutants using Conditional C (Play C until someone plays D, thereafter play D) – does essentially same against Always D as Always D does – does much better against Conditional C than Always D does • • Thus Conditional C will invade Always D uncooperative behavior driven out 5 C D C 2, 2 1,3 D 3, 1 0, 0 But consider ALT Alternate between C and D until pattern broken, thereafter play D • can’t be invaded by some other strategy – other strategy would have to alternate or else would do much worse against ALT than ALT does • Thus ALT is “evolutionarily stable” • But ALT is quite inefficient (average payoff 1) 6 • Still, ALT highly inflexible – relies on perfect alternation – if pattern broken, get D forever • What if there is a (small) probability of mistake in execution? 7 C D C 2, 2 1,3 D 3, 1 0, 0 • Consider mutant strategy s identical to ALT except if (by mistake) alternating pattern broken – s signals “intention” to cooperate by playing C in following period – if other strategy plays C too, s plays C forever – if other strategy plays D, s plays D forever s identical to ALT before pattern broken s and ALT each get about 0 against ALT, after pattern broken s gets 2 against s; ALT gets about 0 against s, after pattern broken • • • • so s invades ALT 8 Main results in paper (for 2-player symmetric repeated games) (1) If s evolutionarily stable and – discount rate r small (future important) – mistake probability p small (but p > 0) then s (almost) “efficient” (2) If payoffs (v, v) “efficient”, • then exists ES strategy s (almost) attaining (v, v) provided – r small – p small relative to r generalizes Fudenberg-Maskin (1990), in which r = p = 0 9 Finite symmetric 2–player game g : A  A   2 • if g  a1 , a2    v1 , v2  , then g  a2 , a1    v2 , v1  • normalize payoffs so that min max g1  a1 , a2   0 a2 a1 • V   convex hull  g  a1 , a2   a1 , a2   A  A 10    •  v1 , v2   V strongly efficient if w  v1  v2  max   v1  v2   v1 ,v2 V 2, 2 1,3  2, 2 unique strongly efficient pair 3, 1 0, 0 0, 0 1, 2  v , v  v  v 1 2,1 2 1 2  3, 1  v1  2 strongly efficient 0, 0 11 Repeated game: g repeated infinitely many times    a t 1 , a t 1 ,  a  t 1 , a  t 1  • period t history h  a1 1 , a2 1 ,   •   h   a2 1 , a1 1 , 1 2 2 1 • H = set of all histories repeated game strategy s : H   A – assume finitely complex (playable by finite computer) • in each period, probability p that i makes mistake – chooses ai  si  h  (equal probabilities for all actions) – mistakes independent across players • 12 t 1    r  1  r, p U1  s1 , s2   E    g1  a1  t  , a2  t   s1 , s2 , p  1  r  t 1  1  r   t 1    r 1   r, p U1  s1 , s2 h   E    g1  a1  t  , a2  t   s1 , s2 , p, h  1  r  t 1  1  r   13 • informally, s evolutionarily stable (ES), if no mutant s can invade population with big proportion s and small proportion s • formally, s is ES w.r.t. q , r , pif for all s  s and all qq 1  q U1r , p  s, s   qU1r , p  s, s  1  q U1r  s, s   qU1r , p  s, s • evolutionary stability – expressed statically here – but can be given precise dynamic meaning 14 • • • • population of b players suppose time measure in “epochs” T = 1, 2, . . . strategy sT  state in epoch T − most players in population use sT group of mutants (of size a) plays s' • a drawn randomly from 1, 2, , a , where a b  q s' drawn randomly from finitely complex strategies M random drawings of pairs of players − each pair plays repeated game • sT 1 = strategy with highest average score 15 Theorem 1: For any q , p, r and   0, there exists T  such that, for all T   T  , there exists M  T   such that, for all M  M  T  , (i) if s not ES, Pr sT t  s, for all t  T  sT  s   (ii) if s is ES, Pr sT t  s for all t  T  sT  s  1   16 Let v  min v  0 there exists v such that  v, v strongly efficient Theorem 2: Given   0 and q  0, there exist r and p such that, for all r   0, r  and p   0, p  , if s is ES w.r.t. q , r , p then U1r , p  s, s h   v   for all h. 17 2, 2 1,3 v  2, So U1r , p  s, s   2   3, 1 0, 0 0, 0 1, 2 v  1, So U1r , p  s, s   1   2,1 0, 0 18 Proof: Suppose U1r , p  s, s h   v   for some h • will construct mutant s' that can invade  r, p • let h  arg min U1  s, s h  • if s = ALT, h= any history for which alternating pattern broken 19 Construct s' so that • • s  h   s  h  , if h not a continuation of h or   h  after h , strategy s' – • “signals” willingness to cooperate by playing differently from s for 1 period (assume s is pure strategy) – if other player responds positively, plays strongly efficiently thereafter – if not, plays according to s thereafter    after  h (assume  h  h ), s – responds positively if other strategy has signaled, and thereafter plays strongly efficiently – plays according to s otherwise     20 • • • • because h is already worst history, s' loses for only 1 period by signaling (small loss if r small) if p small, probability that s' “misreads” other player’s intention is small hence, s' does nearly as well against s as s does against itself (even after h or   h  ) s' does very well against itself (strong efficiency), after h or   h      U1r , p s, s h  U1 s, s   h   w 21 • • remains to check how well s does against s' by definition of h ,     U1r , p s, s h  U1r , p s, s   h   w   for some  >0 • Ignoring effect of p,     U1r , p s, s h  U1r , p s, s h  v   , Also, after deviation by s', punishment started again, and so     U1r , p s, s   h   U1r , p s, s   h  . Hence     U1r , p s, s h  U1r , p s, s   h   w   • so s does appreciably worse against s' than s' does against s' 22 • Summing up, we have: r, p r, p 1  q U s , s  q U   1    s, s 1  1  q U1r , p  s, s   q U1r , p  s, s • s is not ES 23 • Theorem 2 implies for Prisoner’s Dilemma that, for any   0, U1r , p  s, s h   2   for r and p small • doesn’t rule out punishments of arbitrary (finite) length 24 • Consider strategy s with “cooperative” and “punishment” phases – in cooperative phase, play C – stay in cooperative phase until one player plays D, in which case go to punishment phase – in punishment phase, play D – stay in punishment phase for m periods (and then go back to cooperative phase) unless at some point some player chooses C, in which case restart punishment • For any m,   U1r , p s, s h  2 (efficiency), as r  0 p 0 25 Can sharpen Theorem 2 for Prisoner’s Dilemma: Given q , there exist r and p such that, for all r   0, r  and p   0, p  , if s is ES w.r.t. q , r , p, then it cannot entail a punishment lasting more than 3  2q periods 2q Proof: very similar to that of Theorem 2 26 For r and p too big, ES strategy s may not be “efficient” • if r  , back in one-shot case • if p  12 , then even fully cooperative strategies in Prisoner’s Dilemma generate payoffs  1. 27 Theorem 3: Let  v, v  V  with v  v For all   0, there exist q  0 and r  0 s.t. for all r  r there exists p  0 s.t. for all p  p there exists s ES w.r.t. q , r , p for which U1r , p  s, s   v   28 Proof: Construct s so that • along equilibrium path of (s, s), payoffs are (approximately) (v, v) • punishments are nearly strongly efficient – deviating player (say 1) minimaxed long enough wipe out gain – thereafter go to strongly efficient point  v   , w   v    – overall payoffs after deviation:  • if r and p small (s, s) is a subgame perfect equilibrium 29 • In Prisoner’s Dilemma, consider s that – plays C the first period – thereafter, plays C if and only if either both players played C previous period or neither did • strategy s – is efficient – entails punishments that are as short as possible – is modification of Tit-for-Tat (C the first period; thereafter, do what other player did previous period) • Tit-for-Tat not ES – if mistake (D, C) occurs then get wave of alternating punishments: (C, D), (D, C), (C, D), ... until another mistake made 30 • d a b c a 0, 0 4,1 0, 0 0, 0 b 1, 4 0, 0 0, 0 0, 0 c 0, 0 0, 0 0, 0 0, 0 d 0, 0 0, 0 0, 0 2, 2 modified battle of sexes Let s = play d as long as in all past periods – both players played d – neither played d if single player deviates from d – henceforth, that player plays b – other player plays a • s is ES even though inefficient – any attempt to improve on efficiency, punished forever – can’t invade during punishment, because punishment efficient 31 Consider potential invader s' For any h, s' cannot do better against s than s does against itself, since (s, s) equilibrium hence, for all h, () U1r , p  s, s h   U1r , p  s, s h  (otherwise s can't invade) and so () U1r , p  s, s h   U1r , p  s, s h  For s' to invade, need () U1r , p  s, s h   U1r , p  s, s   h      U1r , p  s, s h   U1r , p s, s   h  for some h Claim: () implies h' involves deviation from equil path of (s, s) only other possibility: – s' different from s on equil path – then s' punished by  – violates () r, p r, p  we thus have U1  s, s h   U1 s, s   h   w Hence, from (), rhs of ()  w , so inequality not feasible   32 For Theorem 3 to hold, p must be small relative to r • consider modified Tit-for-Tat against itself (play C if and only if both players took same action last period) • with every mistake, there is an expected loss of 2 – (½ · 3 + ½ (−1)) = 1 the first period 2 – 0 = 2 the second period • so over-all the expected loss from mistakes is  1 r  approximately 3 p   r   • By contrast, a mutant strategy that signals, etc. and doesn’t 1 r  punish at all against itself loses only about p   r   • so if r is small enough relative to p, mutant can invade 33

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Evolution and Repeated Games