Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evolution and Repeated Games D. Fudenberg (Harvard) E. Maskin (IAS, Princeton) Cooperate Defect C D 2, 2 1,3 3, 1 0, 0 Theory of repeated games important • central model for explaining how self-interested agents can cooperate • used in economics, biology, political science and other fields 2 But theory has a serious flaw: • although cooperative behavior possible, so is uncooperative behavior (and everything in between) • theory doesn’t favor one behavior over another • theory doesn’t make sharp predictions 3 Evolution (biological or cultural) can promote efficiency • might hope that uncooperative behavior will be “weeded out” • this view expressed in Axelrod (1984) 4 Basic idea: Cooperate Defect • • C D 2, 2 1,3 3, 1 0, 0 Start with population of repeated game strategy Always D Consider small group of mutants using Conditional C (Play C until someone plays D, thereafter play D) – does essentially same against Always D as Always D does – does much better against Conditional C than Always D does • • Thus Conditional C will invade Always D uncooperative behavior driven out 5 C D C 2, 2 1,3 D 3, 1 0, 0 But consider ALT Alternate between C and D until pattern broken, thereafter play D • can’t be invaded by some other strategy – other strategy would have to alternate or else would do much worse against ALT than ALT does • Thus ALT is “evolutionarily stable” • But ALT is quite inefficient (average payoff 1) 6 • Still, ALT highly inflexible – relies on perfect alternation – if pattern broken, get D forever • What if there is a (small) probability of mistake in execution? 7 C D C 2, 2 1,3 D 3, 1 0, 0 • Consider mutant strategy s identical to ALT except if (by mistake) alternating pattern broken – s signals “intention” to cooperate by playing C in following period – if other strategy plays C too, s plays C forever – if other strategy plays D, s plays D forever s identical to ALT before pattern broken s and ALT each get about 0 against ALT, after pattern broken s gets 2 against s; ALT gets about 0 against s, after pattern broken • • • • so s invades ALT 8 Main results in paper (for 2-player symmetric repeated games) (1) If s evolutionarily stable and – discount rate r small (future important) – mistake probability p small (but p > 0) then s (almost) “efficient” (2) If payoffs (v, v) “efficient”, • then exists ES strategy s (almost) attaining (v, v) provided – r small – p small relative to r generalizes Fudenberg-Maskin (1990), in which r = p = 0 9 Finite symmetric 2–player game g : A A 2 • if g a1 , a2 v1 , v2 , then g a2 , a1 v2 , v1 • normalize payoffs so that min max g1 a1 , a2 0 a2 a1 • V convex hull g a1 , a2 a1 , a2 A A 10 • v1 , v2 V strongly efficient if w v1 v2 max v1 v2 v1 ,v2 V 2, 2 1,3 2, 2 unique strongly efficient pair 3, 1 0, 0 0, 0 1, 2 v , v v v 1 2,1 2 1 2 3, 1 v1 2 strongly efficient 0, 0 11 Repeated game: g repeated infinitely many times a t 1 , a t 1 , a t 1 , a t 1 • period t history h a1 1 , a2 1 , • h a2 1 , a1 1 , 1 2 2 1 • H = set of all histories repeated game strategy s : H A – assume finitely complex (playable by finite computer) • in each period, probability p that i makes mistake – chooses ai si h (equal probabilities for all actions) – mistakes independent across players • 12 t 1 r 1 r, p U1 s1 , s2 E g1 a1 t , a2 t s1 , s2 , p 1 r t 1 1 r t 1 r 1 r, p U1 s1 , s2 h E g1 a1 t , a2 t s1 , s2 , p, h 1 r t 1 1 r 13 • informally, s evolutionarily stable (ES), if no mutant s can invade population with big proportion s and small proportion s • formally, s is ES w.r.t. q , r , pif for all s s and all qq 1 q U1r , p s, s qU1r , p s, s 1 q U1r s, s qU1r , p s, s • evolutionary stability – expressed statically here – but can be given precise dynamic meaning 14 • • • • population of b players suppose time measure in “epochs” T = 1, 2, . . . strategy sT state in epoch T − most players in population use sT group of mutants (of size a) plays s' • a drawn randomly from 1, 2, , a , where a b q s' drawn randomly from finitely complex strategies M random drawings of pairs of players − each pair plays repeated game • sT 1 = strategy with highest average score 15 Theorem 1: For any q , p, r and 0, there exists T such that, for all T T , there exists M T such that, for all M M T , (i) if s not ES, Pr sT t s, for all t T sT s (ii) if s is ES, Pr sT t s for all t T sT s 1 16 Let v min v 0 there exists v such that v, v strongly efficient Theorem 2: Given 0 and q 0, there exist r and p such that, for all r 0, r and p 0, p , if s is ES w.r.t. q , r , p then U1r , p s, s h v for all h. 17 2, 2 1,3 v 2, So U1r , p s, s 2 3, 1 0, 0 0, 0 1, 2 v 1, So U1r , p s, s 1 2,1 0, 0 18 Proof: Suppose U1r , p s, s h v for some h • will construct mutant s' that can invade r, p • let h arg min U1 s, s h • if s = ALT, h= any history for which alternating pattern broken 19 Construct s' so that • • s h s h , if h not a continuation of h or h after h , strategy s' – • “signals” willingness to cooperate by playing differently from s for 1 period (assume s is pure strategy) – if other player responds positively, plays strongly efficiently thereafter – if not, plays according to s thereafter after h (assume h h ), s – responds positively if other strategy has signaled, and thereafter plays strongly efficiently – plays according to s otherwise 20 • • • • because h is already worst history, s' loses for only 1 period by signaling (small loss if r small) if p small, probability that s' “misreads” other player’s intention is small hence, s' does nearly as well against s as s does against itself (even after h or h ) s' does very well against itself (strong efficiency), after h or h U1r , p s, s h U1 s, s h w 21 • • remains to check how well s does against s' by definition of h , U1r , p s, s h U1r , p s, s h w for some >0 • Ignoring effect of p, U1r , p s, s h U1r , p s, s h v , Also, after deviation by s', punishment started again, and so U1r , p s, s h U1r , p s, s h . Hence U1r , p s, s h U1r , p s, s h w • so s does appreciably worse against s' than s' does against s' 22 • Summing up, we have: r, p r, p 1 q U s , s q U 1 s, s 1 1 q U1r , p s, s q U1r , p s, s • s is not ES 23 • Theorem 2 implies for Prisoner’s Dilemma that, for any 0, U1r , p s, s h 2 for r and p small • doesn’t rule out punishments of arbitrary (finite) length 24 • Consider strategy s with “cooperative” and “punishment” phases – in cooperative phase, play C – stay in cooperative phase until one player plays D, in which case go to punishment phase – in punishment phase, play D – stay in punishment phase for m periods (and then go back to cooperative phase) unless at some point some player chooses C, in which case restart punishment • For any m, U1r , p s, s h 2 (efficiency), as r 0 p 0 25 Can sharpen Theorem 2 for Prisoner’s Dilemma: Given q , there exist r and p such that, for all r 0, r and p 0, p , if s is ES w.r.t. q , r , p, then it cannot entail a punishment lasting more than 3 2q periods 2q Proof: very similar to that of Theorem 2 26 For r and p too big, ES strategy s may not be “efficient” • if r , back in one-shot case • if p 12 , then even fully cooperative strategies in Prisoner’s Dilemma generate payoffs 1. 27 Theorem 3: Let v, v V with v v For all 0, there exist q 0 and r 0 s.t. for all r r there exists p 0 s.t. for all p p there exists s ES w.r.t. q , r , p for which U1r , p s, s v 28 Proof: Construct s so that • along equilibrium path of (s, s), payoffs are (approximately) (v, v) • punishments are nearly strongly efficient – deviating player (say 1) minimaxed long enough wipe out gain – thereafter go to strongly efficient point v , w v – overall payoffs after deviation: • if r and p small (s, s) is a subgame perfect equilibrium 29 • In Prisoner’s Dilemma, consider s that – plays C the first period – thereafter, plays C if and only if either both players played C previous period or neither did • strategy s – is efficient – entails punishments that are as short as possible – is modification of Tit-for-Tat (C the first period; thereafter, do what other player did previous period) • Tit-for-Tat not ES – if mistake (D, C) occurs then get wave of alternating punishments: (C, D), (D, C), (C, D), ... until another mistake made 30 • d a b c a 0, 0 4,1 0, 0 0, 0 b 1, 4 0, 0 0, 0 0, 0 c 0, 0 0, 0 0, 0 0, 0 d 0, 0 0, 0 0, 0 2, 2 modified battle of sexes Let s = play d as long as in all past periods – both players played d – neither played d if single player deviates from d – henceforth, that player plays b – other player plays a • s is ES even though inefficient – any attempt to improve on efficiency, punished forever – can’t invade during punishment, because punishment efficient 31 Consider potential invader s' For any h, s' cannot do better against s than s does against itself, since (s, s) equilibrium hence, for all h, () U1r , p s, s h U1r , p s, s h (otherwise s can't invade) and so () U1r , p s, s h U1r , p s, s h For s' to invade, need () U1r , p s, s h U1r , p s, s h U1r , p s, s h U1r , p s, s h for some h Claim: () implies h' involves deviation from equil path of (s, s) only other possibility: – s' different from s on equil path – then s' punished by – violates () r, p r, p we thus have U1 s, s h U1 s, s h w Hence, from (), rhs of () w , so inequality not feasible 32 For Theorem 3 to hold, p must be small relative to r • consider modified Tit-for-Tat against itself (play C if and only if both players took same action last period) • with every mistake, there is an expected loss of 2 – (½ · 3 + ½ (−1)) = 1 the first period 2 – 0 = 2 the second period • so over-all the expected loss from mistakes is 1 r approximately 3 p r • By contrast, a mutant strategy that signals, etc. and doesn’t 1 r punish at all against itself loses only about p r • so if r is small enough relative to p, mutant can invade 33