Download Evolution and Repeated Games

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Evolution and Repeated Games
D. Fudenberg (Harvard)
E. Maskin (IAS, Princeton)
Cooperate
Defect
C
D
2, 2
1,3
3, 1
0, 0
Theory of repeated games important
• central model for explaining how self-interested
agents can cooperate
• used in economics, biology, political science and
other fields
2
But theory has a serious flaw:
• although cooperative behavior possible, so is
uncooperative behavior (and everything in
between)
• theory doesn’t favor one behavior over another
• theory doesn’t make sharp predictions
3
Evolution (biological or cultural) can promote
efficiency
• might hope that uncooperative behavior will be
“weeded out”
• this view expressed in Axelrod (1984)
4
Basic idea:
Cooperate
Defect
•
•
C
D
2, 2
1,3
3, 1
0, 0
Start with population of repeated game strategy Always D
Consider small group of mutants using Conditional C (Play C until someone
plays D, thereafter play D)
– does essentially same against Always D as Always D does
– does much better against Conditional C than Always D does
•
•
Thus Conditional C will invade Always D
uncooperative behavior driven out
5
C
D
C
2, 2
1,3
D
3, 1
0, 0
But consider ALT
Alternate between C and D until pattern broken, thereafter play D
• can’t be invaded by some other strategy
– other strategy would have to alternate or else would do much worse
against ALT than ALT does
• Thus ALT is “evolutionarily stable”
• But ALT is quite inefficient (average payoff 1)
6
• Still, ALT highly inflexible
– relies on perfect alternation
– if pattern broken, get D forever
• What if there is a (small) probability of
mistake in execution?
7
C
D
C
2, 2
1,3
D
3, 1
0, 0
• Consider mutant strategy s identical to ALT except if (by mistake)
alternating pattern broken
– s signals “intention” to cooperate by playing C in following period
– if other strategy plays C too, s plays C forever
– if other strategy plays D, s plays D forever
s identical to ALT before pattern broken
s and ALT each get about 0 against ALT, after pattern broken
s gets 2 against s; ALT gets about 0 against s, after pattern broken
•
•
•
• so s invades ALT
8
Main results in paper (for 2-player symmetric repeated
games)
(1) If s evolutionarily stable and
– discount rate r small (future important)
– mistake probability p small (but p > 0)
then s (almost) “efficient”
(2) If payoffs (v, v) “efficient”,
•
then exists ES strategy s (almost) attaining (v, v) provided
– r small
– p small relative to r
generalizes Fudenberg-Maskin (1990), in which r = p = 0
9
Finite symmetric 2–player game g : A  A  
2
• if g  a1 , a2    v1 , v2  , then g  a2 , a1    v2 , v1 
• normalize payoffs so that min max g1  a1 , a2   0
a2
a1
• V   convex hull  g  a1 , a2   a1 , a2   A  A
10



•  v1 , v2   V strongly efficient if w  v1  v2  max   v1  v2 
 v1 ,v2 V
2, 2
1,3
 2, 2 unique strongly efficient pair
3, 1
0, 0
0, 0
1, 2
 v , v  v  v
1
2,1
2
1
2
 3, 1  v1  2 strongly efficient
0, 0
11
Repeated game: g repeated infinitely many times


 a t 1 , a t 1
,  a  t 1 , a  t 1 
• period t history h  a1 1 , a2 1 ,


•   h   a2 1 , a1 1 ,
1
2
2
1
• H = set of all histories
repeated game strategy s : H   A
– assume finitely complex (playable by finite computer)
• in each period, probability p that i makes mistake
– chooses ai  si  h  (equal probabilities for all actions)
– mistakes independent across players
•
12
t 1



r
 1 
r, p
U1  s1 , s2  
E  
 g1  a1  t  , a2  t   s1 , s2 , p 
1  r  t 1  1  r 

t 1



r
1


r, p
U1  s1 , s2 h  
E  
 g1  a1  t  , a2  t   s1 , s2 , p, h 
1  r  t 1  1  r 

13
• informally, s evolutionarily stable (ES), if no
mutant s can invade population with big
proportion s and small proportion s
• formally, s is ES w.r.t. q , r , pif for all s  s and
all
qq
1  q U1r , p  s, s   qU1r , p  s, s
 1  q U1r  s, s   qU1r , p  s, s
• evolutionary stability
– expressed statically here
– but can be given precise dynamic meaning
14
•
•
•
•
population of b players
suppose time measure in “epochs” T = 1, 2, . . .
strategy sT  state in epoch T
− most players in population use sT
group of mutants (of size a) plays s'
•
a drawn randomly from 1, 2, , a , where a b  q
s' drawn randomly from finitely complex
strategies
M random drawings of pairs of players
− each pair plays repeated game
•
sT 1 = strategy with highest average score
15
Theorem 1: For any q , p, r and   0, there
exists T  such that, for all T   T  , there exists
M  T   such that, for all M  M  T  ,
(i) if s not ES, Pr sT t  s, for all t  T  sT  s  
(ii) if s is ES, Pr sT t  s for all t  T  sT  s  1  
16
Let v  min v  0 there exists v such that  v, v strongly efficient
Theorem 2: Given   0 and q  0, there exist r and p
such that, for all r   0, r  and p   0, p  ,
if s is ES w.r.t. q , r , p
then
U1r , p  s, s h   v   for all h.
17
2, 2
1,3
v  2, So U1r , p  s, s   2  
3, 1
0, 0
0, 0
1, 2
v  1, So U1r , p  s, s   1  
2,1
0, 0
18
Proof:
Suppose U1r , p  s, s h   v   for some h
• will construct mutant s' that can invade

r, p
• let h  arg min U1  s, s h 
• if s = ALT, h= any history for which alternating
pattern broken
19
Construct s' so that
•
•
s  h   s  h  , if h not a continuation of h or   h 
after h , strategy s'
–
•
“signals” willingness to cooperate by playing differently
from s for 1 period (assume s is pure strategy)
– if other player responds positively, plays strongly
efficiently thereafter
– if not, plays according to s thereafter



after  h (assume  h  h ), s
– responds positively if other strategy has signaled, and
thereafter plays strongly efficiently
– plays according to s otherwise
 
 
20
•
•
•
•
because h is already worst history,
s' loses for only 1 period by signaling
(small loss if r small)
if p small, probability that s' “misreads” other
player’s intention is small
hence, s' does nearly as well against s as s does
against itself
(even after h or   h  )
s' does very well against itself (strong efficiency),
after h or   h 




U1r , p s, s h  U1 s, s   h   w
21
•
•
remains to check how well s does against s'
by definition of h ,




U1r , p s, s h  U1r , p s, s   h   w   for some  >0
•
Ignoring effect of p,




U1r , p s, s h  U1r , p s, s h  v   ,
Also, after deviation by s', punishment started again, and so




U1r , p s, s   h   U1r , p s, s   h  .
Hence




U1r , p s, s h  U1r , p s, s   h   w  
•
so s does appreciably worse against s' than s' does against s'
22
• Summing up, we have:
r, p
r, p
1

q
U
s
,
s

q
U
  1  
 s, s
1
 1  q U1r , p  s, s   q U1r , p  s, s
• s is not ES
23
• Theorem 2 implies for Prisoner’s Dilemma that,
for any   0,
U1r , p  s, s h   2   for r and p small
• doesn’t rule out punishments of arbitrary (finite)
length
24
• Consider strategy s with “cooperative” and
“punishment” phases
– in cooperative phase, play C
– stay in cooperative phase until one player plays D, in
which case go to punishment phase
– in punishment phase, play D
– stay in punishment phase for m periods (and then go
back to cooperative phase) unless at some point some
player chooses C, in which case restart punishment
• For any m,


U1r , p s, s h  2 (efficiency), as r  0
p 0
25
Can sharpen Theorem 2 for Prisoner’s Dilemma:
Given q , there exist r and p such that, for all
r   0, r  and p   0, p  , if s is ES w.r.t. q , r , p,
then it cannot entail a punishment lasting more
than 3  2q periods
2q
Proof: very similar to that of Theorem 2
26
For r and p too big, ES strategy s may not be
“efficient”
• if r  , back in one-shot case
• if p  12 , then even fully cooperative
strategies in Prisoner’s Dilemma generate
payoffs  1.
27
Theorem 3: Let  v, v  V  with v  v
For all   0, there exist q  0 and r  0 s.t.
for all r  r there exists p  0 s.t.
for all p  p there exists s ES w.r.t. q , r , p for which
U1r , p  s, s   v  
28
Proof: Construct s so that
• along equilibrium path of (s, s), payoffs are
(approximately) (v, v)
• punishments are nearly strongly efficient
– deviating player (say 1) minimaxed long enough wipe
out gain
– thereafter go to strongly efficient point

v


,
w
  v   
– overall payoffs after deviation: 
• if r and p small (s, s) is a subgame perfect
equilibrium
29
• In Prisoner’s Dilemma, consider s that
– plays C the first period
– thereafter, plays C if and only if either both players played
C previous period or neither did
• strategy s
– is efficient
– entails punishments that are as short as possible
– is modification of Tit-for-Tat (C the first period; thereafter,
do what other player did previous period)
• Tit-for-Tat not ES
– if mistake (D, C) occurs then get wave of alternating
punishments:
(C, D), (D, C), (C, D), ...
until another mistake made
30
•
d
a
b
c
a
0, 0
4,1
0, 0
0, 0
b
1, 4
0, 0
0, 0
0, 0
c
0, 0
0, 0
0, 0
0, 0
d
0, 0
0, 0
0, 0
2, 2
modified battle of sexes
Let s = play d as long as in all past periods
– both players played d
– neither played d
if single player deviates from d
– henceforth, that player plays b
– other player plays a
•
s is ES even though inefficient
– any attempt to improve on efficiency, punished forever
– can’t invade during punishment, because punishment efficient
31
Consider potential invader s'
For any h, s' cannot do better against s than s does against itself, since (s, s)
equilibrium
hence, for all h,
()
U1r , p  s, s h   U1r , p  s, s h  (otherwise s can't invade)
and so
()
U1r , p  s, s h   U1r , p  s, s h 
For s' to invade, need
() U1r , p  s, s h   U1r , p  s, s   h  


 U1r , p  s, s h   U1r , p s, s   h  for some h
Claim: () implies h' involves deviation from equil path of (s, s)
only other possibility:
–
s' different from s on equil path
–
then s' punished by 
– violates ()
r, p
r, p

we thus have U1  s, s h   U1 s, s   h   w
Hence, from (), rhs of ()  w , so inequality not feasible


32
For Theorem 3 to hold, p must be small relative to r
• consider modified Tit-for-Tat against itself
(play C if and only if both players took same action last
period)
• with every mistake, there is an expected loss of
2 – (½ · 3 + ½ (−1)) = 1 the first period
2 – 0 = 2 the second period
• so over-all the expected loss from mistakes is
 1 r 
approximately 3 p 

r


• By contrast, a mutant strategy that signals, etc. and doesn’t
1 r 
punish at all against itself loses only about p 

r


• so if r is small enough relative to p, mutant can invade
33