Download Notes for lecture 11

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CS 6840 Algorithmic Game Theory (2 pages)
Spring 2012
Lecture 11 Scribe Notes
Instructor: Éva Tardos
Norris Xu
1 Lecture 11 Wednesday 15 February 2012
No-Regret Learning
1.1
No-Regret Learning
Suppose we have a game with cost, and players 1, . . . , n. For a strategy vector s = (s1 , . . . , sn ), we
have ci (s) is the cost for player i if strategy vector s is played.
We play this game on days 1, . .P
.,T, P
with st = (st1 , . . . , stn ) being the
P strategy vector used on
T
day t. The overall cost is therefore t=1 i ci (st ) and player i's cost is Tt=1 ci (st ).
We can model Best Response: each day, one person best responds, and the rest do the same
thing as before. But best response is unrealistic (e.g. rock-paper-scissors). In reality, players learn
based on others' strategies.
Denition (No-Regret).
A sequence of strategy vectors s1 , . . . , sT is
T
X
ci (st ) ≤ min
t=1
x
T
X
no-regret
for player i if:
ci (x, st−i )
t=1
|
{z
}
cost of strategy x
Denition
(Vanishing Regret). A sequence of strategy vectors s1 , . . . , sT has
player i if, assuming that 0 ≤ ci (s) ≤ 1 for any strategy vector s:
lim sup
T →∞
vanishing regret
for
T
T
1X
1X
ci (st ) − min
ci (x, st−i ) ≤ 0
x T
T
t=1
t=1
Theorem.
If a cost game is (λ, µ)-smooth1 and all players have no regret on a sequence s1 , . . . , sT
of plays, then:
T X
X
X
λ
ci (st ) ≤
T min
ci (s)
s
1−µ
t=1 i
i
| {z }
same bound as Price of Anarchy
1
Denition ((λ, µ)-smooth).
A cost game is
(λ, µ)-smooth if for any strategy
X
X
X
ci (s∗i , s−i ) ≤ λ
ci (s∗ ) + µ
ci (s)
i
i
i
vectors
s, s∗ :
CS 6840 Lecture 11 Scribe Notes (page 2 of 2)
Proof.
Let s∗ be the min cost vector. Then for any player i, s∗i is no-regret. Then:
T
X
t=1
T
XX
t=1
ci (st ) ≤ min
T
X
x
ci (st ) ≤
i
≤
t=1
T
XX
t=1
T
X
ci (x, st−i ) ≤
t=1
Lemma.
1.2
i
ci (st ) ≤
ci (s∗i , st−i )
t=1
ci (s∗i , st−i )
i
!
λ
X
t=1
T X
X
T
X
ci (s∗ ) + µ
i
X
i
ci (st )
= λT
X
i
ci (s∗ ) + µ
T X
X
t=1
ci (st )
i
X
λ
T
ci (s∗ )
1−µ
i
The sequence s, s, . . . , s is no-regret for all players if and only if s is a Nash equilibrium.
Randomized Strategy
For all players i and strategies x, let the probability distribution pi (x) be the probability of i
playing x. Assuming that the players' choices
Q are independent, the probability of playing the
strategy
vector
s
=
(s
,
.
.
.
,
s
)
is
then
p(s)
=
1
n
i pi (si ). The expected cost Es [ci (s)] for player i is
P
P
Q
p(s)c
(s)
=
p
(s
)c
(s)
.
i
s
s=(s1 ,...,sn )
j j j i
A set of probability distributions pi form a Nash equilibrium if for any player i and strategy x,
Es [ci (s)] ≤ Es [ci (x, s−i )]; equivalently, for any player i, Es [ci (s)] ≤ minx Es [ci (x, s−i )].
Related documents