Download -robust Nash Equilibria

Document related concepts
no text concepts found
Transcript
-robust Nash Equilibria
Patricia Bouyer, Nicolas Markey and Daniel STAN
CNRS, LSV, ENS Cachan
Fifth Cassting Meeting 2014
1 / 26
1
Concurrent framework
2
Existence of equilibria
3
Linear and robust equilibria
2 / 26
The framework
1
Concurrent framework
2
Existence of equilibria
3
Linear and robust equilibria
3 / 26
The framework
Games with mixed strategies
Concurrent non-zero sum games allow
To modelize heterogeneous systems
Several events to occur simultaneously
Agents’ goals not to be necessarily antagonistic
4 / 26
The framework
Games with mixed strategies
Concurrent non-zero sum games allow
To modelize heterogeneous systems
Several events to occur simultaneously
Agents’ goals not to be necessarily antagonistic
whereas mixed strategies enable
Synthesizing strategies for controllers
with memory
Breaking the symmetry (by randomization)
Equilibrium more likely to occur
4 / 26
The framework
Formal model
Definition (Arena)
D
E
A = States, Agt, Act, Tab, (Allowi )i∈Agt
with
|States|, |Agt|, |Act| < +∞
Tab : States × ActAgt −→ States
∀i ∈ Agt
Allowi : States −→ 2Act \{∅}
5 / 26
The framework
Formal model
Definition (Arena)
D
E
A = States, Agt, Act, Tab, (Allowi )i∈Agt
with
|States|, |Agt|, |Act| < +∞
Tab : States × ActAgt −→ States
∀i ∈ Agt
s1
aa, bb
−−
w1
Allowi : States −→ 2Act \{∅}
ab, ba
s2
aa, bb
−−
w2
5 / 26
The framework
Definition (Game)
G = hA, s, φi
where
A is an arena
s ∈ States is an initial state
φ : Statesω −→ RAgt a utility function
6 / 26
The framework
Definition (Game)
G = hA, s, φi
where
A is an arena
s ∈ States is an initial state
φ : Statesω −→ RAgt a utility function
s1
ab, ba
aa, bb
−−
w1
s2
aa, bb
−−

∗ ω

 (1, 0) if r ∈ States w1
φ(r ) = (0, 1) if r ∈ States∗ w2ω


(0, 0) otherwise
w2
6 / 26
The framework
Family of utility functions
Safety condition
Reachability
Limit average
Terminal reachability
7 / 26
The framework
Family of utility functions
Safety condition
Reachability
Limit average
Terminal reachability
Definition (Final states)
Let F denote the set of states that have no successor except themselves. φ
is a terminal reachability utility function if
∀r φ(r ) 6= 0 ⇔ ∃h ∈ States∗ ∃f ∈ F : r = h · f ω
7 / 26
The framework
Family of utility functions
Safety condition
Reachability
Limit average
Terminal reachability
Definition (Final states)
Let F denote the set of states that have no successor except themselves. φ
is a terminal reachability utility function if
∀r φ(r ) 6= 0 ⇔ ∃h ∈ States∗ ∃f ∈ F : r = h · f ω ∧ φ(r ) = φ(f ω )
7 / 26
The framework
Family of utility functions
Safety condition
Reachability
Limit average
Terminal reachability
Definition (Final states)
Let F denote the set of states that have no successor except themselves. φ
is a terminal reachability utility function if
∀r φ(r ) 6= 0 ⇔ ∃h ∈ States∗ ∃f ∈ F : r = h · f ω ∧ φ(r ) = φ(f ω )
s1
aa, bb
1, 0
ab, ba
s2
aa, bb
0, 1
7 / 26
The framework
Definition (Strategies)
A strategy for player i in arena A is given by σi such that for all
h ∈ States+ ,
σi (h) ∈ Dist(Allowi (last(h)))
We call strategy profile the data of strategies for all players, and any finite
non-empty sequence of states is a history.
8 / 26
The framework
Definition (Strategies)
A strategy for player i in arena A is given by σi such that for all
h ∈ States+ ,
σi (h) ∈ Dist(Allowi (last(h)))
We call strategy profile the data of strategies for all players, and any finite
non-empty sequence of states is a history.
Definition (Expectation)
We consider a game G and a strategy
profile σ. X0 = s,
Q
Xn+1 = Tab(Xn , An ) with An ∼ i σi (X0 . . . Xn ).
Let r = lim X0 . . . Xn ∈ Statesω .
Under some mesurability assumptions, the expectation of φ(r ) exists.
If P(r ∈ hStatesω ) > 0, we write Eσ (φ | h) the conditionnal expectation.
8 / 26
The framework
Nash Equilibrium
Definition
Let σ a strategy profile and h an history, then (σ, h) is a Nash Equilibrium
(NE) if for all agent i and any other strategy for i (deviation) σi0 ,
0
Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h)
9 / 26
The framework
Nash Equilibrium
Definition
Let σ a strategy profile and h an history, then (σ, h) is a Nash Equilibrium
(NE) if for all agent i and any other strategy for i (deviation) σi0 ,
0
Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h)
We can show that we can restrict to deterministic deviation only (for
terminal reachability objectives).
9 / 26
The framework
Nash Equilibrium
Definition
Let σ a strategy profile and h an history, then (σ, h) is a Nash Equilibrium
(NE) if for all agent i and any other strategy for i (deviation) σi0 ,
0
Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h)
We can show that we can restrict to deterministic deviation only (for
terminal reachability objectives).
s1
aa, bb
1, 0
ab, ba
s2
aa, bb
0, 1
The uniform strategy for both players is a NE (payoff (2/3, 1/3)).
9 / 26
Existence of equilibria
1
Concurrent framework
2
Existence of equilibria
3
Linear and robust equilibria
10 / 26
Existence of equilibria
Does a mixed Nash Equilibrium always exist?
hw
hs,rw
1, −1
rs
−1, 1
11 / 26
Existence of equilibria
Does a mixed Nash Equilibrium always exist?
hw
hs,rw
1, −1
hw
rs
−1, 1
hs,rw
2, 0
rs
0, 2
Figure: Hide-or-Run game
11 / 26
Existence of equilibria
Does a mixed Nash Equilibrium always exist?
hw
hs,rw
1, −1
rs
−1, 1
1, 1
hs,rw
2, 0
hw
rs
0, 2
Figure: Hide-or-Run game
Value problem in a zero-sum game is not a special case of Nash
Equilibrium problem with positive terminal rewards
11 / 26
Existence of equilibria
Does a mixed Nash Equilibrium always exist?
hw
hs,rw
1, −1
rs
−1, 1
1, 1
hs,rw
2, 0
hw
rs
0, 2
Figure: Hide-or-Run game
Value problem in a zero-sum game is not a special case of Nash
Equilibrium problem with positive terminal rewards
Theorem
The existence problem is undecidable for 3-player concurrent games with
non-negative terminal rewards and a constrain.
11 / 26
Existence of equilibria
Does a mixed Nash Equilibrium always exist?
hw
hs,rw
1, −1
rs
−1, 1
1, 1
hs,rw
2, 0
hw
rs
0, 2
Figure: Hide-or-Run game
Value problem in a zero-sum game is not a special case of Nash
Equilibrium problem with positive terminal rewards
Theorem
The existence problem is undecidable for 3-player concurrent games with
non-negative terminal rewards and a constrain. Also holds on arbitrary
terminal rewards without constrains.
11 / 26
Existence of equilibria
Existence of equilibria
Theorem (Nash 1950)
Every one-stage game has a Nash Equilibrium in mixed strategies.
12 / 26
Existence of equilibria
Existence of equilibria
Theorem (Nash 1950)
Every one-stage game has a Nash Equilibrium in mixed strategies.
Theorem (Secchi and Sudderth 2001)
NE always exists for safety qualitative objectives. Strategies have finite
memory.
12 / 26
Existence of equilibria
Existence of equilibria
Theorem (Nash 1950)
Every one-stage game has a Nash Equilibrium in mixed strategies.
Theorem (Secchi and Sudderth 2001)
NE always exists for safety qualitative objectives. Strategies have finite
memory.
Theorem (Chatterjee et al. 2004)
For > 0, -Nash Equilibriumalways exists with terminal reward, and
strategies are stationary.
12 / 26
Existence of equilibria
Termination problem
General scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function: BR : M → 2M mapping a to a set of
strategy profiles improving the payoff of each player and show it is
continuous, then apply Kakutani fix-point theorem to show
∃σ σ ∈ BR(σ).
13 / 26
Existence of equilibria
Termination problem
General scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function: BR : M → 2M mapping a to a set of
strategy profiles improving the payoff of each player and show it is
continuous, then apply Kakutani fix-point theorem to show
∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
13 / 26
Existence of equilibria
Termination problem
General scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function: BR : M → 2M mapping a to a set of
strategy profiles improving the payoff of each player and show it is
continuous, then apply Kakutani fix-point theorem to show
∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
One-stage termination
13 / 26
Existence of equilibria
Termination problem
General scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function: BR : M → 2M mapping a to a set of
strategy profiles improving the payoff of each player and show it is
continuous, then apply Kakutani fix-point theorem to show
∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
One-stage termination
Assume no final safety collaboration, bound probability to make
someone loose
13 / 26
Existence of equilibria
Termination problem
General scheme.
Let M be the set of stationary strategy profiles.
Consider the best response function: BR : M → 2M mapping a to a set of
strategy profiles improving the payoff of each player and show it is
continuous, then apply Kakutani fix-point theorem to show
∃σ σ ∈ BR(σ).
Continuity of BR is based on termination assumptions.
One-stage termination
Assume no final safety collaboration, bound probability to make
someone loose
Consider a discounted version
13 / 26
Existence of equilibria
Limit behaviour
−a
s1
s2
a−
b−
1
3, 1
−b
1, 13
14 / 26
Existence of equilibria
Limit behaviour
−a
s1
s2
a−
b−
1
3, 1
NE strategies (probability of
playing b):
{(x, 0), (0, x) | 1 ≥ x > 0}
−b
1, 13
14 / 26
Existence of equilibria
Limit behaviour
−a
s1
s2
a−
b−
1
3, 1
−b
NE strategies (probability of
playing b):
{(x, 0), (0, x) | 1 ≥ x > 0}
NE payoffs: {(1, 0), (0, 1)}
1, 13
14 / 26
Existence of equilibria
Limit behaviour
−a
s1
s2
a−
b−
1
3, 1
NE payoffs: {(1, 0), (0, 1)}
−b
1,
NE strategies (probability of
playing b):
{(x, 0), (0, x) | 1 ≥ x > 0}
1
3
BR function graph not
continuous in (0, 0)
14 / 26
Linear and robust equilibria
1
Concurrent framework
2
Existence of equilibria
3
Linear and robust equilibria
15 / 26
Linear and robust equilibria
Assumptions
From now, we consider stationary memoryless strategies (set M)
16 / 26
Linear and robust equilibria
Assumptions
From now, we consider stationary memoryless strategies (set M)
Definition (Cycling Arena)
Let A be an arena. Assume there exists an state s ∈ States and a mixed
strategy profile σ such that no player can enforce reaching a final state:
s
∀i ∈ Agt ∀σi0 ∈ Mi Pσ[i/σi ] (States∗ Fω | s) = 0
Such a state is called cycling.
Note that such a definition implies a Nash Equilibrium with payoff 0 for all
players.
16 / 26
Linear and robust equilibria
Assumptions
From now, we consider stationary memoryless strategies (set M)
Definition (Cycling Arena)
Let A be an arena. Assume there exists an state s ∈ States and a mixed
strategy profile σ such that no player can enforce reaching a final state:
s
∀i ∈ Agt ∀σi0 ∈ Mi Pσ[i/σi ] (States∗ Fω | s) = 0
Such a state is called cycling.
Note that such a definition implies a Nash Equilibrium with payoff 0 for all
players.
Lemma (Remark)
One can effectively transform every game G into a non-cycling game G 0 ,
such that every Nash Equilibrium in G 0 can be converted into a Nash
Equilibriumin G.
16 / 26
Linear and robust equilibria
Strong components
Definition (Strong Component)
Let A be an arena and C ⊆ States.
C is called a strong component if there exists σ ∈ M such that every state
of C is reachable from another with strategy profile σ:
∀s, s 0 ∈ C Pσ (States∗ s 0 | s) > 0
Such σ will be said to stabilize C . We denote with SC the set of strong
components.
Note that it is equivalent to say that the previous probability is equal to 1.
We can also remark that every strong component intersecting F is reduced
to a singleton.
17 / 26
Linear and robust equilibria
Strong component escaping
From now on, we consider non-cycling games.
18 / 26
Linear and robust equilibria
Strong component escaping
From now on, we consider non-cycling games.
Definition (Exiting actions)
Let C ∈ SC. a ∈ Act is an exiting action from C for state s ∈ C and
player i if:
s
Pσ[i/(s7→a)] (s · (States\C ) | s) > 0
for some stationary σ stabilizing C
We define
Exit(C ) = {(a, i, s) | a is an exiting action from C state s ∈ C for player i}
18 / 26
Linear and robust equilibria
Strong component escaping
From now on, we consider non-cycling games.
Definition (Exiting actions)
Let C ∈ SC. a ∈ Act is an exiting action from C for state s ∈ C and
player i if:
s
Pσ[i/(s7→a)] (s · (States\C ) | s) > 0
for some stationary σ stabilizing C
We define
Exit(C ) = {(a, i, s) | a is an exiting action from C state s ∈ C for player i}
Lemma
For any C ∈ SC, Exit(C ) 6= ∅.
18 / 26
Linear and robust equilibria
Reduced State space
Definition
For any C ∈ SC strong component and > 0, we define




X
σi (a | s) ≥ ∆ (C ) = σ ∈ M 

(a,i,s)∈Exit(C )
We also denote ∆ =
T
S∈max SC ∆ (C ).
19 / 26
Linear and robust equilibria
Reduced State space
Definition
For any C ∈ SC strong component and > 0, we define




X
σi (a | s) ≥ ∆ (C ) = σ ∈ M 

(a,i,s)∈Exit(C )
We also denote ∆ =
T
S∈max SC ∆ (C ).
Lemma
For ≤
1
|Act| ,
∆ 6= ∅. It is also convex.
19 / 26
Linear and robust equilibria
Limit behaviour
−a
s1
s2
a−
b−
1
3, 1
−b
1, 13
20 / 26
Linear and robust equilibria
Limit behaviour
−a
s1
s2
a−
b−
1
3, 1
−b
σ1 (b | s1 ) + σ2 (b | s2 ) ≥ 1, 13
20 / 26
Linear and robust equilibria
Bounding probability of termination
Lemma
If σ ∈ ∆ , then Prσ (States∗ F | s) = 1
21 / 26
Linear and robust equilibria
Bounding probability of termination
Lemma
If σ ∈ ∆ , then Prσ (States∗ F | s) = 1
Theorem
For > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ ,
∀s ∈ States Pσ (Statesk · F | s) ≥ p
That is to say, after k iterations, there is a bounded probability that a final
state is reached.
21 / 26
Linear and robust equilibria
Existence theorem
Definition (Best response function)
Let BR : ∆ → 2∆ with
n
o
0
BR (σ) = σ 0 ∈ ∆ ∀i ∈ Agt ∀s ∈ States, σi0 ∈ argmaxσ0 Eσ[i/σi ] (φi | s)
22 / 26
Linear and robust equilibria
Existence theorem
Definition (Best response function)
Let BR : ∆ → 2∆ with
n
o
0
BR (σ) = σ 0 ∈ ∆ ∀i ∈ Agt ∀s ∈ States, σi0 ∈ argmaxσ0 Eσ[i/σi ] (φi | s)
Theorem
For 0 < ≤
1
|Act| ,
BR has a fixed point.
22 / 26
Linear and robust equilibria
Existence theorem
Definition (Best response function)
Let BR : ∆ → 2∆ with
n
o
0
BR (σ) = σ 0 ∈ ∆ ∀i ∈ Agt ∀s ∈ States, σi0 ∈ argmaxσ0 Eσ[i/σi ] (φi | s)
Theorem
For 0 < ≤
1
|Act| ,
BR has a fixed point.
Proof sketch.
∆ is a non-empty compact convex subset of RN where
N = Agt × States. Moreover BR (σ) is a non-empty convex set and the
graph of BR is continuous.
22 / 26
Linear and robust equilibria
-robust equilibria
Definition (robust equilibria)
Let σ ∈ M, σ is a -robust Nash Equilibrium if for any player i,
0
∀σi0 ∃σi00 d(σi0 , σi00 ) ≤ Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h)
with d(σ, σ 0 ) the maximal distance between distributions.
23 / 26
Linear and robust equilibria
-robust equilibria
Definition (robust equilibria)
Let σ ∈ M, σ is a -robust Nash Equilibrium if for any player i,
0
∀σi0 ∃σi00 d(σi0 , σi00 ) ≤ Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h)
with d(σ, σ 0 ) the maximal distance between distributions.
1
σ ∈ ∆α (σ) is a -robust NE
2
This is not a NE (but the converse is false)
3
This is not a -NE
4
This is stationary (stationary NE may not exist for 3 players.)
23 / 26
Linear and robust equilibria
-robust equilibria
Definition (robust equilibria)
Let σ ∈ M, σ is a -robust Nash Equilibrium if for any player i,
0
∀σi0 ∃σi00 d(σi0 , σi00 ) ≤ Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h)
with d(σ, σ 0 ) the maximal distance between distributions.
1
σ ∈ ∆α (σ) is a -robust NE
2
This is not a NE (but the converse is false)
3
This is not a -NE
4
This is stationary (stationary NE may not exist for 3 players.)
5
No computation method yet
23 / 26
Linear and robust equilibria
Overview
Getting closer to exact NE existence problem (2 players)
24 / 26
Linear and robust equilibria
Overview
Getting closer to exact NE existence problem (2 players)
Using of linear constrains to enforce a non-linear property
24 / 26
Linear and robust equilibria
Overview
Getting closer to exact NE existence problem (2 players)
Using of linear constrains to enforce a non-linear property
New notion of equilibria, not equivalent to previous ones
24 / 26
Linear and robust equilibria
Overview
Getting closer to exact NE existence problem (2 players)
Using of linear constrains to enforce a non-linear property
New notion of equilibria, not equivalent to previous ones
Non-constructive proof (ongoing work)
24 / 26
Linear and robust equilibria
Thank you for your attention
Questions ?
25 / 26
Linear and robust equilibria
Bibliography I
Krishnendu Chatterjee, Marcin Jurdziński, and Rupak Majumdar. On Nash
equilibria in stochastic games. In CSL’04, volume 3210 of LNCS, pages
26–40. Springer, 2004.
John F. Nash. Equilibrium points in n-person games. Proceedings of the
National Academy of Sciences of the United States of America, 36(1):
48–49, 1950.
Piercesare Secchi and William D. Sudderth. Stay-in-a-set games.
International Journal of Game Theory, 30:479–490, 2001.
26 / 26