Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
-robust Nash Equilibria Patricia Bouyer, Nicolas Markey and Daniel STAN CNRS, LSV, ENS Cachan Fifth Cassting Meeting 2014 1 / 26 1 Concurrent framework 2 Existence of equilibria 3 Linear and robust equilibria 2 / 26 The framework 1 Concurrent framework 2 Existence of equilibria 3 Linear and robust equilibria 3 / 26 The framework Games with mixed strategies Concurrent non-zero sum games allow To modelize heterogeneous systems Several events to occur simultaneously Agents’ goals not to be necessarily antagonistic 4 / 26 The framework Games with mixed strategies Concurrent non-zero sum games allow To modelize heterogeneous systems Several events to occur simultaneously Agents’ goals not to be necessarily antagonistic whereas mixed strategies enable Synthesizing strategies for controllers with memory Breaking the symmetry (by randomization) Equilibrium more likely to occur 4 / 26 The framework Formal model Definition (Arena) D E A = States, Agt, Act, Tab, (Allowi )i∈Agt with |States|, |Agt|, |Act| < +∞ Tab : States × ActAgt −→ States ∀i ∈ Agt Allowi : States −→ 2Act \{∅} 5 / 26 The framework Formal model Definition (Arena) D E A = States, Agt, Act, Tab, (Allowi )i∈Agt with |States|, |Agt|, |Act| < +∞ Tab : States × ActAgt −→ States ∀i ∈ Agt s1 aa, bb −− w1 Allowi : States −→ 2Act \{∅} ab, ba s2 aa, bb −− w2 5 / 26 The framework Definition (Game) G = hA, s, φi where A is an arena s ∈ States is an initial state φ : Statesω −→ RAgt a utility function 6 / 26 The framework Definition (Game) G = hA, s, φi where A is an arena s ∈ States is an initial state φ : Statesω −→ RAgt a utility function s1 ab, ba aa, bb −− w1 s2 aa, bb −− ∗ ω (1, 0) if r ∈ States w1 φ(r ) = (0, 1) if r ∈ States∗ w2ω (0, 0) otherwise w2 6 / 26 The framework Family of utility functions Safety condition Reachability Limit average Terminal reachability 7 / 26 The framework Family of utility functions Safety condition Reachability Limit average Terminal reachability Definition (Final states) Let F denote the set of states that have no successor except themselves. φ is a terminal reachability utility function if ∀r φ(r ) 6= 0 ⇔ ∃h ∈ States∗ ∃f ∈ F : r = h · f ω 7 / 26 The framework Family of utility functions Safety condition Reachability Limit average Terminal reachability Definition (Final states) Let F denote the set of states that have no successor except themselves. φ is a terminal reachability utility function if ∀r φ(r ) 6= 0 ⇔ ∃h ∈ States∗ ∃f ∈ F : r = h · f ω ∧ φ(r ) = φ(f ω ) 7 / 26 The framework Family of utility functions Safety condition Reachability Limit average Terminal reachability Definition (Final states) Let F denote the set of states that have no successor except themselves. φ is a terminal reachability utility function if ∀r φ(r ) 6= 0 ⇔ ∃h ∈ States∗ ∃f ∈ F : r = h · f ω ∧ φ(r ) = φ(f ω ) s1 aa, bb 1, 0 ab, ba s2 aa, bb 0, 1 7 / 26 The framework Definition (Strategies) A strategy for player i in arena A is given by σi such that for all h ∈ States+ , σi (h) ∈ Dist(Allowi (last(h))) We call strategy profile the data of strategies for all players, and any finite non-empty sequence of states is a history. 8 / 26 The framework Definition (Strategies) A strategy for player i in arena A is given by σi such that for all h ∈ States+ , σi (h) ∈ Dist(Allowi (last(h))) We call strategy profile the data of strategies for all players, and any finite non-empty sequence of states is a history. Definition (Expectation) We consider a game G and a strategy profile σ. X0 = s, Q Xn+1 = Tab(Xn , An ) with An ∼ i σi (X0 . . . Xn ). Let r = lim X0 . . . Xn ∈ Statesω . Under some mesurability assumptions, the expectation of φ(r ) exists. If P(r ∈ hStatesω ) > 0, we write Eσ (φ | h) the conditionnal expectation. 8 / 26 The framework Nash Equilibrium Definition Let σ a strategy profile and h an history, then (σ, h) is a Nash Equilibrium (NE) if for all agent i and any other strategy for i (deviation) σi0 , 0 Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h) 9 / 26 The framework Nash Equilibrium Definition Let σ a strategy profile and h an history, then (σ, h) is a Nash Equilibrium (NE) if for all agent i and any other strategy for i (deviation) σi0 , 0 Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h) We can show that we can restrict to deterministic deviation only (for terminal reachability objectives). 9 / 26 The framework Nash Equilibrium Definition Let σ a strategy profile and h an history, then (σ, h) is a Nash Equilibrium (NE) if for all agent i and any other strategy for i (deviation) σi0 , 0 Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h) We can show that we can restrict to deterministic deviation only (for terminal reachability objectives). s1 aa, bb 1, 0 ab, ba s2 aa, bb 0, 1 The uniform strategy for both players is a NE (payoff (2/3, 1/3)). 9 / 26 Existence of equilibria 1 Concurrent framework 2 Existence of equilibria 3 Linear and robust equilibria 10 / 26 Existence of equilibria Does a mixed Nash Equilibrium always exist? hw hs,rw 1, −1 rs −1, 1 11 / 26 Existence of equilibria Does a mixed Nash Equilibrium always exist? hw hs,rw 1, −1 hw rs −1, 1 hs,rw 2, 0 rs 0, 2 Figure: Hide-or-Run game 11 / 26 Existence of equilibria Does a mixed Nash Equilibrium always exist? hw hs,rw 1, −1 rs −1, 1 1, 1 hs,rw 2, 0 hw rs 0, 2 Figure: Hide-or-Run game Value problem in a zero-sum game is not a special case of Nash Equilibrium problem with positive terminal rewards 11 / 26 Existence of equilibria Does a mixed Nash Equilibrium always exist? hw hs,rw 1, −1 rs −1, 1 1, 1 hs,rw 2, 0 hw rs 0, 2 Figure: Hide-or-Run game Value problem in a zero-sum game is not a special case of Nash Equilibrium problem with positive terminal rewards Theorem The existence problem is undecidable for 3-player concurrent games with non-negative terminal rewards and a constrain. 11 / 26 Existence of equilibria Does a mixed Nash Equilibrium always exist? hw hs,rw 1, −1 rs −1, 1 1, 1 hs,rw 2, 0 hw rs 0, 2 Figure: Hide-or-Run game Value problem in a zero-sum game is not a special case of Nash Equilibrium problem with positive terminal rewards Theorem The existence problem is undecidable for 3-player concurrent games with non-negative terminal rewards and a constrain. Also holds on arbitrary terminal rewards without constrains. 11 / 26 Existence of equilibria Existence of equilibria Theorem (Nash 1950) Every one-stage game has a Nash Equilibrium in mixed strategies. 12 / 26 Existence of equilibria Existence of equilibria Theorem (Nash 1950) Every one-stage game has a Nash Equilibrium in mixed strategies. Theorem (Secchi and Sudderth 2001) NE always exists for safety qualitative objectives. Strategies have finite memory. 12 / 26 Existence of equilibria Existence of equilibria Theorem (Nash 1950) Every one-stage game has a Nash Equilibrium in mixed strategies. Theorem (Secchi and Sudderth 2001) NE always exists for safety qualitative objectives. Strategies have finite memory. Theorem (Chatterjee et al. 2004) For > 0, -Nash Equilibriumalways exists with terminal reward, and strategies are stationary. 12 / 26 Existence of equilibria Termination problem General scheme. Let M be the set of stationary strategy profiles. Consider the best response function: BR : M → 2M mapping a to a set of strategy profiles improving the payoff of each player and show it is continuous, then apply Kakutani fix-point theorem to show ∃σ σ ∈ BR(σ). 13 / 26 Existence of equilibria Termination problem General scheme. Let M be the set of stationary strategy profiles. Consider the best response function: BR : M → 2M mapping a to a set of strategy profiles improving the payoff of each player and show it is continuous, then apply Kakutani fix-point theorem to show ∃σ σ ∈ BR(σ). Continuity of BR is based on termination assumptions. 13 / 26 Existence of equilibria Termination problem General scheme. Let M be the set of stationary strategy profiles. Consider the best response function: BR : M → 2M mapping a to a set of strategy profiles improving the payoff of each player and show it is continuous, then apply Kakutani fix-point theorem to show ∃σ σ ∈ BR(σ). Continuity of BR is based on termination assumptions. One-stage termination 13 / 26 Existence of equilibria Termination problem General scheme. Let M be the set of stationary strategy profiles. Consider the best response function: BR : M → 2M mapping a to a set of strategy profiles improving the payoff of each player and show it is continuous, then apply Kakutani fix-point theorem to show ∃σ σ ∈ BR(σ). Continuity of BR is based on termination assumptions. One-stage termination Assume no final safety collaboration, bound probability to make someone loose 13 / 26 Existence of equilibria Termination problem General scheme. Let M be the set of stationary strategy profiles. Consider the best response function: BR : M → 2M mapping a to a set of strategy profiles improving the payoff of each player and show it is continuous, then apply Kakutani fix-point theorem to show ∃σ σ ∈ BR(σ). Continuity of BR is based on termination assumptions. One-stage termination Assume no final safety collaboration, bound probability to make someone loose Consider a discounted version 13 / 26 Existence of equilibria Limit behaviour −a s1 s2 a− b− 1 3, 1 −b 1, 13 14 / 26 Existence of equilibria Limit behaviour −a s1 s2 a− b− 1 3, 1 NE strategies (probability of playing b): {(x, 0), (0, x) | 1 ≥ x > 0} −b 1, 13 14 / 26 Existence of equilibria Limit behaviour −a s1 s2 a− b− 1 3, 1 −b NE strategies (probability of playing b): {(x, 0), (0, x) | 1 ≥ x > 0} NE payoffs: {(1, 0), (0, 1)} 1, 13 14 / 26 Existence of equilibria Limit behaviour −a s1 s2 a− b− 1 3, 1 NE payoffs: {(1, 0), (0, 1)} −b 1, NE strategies (probability of playing b): {(x, 0), (0, x) | 1 ≥ x > 0} 1 3 BR function graph not continuous in (0, 0) 14 / 26 Linear and robust equilibria 1 Concurrent framework 2 Existence of equilibria 3 Linear and robust equilibria 15 / 26 Linear and robust equilibria Assumptions From now, we consider stationary memoryless strategies (set M) 16 / 26 Linear and robust equilibria Assumptions From now, we consider stationary memoryless strategies (set M) Definition (Cycling Arena) Let A be an arena. Assume there exists an state s ∈ States and a mixed strategy profile σ such that no player can enforce reaching a final state: s ∀i ∈ Agt ∀σi0 ∈ Mi Pσ[i/σi ] (States∗ Fω | s) = 0 Such a state is called cycling. Note that such a definition implies a Nash Equilibrium with payoff 0 for all players. 16 / 26 Linear and robust equilibria Assumptions From now, we consider stationary memoryless strategies (set M) Definition (Cycling Arena) Let A be an arena. Assume there exists an state s ∈ States and a mixed strategy profile σ such that no player can enforce reaching a final state: s ∀i ∈ Agt ∀σi0 ∈ Mi Pσ[i/σi ] (States∗ Fω | s) = 0 Such a state is called cycling. Note that such a definition implies a Nash Equilibrium with payoff 0 for all players. Lemma (Remark) One can effectively transform every game G into a non-cycling game G 0 , such that every Nash Equilibrium in G 0 can be converted into a Nash Equilibriumin G. 16 / 26 Linear and robust equilibria Strong components Definition (Strong Component) Let A be an arena and C ⊆ States. C is called a strong component if there exists σ ∈ M such that every state of C is reachable from another with strategy profile σ: ∀s, s 0 ∈ C Pσ (States∗ s 0 | s) > 0 Such σ will be said to stabilize C . We denote with SC the set of strong components. Note that it is equivalent to say that the previous probability is equal to 1. We can also remark that every strong component intersecting F is reduced to a singleton. 17 / 26 Linear and robust equilibria Strong component escaping From now on, we consider non-cycling games. 18 / 26 Linear and robust equilibria Strong component escaping From now on, we consider non-cycling games. Definition (Exiting actions) Let C ∈ SC. a ∈ Act is an exiting action from C for state s ∈ C and player i if: s Pσ[i/(s7→a)] (s · (States\C ) | s) > 0 for some stationary σ stabilizing C We define Exit(C ) = {(a, i, s) | a is an exiting action from C state s ∈ C for player i} 18 / 26 Linear and robust equilibria Strong component escaping From now on, we consider non-cycling games. Definition (Exiting actions) Let C ∈ SC. a ∈ Act is an exiting action from C for state s ∈ C and player i if: s Pσ[i/(s7→a)] (s · (States\C ) | s) > 0 for some stationary σ stabilizing C We define Exit(C ) = {(a, i, s) | a is an exiting action from C state s ∈ C for player i} Lemma For any C ∈ SC, Exit(C ) 6= ∅. 18 / 26 Linear and robust equilibria Reduced State space Definition For any C ∈ SC strong component and > 0, we define X σi (a | s) ≥ ∆ (C ) = σ ∈ M (a,i,s)∈Exit(C ) We also denote ∆ = T S∈max SC ∆ (C ). 19 / 26 Linear and robust equilibria Reduced State space Definition For any C ∈ SC strong component and > 0, we define X σi (a | s) ≥ ∆ (C ) = σ ∈ M (a,i,s)∈Exit(C ) We also denote ∆ = T S∈max SC ∆ (C ). Lemma For ≤ 1 |Act| , ∆ 6= ∅. It is also convex. 19 / 26 Linear and robust equilibria Limit behaviour −a s1 s2 a− b− 1 3, 1 −b 1, 13 20 / 26 Linear and robust equilibria Limit behaviour −a s1 s2 a− b− 1 3, 1 −b σ1 (b | s1 ) + σ2 (b | s2 ) ≥ 1, 13 20 / 26 Linear and robust equilibria Bounding probability of termination Lemma If σ ∈ ∆ , then Prσ (States∗ F | s) = 1 21 / 26 Linear and robust equilibria Bounding probability of termination Lemma If σ ∈ ∆ , then Prσ (States∗ F | s) = 1 Theorem For > 0, there exists p > 0 and k ∈ N such that for any σ ∈ ∆ , ∀s ∈ States Pσ (Statesk · F | s) ≥ p That is to say, after k iterations, there is a bounded probability that a final state is reached. 21 / 26 Linear and robust equilibria Existence theorem Definition (Best response function) Let BR : ∆ → 2∆ with n o 0 BR (σ) = σ 0 ∈ ∆ ∀i ∈ Agt ∀s ∈ States, σi0 ∈ argmaxσ0 Eσ[i/σi ] (φi | s) 22 / 26 Linear and robust equilibria Existence theorem Definition (Best response function) Let BR : ∆ → 2∆ with n o 0 BR (σ) = σ 0 ∈ ∆ ∀i ∈ Agt ∀s ∈ States, σi0 ∈ argmaxσ0 Eσ[i/σi ] (φi | s) Theorem For 0 < ≤ 1 |Act| , BR has a fixed point. 22 / 26 Linear and robust equilibria Existence theorem Definition (Best response function) Let BR : ∆ → 2∆ with n o 0 BR (σ) = σ 0 ∈ ∆ ∀i ∈ Agt ∀s ∈ States, σi0 ∈ argmaxσ0 Eσ[i/σi ] (φi | s) Theorem For 0 < ≤ 1 |Act| , BR has a fixed point. Proof sketch. ∆ is a non-empty compact convex subset of RN where N = Agt × States. Moreover BR (σ) is a non-empty convex set and the graph of BR is continuous. 22 / 26 Linear and robust equilibria -robust equilibria Definition (robust equilibria) Let σ ∈ M, σ is a -robust Nash Equilibrium if for any player i, 0 ∀σi0 ∃σi00 d(σi0 , σi00 ) ≤ Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h) with d(σ, σ 0 ) the maximal distance between distributions. 23 / 26 Linear and robust equilibria -robust equilibria Definition (robust equilibria) Let σ ∈ M, σ is a -robust Nash Equilibrium if for any player i, 0 ∀σi0 ∃σi00 d(σi0 , σi00 ) ≤ Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h) with d(σ, σ 0 ) the maximal distance between distributions. 1 σ ∈ ∆α (σ) is a -robust NE 2 This is not a NE (but the converse is false) 3 This is not a -NE 4 This is stationary (stationary NE may not exist for 3 players.) 23 / 26 Linear and robust equilibria -robust equilibria Definition (robust equilibria) Let σ ∈ M, σ is a -robust Nash Equilibrium if for any player i, 0 ∀σi0 ∃σi00 d(σi0 , σi00 ) ≤ Eσ[i/σi ] (φi | h) ≤ Eσ (φi | h) with d(σ, σ 0 ) the maximal distance between distributions. 1 σ ∈ ∆α (σ) is a -robust NE 2 This is not a NE (but the converse is false) 3 This is not a -NE 4 This is stationary (stationary NE may not exist for 3 players.) 5 No computation method yet 23 / 26 Linear and robust equilibria Overview Getting closer to exact NE existence problem (2 players) 24 / 26 Linear and robust equilibria Overview Getting closer to exact NE existence problem (2 players) Using of linear constrains to enforce a non-linear property 24 / 26 Linear and robust equilibria Overview Getting closer to exact NE existence problem (2 players) Using of linear constrains to enforce a non-linear property New notion of equilibria, not equivalent to previous ones 24 / 26 Linear and robust equilibria Overview Getting closer to exact NE existence problem (2 players) Using of linear constrains to enforce a non-linear property New notion of equilibria, not equivalent to previous ones Non-constructive proof (ongoing work) 24 / 26 Linear and robust equilibria Thank you for your attention Questions ? 25 / 26 Linear and robust equilibria Bibliography I Krishnendu Chatterjee, Marcin Jurdziński, and Rupak Majumdar. On Nash equilibria in stochastic games. In CSL’04, volume 3210 of LNCS, pages 26–40. Springer, 2004. John F. Nash. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences of the United States of America, 36(1): 48–49, 1950. Piercesare Secchi and William D. Sudderth. Stay-in-a-set games. International Journal of Game Theory, 30:479–490, 2001. 26 / 26