Download Target (R)

Document related concepts

Prisoner's dilemma wikipedia , lookup

Nash equilibrium wikipedia , lookup

Game mechanics wikipedia , lookup

Evolutionary game theory wikipedia , lookup

Turns, rounds and time-keeping systems in games wikipedia , lookup

Replay value wikipedia , lookup

Minimax wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Chicken (game) wikipedia , lookup

Transcript
Stochastic Games
Krishnendu Chatterjee
CS 294
Game Theory
Games on Components.

Model interaction between components.

Games as models of interaction.

Repeated Games: Reactive Systems.
Games on Graphs.

Today’s Topic:



Games played on game graphs.
Possibly for infinite number of rounds.
Winning Objectives:


Reachability.
Safety ( the complement of Reachability).
Games.


1 Player Game : Graph G=(V,E) . R µ V which is the
target set.
2 Player Game: G=(V,E, (V,V})). Rµ V. (alternating
reachability).
Games.




1 Player Game : Graph G=(V,E) . R µ V which is the
target set.
1-1/2 Player Game (MDP’s) : G=(V,E,(V,V°)). Rµ V.
2 Player Game: G=(V,E, (V,V})). Rµ V. (alternating
reachability) .
2-1/2 Player Game: G=(V,E,(V,V},V°)). Rµ V.
1-1/2 player game
Markov Decision Processes.
•
A Markov Decision Process (MDP) is defined as follows:
•
G=(V,E, (V, V°) R)
•
(V,E) is a graph.
•
(V, V°) is a partition.
•
Rµ V – set of Target nodes.
•
V° are random nodes chooses between successors
uniformly at random.
•
For simplicity we assume our graphs are binary.
A Markov Decision Process.
Target
Strategy.

1: V* ¢ V ! D(V) such that for all
x 2 V* and v 2 V, if 1(x ¢ v) >0,
(v,1(x ¢ v) ) 2 E.
( D(V) is a probability distribution over
successor).
Subclass of Strategies.



Pure Strategy : Chooses one successor and not a
distribution.
Memoryless Strategy: Strategy independent of the
history. Hence can be represented as 1: V ! D(V)
Pure Memoryless Strategy is a strategy which is pure
and memoryless. Hence can be represented as
1: V ! V
Values.

Reach(R)={ s0 s1 … | 9 k. sk 2 R }

v1(s) =sup1 2 1 Pr1( Reach (R) )

Optimal Strategy: 1 is optimal if
v1(s) = Pr1 (Reach (R))
Values and Strategies.


Pure Memoryless Optimal Strategy exist. [CY 98, FV
97]
Computed by the following linear program.
minimize s x(s) subject to
x(s) ¸ x(s’)
(s,s’) 2 E and s 2 V
x(s) =1/2(x(s’)+x(s’’)) (s,s’), (s’s’’) 2 E and s 2 V°
x(s) ¸ 0
x(s)=1
s2R
A Markov Decision Process.
Target
A Markov Decision Process.
S0
Target
S1
Pure Memoryless Optimal Strategy.



At s0 the player chose s1 and at s1 the
play reaches R with probability ½.
Hence the probability of not reaching R
in n steps is (1/2)n.
As n ! 1 this is 0 and hence the player
can reach with probability 1.
The Safety Analysis.
Target
The Safety Analysis.
Target
The Safety Analysis.
Target
The Safety Analysis.



Consider the random player as an
adversary.
Then there is a choice of successors
such that the play will reach the target.
The probability of the choice of
successors is at least (1/2)n .
The Key Fact.

The Fact about the Safety Game:



If the MDP is a safety game for the player
and it loses with probability 1.
The number of nodes is n.
Then the probability to reach the target
within n steps is at least (1/2)n.
MDP’s.



Pure Memoryless Optimal Strategy
exists.
Values can be computed in polynomial
time.
The Safety game fact.
2-1/2 player games
Simple Stochastic Games.



G=(V,E,(V,V},V°)), Rµ V. [Con’92]
Strategy:
i:V* ¢ Vi ! D(V)
(as before)
Values:
v1(s)= sup1 inf2 Pr1,2(Reach(R))
v2(s)=sup2 inf1Pr1,2(: Reach(R))
Determinacy.



v1(s) + v2(s) =1 [Martin’98]
Strategy 1 for player 1 is optimal if
v1(s) = inf2 Pr1,2(Reach(R))
Our Goal: Pure Memoryless Optimal
Strategy.
Pure Memoryless Optimal Strategy.



Induction on the number of vertices.
Use Pure Memoryless strategy for
MDP’s.
Also use facts about MDP safety game.
Value Class.


Value Class is the set of vertices with
the same value v1. Formally,
C(p)={ s | v1(s) =p }
We now see some structural property of
a value class.
Value Class.
X
Higher
Value Class
Value Class
 maximize value
and } minimize
X
Lower
Value Class
Pure Memoryless Optimal Strategy.

Case 1. There is only 1 value class.


Case a: R = ; any strategy for player }(2)
suffice.
Case b: R  ; then since in R player (1)
wins with probability 1 then the values
class must be the value class 1.
One Value Class.
Target (R)
Favorable Subgame for Player 1: One Value Class.
Target (R)
K vertices.
Subgame Pure Memoryless Optimal Strategy.



By Induction Hypothesis: pure memoryless
optimal strategy in the subgame.
Fix the memoryless strategy of the sub-game.
Now analyse the MDP safety game.
For any strategy of player } (2) the probability
to reach the boundaries in k steps is at least
(1/2)k.
Pure Memoryless Optimal Strategy.


The optimal strategy of the subgame
ensures that the probability to reach the
target in original game in k+1 steps is
at least (1/2)k+1.
The probability of not reaching the
target within (k+1)*n steps is
(1-(1/2)k+1)n which is 0 as n ! 1.
More than One Value Class.
X
Higher
Value Class
Value Class
X
Lower
Value Class
More than One Value Class.
Higher
Value Class
Lower
Value Class
More than One Value Class.
Higher
Value Class
Lower
Value Class
Pure Memoryless Optimal Strategy.


Either can collapse a vertex in which
case we can apply induction hypothesis.
Else in no value class there is a vertex
for player 1 (V is empty)

Then it is a MDP and pure memoryless
optimal strategy of MDP suffice.
Computing the values.

Given a vertex s and value v’ if v1(s) ¸ v’
can be achieved in NP Å coNP.

Follows from pure memoryless optimal
strategy and that values of MDP’s can be
computed in polytime.
Algorithms for determining values.

Algorithms [Con’93]




Randomized Hoffman-Karp.
Non-linear programming.
All these algorithms practically efficient.
Open problem: Is there a polytime
algorithms to compute the values?
Limit Average Games.





r: V ! N (zero sum)
The payoff is limit average or mean-payoff
limn! 1 1/n i=1 to n r(si)
Two player mean payoff games can be
reduced to Simple Stochastic Reachability
Game. [ZP’96]
Two player Mean payoff games can be solved
in NP Å coNP.
Polytime algorithm is still open?
Re-Search Story.


2-1/2 Player Limit Average Pure Memoryless Strategy:

Gilette’ 57 : Wrong version of the proof.

Liggett & Lippman’ 69: New Correct Proof.
2 Player Limit Average Pure Memoryless Strategy



Ehrenfeucht & Mycielski ’78: “ didn’t understand”
Gurvich, Karzanov & Khachiyan ’88: “typo”
Zwick & Patterson ’96 : Quasi polynomial time algorithm
Slide Due to Marcin Jurdzinski
N-player games.

Pure Memoryless Optimal Strategy for 2
player zero-sum games can be used to prove
existence of Nash Equilibrium in n-player
games.



Key Idea: Threat Strategy as in Folk Theorem.
[TR’97]
No body has an incentive to change as other will
punish.
We require pure strategy to detect deviation.
Concurrent Games
Concurrent Games.


Previously games were turn-based
either player  or player } chose moves
or player ° chose successor randomly.
Now we allow the players to play
concurrently.
G=(S,Moves,1,2,)


i: S ! 2Moves n ;
: S £ Moves £ Moves ! S
A Concurrent Game.
Player 1 plays a,b and player 2 plays c,d
ad,bc
ac,bd
Concurrent games.


Concurrent Game with Reachability
Objective [dAHK’98]
Concurrent Game with arbitrary regular winning objective
[dAH’00,dAM’01]
A Concurrent Game.
Player 1 plays a,b and player 2 plays c,d
ad,bc
Deterministic (Pure) Strategy not Good:
a!d
b!c
ac,bd
A Concurrent Game.
Player 1 plays a,b and player 2 plays c,d
Randomized Strategy :a =1/2, b=1/2
ad,bc
1/2
1/2
ac,bd
c
Using arguments
as before pl.1
wins with prob. 1
1/2
d
1/2
Concurrent Games and Nash equilibrium.
ad
ac,bd
bc
Fact: For any strategy for player 1 he
cannot win with prob. 1.
As long player 1 plays move “a”
deterministically player 2 plays move “d”,
when player 1 plays “b” with positive
probability then player 2 plays “c” with
positive probability.
Thus (1,0) not a Nash Equilibrium.
Concurrent Game and Nash equilibrium.
ad
a !1-
b!
1-
c
d
ac,bd
bc
1-


For every positive  player 1 can win with probability
1 -.
Why is “c” better?



If player 2 plays “d” then reaches target
with probability .
Probability of not reaching target in n
steps is (1-)n and this is 0 as n ! 1.
For move “c” player 1 reaches target
with probability (1-)
No Nash Equilibrium.


We saw earlier that (1,0) is not a Nash
equlibrium.
For any positive  we have (1-,) is not
a Nash equilibrium as player 1 can
choose a positive ’ <  and achieve
(1-’,’)
Concurrent Game: Borel Winning Condition.




Nash equilibrium need not necessarily exist but -Nash
equilibrium exist for 2-player concurrent zero-sum games for
entire Borel hierarchy. [Martin’98]
The Big Open Problem: Existence of -Nash equilibrium for nplayer / 2 player non zero-sum games.
Safety games: n-person concurrent game Nash equilibrium
exist.[Sudderth,Seechi’01]
Existence of Nash equilibrium and complexity issues for nperson Reachability game. (Research Project for this course)
Concurrent Games: Limit Average Winning Condition.


The monumental result of [Vieille’02]
shows -Nash equilibrium exist for 2player concurrent non-zero sum limit
average game.
The big open problem: Existence of Nash equilibrium for n-player limit
average game.
Relevant Papers.
1. Complexity of Probabilistic Verification : JACM’98 –
Costas Courcoubetis and Mihalis Yannakakis
2. The Complexity of Simple Stochastic Games: Information and
Computatyon’92 - Anne Condon
3. On algorithms for Stochastic Games – DIMACS’ 93
Anne Condon
4. Book: Competitive Markov Decision Processes. 1997
J.Filar and K.Vrieze
5. Concurrent Reachability Games : FOCS’98
Luca deAlfaro, Thomas A. Henzinger and Orna Kupferman
Relevant Papers
6. Concurrent - regular Games: LICS’00
Luca deAlfaro and Thomas A Henzinger
7. Quantitative Solution of  -regular Games : STOC’01
Luca deAlfaro and Rupak Majumdar
8. Determinacy of Blackwell Games: Journal of Symbolic Logic’98
Donald Martin
9. Stay-in-a-set-games : Int. Journal in Game Theory’01
S. Seechi and W. Sudderth ’01
10. Stochastic Games: A Reduction (I,II): Israel Journal in
Mathematics’02, N. Vieille
11. The complexity of mean payoff games on graphs: TCS’96 U.
Zwick and M.S. Patterson ’96
Thank You !!!

http:www.cs.berkeley.edu/~c_krish/