Download A Generalization of Two-Player Stackelberg Games to Three Players

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Game theory wikipedia , lookup

Transcript
A Generalization of Two-Player
Stackelberg Games to Three Players
Garrett Andersen
1
Introduction
Two-player Stackelberg games and their applications to security are currently a very hot topic in the field of Algorithmic Game Theory [7]. In
two-player Stackelberg games, instead of having the players move simultaneously, one player is designated as the leader and one as the follower. The
game then works as follows: the leader chooses a strategy (possibly mixed)
to commit to which the follower observes before choosing his response. A
large part of the appeal of two-player Stackelberg games is the fact that
they can be solved using linear programming in polynomial time in the size
of the game [3].
One obvious question then, is whether or not this tractability can be
extended to Stackelberg games with more players. This is addressed somewhat in [2], however the model presented relies on the followers obeying a
signal from the leader, which is a very strong assumption. In this paper, I
propose an alternative model with weaker assumptions and then analyze its
complexity.
2
Stackelberg Games With Three Players
In Stackelberg games with more than two players, one player is designated as
the leader and the rest are designated as followers. After the leader chooses
a strategy to commit to, the followers observe this strategy and then respond
simultaneously. Because the followers respond simultaneously, it would be
natural to require the followers’ responses to constitute a Nash Equilibrium.
However, finding a Nash Equilibrium is already known to be NP-hard in the
vast majority of scenarios, so in the interest of computability, this requirement is relaxed slightly; the followers are only required to play a correlated
equilibrium between themselves. This is done because finding a correlated
1
equilibrium of a simultaneous move game can be done in polynomial time
in the size of the game using linear programming [6].
In general though, there is still the question of which correlated equilibrium the followers are going to play. From the perspective of the leader,
a natural thing to do would be to optimize his commitment strategy for
the worst case scenario, where after observing his commitment strategy, the
followers will always respond by playing the correlated equilibrium that he
prefers the least. This is the model that will be used for the rest of the paper,
although only the case with one leader and two followers is considered.
Initially the hope was solveable using linear programming, however in
the next section I will provide a NP-hardness reduction from multi-player
minmax games (which implies that the problem can’t be described as a
linear program).
3
Hardness Reduction
In multi-player minmax games, two players simultaneously attempt to minimize a third player’s utility (without regard for their own utilities). If
correlation between the two punishing players is not allowed, then finding
the optimal multi-player minmax strategy is known to be NP-hard [1] [5].
So, given a three-player normal-form game with players A, B, and C,
suppose that A and B are simultaneously trying to punish C as much as
possible, without being able to correlate their strategies. Note that without
loss of generality, we can entry-wise replace the utilities of A and B with the
negative utilities of C because changing their utilities wont have any effect
∗P
on the punishment of C. Let s∗P
a and sb be optimal multi-player minmax
strategy profiles for A and B, and let s∗P
c be a best response of C to these
∗P
∗P
be the resultant utilities for each
,
u
,
and
u
strategies. Then, let u∗P
c
a
b
∗P
∗P
player in this outcome. Note that ua = u∗P
b = −uc because of the way the
utilities of players A and B were changed.
Now suppose that this same (modified) game is treated as a three-player
Stackelberg game with A as the leader and B and C as the followers. As
in the previous section, after A chooses a strategy to commit to, B and
C observe this strategy and respond by playing the correlated equilibrium
that A would prefer the least. Note that after A commits, B and C are left
playing a two-player zero-sum game with some minimax value v (with B as
the minimizer). And, for any correlated equilibrium in a two player zerosum game with minimax value v, the minimizer’s utility must be −v and the
maximizer’s utility must be v [4]. So after A commits, B is guaranteed to
2
get utility −v in any correlated equilibrium of the resultant game, which also
means that player A will get utility −v because they have the same utilities.
Therefore, once player A chooses his commitment strategy, he will always
be indifferent over all correlated equilibrium responses of B and C.
The following algorithm based on this Stackelberg game can be used
to calculate optimal multi-player minmax strategies for A and B. First,
for every outcome, replace A’s and B’s utilities by the negative utility of
C. Then, in the corresponding three-player Stackelberg game with A as
the leader, compute A’s optimal strategy to commit to. Call this strategy
∗S
∗S
∗S
be the players’ utilities when A commits
s∗S
a and let ua , ub , and uc
to this strategy and B and C respond by playing a correlated equilibrium
(remember each player’s utility is constant over all correlated equilibria that
B and C could play). Then, fix s∗S
a and run the minimax algorithm on
the resulting two-player zero-sum game to find a minimax strategy s∗S
b for
for
player
C.
If
B
and
C
play
these
player B and a maximin strategy s∗S
c
strategies, B will get utility −v and C will get utility v, where v is the
minimax value of the game, so their utilities must be the same as in any
∗S
correlated equilibrium of the game. Therefore, if the players play s∗S
a , sb ,
and s∗S
c as their strategies, the resultant utilities for each player must be
∗S
∗S
,
u
u∗S
a
b , and uc (i.e. the same as if B and C responded with a correlated
equilibrium). The rest of this section will be dedicated to showing that s∗S
a
are
optimal
multi-player
minmax
strategies
for
A
and
B.
and s∗S
b
The basic proof idea is pretty simple. It just has to be shown that C’s
utility when playing a best response against an optimal multi-player minmax
strategy is both less than or equal and greater than or equal to his utility
when A commits to the optimal commitment strategy and B and C respond
by playing a correlated equilibrium. The following lemmas will be useful:
Lemma 1: In the simultaneous move game, s∗S
c is a best response for player
∗S
and
s
.
C when players A and B play s∗S
a
b
Proof: Suppose that there exists some strategy sc for C that is better than
∗S
∗S
s∗S
c when A and B play sa and sb . Then in the corresponding Stackelberg
∗S
game, if A commits to sa and B plays s∗S
b in the resultant two-player zerosum game, then C could increase his utility by switching from s∗S
c to sc .
∗S
This would either contradict the fact that sc is a maximin strategy for C
or the fact that s∗S
b is a minimax strategy for B (for the two-player zero-sum
game that occurs after A commits). ◻
3
∗S
Therefore, in the simultaneous move game, if A and B play s∗S
a and sb
and C plays a best response, the resulting utilities for the players will be
∗S
∗S
u∗S
a , ub , and uc .
Lemma 2: Suppose A’s strategy is fixed to be s∗P
in the simultaneous
a
move game. Then, the result of this fixing is that players B and C are left
playing a two-player zero-sum game with some minimax value v (with B as
∗P
the minimizer). If B and C play s∗P
b and sc in this new game, it must be
that the resulting utility for C is v and the resulting utility for B is −v.
Proof: Suppose that C’s utility is greater than v in this outcome. Then
player B can switch from playing s∗P
to playing a minimax strategy to
b
guarantee that C cannot get more than v. This contradicts the fact that
∗P
s∗P
a and sb are optimal multi-player minmax strategies. Now suppose that
C’s utility is less than v in this outcome. Then C can switch from playing
s∗P
c to playing a maximin strategy to guarantee himself a utility of at least
is a best response to s∗P
v. This would contradict the fact that s∗P
c
a and
s∗P
.
◻
b
∗P
Lemma 3: u∗S
a ≥ ua
Proof: Suppose A commits to s∗P
a in the Stackelberg game. If players B and
∗P
and
s
C respond by playing s∗P
c , then by Lemma 2, the resulting utility
b
for A must be the same as if B and C had responded with a correlated
and B and C respond with
equilibrium. Therefore if A commits to s∗P
a
a correlated equilibrium, player A’s utility will be u∗P
a . That means that
∗P
by
because
A
can
guarantee
himself
a
utility
of at least u∗P
≥
u
u∗S
a
a
a
∗P
committing to sa . ◻
∗P
This also implies that u∗S
c ≤ uc because C’s utility will be equal to the
negative utility of player A in every outcome.
∗S
Theorem: s∗S
a and sb are optimal multi-player minmax strategies.
∗S
Proof: By Lemma 1, if A and B play s∗S
a and sb in the simultaneous move
game and C plays a best response to these, player C’s utility will be u∗S
c .
∗S
∗P
Also by Lemma 3, uc ≤ uc . ◻
I have shown how to compute optimal multi-player minmax strategies
for each punishing player given an algorithm to find the optimal strategy for
the leader to commit to in the three-player Stackelberg model I introduced.
4
Therefore, because computing optimal multi-player minmax strategies is an
NP-hard problem, finding the optimal strategy to commit to must also be
NP-hard.
4
Other Possible Implementations
Although the fact that this problem is NP-hard implies that it cannot be
written as a linear program, it still may be possible to describe it as a MixedInteger Program and solve it using the variety of techniques available for
those problems. Unfortunately, I was unable to think of a formulation even
in this relaxed setting and the problem seems inherently highly non-linear.
One thing I did try was to solve this problem using solvers which accept
non-linear objectives and/or non-linear constraints. I tried two differnet
implementations of the problem, one with a non-linear objective and linear
constraints, and one with a linear objective and non-linear constraints.
The implementation with the non-linear objective is:
maximize uL (p1 , p2 ..., pn )
such that ∶
Σr pr = 1
∀(r) pr ≥ 0
Where uL (p1 , p2 ..., pn ) is equal to the leader’s utility when he plays strategy profile p1 , p2 ..., pn and the followers respond by playing the correlated
equilibrium between themselves that the leader prefers the least. This can
be considered the naive implementation.
Another possible implementation, with a linear objective, but non-linear
constraints is:
maximize v
such that ∶
∀(c, h) v + Σc′ yc,c′ (Σr pr (uf 1 (r, c, h) − uf 1 (r, c′ , h)))
+ Σh′ zh,h′ (Σr pr (uf 2 (r, c, h) − uf 2 (r, c, h′ ))) ≤ Σr pr uL (r, c, h)
Σr pr = 1
∀(r) pr ≥ 0
∀(c, c′ ) yc,c′ ≥ 0
∀(h, h′ ) zh,h′ ≥ 0
5
Where the r’s, c’s, and h’s represent the pure strategies of the leader,
follower 1, and follower 2, respectively. This formulation is based on the dual
of the linear program which solves for the three-player correlated equilibrium
that minimizes the leader’s utility, without individual rationality constraints
for the leader, seen below.
maximize v
such that ∶
∀(r, c, h) v + Σc′ yc,c′ (uf 1 (r, c, h) − uf 1 (r, c′ , h))
+ Σh′ zh,h′ (uf 2 (r, c, h) − uf 2 (r, c, h′ )) ≤ uL (r, c, h)
∀(c, c′ ) yc,c′ ≥ 0
∀(h, h′ ) zh,h′ ≥ 0
This program would find a maximum lower bound on the leader’s utility
in any three-player correlated equilibrium without rationality constraints
for the leader. If this program is modified by replacing each u(r, c, h) by
Σr pr u(r, c, h), it would give a maximum lower bound in the case where the
leader commits to playing the distribution given by the pr ’s and then the
leader’s least preferred correlated equilibrium is played (exactly the outcome
that is being solved for). See [2] for more information.
I tested these two implementations on three NLP solvers provided by
the Matlab(R) Optimization Toolbox, f mincon, GeneticAlgorithm, and
P atternSearch. There are significantly more advanced commercial solvers
available for solving NLPs, however I was unable to obtain a license for them.
Nevertheless, the results for these three solvers were not very promising.
Repeated execution of a solver on the same parameters gave significantly
different answers each time, indicating that the solvers were getting stuck on
local optima. This was true for both implementations and all three solvers.
One interesting thing though, was that the first implementation didn’t vary
nearly as much and gave consistently better solutions than the second. I
attribute this to the fact that having more variables is highly undesirable
for non-linear programs because of how the solvers have to search through
the variable space. It is possible that the commercial solvers are much more
viable, but I am not sure I will get the chance to try them out.
References
[1] Christian Borgs, Jennifer Chayes, Nicole Immorlica, Adam Tauman
Kalai, Vahab Mirrokni, and Christos Papadimitriou. The myth of the
6
folk theorem. Games and Economic Behavior, 70(1):34–43, 2010.
[2] Vincent Conitzer and Dmytro Korzhyk.
strategies. In AAAI, 2011.
Commitment to correlated
[3] Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In Proceedings of the 7th ACM conference on Electronic commerce, pages 82–90. ACM, 2006.
[4] Francoise Forges. Correlated equilibrium in two-person zero-sum games.
Econometrica, 58(2):515, March 1990.
[5] Kristoffer Arnsfelt Hansen, Thomas Dueholm Hansen, Peter Bro Miltersen, and Troels Bjerre Sørensen. Approximability and parameterized
complexity of minmax values. In Internet and Network Economics, pages
684–695. Springer, 2008.
[6] Christos H Papadimitriou and Tim Roughgarden. Computing correlated
equilibria in multi-player games. Journal of the ACM (JACM), 55(3):14,
2008.
[7] James Pita, Manish Jain, Janusz Marecki, Fernando Ordóñez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and
Sarit Kraus. Deployed armor protection: the application of a game theoretic model for security at the los angeles international airport. In Proceedings of the 7th international joint conference on Autonomous agents
and multiagent systems: industrial track, pages 125–132. International
Foundation for Autonomous Agents and Multiagent Systems, 2008.
7