Download A Generalization of Two-Player Stackelberg Games to Three Players

A Generalization of Two-Player Stackelberg Games to Three Players Garrett Andersen 1 Introduction Two-player Stackelberg games and their applications to security are currently a very hot topic in the field of Algorithmic Game Theory [7]. In two-player Stackelberg games, instead of having the players move simultaneously, one player is designated as the leader and one as the follower. The game then works as follows: the leader chooses a strategy (possibly mixed) to commit to which the follower observes before choosing his response. A large part of the appeal of two-player Stackelberg games is the fact that they can be solved using linear programming in polynomial time in the size of the game [3]. One obvious question then, is whether or not this tractability can be extended to Stackelberg games with more players. This is addressed somewhat in [2], however the model presented relies on the followers obeying a signal from the leader, which is a very strong assumption. In this paper, I propose an alternative model with weaker assumptions and then analyze its complexity. 2 Stackelberg Games With Three Players In Stackelberg games with more than two players, one player is designated as the leader and the rest are designated as followers. After the leader chooses a strategy to commit to, the followers observe this strategy and then respond simultaneously. Because the followers respond simultaneously, it would be natural to require the followers’ responses to constitute a Nash Equilibrium. However, finding a Nash Equilibrium is already known to be NP-hard in the vast majority of scenarios, so in the interest of computability, this requirement is relaxed slightly; the followers are only required to play a correlated equilibrium between themselves. This is done because finding a correlated 1 equilibrium of a simultaneous move game can be done in polynomial time in the size of the game using linear programming [6]. In general though, there is still the question of which correlated equilibrium the followers are going to play. From the perspective of the leader, a natural thing to do would be to optimize his commitment strategy for the worst case scenario, where after observing his commitment strategy, the followers will always respond by playing the correlated equilibrium that he prefers the least. This is the model that will be used for the rest of the paper, although only the case with one leader and two followers is considered. Initially the hope was solveable using linear programming, however in the next section I will provide a NP-hardness reduction from multi-player minmax games (which implies that the problem can’t be described as a linear program). 3 Hardness Reduction In multi-player minmax games, two players simultaneously attempt to minimize a third player’s utility (without regard for their own utilities). If correlation between the two punishing players is not allowed, then finding the optimal multi-player minmax strategy is known to be NP-hard [1] [5]. So, given a three-player normal-form game with players A, B, and C, suppose that A and B are simultaneously trying to punish C as much as possible, without being able to correlate their strategies. Note that without loss of generality, we can entry-wise replace the utilities of A and B with the negative utilities of C because changing their utilities wont have any effect ∗P on the punishment of C. Let s∗P a and sb be optimal multi-player minmax strategy profiles for A and B, and let s∗P c be a best response of C to these ∗P ∗P be the resultant utilities for each , u , and u strategies. Then, let u∗P c a b ∗P ∗P player in this outcome. Note that ua = u∗P b = −uc because of the way the utilities of players A and B were changed. Now suppose that this same (modified) game is treated as a three-player Stackelberg game with A as the leader and B and C as the followers. As in the previous section, after A chooses a strategy to commit to, B and C observe this strategy and respond by playing the correlated equilibrium that A would prefer the least. Note that after A commits, B and C are left playing a two-player zero-sum game with some minimax value v (with B as the minimizer). And, for any correlated equilibrium in a two player zerosum game with minimax value v, the minimizer’s utility must be −v and the maximizer’s utility must be v [4]. So after A commits, B is guaranteed to 2 get utility −v in any correlated equilibrium of the resultant game, which also means that player A will get utility −v because they have the same utilities. Therefore, once player A chooses his commitment strategy, he will always be indifferent over all correlated equilibrium responses of B and C. The following algorithm based on this Stackelberg game can be used to calculate optimal multi-player minmax strategies for A and B. First, for every outcome, replace A’s and B’s utilities by the negative utility of C. Then, in the corresponding three-player Stackelberg game with A as the leader, compute A’s optimal strategy to commit to. Call this strategy ∗S ∗S ∗S be the players’ utilities when A commits s∗S a and let ua , ub , and uc to this strategy and B and C respond by playing a correlated equilibrium (remember each player’s utility is constant over all correlated equilibria that B and C could play). Then, fix s∗S a and run the minimax algorithm on the resulting two-player zero-sum game to find a minimax strategy s∗S b for for player C. If B and C play these player B and a maximin strategy s∗S c strategies, B will get utility −v and C will get utility v, where v is the minimax value of the game, so their utilities must be the same as in any ∗S correlated equilibrium of the game. Therefore, if the players play s∗S a , sb , and s∗S c as their strategies, the resultant utilities for each player must be ∗S ∗S , u u∗S a b , and uc (i.e. the same as if B and C responded with a correlated equilibrium). The rest of this section will be dedicated to showing that s∗S a are optimal multi-player minmax strategies for A and B. and s∗S b The basic proof idea is pretty simple. It just has to be shown that C’s utility when playing a best response against an optimal multi-player minmax strategy is both less than or equal and greater than or equal to his utility when A commits to the optimal commitment strategy and B and C respond by playing a correlated equilibrium. The following lemmas will be useful: Lemma 1: In the simultaneous move game, s∗S c is a best response for player ∗S and s . C when players A and B play s∗S a b Proof: Suppose that there exists some strategy sc for C that is better than ∗S ∗S s∗S c when A and B play sa and sb . Then in the corresponding Stackelberg ∗S game, if A commits to sa and B plays s∗S b in the resultant two-player zerosum game, then C could increase his utility by switching from s∗S c to sc . ∗S This would either contradict the fact that sc is a maximin strategy for C or the fact that s∗S b is a minimax strategy for B (for the two-player zero-sum game that occurs after A commits). ◻ 3 ∗S Therefore, in the simultaneous move game, if A and B play s∗S a and sb and C plays a best response, the resulting utilities for the players will be ∗S ∗S u∗S a , ub , and uc . Lemma 2: Suppose A’s strategy is fixed to be s∗P in the simultaneous a move game. Then, the result of this fixing is that players B and C are left playing a two-player zero-sum game with some minimax value v (with B as ∗P the minimizer). If B and C play s∗P b and sc in this new game, it must be that the resulting utility for C is v and the resulting utility for B is −v. Proof: Suppose that C’s utility is greater than v in this outcome. Then player B can switch from playing s∗P to playing a minimax strategy to b guarantee that C cannot get more than v. This contradicts the fact that ∗P s∗P a and sb are optimal multi-player minmax strategies. Now suppose that C’s utility is less than v in this outcome. Then C can switch from playing s∗P c to playing a maximin strategy to guarantee himself a utility of at least is a best response to s∗P v. This would contradict the fact that s∗P c a and s∗P . ◻ b ∗P Lemma 3: u∗S a ≥ ua Proof: Suppose A commits to s∗P a in the Stackelberg game. If players B and ∗P and s C respond by playing s∗P c , then by Lemma 2, the resulting utility b for A must be the same as if B and C had responded with a correlated and B and C respond with equilibrium. Therefore if A commits to s∗P a a correlated equilibrium, player A’s utility will be u∗P a . That means that ∗P by because A can guarantee himself a utility of at least u∗P ≥ u u∗S a a a ∗P committing to sa . ◻ ∗P This also implies that u∗S c ≤ uc because C’s utility will be equal to the negative utility of player A in every outcome. ∗S Theorem: s∗S a and sb are optimal multi-player minmax strategies. ∗S Proof: By Lemma 1, if A and B play s∗S a and sb in the simultaneous move game and C plays a best response to these, player C’s utility will be u∗S c . ∗S ∗P Also by Lemma 3, uc ≤ uc . ◻ I have shown how to compute optimal multi-player minmax strategies for each punishing player given an algorithm to find the optimal strategy for the leader to commit to in the three-player Stackelberg model I introduced. 4 Therefore, because computing optimal multi-player minmax strategies is an NP-hard problem, finding the optimal strategy to commit to must also be NP-hard. 4 Other Possible Implementations Although the fact that this problem is NP-hard implies that it cannot be written as a linear program, it still may be possible to describe it as a MixedInteger Program and solve it using the variety of techniques available for those problems. Unfortunately, I was unable to think of a formulation even in this relaxed setting and the problem seems inherently highly non-linear. One thing I did try was to solve this problem using solvers which accept non-linear objectives and/or non-linear constraints. I tried two differnet implementations of the problem, one with a non-linear objective and linear constraints, and one with a linear objective and non-linear constraints. The implementation with the non-linear objective is: maximize uL (p1 , p2 ..., pn ) such that ∶ Σr pr = 1 ∀(r) pr ≥ 0 Where uL (p1 , p2 ..., pn ) is equal to the leader’s utility when he plays strategy profile p1 , p2 ..., pn and the followers respond by playing the correlated equilibrium between themselves that the leader prefers the least. This can be considered the naive implementation. Another possible implementation, with a linear objective, but non-linear constraints is: maximize v such that ∶ ∀(c, h) v + Σc′ yc,c′ (Σr pr (uf 1 (r, c, h) − uf 1 (r, c′ , h))) + Σh′ zh,h′ (Σr pr (uf 2 (r, c, h) − uf 2 (r, c, h′ ))) ≤ Σr pr uL (r, c, h) Σr pr = 1 ∀(r) pr ≥ 0 ∀(c, c′ ) yc,c′ ≥ 0 ∀(h, h′ ) zh,h′ ≥ 0 5 Where the r’s, c’s, and h’s represent the pure strategies of the leader, follower 1, and follower 2, respectively. This formulation is based on the dual of the linear program which solves for the three-player correlated equilibrium that minimizes the leader’s utility, without individual rationality constraints for the leader, seen below. maximize v such that ∶ ∀(r, c, h) v + Σc′ yc,c′ (uf 1 (r, c, h) − uf 1 (r, c′ , h)) + Σh′ zh,h′ (uf 2 (r, c, h) − uf 2 (r, c, h′ )) ≤ uL (r, c, h) ∀(c, c′ ) yc,c′ ≥ 0 ∀(h, h′ ) zh,h′ ≥ 0 This program would find a maximum lower bound on the leader’s utility in any three-player correlated equilibrium without rationality constraints for the leader. If this program is modified by replacing each u(r, c, h) by Σr pr u(r, c, h), it would give a maximum lower bound in the case where the leader commits to playing the distribution given by the pr ’s and then the leader’s least preferred correlated equilibrium is played (exactly the outcome that is being solved for). See [2] for more information. I tested these two implementations on three NLP solvers provided by the Matlab(R) Optimization Toolbox, f mincon, GeneticAlgorithm, and P atternSearch. There are significantly more advanced commercial solvers available for solving NLPs, however I was unable to obtain a license for them. Nevertheless, the results for these three solvers were not very promising. Repeated execution of a solver on the same parameters gave significantly different answers each time, indicating that the solvers were getting stuck on local optima. This was true for both implementations and all three solvers. One interesting thing though, was that the first implementation didn’t vary nearly as much and gave consistently better solutions than the second. I attribute this to the fact that having more variables is highly undesirable for non-linear programs because of how the solvers have to search through the variable space. It is possible that the commercial solvers are much more viable, but I am not sure I will get the chance to try them out. References [1] Christian Borgs, Jennifer Chayes, Nicole Immorlica, Adam Tauman Kalai, Vahab Mirrokni, and Christos Papadimitriou. The myth of the 6 folk theorem. Games and Economic Behavior, 70(1):34–43, 2010. [2] Vincent Conitzer and Dmytro Korzhyk. strategies. In AAAI, 2011. Commitment to correlated [3] Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In Proceedings of the 7th ACM conference on Electronic commerce, pages 82–90. ACM, 2006. [4] Francoise Forges. Correlated equilibrium in two-person zero-sum games. Econometrica, 58(2):515, March 1990. [5] Kristoffer Arnsfelt Hansen, Thomas Dueholm Hansen, Peter Bro Miltersen, and Troels Bjerre Sørensen. Approximability and parameterized complexity of minmax values. In Internet and Network Economics, pages 684–695. Springer, 2008. [6] Christos H Papadimitriou and Tim Roughgarden. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM), 55(3):14, 2008. [7] James Pita, Manish Jain, Janusz Marecki, Fernando Ordóñez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. Deployed armor protection: the application of a game theoretic model for security at the los angeles international airport. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems: industrial track, pages 125–132. International Foundation for Autonomous Agents and Multiagent Systems, 2008. 7

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download A Generalization of Two-Player Stackelberg Games to Three Players