Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Prisoner's dilemma wikipedia , lookup
Game mechanics wikipedia , lookup
Turns, rounds and time-keeping systems in games wikipedia , lookup
Artificial intelligence in video games wikipedia , lookup
Evolutionary game theory wikipedia , lookup
Nash equilibrium wikipedia , lookup
Computing the Nondominated Nash Points of a Normal Form Game with Two Players Hadi Charkhgard∗a , Martin Savelsberghb , and Masoud Talebiana a School of Mathematical and Physical Sciences, The University of Newcastle, Australia School of Industrial and Systems Engineering, Georgia Institute of Technology, USA b January 20, 2016 Abstract We investigate computing Nash equilibria of normal form games with two players using mixed integer programming. We slightly modify the mixed integer programming formulation of Sandholm et al. [24] and show that this modified formulation has superior performance when computing a Nash equilibrium. We then define the concept of efficient (Pareto optimal) Nash equilibria. This concept is an equilibrium refinement, but different from the traditional concept of Pareto optimality. A Nash equilibrium is “efficient” if it is Pareto optimal with respect to all Nash equilibria (but not necessarily with respect to all possible mixed strategy profiles) of a game. This definition ensures that any game with at least one Nash equilibrium (in mixed strategies), such as normal form games with two players, must have at least one efficient Nash equilibrium. We prove that the set of all points in the payoff space of a normal form game with two players corresponding to the utilities of players in an efficient Nash equilibrium, the so-called nondominated Nash points, is finite. We demonstrate that biobjective mixed integer programming can be used to efficiently compute the set of nondominated Nash points. Finally, we illustrate how the nondominated Nash points can be used to determine the disagreement point of a bargaining problem. Keywords: biobjective mixed integer programming, bimatrix game, equilibrium refinement, efficient Nash equilibria, disagreement point 1 Introduction Over sixty years ago, John Nash introduced the most influential concept of game theory to this date, now known as the Nash equilibrium. In his groundbreaking paper in 1951 [19], Nash showed that every game with a finite number of players and action profiles has at least one Nash equilibrium (in mixed strategies). Unfortunately, the proof of this fundamental result was non-constructive. In fact, computing a Nash equilibrium, even for games with two players, appears to be hard; it is PPADcomplete [7]. As a consequence, how to efficiently compute a single Nash equilibrium and how to efficiently compute all Nash equilibria have become central questions in algorithmic game theory. ∗ Corresponding author. Tel.: +61 424 607 237 E-mail address: [email protected] 1 We focus on normal form games with two players, sometimes referred to as bimatrix games. The most popular algorithm for efficiently computing a single Nash equilibrium is the Lemke-Howson algorithm [14]. The Lemke-Howson algorithm is a path-following method designed specifically to compute a Nash equilibrium of a nondegenerate normal form game. A slight variation of the algorithm can be used to solve degenerate games. It is worth mentioning that the Lemke-Howson algorithm does not handle negative payoffs. The Porter, Nudelman, and Shoham (PNS) algorithm [23] is an alternative approach that repeatedly guesses the support of a mixed strategy for each player and then checks whether these strategies result in a Nash equilibrium [23]. Another alternative approach, especially relevant to our research, was proposed by Audet et al. [2] and later explored further by Sandholm et al. [24]. Audet et al. [2] observe that mixed integer programming techniques can be used effectively to find a Nash equilibrium. Rather than developing a customized algorithm, they formulate the problem of finding a Nash equilibrium as a mixed integer program (MIP) and use commercial MIP solvers (in particular CPLEX) to find a Nash equilibrium. Sandholm et al. [24] computationally show that this approach is competitive (in terms of efficiency) with the Lemke-Howson algorithm. It is a well-known fact that a normal form game can have a single Nash equilibrium, multiple but a finite number of Nash equilibria, or even an infinite number of Nash equilibria. When a normal form game has an infinite number of Nash equilibria, the set of all Nash equilibria can still be completely described by finitely many Nash equilibria, namely by all extreme Nash equilibria. Not surprisingly, many researchers have argued that computing a single Nash equilibrium in a game that has multiple Nash equilibria does not provide sufficient information for analyzing the game (see, for example, [3, 4, 9, 12, 15, 16, 17]). In such situations, the preferred approach is to find a complete description of all the Nash equilibria and then use one or more criteria to identify or choose the most appropriate equilibrium. Such an approach requires an algorithm to find a complete description of all the Nash equilibria. Most such algorithms are based on cleverly and efficiently enumerating the vertices of the best-response polytope. Unfortunately, computing all the Nash equilibria (i.e., finding a complete description of the set of all Nash equilibria) is only practical for games with a small set of actions for each player. For instance, Avis et al. [4] observe that for games in which each player has more than 25 actions, their algorithm, lrsNash, does not terminate within 24 hours for nondegenerate normal form games. Because a single Nash equilibrium provides insufficient information to analyze a game and it is prohibitive to compute all Nash equilibria for all but games with small action spaces, it is natural to consider finding a restricted set of Nash equilibria, i.e., all Nash equilibria with certain (desirable) characteristics. This concept is known as equilibrium refinement. Strong Nash equilibria, sub-game perfect Nash equilibria, proper Nash equilibria, coalition-proof Nash equilibria are just some of the well-known restricted sets that have been considered in the literature. In this study, we define a different restricted set of Nash equilibria with two desirable characteristics: (1) the set is nonempty if the game has at least one Nash equilibrium; (2) the set is computable with multi-objective optimization techniques (at least for bimatrix games). More precisely, we focus on finding (a minimal set of) efficient Nash equilibria. A Nash equilibrium is said to be “efficient” if it is Pareto optimal with respect to all Nash equilibria (but not necessarily with respect to all possible mixed strategy profiles) of a game. This definition of efficient ensures that any game with at least one Nash equilibrium, such as bimatrix games, will have at least one efficient Nash equilibrium. To best of our knowledge, this concept of efficient Nash equilibria has, so far, received little attention in the literature. We were able to find only one reference that uses this notion [13]. LaValle [13] argues that efficient Nash equilibria will be preferred by intelligent players. We prove the critical result that even though a bimatrix game can have an infinite number of efficient Nash equilibria, the set of all points in the payoff space corresponding to the utilities of players in an efficient Nash equilibrium, the so-called nondominated Nash points, is finite. This implies that the set of efficient Nash equilibria can be partitioned into a finite number of subsets of efficient Nash equilibria 2 with the same player payoffs. We demonstrate that computing a single efficient Nash equilibrium for each of the subsets of the partition can be done efficiently by formulating the problem of finding an efficient Nash equilibrium as a biobjective mixed integer program (BOMIP) and computing the nondominated frontier of that biobjective mixed integer program. During our exploration of biobjective mixed integer programming approaches for the computation of nondominated Nash points, we were able to enhance the mixed integer programming formulation proposed by Sandholm et al. [24] for finding a single Nash equilibrium of a normal form game with two players. The modification, even though relatively minor, resulted in a significant reduction in computing time across a wide variety of instances (generated using GAMUT, see http://gamut.stanford.edu). On average, computing times were reduced by a factor of four. Finally, we show how nondominated Nash points can be used in determining the disagreement point in a bargaining problem. Nash introduced his approach to solve the bargaining problem in 1950 and showed that the bargaining problem has a unique solution, called the Nash bargaining solution, if certain axioms hold [18]. However, the Nash bargaining solution depends on the choice of the status quo or the disagreement point for the bargaining problem. The disagreement point represents the payoff for each of the players if no grand coalition is created, i.e., if negotiations break down. Consequently, in games with a single Nash equilibrium, the obvious disagreement point is given by the equilibrium payoffs. However, in games with multiple equilibria, the situation is more complicated and the choice of the disagreement point is controversial. We believe that in such games, the natural candidates for disagreement points are the nondominated Nash points, because these points will be preferable by intelligent players under competition [13] (and when negotiations break down, competition starts). Supposing that each of the nondominated Nash points is equally likely to occur (under competition), the expected payoff for the players is the average of the utility values of the nondominated Nash points. We argue and demonstrate that an appropriate disagreement point is given by these expected payoffs. To summarize, the contributions of our research are that we 1. develop an efficient approach for computing a single Nash equilibrium; 2. demonstrate that a set of efficient Nash equilibria, one for each nondominated Nash point, can be computed effectively (for bimatrix games); and 3. propose the use of nondominated Nash points to define the disagreement point in a bargaining problem. The rest of paper is organized as follows. In Section 2, we introduce important concepts and notation related to normal form games. In Section 3, we detail the logic of our new MIP for computing a single Nash equilibrium and report the results of a set of computational experiments carried out to analyze the performance of the new MIP. In Section 4, we formally introduce the notion of nondominated Nash points and we conduct a comprehensive computational study in which we investigate the nondominated Nash frontier of a large number of games. In Section 5, we demonstrate how the nondominated Nash frontier can be exploited in Nash bargaining problems. Finally, in Section 6, we give some concluding remarks. 2 Normal Form Games We will now describe the basic concepts of normal form games. We refer the interested readers to [26] for more information. A normal form game is a tuple (N, A, u) where • N is a finite set of n players, indexed by i; • A = (A1 , · · · , An ) where Ai is a finite set of actions (pure strategies) available to player i. Each vector (a1 , · · · , an ) ∈ A is called an action profile; 3 • u = (u1 , · · · , un ) where ui : A → R is a real-valued utility (or payoff) function for player i. There are two strategies for playing a game: a pure strategy and mixed strategy. In a pure strategy, a player always chooses a unique action from among the set of possible actions. However, in a mixed strategy, a player chooses an action from among the set of possible actions according to some probability distribution. Definition 1. Let (N, A, u) be a normal form game, and for any finite set S, let Π(S) be the set of all discrete probability distributions over S. Then the set of mixed strategies for player i is Pi = Π(Ai ). Definition 2. The Cartesian product of the individual mixed strategy sets P1 × · · · × Pn is called the set of mixed strategy profiles. Definition 3. The support of a mixed strategy pi ∈ Pi for a player i is the set of actions with positive probability, i.e., {ai ∈ Ai : pi (ai ) > 0}. Definition 4. Let (N, A, u) be a normal form game. The expected utility uE i for player i of a mixed strategy profile p = (p1 , · · · , pn ) is defined as, n X Y uE pj (aj ) ui (a) i (p) = a∈A j=1 Definition 5. Let p−i := (p1 , · · · , pi−1 , pi+1 , · · · , pn ). Player i’s best response to the strategy profile ∗ E p−i is a mixed strategy p∗i ∈ Pi such that uE i (pi , p−i ) ≥ ui (pi , p−i ) for all strategies pi ∈ Pi . Definition 6. A strategy profile p = (p1 , · · · , pn ) is a Nash equilibrium if for all agents i, pi is the best response to p−i . Theorem 7. Every game with a finite number of players and action profiles has at least one Nash equilibrium [19]. For the remainder of the paper, we restrict ourselves to normal form games with two players (we sometimes refer to it as bimatrix games). Next, we show how a Nash equilibrium can be computed. Definition 5 implies that if the mixed strategy p2 of the second player is known, the best response of the first player can be obtained by solving the following Linear Program (LP), X X max p1 (a1 )p2 (a2 )u1 (a1 , a2 ) a1 ∈A1 a2 ∈A2 subject to X p1 (a1 ) = 1 a1 ∈A1 p1 (a1 ) ≥ 0 ∀ a1 ∈ A1 . Similarly, if the mixed strategy p1 of the first player is known, the best response by player two can be obtained by solving the following LP, X X max p1 (a1 )p2 (a2 )u2 (a1 , a2 ) a1 ∈A1 a2 ∈A2 subject to X p2 (a2 ) = 1 a2 ∈A2 p2 (a2 ) ≥ 0 4 ∀ a2 ∈ A2 . Definition 6 implies that to obtain a Nash equilibrium, the optimality conditions of these two LPs must be achieved at the same time. The KKT optimality conditions imply the following feasibility problem, which is known as the Linear Complementarity Problem (LCP): X p2 (a2 )u1 (a1 , a2 ) + r1 (a1 ) = uE ∀ a1 ∈ A1 1 a2 ∈A2 X p1 (a1 )u2 (a1 , a2 ) + r2 (a2 ) = uE 2 ∀ a2 ∈ A2 a1 ∈A1 X p1 (a1 ) = 1 a1 ∈A1 X p2 (a2 ) = 1 a2 ∈A2 p1 (a1 )r1 (a1 ) = 0 ∀ a1 ∈ A1 p2 (a2 )r2 (a2 ) = 0 ∀ a2 ∈ A2 r1 (a1 ) ≥ 0 ∀ a1 ∈ A1 r2 (a2 ) ≥ 0 ∀ a2 ∈ A2 p1 (a1 ) ≥ 0 ∀ a1 ∈ A1 p2 (a2 ) ≥ 0 ∀ a2 ∈ A2 , E where uE 1 and u2 are free decision variables corresponding to the expected utility value of each player in a Nash equilibrium [14]. Moreover, r1 (a1 ) and r2 (a2 ) for a1 ∈ A1 and a2 ∈ A2 are the slack variables for the first two constraints. They can be interpreted as the regret of action a1 and a2 for a1 ∈ A1 and a2 ∈ A2 when they are not in the support of a strategy of player one and two, respectively. The LCP has an intuitive interpretation. It implies that in a Nash equilibrium, each player must make the other player indifferent (the expected utility is exactly the same) between the choice of the actions (pure strategies) in his support. This means that, in a Nash equilibrium, any action in the support of a player must have zero regret. Obviously, to obtain a Nash equilibrium the LCP has to be solved, which can be done exactly using the Lemke-Hawson algorithm [14] or mixed integer programming [24], or heuristically using the PNS algorithm [23]. The Lemke-Hawson algorithm is one of the most efficient algorithms for solving the LCP [26]. The algorithm is a path-following method which pivots using complementary feasible bases and terminates when it finds a Nash equilibrium. Sandholm et al. [24] transform the LCP into a mixed integer program (MIP) and solve it using a powerful commercial MIP solver. They show that the MIP approach is competitive with the Lemke-Hawson algorithm. The PNS algorithm proceeds differently. The algorithm repeatedly guesses the support of each player’s strategy so as to reduce the LCP to a linear program, which can then be checked for feasibility. Sandholm et al. [24] observed that there exist classes of normal form games where the PNS algorithm tends to struggle to obtain a Nash equilibrium because it must guess and evaluate a large number of supports. 3 Computing a Nash Equilibrium In this section, we present a new MIP to obtain a Nash equilibrium. Our formulation is a simplified version of the best formulation of Sandholm et al. [24] with, importantly, two additional payoff-based valid inequalities: E max uE 1 + u2 5 max uE 1 ≤ u1 max uE 2 ≤ u2 X p2 (a2 )u1 (a1 , a2 ) + r1 (a1 ) = uE 1 ∀ a1 ∈ A1 p1 (a1 )u2 (a1 , a2 ) + r2 (a2 ) = uE 2 ∀ a2 ∈ A2 a2 ∈A2 X a1 ∈A1 X p1 (a1 ) = 1 a1 ∈A1 X p2 (a2 ) = 1 a2 ∈A2 p1 (a1 ) ≤ b1 (a1 ) ∀ a1 ∈ A1 p2 (a2 ) ≤ b2 (a2 ) ∀ a2 ∈ A2 r1 (a1 ) ≤ M1 (1 − b1 (a1 )) ∀ a1 ∈ A1 r2 (a2 ) ≤ M2 (1 − b2 (a2 )) ∀ a2 ∈ A2 r1 (a1 ) ≥ 0 ∀ a1 ∈ A1 r2 (a2 ) ≥ 0 ∀ a2 ∈ A2 p1 (a1 ) ≥ 0 ∀ a1 ∈ A1 p2 (a2 ) ≥ 0 ∀ a2 ∈ A2 b1 (a1 ) ∈ {0, 1} ∀ a1 ∈ A1 b2 (a2 ) ∈ {0, 1} ∀ a2 ∈ A2 , where umax = maxa∈A u1 (a), umax = maxa∈A u2 (a), M1 = maxah ,al ∈A u1 (ah ) − u1 (al ) and M2 = 1 2 h l maxah ,al ∈A u2 (a ) − u2 (a ). The MIP is a linearization of LCP, in which the constraints p1 (a1 )r1 (a1 ) = 0 ∀ a1 ∈ A1 p2 (a2 )r2 (a2 ) = 0 ∀ a2 ∈ A2 are replaced by p1 (a1 ) ≤ b1 (a1 ) ∀ a 1 ∈ A1 p2 (a2 ) ≤ b2 (a2 ) ∀ a 2 ∈ A2 r1 (a1 ) ≤ M1 (1 − b1 (a1 )) ∀ a 1 ∈ A1 r2 (a2 ) ≤ M2 (1 − b2 (a2 )) ∀ a 2 ∈ A2 . That is, the nonlinear constraints are replaced by linear constraints at the price of the introduction of binary variables b1 (a1 ) for a1 ∈ A1 and b2 (a2 ) for a2 ∈ A2 and disjunctive constraints r1 (a1 ) ≤ M1 (1 − b1 (a1 )) for a1 ∈ A1 and r2 (a2 ) ≤ M2 (1 − b2 (a2 )) for a2 ∈ A2 . Moreover, two valid inequalities defining upper bounds on the expected utility values of the players max and uE ≤ umax . Note that any feasible solution to the MIP represents are added, i.e., uE 1 ≤ u1 2 2 a Nash equilibrium. The objective function, which maximizes the social welfare, is added because computational experiments revealed that CPLEX solves the MIP faster when this objective function is added [24]. To demonstrate the efficiency of the new MIP, a computational study was conducted. The instances in the study are generated using GAMUT. We generated 5 classes of instances: C20, C40, C60, C80 and C100, where the class identifier Cm embeds the size of the game, i.e., m = |A1 | = |A2 |. Each 6 Table 1: Distributions of GAMUT. D1 BertrandOligopoly D2 BidirectionalLEG-CG D3 BidirectionalLEG-RG D4 CovariantGame-Pos D5 CovariantGame D6 CovariantGame-Zero D7 GraphicalGame-RG D8 GraphicalGame-Road D9 GraphicalGame-SG D10 GraphicalGame-SW D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 MinimumEffortGame PolymatrixGame-CG PolymatrixGame-RG PolymatrixGame-Road PolymatrixGame-SW RandomGame UniformLEG-CG UniformLEG-RG UniformLEG-SG LocationGame class has 20 subclasses, each having 5 instances. Each subclass is a different game distribution in GAMUT. The 20 distributions can be found in Table 1. These distributions are used frequently in previous studies, see for instance Sandholm et al. [24] or Porter et al. [23]. (It is worth mentioning that distributions related to polymatrix and graphical games can be used to generate instances with more than two players. However, the focus of our study is on instances with only two players.) CPLEX 12.5.1 is used to solve the MIPs. All instances were run on a Dell PowerEdge R710 with dual hex core 3.06GHz Intel Xeon X5675 processors and 96GB RAM running Red Hat Enterprise Linux 6, and only a single thread was used. In all experiments, a runtime limit of 1800 seconds was imposed. Figure 1 shows the performance profile of the runtime of CPLEX for three formulations: the (best) formulation proposed by Sandholm et al. [24] (S), the new formulation (N), and the new formulation without the two valid inequalities defining upper bounds on the expected utility values of the players (W). A performance profile [10] presents cumulative distribution functions for a set of solution approaches being compared with respect to a specific performance metric. The runtime performance profile for a set of solution approaches is constructed by computing for each solution approach and for each instance the ratio of the runtime of the solution approach on the instance and the minimum of the runtimes of all solution approaches on the instance. The runtime performance profile then shows the ratios on the horizontal axis and, on the vertical axis, for each solution approach, the fraction of instances with a ratio that is greater than or equal to the ratio on the horizontal axis. This implies that values in the upper left-hand corner of the graph indicate the best performance. Clearly, the new formulation performs much better than the other formulations; in most cases, it is about 4 times faster. The reason for this improvement is the introduction of the two valid inequalities defining upper bounds on the expected utility values of the players. These inequalities are exploited by the solution techniques embedded in CPLEX to find feasible solutions, which is clearly beneficial. The payoff-based valid inequalities can also remove fractional solutions in situations where both negative and positive payoffs exist for at least one of the players. In these situations, either M1 − umax > 0 or 1 M2 − umax > 0 or both. Consequently, when we add the payoff-based valid inequalities, we implicitly 2 define a tighter bound for variables of the form that M1 − umax >0 1 P r(a1 ) or r(a2 ). To see this, suppose E and consider the constraints of the form a2 ∈A2 p2 (a2 )u1 (a1 , a2 ) + r1 (a1 ) = u1 . It is clear that if P E max is added then r (a ) cannot be freely set to any value less 1 1 a2 ∈A2 p2 (a2 )u1 (a1 , a2 ) > 0 and u1 ≤ u1 than or equal to M1 . CPLEX allows users to tune the performance by adjusting parameters. One of these parameters CPLEX MIP emphasize, which “controls trade-offs between speed, feasibility, optimality, and moving bounds in MIP”. Its default setting balances between optimality and feasibility. However, because our interest is finding a feasible solution, i.e., a single Nash equilibrium, as quickly as possible, we experimented with setting the parameter to emphasize feasibility over optimality. We found that 7 Ratio of solved instances 1 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 Ratio of runtime to the minimum runtime S N W 14 Figure 1: Performance profile of the runtime of three formulations (S, N and W). resulting performance profile is similar, i.e., the relative performance of the formulations does not change, but the runtime increased significantly for all formulations (almost doubled). 4 Biobjective Mixed Integer Programming and Bimatrix Games As mentioned in the introduction, in addition to developing faster mixed integer programming approaches for finding a single Nash equilibrium, we are interested in developing biobjective integer programming approaches for computing a desirable subset of Nash equilibria. The latter is the focus of this section. We start by introducing the concept of efficient (or Pareto optimal) Nash equilibria. Again, we note that the traditional concept of “Pareto optimality”, which is widely used in the literature (see for instance [26]), is different from the concept that we will formally define in this section. The existing concept defines Pareto optimality over the set of all possible mixed strategy profiles. In other words, a mixed strategy profile is Pareto optimal if it is impossible to improve the utility value of at least one of the players without a deterioration in the utility value of any of the other players. Based on this definition, there is no relationship between Pareto optimal mixed strategy profiles and Nash equilibria. In other words, a Pareto optimal mixed strategy profile is not necessarily a Nash equilibrium. For example, consider an instance of the prisoner’s dilemma game (See Table 2). There are two prisoners (P1 and P2) and two actions are available to each player: confessing (C) and not confessing (N). If both prisoners confess, they go to jail for 4 years. If only one of them confesses, the one who confessed will be released and the other will go to jail for 8 years. If none of them confess, both of them will go to jail for only 1 year. The only Nash equilibrium of this game is the case where both players confess. However, it is not Pareto optimal, because, for example, when none of the players confess, both players obtain a higher payoff. Note that because our goal is to compute a (desirable) subset of all Nash equilibria, using the 8 Table 2: Prisoner’s dilemma P1 P2 Strategy N C N -1,-1 -8,0 C 0,-8 -4,-4 traditional definition of Pareto optimality is not appropriate, because a Pareto optimal mixed strategy profile is not necessarily a Nash equilibrium. Therefore, we define the concept of Pareto optimality or efficiency over the set of all Nash equilibria rather than over the set of all possible mix strategy profiles. Let E denote the set of all Nash equilibria. Moreover, let the feasible set in the payoff space, U, be the image of E under vector-valued function u = {u1 , · · · , un }, i.e., U := u(E) := {y ∈ Rn : y = u(p) for some p ∈ E}. Definition 8. A Nash equilibrium p0 ∈ E is called weakly efficient, if there is no other Nash equilibrium p ∈ E such that ui (p) > ui (p0 ) for i = 1, · · · , n. If p0 is weakly efficient, then u(p0 ) is called a weakly nondominated Nash point. Definition 9. A Nash equilibrium p0 ∈ E is called efficient or Pareto optimal, if there is no other Nash equilibrium p ∈ E such that ui (p) ≥ ui (p0 ) for i = 1, · · · , n and u(p) 6= u(p0 ). If p0 is efficient, then u(p0 ) is called a nondominated Nash point. The set of all efficient Nash equilibria p0 ∈ E is denoted by EE . The set of all nondominated Nash points u(p0 ) ∈ U for some p0 ∈ EE is denoted by UN and referred to as the nondominated Nash frontier or efficient Nash frontier. Definition 10. A Nash equilibrium p0 ∈ E is called ideal if it simultaneously maximizes the expected utility of all players. If p0 is ideal, then uI = u(p0 ) is called the ideal Nash point. To compute the set of all efficient Nash equilibria or nondominated Nash points of a normal form game with two players, the following BOMIP must be solved, E max uE (p) := {uE 1 (p), u2 (p)}. p∈E We sometimes refer to this formulation as BOMIP-NFG2. Note that E is the set of all Nash equilibria and can be expressed by the constraints of the MIP introduced in Section 3. Theorem 11. The number of nondominated Nash points of a normal form game with two players ({1, 2}, A, u) is finite. Proof. Let supp(pi ) := {ai ∈ Ai : pi (ai ) > 0} be the support of a mixed strategy pi ∈ Pi of player i ∈ {1, 2}. We observe that supp(pi ) ⊆ Ai and supp(pi ) 6= ∅. We denote by Si , the set of all possible supp(pi ) where pi ∈ Pi and i ∈ {1, 2}. It is evident that since the game is finite (in terms of the number of players and the number of action profiles), Si is also finite, |Si | = |Ai | X |Ai | l=1 l = 2|Ai | − 1. Next, consider a support vector (s1 , s2 ) ∈ S1 × S2 . We now try to simply the BOMIP-NFG2 based on (s1 , s2 ). A specific support vector corresponds to a specific set of values of the binary variables in the BOMIP-NFG2, i.e., b1 (a1 ) = 1 for all a1 ∈ s1 , b1 (a1 ) = 0 for all a1 ∈ A1 \ s1 , b2 (a2 ) = 1 for all a2 ∈ s2 , and b2 (a2 ) = 0 for all a2 ∈ A2 \ s2 . Note that this implies that we also have p1 (a1 ) = 0 for all 9 a1 ∈ A1 \ s1 , r1 (a1 ) = 0 for all a1 ∈ s1 , p2 (a2 ) = 0 for all a2 ∈ A2 \ s2 , and r2 (a2 ) = 0 for all a2 ∈ s2 . As a consequence, the BOMIP-NFG2 reduces to a biobjective linear program (BOLP) since all the binary variables are removed. It is easy to see that the constraints of the BOLP can be decomposed into two independent sets, i.e., Set 1: Set 2: X X p1 (a1 ).u2 (a1 , a2 ) + r2 (a2 ) = uE p2 (a2 ).u1 (a1 , a2 ) + r1 (a1 ) = uE ∀ a ∈ A 1 1 2 ∀ a2 ∈ A2 1 a1 ∈A1 a2 ∈A2 X X p2 (a2 ) = 1 p1 (a1 ) = 1 a1 ∈A1 a2 ∈A2 r1 (a1 ) ≥ 0 ∀ a1 ∈ A1 \s1 r2 (a2 ) ≥ 0 ∀ a2 ∈ A2 \s2 r1 (a1 ) = 0 ∀ a1 ∈ s1 r2 (a2 ) = 0 ∀ a2 ∈ s2 p2 (a2 ) > 0 ∀ a2 ∈ s2 p1 (a1 ) > 0 ∀ a1 ∈ s1 p2 (a2 ) = 0 ∀ a2 ∈ A2 \s2 p1 (a1 ) = 0 ∀ a1 ∈ A1 \s1 . If the support vector (s1 , s2 ), does not result in a Nash equilibrium, then the BOLP is infeasible. If the support vector does result in a Nash equilibrium, then the BOLP must be feasible. However, we know that the constraints of the BOLP are decomposed into two independent sets. This implies that E uE 1 and u2 can be maximized simultaneously, and thus there exists a single, ideal Nash point for the BOLP. So, since there is a finite number of support vectors, i.e., S1 × S2 is finite, there is only a finite number of nondominated Nash points. Theorem 11 is a critical result that shows that even though a bimatrix game may have an infinite number of efficient Nash equilibria, the set of nondominated Nash points is always finite. So, we can compute one efficient Nash equilibrium corresponding to each nondominated Nash point to construct a desirable subset of Nash equilibria. It is worth mentioning that determining the nondominated frontier of a BOMIP is not easy in general. The primary reason for this is that the nondominated frontier may have continuous segments, i.e., parts in which all points of a line segment are nondominated. However, when a BOMIP has a finite number of nondominated points (and the nondominated frontier does not have continuous segments), then several effective algorithms for computing the nondominated frontier exist. One of these algorithms is the balanced box method (see Boland et al. [5]), which we will use in our computational study. As observed earlier, multiple Nash equilibria may exist for a single nondominated Nash point. The balanced box method returns only one of them. Thus, when the algorithm terminates, it reports the set of nondominated Nash points and one efficient Nash equilibrium for each of them. It is worth noting, too, that the balanced box method can be customized to exploit the structure of the payoff-based valid inequalities (see Section 3), which in many cases decreases the runtime of the algorithm substantially. Detailed performance statistics of the balanced box method can be found in Tables 3 to 7, i.e., the number of nondominated Nash points (#NDP ), the time required to determine the efficient Nash frontier (Time (sec.)) and the number of instances for which the complete efficient Nash frontier was not determined within the time limit of 1800 seconds (#NS ). Observe that the number of points in the efficient Nash frontier is very small and that in more than half of the instances the efficient Nash frontier consists of a single point, i.e., there is an ideal point. Moreover, the time required to determine the complete nondominated frontier is quite small (by using CPLEX). For example, for the largest classes of instances, the complete nondominated frontier is determined in about 5 minutes. For only 22 out of 500 instances the complete nondominated frontier could not be determined in 30 minutes of computing; most of them belonging to subclasses D1 and D5. The payoff values of instances in subclasses D1 and D5 are symmetric, which causes the performance of the single objective commercial MIP solvers to deteriorate. 10 Table 3: Overall results of the balanced box method on class C20. Subclass D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 Avg Max 3 3 1 2 3 2 3 3 4 3 1 4 4 3 5 4 1 1 1 1 2.60 #NDP Avg Min 1.4 1 1.4 1 1 1 1.2 1 1.6 1 1.4 1 2 1 2 1 2 1 2 1 1 1 2.4 1 2.6 1 2.2 1 1.8 1 2.2 1 1 1 1 1 1 1 1 1 1.61 1.00 Time (sec.) Max Avg Min 0.87 0.40 0.06 0.18 0.06 0.01 0.05 0.02 0.00 0.12 0.08 0.06 25.92 9.46 2.51 1.08 0.61 0.14 1.21 0.73 0.37 1.04 0.63 0.36 1.63 0.54 0.08 1.01 0.80 0.19 0.16 0.14 0.11 1.80 0.75 0.16 1.14 0.64 0.24 0.69 0.48 0.36 2.30 0.62 0.08 1.14 0.62 0.20 0.07 0.05 0.02 0.10 0.06 0.04 0.02 0.01 0.00 0.05 0.04 0.02 2.03 0.84 0.25 #NS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00 Table 4: Overall results of the balanced box method on class C40. Subclass D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 Avg Max 1 1 1 2 0 3 4 6 5 4 1 4 6 4 4 4 1 1 1 1 2.70 #NDP Avg Min 1 1 1 1 1 1 1.4 1 0 0 1.8 1 2.4 1 2.6 1 3 1 2.4 1 1 1 2 1 2.6 1 2.4 1 2.2 1 2.2 1 1 1 1 1 1 1 1 1 1.65 0.95 Max 2.12 0.20 0.09 0.56 1801.00 3.62 39.15 17.08 24.37 33.49 0.18 39.43 20.35 62.03 55.34 25.07 0.15 0.20 0.12 0.18 106.24 11 Time (sec.) Avg Min 1.50 0.96 0.14 0.05 0.05 0.03 0.40 0.23 1800.99 1800.98 2.41 0.96 12.31 1.10 7.53 1.45 11.77 3.20 9.97 1.80 0.13 0.07 10.57 1.04 8.72 2.22 17.07 0.74 21.55 3.09 8.36 2.45 0.12 0.07 0.09 0.04 0.05 0.01 0.14 0.12 95.69 91.03 #NS 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 Table 5: Overall results of the balanced box method on class C60. Subclass D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 Avg Max 1 2 1 2 2 4 5 6 3 2 1 3 3 5 8 5 1 3 1 1 2.95 #NDP Avg Min 1 1 1.2 1 1 1 1.6 1 0.6 0 2.4 1 2 1 3.2 1 1.8 1 1.6 1 1 1 1.8 1 1.8 1 3.2 2 4 2 3 2 1 1 1.6 1 1 1 1 1 1.79 1.10 Time (sec.) Max Avg Min 4.93 1.83 0.01 1.03 0.73 0.41 0.52 0.22 0.09 1.24 0.99 0.70 1801.00 1184.38 128.06 86.82 46.99 7.52 112.76 27.26 2.37 118.94 43.75 2.68 145.10 54.20 11.88 10.89 5.77 1.99 0.54 0.45 0.35 45.63 18.12 3.99 33.75 13.34 3.29 98.30 34.12 4.29 101.15 44.91 3.92 78.50 37.36 4.66 0.33 0.24 0.22 0.75 0.31 0.06 0.39 0.16 0.02 0.35 0.27 0.21 132.15 75.77 8.84 #NS 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.15 Table 6: Overall results of the balanced box method on class C80. Subclass D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 Avg Max 3 1 2 3 4 3 8 5 4 3 1 4 3 3 5 4 2 1 1 1 3.05 #NDP Avg Min 1.2 0 1 1 1.2 1 1.6 1 1.8 0 1.8 1 3.8 1 2.6 1 2.8 1 1.6 1 1 1 2.4 1 2 1 2 0 3 1 2.6 1 1.2 1 1 1 1 1 1 1 1.83 0.85 Time (sec.) Max Avg 1800.99 1188.34 3.39 1.25 2.00 1.07 3.93 2.44 1800.98 772.01 99.20 31.63 788.14 250.41 131.31 60.05 217.55 84.38 113.52 56.85 0.86 0.55 1203.91 298.55 157.04 54.58 1800.95 405.83 204.61 91.04 281.55 90.20 0.78 0.52 0.37 0.26 1.46 0.40 0.53 0.44 430.65 169.54 12 Min 26.55 0.23 0.12 1.38 16.00 3.53 12.45 6.11 4.24 16.41 0.37 12.70 8.24 15.68 29.93 14.38 0.40 0.07 0.04 0.27 8.46 #NS 3 0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.30 Table 7: Overall results of the balanced box method on class C100. Subclass D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 Avg 5 Max 1 1 1 4 3 3 6 3 4 6 2 5 6 8 5 3 2 1 1 1 3.30 #NDP Avg Min 0.2 0 1 1 1 1 2.6 1 2.4 2 1.6 1 2.6 0 1.8 1 2.6 1 3.8 2 1.2 1 3.2 1 2.6 0 3.4 1 3 2 1.4 0 1.2 1 1 1 1 1 1 1 1.93 0.95 Time (sec.) Max Avg Min 1800.99 1442.38 8.17 3.69 1.44 0.28 2.95 1.06 0.12 10.58 6.55 2.56 15.18 10.00 6.74 714.78 224.47 27.97 1800.96 673.02 32.91 943.14 216.09 15.84 530.01 234.17 7.43 909.26 332.86 22.67 1.06 0.74 0.46 1055.44 350.64 74.62 1800.97 717.26 79.09 1610.19 370.83 16.57 1707.69 581.36 34.14 1800.98 953.23 108.79 1.30 0.74 0.44 0.99 0.61 0.17 0.37 0.25 0.09 1.00 0.86 0.73 735.58 305.93 21.99 #NS 4 0 0 0 0 0 1 0 0 0 0 0 1 0 0 2 0 0 0 0 0.40 Bargaining problem and the Nash Efficient Frontier In this section, we discuss how knowledge of the efficient Nash frontier can be used in a bargaining problem (or bargaining game) to define the disagreement point. A bargaining problem is a cooperative game in which players agree to create a grand coalition instead of competing with each other to get a higher payoff [25]. A fundamental question in this context is what the payoff of each player should be in a grand coalition. One solution to a bargaining problem was proposed by Nash and has become known as the Nash bargaining solution [18, 20]. Next, we explain the Nash bargaining solution when there are only two players. Let Y be the set of feasible points in the (two-dimensional) payoff space. In fact, set Y contains all possible expected utility values of players. We assume that Y is compact and convex. Let YN be the set of nondominated points of Y, i.e., if y ∈ YN , then there exists no point y 0 ∈ Y such that y10 ≥ y1 , y20 ≥ y2 , and y 0 6= y, and let d = (d1 , d2 ) ∈ Y be the disagreement point, representing the payoff for each of the players if no grand coalition is created. We note that determining the disagreement point of a bargaining problem is, in general, non-trivial [8]. One reason is that when negotiations break down, players may decide to forgo (some of their) own rewards and try to punish the other player, e.g., by minimizing the maximum expected payoff of their opponent. But even if the players do not sacrifice their own rewards, determining the disagreement point is not straightforward when there are multiple Nash equilibria. Two classical axioms imposing restrictions on the solution to a Nash bargaining problem are • Individual Rationality: None of the players accepts a payoff lower than the one which is guaranteed E to him under disagreement, i.e., uE 1 ≥ d1 and u2 ≥ d2 . • Pareto Optimality: The solution must be such that the payoff for a single player cannot be increased without decreasing the payoff of the other player. E ∗ E E E Let Y ∗ = {uE ∈ Y : uE 1 ≥ d1 , u2 ≥ d2 } and YN = {u ∈ YN : u1 ≥ d1 , u2 ≥ d2 }. To satisfy the ∗ . However, in general, Y ∗ still contains an classical axioms, a bargaining solution u∗ must be in YN N infinite number of points. Nash introduced three additional axioms: • Symmetry: If Y ∗ is symmetric, i.e., for any vector (y, y 0 ) ∈ Y ∗ , the vector (y 0 , y) is also in Y ∗ , 13 then in a bargaining solution we must have u∗1 = u∗2 . • Linear Invariance: Let u∗ be a solution of bargaining game G. Moreover, let Ĝ be a bargaining game obtained from G by an order-preserving linear transformation T of one player’s utility function. The solution û∗ to the bargaining game Ĝ has to be the image of u∗ under T , i.e., û∗ = T u∗ . • Independence of Irrelevant Alternatives: Let d be the disagreement point and u∗ be a solution of bargaining game G. Moreover, let Ĝ be a bargaining game that is obtained from G by restricting Y to Ŷ, i.e., Ŷ ⊂ Y. If d ∈ Ŷ and u∗ ∈ Ŷ, then u∗ must be the solution of Ĝ. Nash [18] proved that there exists a unique solution u∗ = (u∗1 , u∗2 ) to the bargaining problem that satisfies all five axioms, namely E E E E u∗ = arg max{(uE 1 − d1 )(u2 − d2 ) : u ∈ Y, u1 ≥ d1 , u2 ≥ d2 }. (1) Note that the solution depends on the choice of the disagreement point d. Next, we discuss how Y and d can be defined for a bimatrix game. Since Y must contain the expected utility values of players, Y can be defined as the set of convex combinations of outcomes in the bimatrix game, i.e., X X X E 2 E λ(a)u1 (a), uE λ(a)u2 (a), λ(a) = 1, λ(a) ≥ 0 ∀a ∈ A}. Y = {(uE 2 = 1 , u2 ) ∈ R : u1 = a∈A a∈A a∈A When a game has a unique Nash equilibrium, that equilibrium can be chosen as the disagreement point. Unfortunately, when there are multiple Nash equilibria, then there is no obvious way for choosing d. However, since the points on the efficient Nash frontier represent points with maximal payoffs for each player under competition, they are the natural candidates for the disagreement point. Thus, if the efficient Nash frontier consists of a single point, then that point can be chosen as the disagreement point. However, as the results of our computational experiments in the previous section show, in many games the efficient Nash frontier consists of more than one point (implying that it is not possible to maximize both players’ payoffs simultaneously under competition). Because all points on the Nash efficient frontier have the same chance of being the final outcome of the game under competition, the expected best payoff of the game under competition is the average of the utility values of the players over the points on the Nash efficient frontier. Therefore, one option is to set the disagreement point to be equal to P P E E uE ∈UN u1 uE ∈UN u2 (d1 , d2 ) = ( , ). |UN | |UN | Because d is the expected best payoff of the game under competition, if creating a coalition results in a greater payoff for the players, they have an incentive to collaborate. (Note that with this choice of disagreement point, the payoff for each player in the Nash bargaining solution will be at least the expected best payoff under competition.) Next, we illustrate the Nash bargaining solution that results from this choice of disagreement point by means of a small example. We generated a game with |A1 | = |A2 | = 6 using the RandomGame distribution of GAMUT with the utility values of both players restricted to be in the interval [−150, 150]. The payoff matrix of this game is shown in Table 8. This game has three nondominated Nash points. These points together with an associated Nash equilibrium are shown in Table 9. Now suppose that the players agree to create a grand coalition. The proposed disagreement point is d = (20.508, 95.238). As a result, the Nash bargaining solution for this game is u∗ = (63.984, 145.307), which implies both players should choose action a4 . We see that, in this example, with cooperation each player can obtain a payoff that is close to its maximum possible payoff under competition (and significantly larger than the expected best payoff under competition). 14 Table 8: Utility values of the example. a2 (58.954,-50.672) (-128.268,-126.595) (124.872,-115.63) (54.368,-97.699) (-150,40.32) (88.603,-63.644) Player 2 a3 a4 (54.404,139.851) (-124.681,147.785) (-6.944,-27.046) (-125.045,12.209) (-91.372,-121.989) (-93.674,-46.08) (-91.275,150) (63.984,145.307) (-54.68,131.402) (-100.583,-70.608) (-121.96,-137.767) (81.699,-125.064) a5 (-23.458,-42.076) (120.311,-31.287) (-2.735,-17.662) (-82.081,-77.016) (41.558,-84.65) (-121.368,-96.295) a6 (110.561,-60.093) (105.222,-122.293) (-100.213,-147.231) (100.871,61.205) (17.320,59.588) (71.947,81.638) Table 9: Nondominated Nash utility points of the example. Action a1 a2 a3 a4 a5 a6 Utility NDP 1 p1 p2 0.372 0.000 0.000 0.000 0.000 0.564 0.628 0.436 0.000 0.000 0.000 0.000 -23.626 146.228 NDP 2 p1 p2 0.000 0.000 0.000 0.000 0.000 0.366 0.730 0.634 0.000 0.000 0.270 0.000 7.157 72.371 NDP 3 p1 p2 0.000 0.000 0.000 0.000 0.000 0.000 0.711 0.620 0.000 0.000 0.289 0.380 77.995 67.114 150 NDP 1 100 NDP 2 50 NDP 3 0 -50 -100 Convex Hull of Y Status Quo 150 100 uE 1 50 0 -50 -100 -150 -150 a1 a2 a3 a4 a5 a6 a1 (20.846,-33.357) (-115.218,123.302) (-53.927,28.03) (94.957,1.705) (12.604,-45.033) (-7.921,147.105) uE 2 Player 1 Actions Nash Solution (u∗ ) Nondominated Points Figure 2: Representation of set Y and the Nash bargaining solution of the example. 15 An illustration of set Y, of the Nash bargaining solution, and the three nondominated Nash points of the example can be found in Figure 2. An examination of the figure reveals the following • The Nash bargaining solution does not change if we choose NDP 2 as disagreement point, i.e., u∗ = (63.984, 145.307). This is easily seen after recognizing that the area of the rectangle defined by d and u∗ has to be maximal in a solution to optimization problem (1); • If we choose NDP 1 as the disagreement point, then the Nash bargaining solution is (4.944, 147.092). That is, the expected utility value of the second player will be slightly larger than 145.307, but the expected utility value of the first player will be significantly smaller than 63.984. To achieve u∗ = (4.944, 147.092), players should choose actions (a4 , a4 ) with probability 0.62 and actions (a4 , a3 ) with probability 0.38; and • If we choose NDP 3 as the disagreement point, then the Nash bargaining solution is (88.137, 90.238). That is, the expected utility value of the first player will be larger than 63.984, but the expected utility value of the second player will be significantly smaller than 145.307. To achieve u∗ = (88.137, 90.238), players should choose actions (a4 , a4 ) with probability 0.345 and actions (a4 , a6 ) with probability 0.655. We see that the combined payoff of both players in the Nash bargaining solution obtained when using the proposed disagreement point is the largest (and it also best reflects the relative proportions of the expected best payoffs. 6 Conclusion We investigated the use of integer programming techniques for computing Nash equilibria of bimatrix games. Our main contributions are: (1) By slightly modifying the mixed integer programming formulation of Sandholm et al. [24], we are able to improve its runtime efficiency by a factor of four (on average), and, as a result, are able to more efficiently find a single Nash equilibrium, (2) By focusing on the concept of efficient Nash equilibria, we are able to exploit biobjective integer programming techniques to construct a subset of Nash equilibria with certain desirable characteristics, and (3) By computing and exploiting properties of nondominated Nash points, we are able to determine a disagreement point of a bargaining problem. We note that we have studied games in which each player wants to maximize a single objective. In the literature, we also find multiobjective (multi-criteria) games in which each player wants to (simultaneously) maximize one or more objectives. The definition of Nash equilibria and efficiency in this context [1, 6, 11, 21, 22, 27, 28] differs significantly from the ones use in this paper. A possible topic for further research is investigating whether nondominated Nash points for such games can be found using mixed integer programming techniques. References [1] Altman, E., Boulogne, T., El-Azouzi, R., Jiménez, T., and Wynter, L. (2006). A survey on networking games in telecommunications. Computers & Operations Research, 33(2):286 – 311. [2] Audet, C., Belhaiza, S., and Hansen, P. (2006). Enumeration of all the extreme equilibria in game theory: Bimatrix and polymatrix games. Journal of Optimization Theory and Applications, 129(3):349–372. [3] Audet, C., Hansen, P., Jaumard, B., and Savard, G. (2001). Enumeration of all extreme equilibria of bimatrix games. SIAM Journal on Scientific Computing, 23(1):323–328. 16 [4] Avis, D., Rosenberg, D, G., Savani, R., and von Stengel, B. (2010). Enumeration of Nash equilibria for two-player games. Economic Theory, 42(1):9–37. [5] Boland, N., Charkhgard, H., and Savelsbergh, M. (2015). A criterion space search algorithm for biobjective integer programming: The balanced box method. INFORMS Journal on Computing, 27(4):735–754. [6] Borm, P., Vermeulen, D., and Voorneveld, M. (2003). The structure of the set of equilibria for two person multicriteria games. European Journal of Operational Research, 148(3):480 – 493. [7] Chen, X. and Deng, X. (2006). Settling the complexity of two-player nash equilibrium. 47th Annual IEEE Symposium on Foundations of Computer Science, pages 261–272. [8] Chun, Y. and Thomson, W. (1990). Nash solution and uncertain disagreement points. Games and Economic Behavior, 2(3):213 – 223. [9] Dickhaut, J. and Kaplan, T. (1991). A program for finding Nash equilibria. The Mathematica Journal, 1:87–93. [10] Dolan, E. D. and Moré, J, J. (2002). Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2):201–213. [11] Ghose, D. (1991). A necessary and sufficient condition for pareto-optimal security strategies in multicriteria matrix games. Journal of Optimization Theory and Applications, 68(3):463–481. [12] Jansen, M. J. M. (1981). Maximal Nash subsets for bimatrix games. Naval Research Logistics Quarterly, 28(1):147–152. [13] LaValle, S, M. (2006). Planning Algorithms. Cambridge University Press. [14] Lemke, C, E. and Howson, J, T. (1964). Equilibrium points of bimatrix games. Journal of the Society for Industrial and Applied Mathematics, 12 (2):413–423. [15] Mangasarian, O, L. (1964). Equilibrium points of bimatrix games. Journal of the Society for Industrial and Applied Mathematics, 12(4):778–780. [16] McKelvey, R, D. and McLennan, A. (1996). Computation of equilibria in finite games. In Handbook of Computational Economics, pages 87–142. Elsevier. [17] Mills, H. (1960). Equilibrium points in finite games. Journal of the Society for Industrial and Applied Mathematics, 8(2):397–402. [18] Nash, J, F. (1950). The bargaining problem. Econometrica, 18:155–162. [19] Nash, J, F. (1951). Non-cooperative games. Annals of Mathematics, 54:286–295. [20] Nash, J, F. (1953). Two person cooperative games. Econometrica, 21:128–140. [21] Nishizaki, I. and Notsu, T. (2007). Nondominated equilibrium solutions of a multiobjective twoperson nonzero-sum game and corresponding mathematical programming problem. Journal of Optimization Theory and Applications, 135(2):217–239. [22] Nishizaki, I. and Notsu, T. (2008). Nondominated equilibrium solutions of a multiobjective twoperson nonzero-sum game in extensive form and corresponding mathematical programming problem. Journal of Global Optimization, 42(2):201–220. 17 [23] Porter, R., Nudelman, E., and Shoham, Y. (2008). Simple search methods for finding a Nash equilibrium. Games and Economic Behavior, 63(2):642 – 662. [24] Sandholm, T., Gilpin, A., and Conitzer, V. (2005). Mixed-integer programming methods for finding Nash equilibria. In Proceedings of the 20th National Conference on Artificial Intelligence, volume 2 of AAAI’05, pages 495–501, Pittsburgh, Pennsylvania. AAAI Press. [25] Serrano, R. (2005). Fifty years of the Nash program 1953-2003. Investigaciones Economicas, pages 219–258. [26] Shoham, Y. and Leyton-Brown, K. (2009). Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press. [27] Voorneveld, M., Grahn, S., and Dufwenberg, M. (2000). Ideal equilibria in noncooperative multicriteria games. Mathematical Methods of Operations Research, 52(1):65–77. [28] Zeleny, M. (1975). Games with multiple payoffs. International Journal of Game Theory, 4(4):179– 191. 18