Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Game Theory 2. Normal-Form Games Dana Nau University of Maryland Nau: Game Theory 1 How to reason about games? In single-agent decision theory, look at an optimal strategy Maximize the agent’s expected payoff in its environment With multiple agents, the best strategy depends on others’ choices Deal with this by identifying certain subsets of outcomes called solution concepts Some solution concepts: Pareto optimality Nash equilibrium Maximin and Minimax Dominant strategies Correlated equilibrium Nau: Game Theory 2 Pareto Optimality Strategy profile S Pareto dominates a strategy profile Sʹ′ if no agent gets a worse payoff with S than with Sʹ′, i.e., ui(S) ≥ ui(Sʹ′) for all i , at least one agent gets a better payoff with S than with Sʹ′, i.e., ui(S) > ui(Sʹ′) for at least one i Strategy profile s is Pareto optimal, or strictly Pareto efficient, if there’s no strategy s' that Pareto dominates s Every game has at least one Pareto optimal profile Always at least one Pareto optimal profile in which the strategies are pure Nau: Game Theory 3 Examples The Prisoner’s Dilemma (C,C) is Pareto optimal No profile gives both players a higher payoff 3, 3 0, 5 5, 0 1, 1 (D,C) is Pareto optimal No profile gives player 1 a higher payoff (D,C) is Pareto optimal - same argument (D,D) is Pareto dominated by (C,C) Which Side of the Road (Left,Left) and (Right,Right) are Pareto optimal In common-payoff games, all Pareto optimal strategy profiles have the same payoffs If (Left,Left) had payoffs (2,2), then (Right,Right) wouldn’t be Pareto optimal Nau: Game Theory 4 Best Response Suppose agent i knows how the others are going to play Then agent i has the single-agent problem of choosing a utility- maximizing action Some notation: S−i = (s1, …, si−1, si+1, …, sn), • i.e., S–i is strategy profile S without agent i’s strategy If si' is any strategy for agent i, then • (si' , S−i ) = (s1, …, si−1, si', si+1, …, sn) Hence (si , S−i ) = S For agent i, a best response to S−i is a mixed strategy si* such that ui (si*, S−i ) ≥ ui (si , S−i ) for every strategy si available to agent i A best response si* is unique if ui (si*, S−i ) > ui (si , S−i ) for every si ≠ si* Nau: Game Theory 5 Best Response Unless there’s a unique best response to S–i that’s a pure strategy, the number of best responses is infinite Suppose a best response to S–i is a mixed strategy s* whose support includes ≥ 2 actions • Then every action a in support(s*) must have the same expected utility ui(a,S–i) › If one of them had a higher expected utility than the others, then it would be a better response than s* • Thus any mixture of the actions in support(s*) is a best response Similarly, if there are ≥ 2 pure strategies that are best responses to S–i then any mixture of them is also a best response Nau: Game Theory 6 Nash Equilibrium A strategy profile s = (s1, …, sn) is a Nash equilibrium if for every i, si is a best response to S−i , i.e., no agent can do better by unilaterally changing his/her strategy A Nash equilibrium S = (s1, . . . , sn) is strict if for every i, si is the only best response to S−i ,i.e., any agent who unilaterally changes strategy will do worse Otherwise the Nash equilibrium is weak 3, 3 0, 5 5, 0 1, 1 In the Prisoner’s Dilemma, (D,D) is a strict Nash equilibrium If either agent unilaterally switches to a different strategy, his/her expected utility goes below 1 In Which Side of the Road, (Left,Left) and (Right,Right) both are strict Nash equilibria If either agent unilaterally switches to a different strategy, his/her expected utility goes below 1 Nau: Game Theory 7 Properties of Nash Equilibria Theorem (Nash, 1951): Every game with a finite number of agents and action profiles has at least one Nash equilibrium Weak Nash equilibria are less stable than strict Nash equilibria In the weak case, at least one agent has > 1 best responses, and only one of them is in the Nash equilibrium Pure-strategy Nash equilibria can be either strict or weak Mixed-strategy Nash equilibria are always weak Reason: if there are ≥ 2 pure strategies that are best responses to S–i then any mixture of them is also a best response Thus in a strict Nash equilibrium, all of the strategies are pure e.g., Prisoner’s Dilemma, Which Side of the Road Nau: Game Theory 8 Weak and Strong Nash Equilibria Weak Nash equilibria are less stable than strict Nash equilibria In the weak case, at least one agent has > 1 best responses, and only one of them is in the Nash equilibrium Pure-strategy Nash equilibria can be either strict (e.g., the Prisoner’s Dilemma) or weak (e.g., which side of the road) From the corollary on the previous slide, we know that mixed-strategy Nash equilibria are always weak Thus in a strict Nash equilibrium, all of the strategies are pure Nau: Game Theory 9 Finding Nash Equilibria The Battle of the Sexes has two pure-strategy Nash equilibria Generally it’s tricky to compute mixed-strategy equilibria But easy if we identify the support of the equilibrium strategies Assume both agents randomize, and that the husband’s mixed strategy sh is sh(Opera) = p; Husband Wife sh(Football) = 1 – p Expected utilities of the wife’s actions: uw(Football, sh) = 0p + 1(1 − p) Opera Football Opera 2, 1 0, 0 Football 0, 0 1, 2 uw(Opera, sh) = 2p If the wife mixes between her two actions, they must have the same expected utility If one of the actions had a better expected utility, she’d do better with a pure strategy that always used that action Thus 0p + 1(1 – p) = 2p, so p = 1/3 So the husband’s mixed strategy is sh(Opera) = 1/3; sh(Football) = 2/3 Nau: Game Theory 10 Finding Nash Equilibria A similar calculation shows that the wife’s mixed strategy sw is sw(Opera) = 2/3, sw(Football) = 1/3 Like all mixed-strategy equilibria, this is a weak Nash equilibrium Each agent has infinitely many Husband Wife other best-response strategies In this equilibrium, Opera Football Opera 2, 1 0, 0 Football 0, 0 1, 2 Probability 5/9 that payoff is 0 for both agents Probability 2/9 that wife gets 2 and husband gets 1 Probability 2/9 that husband gets 2 and wife gets 1 Expected utility for each agent is 2/3 Pareto-dominated by both of the pure-strategy equilibria In them, one agent gets 1 and the other gets 2 Nau: Game Theory 11 Finding Nash Equilibria Matching Pennies Heads Tails Easy to see that in this game, no pure strategy could be part of a Nash equilibrium in this game Heads 1,–1 –1, 1 –1, 1 1,–1 For each combination of pure strategies, one of the agents can do better by changing his/her strategy Tails • e.g., for (Heads,Heads), agent 2 can do better by switching to Tails Thus there isn’t a strict Nash equilibrium But again there’s a mixed-strategy equilibrium: (s,s), where s(Heads) = s(Tails) = ½ Nau: Game Theory 12 A Real-World Example Penalty kicks in soccer A kicker and a goalie in a penalty kick Kicker can kick left or right Goalie can jump to left or right Kicker scores iff he/she kicks to one side and goalie jumps to the other Analogy to Matching Pennies • If you use a pure strategy and the other agent uses his/her best response, the other agent will win • If you kick or jump in either direction with equal probability, the opponent can’t exploit your strategy Nau: Game Theory 13 Another Interpretation of Mixed Strategies Another interpretation of mixed strategies is that Each agent’s strategy is deterministic But each agent has uncertainty regarding the other’s strategy Agent i’s mixed strategy is everyone else’s assessment of how likely i is to play each pure strategy Example: In a series of soccer penalty kicks, the kicker could kick left or right in a deterministic pattern that the goalie thinks is random Nau: Game Theory 14 Another Real-World Example Road Networks Suppose that: 1,000 drivers wish to travel from S (start) to D (destination) Two possible paths: • S→A→D and S→B→D The road from S to A is long: t = 50 minutes • But it’s also very wide: t = 50 minutes, no matter how many drivers A t = cars/25 t = 50 D S t = 50 t = cars/25 B Same for road from B to D Road from A to E is shorter but is narrow • Time = (number of cars)/25 Nash equilibrium: • 500 cars go through A, 500 cars through B • Everyone’s time is 50 + 500/25 = 70 minutes • If a single driver changes to the other route › There now are 501 cars on that route, so his/her time goes up Nau: Game Theory 15 Braess’s Paradox Suppose we add a new road from B to A The road is so wide and so short that it takes 0 minutes to traverse it A Nash equilibrium: • All 1000 cars go S→B→A→D • Time for S→B is 1000/25 = 40 minutes t = cars/25 t = 50 S • Total time is 80 minutes t = cars/25 To see that this is an equilibrium: B • If driver goes S→A→D, his/her cost is 50 + 40 = 90 minutes D t=0 t = 50 • If driver goes S→B→D, his/her cost is 40 + 50 = 90 minutes • Both are dominated by S→B→A→D To see that it’s the only Nash equilibrium: • For every traffic pattern, S→B→A→D dominates S→A→D and S→B→D • Choose any traffic pattern, and compute the times a driver would get on all three routes Carelessly adding capacity can actually be hurtful! Nau: Game Theory 16 Braess’s Paradox in practice From an article about Seoul, South Korea: “The idea was sown in 1999,” Hwang says. “We had experienced a strange thing. We had three tunnels in the city and one needed to be shut down. Bizarrely, we found that that car volumes dropped. I thought this was odd. We discovered it was a case of ‘Braess paradox’, which says that by taking away space in an urban area you can actually increase the flow of traffic, and, by implication, by adding extra capacity to a road network you can reduce overall performance.” John Vidal, “Heart and soul of the city”, The Guardian, Nov. 1, 2006 http://www.guardian.co.uk/environment/2006/nov/01/society.travelsenvironmentalimpact Nau: Game Theory 17 Maximin and Minimax Strategies Let s be a strategy for agent i Then s’s worst-case payoff is the payoff if the other agents play a combination of strategies that is worst for agent i Agent i’s maximin value (or security level) is the best worst-case payoff of any combination of the other agents’ strategies, i.e., max min ui ( si ,S−i ) si S− i A maximin strategy for agent i is any strategy whose worst-case payoff is i’s maximin value, i.e., € argmax min ui ( si ,S−i ) si S− i The maximin strategy not necessarily unique, and it often is mixed € Nau: Game Theory 18 Example Battle of the Sexes Opera’s worst-case payoff is 0 Football’s worst-case payoff is 0 Consider the Nash equilibrium strategies we Husband Wife Opera Football Opera 2, 1 0, 0 Football 0, 0 1, 2 discussed earlier: The wife’s strategy: sw(Opera) = 2/3, sw(Football) = 1/3 • Recall that this strategy’s expected payoff is 2/3, regardless of what the husband does The husband’s strategy: sh(Opera) = 1/3, sh(Football) = 2/3 • Recall that this strategy’s expected payoff is 2/3, regardless of what the wife does These are also the wife’s and husband’s maximin strategies Nau: Game Theory 19 Comments If agent i plays a maximin strategy and the other agents play arbitrarily, i still receives an expected payoff of at least his/her maximin value So the maximin strategy is sensible for a conservative agent Maximize his/her expected utility without any assumptions about the others E.g., don’t assume— • that they’re rational • that they draw their action choices from known distributions Nau: Game Theory 20 Minimax Strategies The minimax strategy and minimax value are duals of their maximin counterparts In a 2-player game, a minimax strategy for agent 1 against agent 2 is a strategy s1* that minimizes agent 2’s maximum payoff s1* = arg min max u2 ( s1 , s2 ) s1 s2 i.e., suppose player 1 first chooses a strategy s1 and then player 2 can choose a best response. Then s1* is a strategy that minimizes the payoff of 2’s best response € Useful if you want to punish an opponent without regard for your own payoff (e.g., see repeated games) Agent 2’s minimax value is agent 2’s maximum payoff against s1* max u2 ( s1* , s2 ) = min max u2 ( s1 , s2 ) s2 s1 s2 Nau: Game Theory 21 € Minimax Strategies in n-Agent Games In n-agent games (n > 2), agent i usually can’t minimize agent j’s payoff by acting unilaterally But suppose all the agents “gang up” on agent j Let S−j* be a mixed-strategy profile that minimizes j’s maximum payoff, i.e., ( S−* j = arg min max u j s j , S− j S− j sj ) Then a minimax strategy for agent i ≠ j is i’s component of S-j* Agent j’s minimax value is j’s maximum payoff against S–j* € ( ) ( max u j s j , S−* j = min max u j s j , S− j sj S− j sj ) In two-player games, an agent’s minimax value = his/her maximin value For€n-player games, an agent’s maximin value ≤ his/her minimax value Nau: Game Theory 22 Minimax Theorem (von Neumann, 1928) Theorem. Let G be any two-person finite zero-sum game. Then there are a number VG called the value of G, and strategies s1* and s2* for agents 1 and 2, such that If agent 2 uses s2*, agent 1’s expected utility is ≤ VG, i.e., • maxs U(s,s2*) = VG If agent 1 uses s1*, agent 1’s expected utility is ≥ VG, i.e., • mint U(s1*,t) = VG How this relates to the last few slides: agent 1’s maximin value = agent 1’s minimax value = VG agent 2’s maximin value = agent 2’s minimax value = –VG si* = any minimax strategy for agent i = any maximin strategy for agent i (s1*, s2*) = a Nash equilibrium Nau: Game Theory 23 Example: Matching Pennies Heads Agent i’s mixed strategy is to display Heads Tails 1,–1 –1, 1 –1, 1 1,–1 heads with some probability pi The graph shows u1(p1, p2) for every possible p1, p2 Tails u2(p1, p2) = –u1(p1, p2) The Nash equilibrium is the saddle point, p1 = p2 = 0.5 The value of the game is 0 If agent i uses a strategy u1(p1, p2) with pi ≠ 0.5, The other agent has a “best response” strategy that will lower i’s expected utility below 0 p1 p2 Nau: Game Theory 24 Two-Finger Morra There are several versions of this game Here’s the one I’ll use: Each agent holds up 1 or 2 fingers If the total number of fingers is odd Agent 1 gets that many points If the total number of fingers is even Agent 2 gets that many points 1 Agent 1 has no dominant strategy Agent 2 plays 1 => agent 1’s best response is 2 Agent 2 plays 2 => agent 1’s best response is 1 Similarly, agent 2 has no dominant strategy Thus there’s no pure-strategy Nash equilibrium 2 1 –2, 2 3, –3 2 3, –3 –4, 4 Look for a mixed-strategy equilibrium Nau: Game Theory 25 Two-Finger Morra Suppose agent 2 plays 1 with probability p2 If agent 1 mixes between 1 and 2, they must have the same expected utility Agent 1 plays 1 => expected utility is –2p2 + 3(1−p2) = 3 – 5p2 Agent 1 plays 2 => expected utility is 3p2 – 4(1−p2) = 7p2 – 4 1 2 Thus 3 – 5p2 = 7p2 – 4, so p2 = 7/12 Agent 1’s expected utility is 3–5(7/12) = 1/12 Suppose agent 1 plays 1 with probability p1 1 –2, 2 3, –3 2 3, –3 –4, 4 If agent 2 mixes between 1 and 2, they must have the same expected utility Agent 2 plays 1 => agent 2’s expected utility is 2p1 – 3(1−p1) = 5p1 – 3 Agent 2 plays 2 => agent 2’s expected utility is –3p1 + 4(1−p1) = 4 – 7p1 Thus 5p1 – 3 = 4 – 7p1 , so p1 = 7/12 Agent 2’s expected utility is 5(7/12) – 3 = –1/12 Nau: Game Theory 26 Dominant Strategies Let si and si' be two strategies for agent i Intuitively, si dominates si' if s gives agent i a greater payoff than si’ for every strategy profile s−i of the remaining agents Mathematically, there are three gradations of dominance: 1. si strictly dominates sʹ′i if ui (si, s−i) > ui (sʹ′i, s−i) for all s−i ∈ S−i 2. si weakly dominates sʹ′i if ui (si, s−i) ≥ ui (sʹ′i, s−i) for all s−i ∈ S−i and ui (si, s−i ) > ui (sʹ′i, s−i ) for at least one s−i ∈ S−i 3. si very weakly dominates sʹ′i if ui (si, s−i ) ≥ ui (sʹ′i, s−i) for all s−i ∈ S−i Nau: Game Theory 27 Dominant Strategy Equilibria A strategy is strictly (resp., weakly, very weakly) dominant for an agent if it strictly (weakly, very weakly) dominates any other strategy for that agent It’s not hard to show that a strictly dominant strategy must be pure Obviously, a strategy profile (s1, . . . , sn) where every si is dominant for agent i (strictly, weakly, or very weakly) is a Nash equilibrium Such a strategy profile forms an equilibrium in strictly (weakly, very weakly) dominant strategies An equilibrium in strictly dominant strategies is necessarily the unique Nash equilibrium Nau: Game Theory 28 Examples C D 3, 3 0, 5 5, 0 1, 1 Example: the Prisoner’s Dilemma D is strongly dominant for agent 1 it strongly dominates C if agent 2 uses C it strongly dominates C if agent 2 uses D Similarly, D is strongly dominant for agent 2 So (D,D) is a Nash equilibrium in dominant strategies Ironically, of the pure strategy profiles, (D,D) is the only one that’s not Pareto optimal Nau: Game Theory 29 Example: Matching Pennies Matching Pennies Heads Heads isn’t strongly dominant for agent 1, because u1(Heads,Tails) < u1(Tails,Tails) Tails isn’t strongly dominant for agent 1, Heads Tails 1,–1 –1, 1 –1, 1 1,–1 because u1(Heads,Heads) > u1(Tails,Heads) Agent 1 doesn’t have a dominant strategy Tails No Nash equilibrium in dominant strategies Which Side of the Road No Nash equilibrium in dominant strategies Same kind of argument as above Nau: Game Theory 30 Elimination of Dominated Strategies A strategy si is strictly (weakly, very weakly) dominated for an agent i if some other strategy sʹ′i strictly (weakly, very weakly) dominates si A strictly dominated strategy can’t be a best response to any move So we can eliminate it (remove it from the payoff mtrix) Once a pure strategy is eliminated, another strategy may become dominated This elimination can be repeated Nau: Game Theory 31 Elimination of Dominated Strategies A pure strategy s may be dominated by a mixed strategy t without being dominated by any of the strategies in t’s support Example: the three games shown below In the first game, agent 2’s strategy R is dominated by L (and also by C) • Eliminate it, giving the second game In the second game, M is dominated by neither U nor D • But it’s dominated by the mixed strategy P(U) = 0.5, P(D) = 0.5 › (It wasn’t dominated before we removed R) • Removing it gives the third game (maximally reduced) Nau: Game Theory 32 Elimination of Dominated Strategies This gives a solution concept: The set of all strategy profiles that assign 0 probability to playing any action that would be removed through iterated removal of strictly dominated strategies A much weaker solution concept than Nash equilibrium The set of strategy profiles includes all the Nash equilibria But it includes many other mixed strategies • In some games, it’s equal to S (the set of all mixed strategies) Use this techniques to simplify finding Nash equilibria In the current example, go from a 3 × 3 to a 2 × 2 game Sometimes (e.g., Prisoner’s dilemma), end with a single cell • In this case, the single cell must be a Nash equilibrium Nau: Game Theory 33 Elimination of Dominated Strategies Each flavor of domination has advantages and disadvantages Strict domination yields a reduced game Get the same game, independent of the elimination order Weak domination can yield a smaller reduced game But which reduced game may depend on the elimination order Very weak domination can yield even smaller reduced games But again, which one might depend on elimination order And it doesn’t impose a strict order on strategies: • Can have two strategies that each very weakly dominates the other • For this reason, very weak domination usually is considered the least important Nau: Game Theory 34 Correlated Equilibrium Motivating example: the mixed-strategy equilibrium for the Battle of the Sexes P(both agents choose Opera) = (2/3)(1/3) = 2/9 Wife’s payoff is 2, husband’s payoff is 1 P(both agents choose Football) = (1/3)(2/3) = 2/9 Wife’s payoff is 1, husband’s payoff is 2 Husband Wife Opera Football Opera 2, 1 0, 0 Football 0, 0 1, 2 P(the agents choose different activities) = (2/3)(2/3) + (1/3)(1/3) = 5/9 In this case, payoff is 0 for both agents Each agent’s expected utility is 2(2/9) + 1(2/9) + 0(5/9) = 2/3 If the agents could coordinate their choices, they could eliminate the case where they choose different activities Each agent’s payoff would always be 1 or 2 Higher expected utility for both agents This leads to the notion of a correlated equilibrium Generalization of a Nash equilibrium Nau: Game Theory 35 General Setting Start out with an n-agent game G Let v1, . . . , vn be random variables called signals, one for each agent For each i, let Di be the domain (the set of possible values) of vi Let π be a joint distribution over v1, . . . , vn π(d1, …, dn) = P(v1=d1, …, vn=dn) Nature uses π to choose values d1, …, dn for v1, …, vi Nature tells each agent i the value of vi An agent can condition his/her action on the value of vi An agent’s strategy is a mapping σi : Di → Ai • σi is deterministic (i.e., a pure strategy) • Mixed strategies wouldn’t give any greater generality A strategy profile is (σ1, …, σn) The games we’ve been considering before now are a degenerate case in which the random variables v1, . . . , vn are independent Nau: Game Theory 36 Correlated Equilibrium Given an n-player game G Let v, π, and Σ be as defined on the previous slide Given a strategy profile (σ1, . . . , σn), the expected utility for agent i is ui (σ1,…,σn ) ≥ ∑ π (d ,…,d )u (σ (d ),…,σ (d )) 1 n i 1 1 n n d 1 ,…,d n (v, π, Σ) is a correlated equilibrium if for every agent i and every mapping σiʹ′ from Di to Ai , € ui(σ1, …, σi–1, σi, σi+1, …, σn) ≥ ui(σ1, …, σi–1, σiʹ′, σi+1, …, σn) Nau: Game Theory 37 Correlated Equilibrium Theorem. For every Nash equilibrium (s1, …, sn), there’s a corresponding correlated equilibrium (σ1, . . . , σn) “Corresponding” means they produce the same distribution on outcomes Sketch of Proof Let each vi = (v1, …, vn) be independently distributed a random variables, where • each vi has domain Ai and probability distribution si(ai ), for ai in Ai Hence the joint distribution is π(a1, …, an) = Πi si(ai ) Let each σi be the identity function When the agents play the strategy profile (σ1, …, σn), the distribution over outcomes is identical to that under (s1, …, sn) The vi’s are uncorrelated and no agent i can benefit by deviating from σi, so (σ1, . . . , σn) is a correlated equilibrium But not every correlated equilibrium is equivalent to a Nash equilibrium e.g., the correlated equilibrium for the Battle of the Sexes Nau: Game Theory 38 Convex Combinations Correlated equilibria can be combined to form new correlated equilibria First we need a definition Let x1, x2, …, xn be a finite number of points (vectors, scalars, …) A convex combination of x1, x2, …, xn is a linear combination of them in which all coefficients are non-negative and sum up to 1, i.e., a1x1 + a2x2 + … + anxn where every ai ≥ 0 and a1 + a2 + … + an = 1 Any convex combination of two points x1 and x2 lies on the straight line segment between x1 and x2 Any convex combination of x1, x2, …, xn is within the convex hull of x1, x2, …, xn Nau: Game Theory 39 Convex Combinations Theorem. Any convex combination of correlated equilibrium payoffs can be realized as the payoff profile of some correlated equilibrium To understand this, imagine a public random device that randomly selects which correlated equilibrium to play Each agent’s overall expected payoff is the weighted sum of the payoffs from the correlated equilibria that were combined No agent has an incentive to deviate regardless of the probabilities governing the first random device Nau: Game Theory 40 Summary I’ve discussed several solution concepts, and ways of finding them: Pareto optimality • Prisoner’s Dilemma, Which Side of the Road best responses and Nash equilibria • Battle of the Sexes, Matching Pennies real-world examples • soccer penalty kicks, road networks (Braess’s Paradox) maximin and minimax strategies, and the Minimax Theorem • Matching Pennies, Two-Finger Morra dominant strategies • Prisoner’s Dilemma, Which Side of the Road, Matching Pennies correlated equilibrium • Battle of the Sexes Nau: Game Theory 41