Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 7: Game theory David Aldous February 24, 2016 STAT 155 is an entire course on Game Theory. In this lecture we illustrate Game Theory by first focusing on one particular game for which we can get data. The game is relevant to one of the central ideas of game theory, Does the data – how people actually play the game – correspond roughly to what theory says? We will do some math calculations “because we can” – more details in write-up. Continuing to analyze the data, or doing a simulation study of more complex strategies, would be a nice course project. Also, finding and studying some other observable online game-theoretic game wold be a good project. There are many introductory textbooks and less technical accounts of game theory – see write-up. Here is a 1-slide overview. 1 2 Setting: players each separately choose from a menu of actions, and get a payoff depending (in a known way) on all players’ actions. Rock-paper-scissors illustrates that one should use a randomized strategy, and so we assume a player’s goal is to maximize their expected payoff. There is a complete theory of such two-person zero-sum games. 3 For other games, a fundamental concept is Nash equilibrium strategy: one such that, if all other players play that strategy, then you cannot do better by choosing some other strategy. This concept is motivated by the idea that, if players adjust their strategies in a selfish way, then strategies will typically converge to some Nash equilibrium. 4 More advanced theory is often devoted to settings where Nash equilibria are undesirable in some sense, as with Prisoners’ Dilemma, and to understanding why human behavior is not always selfish. A (slightly simplified) math description of the actual game we shall study. There are 5 items of somewhat different known values, say {8, 7, 6, 5, 4} dollars. There are 10 players. A player can make a sealed bid for (only) one item, during a window of time. During the time window, players see how many bids have already been placed on each item, but do not see the bid amounts. When time expires each item is awarded to the highest bidder on that item. We assume players are seeking to maximize their expected gain. So a player has to decide three things; when to bid, which item to bid on, and how much to bid. The game is called Dice City Roller on pogo.com. [show DCR in progress] We get data by screenshots at 14, 5 and 0 seconds before deadline, and then after winning bids are shown. Who are the actual players? [show profiles] We study math for simplified version without the time window – each player just places a sealed bid without knowledge of other players actions. In this case we can calculate the Nash equilibrium strategy explicitly – for any number of players, and any number and values of items. Start with the simplest setting: 2 players, 2 items of values 1 and b, where 0 < b ≤ 1. We have a little data from playing this in Lecture 1. [show data] [show bidding-analysis] and the winner of the Lecture 1 game is . . . . . . [2014] 24/35 students bid on the $1, 11/35 bid on the 50c bids on $1 bids on 50c 0 25 41 42 45 48 49 49 49 49 49 49 50 50 50 50 51 55 67 75 75 80 81 100 0 0 1 10 25 26 32 37 40 45 49 Consider a person P who bid 49. When we match with a random other person: chance 12/34 P gains 0 (other person bid more on the $1) chance 17/34 P gains 51 (= 100 - 49) (P won the bid) chance 5/34 we do coin-toss to decide winner: P gains 51/2. So P's expected gain = 29.25 ______________________________________________________________________ [2016] 18/35 students bid on the $1, 17/35 bid on the 50c bids on $1 bids on 50c 20 50 50 50 50 50 62 65 70 70 70 74 75 75 76 83 95 99 1 1 2 5 10 30 30 38 40 40 45 45 48 48 48 50 50 I will develop some math theory, first in this “two players, two items” setting. My point is to show that it’s not terribly complicated. See write-up for more theory. A player’s strategy is a pair of functions (F1 , Fb ): F1 (x) = P( bid an amount ≤ x on the first item), 0≤x ≤1 (1) Fb (y ) = P( bid an amount ≤ y on the second item), 0≤y ≤b (2) where F1 (1) + Fb (b) = 1. We can equivalently work with the associated densities f1 (x) = F10 (x), fb (y ) = Fb0 (y ). Suppose your opponent’s strategy is some function (f1 , fb ) and your strategy is some function (g1 , gb ). What is the formula for your expected gain? [do on board] (3) opponent’s strategy is (f1 , fb ), your strategy is (g1 , gb ). Your expected gain is Z 1 Z (1−x)g1 (x)[F1 (x)+Fb (b)] dx+ 0 b (b−y )gb (y )[Fb (y )+F1 (1)] dy . (4) 0 We need an obvious fact [picture on board]. Given a payoff function h(x) ≥ 0 with h∗ = maxx h(x), consider the R expected payoff h(x)g (x)dx when we choose x according to a probability density g . Then we get the maximum expected payoff if and only if h(x) = c for all x ∈ support(g ) h(x) ≤ c for all x 6∈ support(g ) for some c (which is in fact h∗ ). Now our expected gain (4) is of this form, thinking of (g1 , gb ) as a single probability density function. Apply the “obvious fact”: So given your opponent’s strategy (f1 , fb ), your expected gain is maximized by choosing a strategy (g1 , gb ) satisfying, for some constant c (1 − x)[F1 (x) + Fb (b)] = c on support(g1 ) ≤ c off support(g1 ) (b − y )[Fb (y ) + F1 (1)] = c on support(gb ) ≤ c off support(gb ) Now the definition of (f1 , fb ) being a Nash equilibrium strategy is precisely the assertion that (5 - 8) hold for (g1 , gb ) = (f1 , fb ). So now we have a set of equations for the NE strategy. (1 − x)[F1 (x) + Fb (b)] (b − y )[Fb (y ) + F1 (1)] = c on support(f1 ) (5) ≤ c off support(f1 ) (6) = c on support(fb ) (7) ≤ c off support(fb ) (8) with “boundary conditions” F1 (0) = Fb (0) = 0; F1 (1) + Fb (b) = 1. Note that in any game we can do some similar argument to get equations that a NE must satisfy. STAT 155, like most game theory, focusses on a discrete menu of actions – our example is continuous. Theory talks about existence and uniqueness of solutions, for general games. We can just go ahead and solve these particular equations. The write-up shows how to solve “as math” without thinking about the game interpretation. The answer appears as F1 (x) Fb (y ) = = b 1 1+b ( 1−x 1 b 1+b ( b−y − 1) on 0 ≤ x ≤ − 1) on 0 ≤ y ≤ 1 1+b 2 b 1+b . (9) (10) The corresponding densities are f1 (x) fb (y ) = b 1+b (1 = b 1+b (b − x)−2 on 0 ≤ x ≤ − y) −2 on 0 ≤ y ≤ [show figure] The expected gain for each player works out as E[gain] = b . 1+b 1 1+b (11) b2 1+b . (12) An important general principle If opponents play the NE strategy then any non-random choice of action you make in the support of the NE strategy will give you the same expected gain (which equals the expected gain if you play the random NE strategy), and any other choice will give you smaller expected gain. This “constant expected gain” principle is true because the NE expected gain is an average gain over the different choices in its support; if these gains were not constant then one would be larger than the NE gain, contradicting the definition of NE. In our game, if you bid x on item 1, where x is in the support 1 , then your chance of winning is (by calculation) 0 ≤ x ≤ 1+b b b −1 −1 = 1+b (1 − x) , so your expected gain is (1 − x) × 1+b (1 − x) the general principle says. b 1+b Later we will use the general principle to calculate the NE for arbitrary numbers of players and prizes. as Note that the gap between your maximum bid and the item’s value is the same for both items; 1 − 1/(1 + b) = b − b 2 /(1 + b) = b/(1 + b). This follows from the “constant expected gain” principle above; if you bid the maximum value in the support the you are certain to win the item, so your gain must be the same for both items. The same “equal gap principle” works by the same argument for general numbers of players and items (but is special to our particular game). [show figure – how close is data to theory?] Consider the general case of N ≥ 2 players and M ≥ 2 items of values b1 ≥ b2 ≥ . . . ≥ bM > 0. The bottom line (with a side condition I’ll explain) is the formula E ( gain to a player at NE) = c = !N−1 M −1 P i −1/(N−1) bi and the NE strategy is defined by the density functions fi (x) = 1 M −1 −N/(N−1) , P −1/(N−1) (bi − x) N −1 b j j 0 ≤ x ≤ bi − c for bids on prize i. The next slide shows the main steps in the calculation. (13) Writing out the expression for the expected gain when you bid xi on the i’th item, the “constant expected gain” property says (bi − x) (1 − (Fi (xi∗ ) − Fi (x)))N−1 = c, 0 ≤ x ≤ xi∗ := bi − c (14) where c = expected gain to a player P at NE. Because a strategy is a probability distribution we have i Fi (xi∗ ) = 1 and so X (1 − Fi (xi∗ )) = M − 1. i Now using (14) with x = 0 we have 1 − Fi (xi∗ ) = (c/bi )1/(N−1) and so X i identifying c. (c/bi )1/(N−1) = M − 1 (15) The data from playing the game once doesn’t fit the NE theory very well, but there’s no reason it should fit. Recall our comment For other games, a fundamental concept is Nash equilibrium strategy: one such that, if all other players play that strategy, then you cannot do better by choosing some other strategy. This concept is motivated by the idea that, if players adjust their strategies in a selfish way, then strategies will typically converge to some Nash equilibrium. The idea is that people “learn” after playing many times. In principle one could learn by recording your action and other players’ actions in past games, and then use whatever strategy (probability distribution over actions) would have worked best (for you) in the past. Eventually you will reach an “equilibrium” in the sense of not changing strategy any longer. [show class data and graph again] In the actual Dice City Roller game people do play many times. A little data analysis for the actual DCR game There’s a complication: win amounts are actually random. Theory says only EVs should matter, and one expects players would “learn by experience” the EVs. In fact one can calculate them – thanks YuFan Hu. [show screenshot] [explain 150 match on board] [show Picture-150-match] [show Picture-250-match] [explain 180 scratch on board] [show Picture-180-scratch] [show Picture-300-scratch] project; repeat math analysis taking into account minimum allowed bid. I don’t want to spend all this lecture on the specifics of this game, but the “time window” aspect makes it more complicated and more (mathematically) challenging. [say in words] project: can you find another “observable” online game with a clear “game theory” component? Conceptual digression. Prisoner’s dilemmas reminds us that a NE is usually not a social optimum strategy. Here’s another reminder, from Scott Aaronson’s blog. Why are even some affluent parts of the world running out of fresh water? Because if they weren’t, they’d keep watering their lawns until they were. Why does it cost so much to buy something to wear to a wedding? Because if it didn’t, the fashion industry would invent more extravagant “requirements” until it reached the limit of what people could afford. Again and again, I’ve undergone the humbling experience of first lamenting how badly something sucks, then only much later having the crucial insight that its not sucking wouldn’t have been a Nash equilibrium. Our other example is the Least Unique Positive Integer game. Each of N players chooses a number from 1, 2, 3, . . .. The winner is the person who chooses the smallest number that no-one else chooses. Quick to play with 5 - 40 people – need only pen and paper – organizer calls out to find winner. There might be no winner, but unlikely for large N. Consider random strategy p = (p1 , p2 , . . .). This is another game where we expect a unique NE and we could study data to see if real-world players adopt roughly the NE. I outline [next slide] an easy approximate analysis of the NE, for reasonably large N. We expect the support of the NE to be 1 ≤ i ≤ K for some K depending on N. If other players use p then Xi = number others choosing i ≈ Poisson(λi = (N − 1)pi ) The “constant expected gain” principle says that, whatever your choice of i in the support [1, K ], your chance of winning is c ≈ N −1 . Choosing i, you win if no-one else chooses i and there is no unique chooser of any j < i, giving approximate equations P(Xi = 0, Xj 6= 1 ∀j < i) = 1/N, 1 ≤ i ≤ K . For the left side there is a (complicated) formula in terms of p. Can solve numerically. In particular, for i = 1 we see exp(−(N − 1)p1 ) ≈ 1/N and so p1 ≈ Then pi decreases, slowly at first. log N N−1 . This game was played on a large scale, for large prizes, in Sweden, 7 times in 2007. Around 50,000 players. Analysis in paper Testing game theory in the field: Swedish LUPI lottery games. [show ostling-figure, next slide] Stopped after 7 weeks because it’s possible to “cheat” with a coalition of players. [explain how]