Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Game mechanics wikipedia , lookup
Turns, rounds and time-keeping systems in games wikipedia , lookup
Nash equilibrium wikipedia , lookup
The Evolution of Cooperation wikipedia , lookup
Artificial intelligence in video games wikipedia , lookup
Evolutionary game theory wikipedia , lookup
Introduction to Natural Computation Lecture 10 Games Peter Lewis 1 / 24 Overview of the Lecture What is a game? What is the theory of games? What are strategies for playing a game and which strategies should we play? Equilibria, dynamics and what happens when people play their best strategies. Are the players really rational? Repeated games. Some applications of game theory. 2 / 24 All about games What is a game? Two people playing chess are playing a game. Criminals and the police are playing a game. Labour and the Conservatives are playing a game. Stock market traders are playing a game. A shopkeeper and his customers are playing a game. Kruschev and Kennedy played a very dangerous game. A game is being played whenever two or more individuals interact. Game theory is concerned with how the outcomes of these interactions relate to the individuals’ preferences and the structure of the game. 3 / 24 What is a game? The origins of game theory People have always tried to predict what the outcome of various game-like situations might be, and what they should do in order to achieve their outcomes. But formal game theory was developed by John Von Neumann and Oskar Morgenstern and described in their book Theory of Games and Economic Behaviour Yes, that John Von Neumann... 4 / 24 A formal game So, what is a game? According to Von Neumann and Morgenstern, a game can be completely described by: The players of a game, For every player, every opportunity they have to move, What each player can do at each of their moves, What each player knows for every move, and The payoffs received by every player for every possible combination of moves. Extensive form Normal (strategic) form Player 1 Usually used to denote sequential play games. Player 2 Red Red 1,1 Blue 1,2 Blue 2,1 0.5 , 0.5 Usually used to denote simultaneous play games. 5 / 24 Strategies for playing a game So how does an individual play the game? According to their strategy. A player’s strategy is “a predetermined programme of play that tells her what actions to take in response to every possible strategy other players might use”. Defines a player’s entire behaviour. Example: Noughts and Crosses / Tic Tac Toe As a player, in 3x3 Noughts and Crosses we have 9! = 362, 880 possible games (leaf nodes) to consider. Discounting unreachable subtrees, we have 255, 168 possible games remaining. If our player is clever enough to understand rotations and reflections, we can reduce this to only 26, 830! Example: Chess So in case you think we’re about to use game theory to solve chess... There are in the order of 10123 possible unique games of chess for a complete strategy to consider. 6 / 24 Simpler games Let’s think of some simpler games... Prisoner’s Dilemma Two people have been arrested for robbing a bank and are in separate isolation cells. Both care much more about their own freedom than about the welfare of their accomplice. They are each presented with a choice by the prosecutor: “You may either confess or remain silent. If you confess and your accomplice remains silent I will drop all charges against you and and use your testimony to ensure that your accomplice gets 10 years in prison. If your accomplice confesses and you remain silent, they will go free while you get the 10 years. If you both confess you both go to prison, but I’ll make sure that you both get out in 5. If you both remain silent, you’ll both get 1 year sentences on firearms possession charges.” What should they do? Person 1 Person 2 Stay silent Stay silent -1 Confess 0 Confess -10 -5 7 / 24 The Prisoner’s Dilemma More generally... The payoff matrix for the two-player prisoner’s dilemma game Player 2 Cooperate Defect R T Cooperate R Player 1 S S P Defect T P The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2. From each prisoner’s perspective: If he were to cooperate, then I should defect and get away free. And if he were to defect, then I should also defect, as I’d halve my sentence! So for both prisoners, it is always rational to defect. Defection is a dominant strategy. It is always the best strategy to play and neither player will deviate from it. Mutual defection is a Nash equilibrium. 8 / 24 Nash equilibria A Nash equilibrium, named after John Nash who proposed it, is: a solution concept of a game involving two or more players, in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only his or her own strategy unilaterally. “We cannot predict the result of the choices of multiple decision makers if we analyze those decisions in isolation. Instead, we must ask what each player would do, taking into account the decision-making of the others.” Verifying if strategy profile is a Nash equilibrium Each player asks themself “knowing the strategies of the other players, and treating those strategies as set in stone, can I benefit by changing my strategy?” If everyone answers no, it is a Nash equilibrium. If anyone answers yes, it is not a Nash equilibrium. 9 / 24 The Final Problem In The Final Problem, Professor Moriarty is pursuing Sherlock Holmes, who leaves from London on the train. Holmes must choose whether to alight at Newhaven or Canterbury. Moriarty must also choose a station, at which to lie and wait, in order to catch Holmes. London Victoria Canterbury Newhaven If they both select the same station, then Moriarty will kill Holmes and get away with his crimes. If they choose different stations, then Holmes will escape, and provide evidence that will convict Moriarty. 10 / 24 The Final Problem London Victoria Canterbury Newhaven What might the payoff matrix look like? Holmes Moriarty Canterbury Canterbury -1 , 1 Newhaven 1, -1 Newhaven 1 , -1 -1 , 1 It is clear that for both players neither choice dominates the other, and each player may as well flip a coin, accepting a 50 / 50 chance of winning. 11 / 24 Mixed strategies Surely flipping a coin is not a useful strategy to follow? Well, it turns out that it is! In fact, both players choosing either station with probability 0.5 is a Nash equilibrium. This is an example of a mixed strategy, as opposed to pure strategies which make a single choice with probability 1. The equilibrium is known as a mixed strategy Nash equilibrium. Pure strategy Nash equilibria are a special case of these. This particular game is known as Matching Pennies. Mixed strategies A mixed strategy defines a probability distribution over a set of possible actions. When the player plays, he chooses an action according to this distribution. Mixed strategy Nash equilibria When no player can improve his payoff by unilaterally changing his probability distribution. Nash showed that for any game with a finite set of actions, at least one mixed strategy Nash equilibrium must exist. 12 / 24 Another common example The Driving Game In the Driving Game two players are driving in opposite directions along the same road. They have to choose on which side of the road to drive. If they both choose the same side, they both get to their destinations. If they both choose opposite sides, they crash. We assume that they don’t have enough notice to switch sides upon seeing the other car. Possible payoff matrix Player 1 Left Right Player 2 Left 10 -1,000,000 Right -1,000,000 10 There are two obvious Nash equilibria in this game, which we can express as probability distributions over the strategy space (Left, Right) (1.0, 0.0) and (1.0, 0.0) (0.0, 1.0) and (0.0, 1.0) But there’s also one more: (0.5, 0.5) and (0.5, 0.5). 13 / 24 Stupid questions? Possible payoff matrix for the Driving Game Player 1 Left Right Player 2 Left 10 -1,000,000 Right -1,000,000 10 Three Nash equilibria: (1.0, 0.0) and (1.0, 0.0) (0.0, 1.0) and (0.0, 1.0) (0.5, 0.5) and (0.5, 0.5) So why don’t we see the third one in real life? Well, at the third Nash equilibrium, the expected payoffs are not good! Not all Nash equilibria are the same; some may appear non-rational from an external perspective. A Nash equilibrium is not guaranteed to be Pareto optimal. Cooperative or multilateral decision making can allow players to move from one Nash equilibrium to another one. Other forces, capable of changing multiple players’ strategies at the same time, can achieve the same thing. This might be a move to a worse equilibrium. 14 / 24 On rationality Rock Paper Scissors Rock Paper Scissors can be thought of as a three choice extension of Matching Pennies. There is no dominant pure strategy. The Nash equilibrium is when both players’ strategies are ( 13 , 13 , 31 ). So why all the fuss? Tim Conrad won $7,000 for winning the Rock Paper Scissors World Championships! Humans are not particularly good at keeping to the equilibrium strategy. Any deviation on the part of your opponent can be exploited. Psychology plays a large part. In many games, human are not rational anyway. 15 / 24 On Rationality The Ultimatum Game Two players interact to decide how to divide a sum of money that is given to them by a third party. First, player 1 proposes how to divide the sum between the two players, Subsequently, player 2 can either accept or reject this proposal. If the second player accepts, the money is split according to the proposal. If the second player rejects, neither player receives anything. The game is played only once so that reciprocation is not an issue. What should the players do? Rationally, the second player should accept any proposal which offers her a positive amount of money. Even if the proposal is to offer nothing, at this point the second player is only indifferent. So, the first player has nothing to loose by offering the smallest amount which convinces the second player to accept. An offer of the smallest possible division of the money is the Nash equilibrium. 16 / 24 On Rationality In reality Would you really dare to offer 1p out of £100 to the second player? In one set of experiments, 43% of those playing first offered an even split. Those playing second rejected on average offers falling below amount. Over half of the offers below 1 4 1 3 of the total were rejected! Why? Is homo economicus, the model of a human as rational economic actors false? Does the payoff not take into account a positive psychological reaction to offering more money? Are the second players attempting to punish those going first? Does it illustrate a human unwillingness to accept injustice and social inequality? Are empathy and perspective driving the generosity? Is this kin selection in action? Experimentally... Externally administered oxytocin, used to increase levels of emotion in the subject, increased generous offers by 80% relative to a placebo, though it did not affect the minimum acceptance threshold. 17 / 24 Repeated Games One shot and repeated games So far, we have just considered a game which is played only once. These are known as one shot games. In many situations of course, we interact multiple times with the same individual. When a game is played multiple times by the same players, it is called a repeated game. The payoff in a repeated game is just the sum of the payoffs from each round. 18 / 24 Iterated Prisoner’s Dilemma A very commonly studied repeated game is the Iterated Prisoner’s Dilemma. Recall the Prisoner’s Dilemma payoff matrix: Player 2 Cooperate Defect R T Cooperate R Player 1 S S P Defect T P The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2. The Nash equilibrium in a one shot game was defect, defect. What happens if we play n games with the same opponent? 19 / 24 Iterated Prisoner’s Dilemma Player 2 Cooperate Defect R T Cooperate R Player 1 S S P Defect T P The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2. If we both defect for n rounds, then we each get a payoff of P n. But if we were to both cooperate, we’d get Rn each! Except I think I might defect in the nth round, since then I’d get (n − 1)R + T . But my opponent will think the same thing so we’ll both be left with (n − 1)R + P . But I can’t cooperate in the last round, since if he defects I’ll only get R(n − 1) + S, so I must defect in the nth round. The problem is that this logic can now be applied to the n − 1th round, and the n − 2th round, and so on... Until we’re left defecting in round 1 and throughout the whole game. 20 / 24 Learning to play games Okay, so the Nash equilibrium is for both players to always defect. But the IPD models real world scenarios where people do cooperate! And their payoffs are higher as a result. Is this human irrationality again? Mutual defection throughout the game is the Nash equilibrium, but it is not a dominant strategy. I.e. it is not the best response to every other strategy that your opponent could play. Furthermore, for the Iterated Prisoner’s Dilemma there is no single best strategy against all possible opponents. Can we develop strategies which can perform well against a good range of other strategies? We could search the strategy space...but it’s very big! For an n round game there are 22n−1 possible strategies! Learning game playing strategies Strategies can be encoded in many ways: neural networks, bitstrings, finite state machines etc. Learning is typically done through the (co)evolution of a population of strategies. Surprisingly, strategies can emerge which bring about cooperation through the threat of retaliation if the opponent defects (e.g. tit for tat). 21 / 24 So why is all this interesting? Game theory has been used to try to understand: Pricing and the formation of cartels in business, Why people vote in certain ways, Evolutionary dynamics in populations of animals, How to maintain biodiversity, Why humans appear to behave altruistically, How to win competitions of Rock Paper Scissors, Bacterial strain diversity, How to allocate resources in computer networks, Why countries spend billions on nuclear weapons and (almost) never use them. And it’s also the basis of a lot of economics-inspired computation. 22 / 24 Summary We have learnt: What formal games are and how they can be described. We looked at several examples, including particularly the Prisoner’s Dilemma. That pure and mixed strategies define how a player plays a game. Some games have a dominant strategy, i.e. one which is always best. How Nash equilibria describe a certain type of “solution” for a game, where no player can unilaterally improve his payoff. That there are pure strategy Nash equilibria and mixed strategy Nash equilibria, and that all games have at least one (mixed strategy) Nash equilibrium. That Nash equilibria can lead to either good or bad outcomes for the players! That in many (especially repeated) games, such as the Iterated Prisoner’s Dilemma, there is often no dominant strategy. That in these cases, we can learn high performing strategies. 23 / 24 Further reading Stanford Encyclopedia of Philosophy. Game Theory; 2010. http://plato.stanford.edu/entries/game-theory/. Binmore K. Game Theory: A Very Short Introduction. Oxford University Press; 2007. Kendall G, Yao X, Chong SY. The Iterated Prisoners’ Dilemma: 20 Years On. World Scientific; 2007. 24 / 24