Download Introduction to Natural Computation Lecture 10 Games Peter Lewis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Game mechanics wikipedia , lookup

Minimax wikipedia , lookup

Turns, rounds and time-keeping systems in games wikipedia , lookup

Deathmatch wikipedia , lookup

Nash equilibrium wikipedia , lookup

The Evolution of Cooperation wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Evolutionary game theory wikipedia , lookup

Prisoner's dilemma wikipedia , lookup

Chicken (game) wikipedia , lookup

Transcript
Introduction to Natural Computation
Lecture 10
Games
Peter Lewis
1 / 24
Overview of the Lecture
What is a game?
What is the theory of games?
What are strategies for playing a game and which strategies should we play?
Equilibria, dynamics and what happens when people play their best strategies.
Are the players really rational?
Repeated games.
Some applications of game theory.
2 / 24
All about games
What is a game?
Two people playing chess are playing a game.
Criminals and the police are playing a game.
Labour and the Conservatives are playing a game.
Stock market traders are playing a game.
A shopkeeper and his customers are playing a game.
Kruschev and Kennedy played a very dangerous game.
A game is being played whenever two or more individuals interact. Game theory is
concerned with how the outcomes of these interactions relate to the individuals’
preferences and the structure of the game.
3 / 24
What is a game?
The origins of game theory
People have always tried to predict what the outcome of various game-like situations
might be, and what they should do in order to achieve their outcomes.
But formal game theory was developed
by John Von Neumann and Oskar
Morgenstern and described in their
book Theory of Games and Economic
Behaviour
Yes, that John Von Neumann...
4 / 24
A formal game
So, what is a game?
According to Von Neumann and Morgenstern, a game can be completely described by:
The players of a game,
For every player, every opportunity they have to move,
What each player can do at each of their moves,
What each player knows for every move, and
The payoffs received by every player for every possible combination of moves.
Extensive form
Normal (strategic) form
Player 1
Usually used to denote sequential play
games.
Player 2
Red
Red
1,1
Blue
1,2
Blue
2,1
0.5 , 0.5
Usually used to denote simultaneous play
games.
5 / 24
Strategies for playing a game
So how does an individual play the game?
According to their strategy.
A player’s strategy is “a predetermined programme of play that tells her what
actions to take in response to every possible strategy other players might use”.
Defines a player’s entire behaviour.
Example: Noughts and Crosses / Tic Tac Toe
As a player, in 3x3 Noughts and Crosses we
have 9! = 362, 880 possible games (leaf nodes)
to consider.
Discounting unreachable subtrees, we have
255, 168 possible games remaining.
If our player is clever enough to understand
rotations and reflections, we can reduce this to
only 26, 830!
Example: Chess
So in case you think we’re about to use game theory to solve chess... There are in the
order of 10123 possible unique games of chess for a complete strategy to consider.
6 / 24
Simpler games
Let’s think of some simpler games...
Prisoner’s Dilemma
Two people have been
arrested for robbing a
bank and are in separate
isolation cells. Both care
much more about their
own freedom than about
the welfare of their
accomplice. They are
each presented with a
choice by the prosecutor:
“You may either confess or remain silent. If you
confess and your accomplice remains silent I will
drop all charges against you and and use your
testimony to ensure that your accomplice gets 10
years in prison. If your accomplice confesses and
you remain silent, they will go free while you get
the 10 years. If you both confess you both go to
prison, but I’ll make sure that you both get out in
5. If you both remain silent, you’ll both get 1 year
sentences on firearms possession charges.”
What should they do?
Person 1
Person 2
Stay silent
Stay silent
-1
Confess
0
Confess
-10
-5
7 / 24
The Prisoner’s Dilemma
More generally...
The payoff matrix for the two-player prisoner’s dilemma game
Player 2
Cooperate
Defect
R
T
Cooperate
R
Player 1
S
S
P
Defect
T
P
The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2.
From each prisoner’s perspective:
If he were to cooperate, then I should defect and get away free.
And if he were to defect, then I should also defect, as I’d halve my sentence!
So for both prisoners, it is always rational to defect.
Defection is a dominant strategy. It is always the best strategy to play and neither
player will deviate from it. Mutual defection is a Nash equilibrium.
8 / 24
Nash equilibria
A Nash equilibrium, named after John Nash who
proposed it, is:
a solution concept of a game involving two or
more players,
in which each player is assumed to know the
equilibrium strategies of the other players,
and no player has anything to gain by
changing only his or her own strategy
unilaterally.
“We cannot predict the result of the choices of
multiple decision makers if we analyze those
decisions in isolation. Instead, we must ask what
each player would do, taking into account the
decision-making of the others.”
Verifying if strategy profile is a Nash equilibrium
Each player asks themself “knowing the strategies of the other players, and treating
those strategies as set in stone, can I benefit by changing my strategy?”
If everyone answers no, it is a Nash equilibrium.
If anyone answers yes, it is not a Nash equilibrium.
9 / 24
The Final Problem
In The Final Problem, Professor
Moriarty is pursuing Sherlock
Holmes, who leaves from London
on the train.
Holmes must choose whether to
alight at Newhaven or Canterbury.
Moriarty must also choose a
station, at which to lie and wait,
in order to catch Holmes.
London Victoria
Canterbury
Newhaven
If they both select the same station, then Moriarty will kill Holmes and get away with
his crimes. If they choose different stations, then Holmes will escape, and provide
evidence that will convict Moriarty.
10 / 24
The Final Problem
London Victoria
Canterbury
Newhaven
What might the payoff matrix look like?
Holmes
Moriarty
Canterbury
Canterbury
-1 , 1
Newhaven
1, -1
Newhaven
1 , -1
-1 , 1
It is clear that for both players neither choice dominates the other, and each player
may as well flip a coin, accepting a 50 / 50 chance of winning.
11 / 24
Mixed strategies
Surely flipping a coin is not a useful strategy to follow?
Well, it turns out that it is!
In fact, both players choosing either station with probability 0.5
is a Nash equilibrium.
This is an example of a mixed strategy, as opposed to pure
strategies which make a single choice with probability 1.
The equilibrium is known as a mixed strategy Nash equilibrium.
Pure strategy Nash equilibria are a special case of these.
This particular game is known as Matching Pennies.
Mixed strategies
A mixed strategy defines a probability distribution over a set of possible actions.
When the player plays, he chooses an action according to this distribution.
Mixed strategy Nash equilibria
When no player can improve his payoff by unilaterally changing his probability
distribution.
Nash showed that for any game with a finite set of actions, at least one mixed
strategy Nash equilibrium must exist.
12 / 24
Another common example
The Driving Game
In the Driving Game two players are driving in opposite
directions along the same road.
They have to choose on which side of the road to drive.
If they both choose the same side, they both get to
their destinations.
If they both choose opposite sides, they crash.
We assume that they don’t have enough notice to
switch sides upon seeing the other car.
Possible payoff matrix
Player 1
Left
Right
Player 2
Left
10
-1,000,000
Right
-1,000,000
10
There are two obvious Nash
equilibria in this game, which we
can express as probability
distributions over the strategy
space (Left, Right)
(1.0, 0.0) and (1.0, 0.0)
(0.0, 1.0) and (0.0, 1.0)
But there’s also one more: (0.5, 0.5) and (0.5, 0.5).
13 / 24
Stupid questions?
Possible payoff matrix for the Driving Game
Player 1
Left
Right
Player 2
Left
10
-1,000,000
Right
-1,000,000
10
Three Nash equilibria:
(1.0, 0.0) and (1.0, 0.0)
(0.0, 1.0) and (0.0, 1.0)
(0.5, 0.5) and (0.5, 0.5)
So why don’t we see the third one in real life?
Well, at the third Nash equilibrium, the expected payoffs are not good!
Not all Nash equilibria are the same; some may appear non-rational from an
external perspective.
A Nash equilibrium is not guaranteed to be Pareto optimal.
Cooperative or multilateral decision making can allow players to move from one
Nash equilibrium to another one.
Other forces, capable of changing multiple players’ strategies at the same time,
can achieve the same thing.
This might be a move to a worse equilibrium.
14 / 24
On rationality
Rock Paper Scissors
Rock Paper Scissors can be thought of as a
three choice extension of Matching Pennies.
There is no dominant pure strategy.
The Nash equilibrium is when both players’
strategies are ( 13 , 13 , 31 ).
So why all the fuss?
Tim Conrad won $7,000 for winning the Rock
Paper Scissors World Championships!
Humans are not particularly good at keeping
to the equilibrium strategy.
Any deviation on the part of your opponent
can be exploited.
Psychology plays a large part.
In many games, human are not rational
anyway.
15 / 24
On Rationality
The Ultimatum Game
Two players interact to decide how to divide a sum of money that is given to
them by a third party.
First, player 1 proposes how to divide the sum between the two players,
Subsequently, player 2 can either accept or reject this proposal.
If the second player accepts, the money is split according to the proposal.
If the second player rejects, neither player receives anything.
The game is played only once so that reciprocation is not an issue.
What should the players do?
Rationally, the second player should accept any proposal which offers her a
positive amount of money.
Even if the proposal is to offer nothing, at this point the second player is only
indifferent.
So, the first player has nothing to loose by offering the smallest amount which
convinces the second player to accept.
An offer of the smallest possible division of the money is the Nash equilibrium.
16 / 24
On Rationality
In reality
Would you really dare to offer 1p out of £100 to the second player?
In one set of experiments, 43% of those playing first offered an even split.
Those playing second rejected on average offers falling below
amount.
Over half of the offers below
1
4
1
3
of the total
were rejected!
Why?
Is homo economicus, the model of a human as rational economic actors false?
Does the payoff not take into account a positive psychological reaction to offering
more money?
Are the second players attempting to punish those going first?
Does it illustrate a human unwillingness to accept injustice and social inequality?
Are empathy and perspective driving the generosity?
Is this kin selection in action?
Experimentally...
Externally administered oxytocin, used to increase levels of emotion in the subject,
increased generous offers by 80% relative to a placebo, though it did not affect the
minimum acceptance threshold.
17 / 24
Repeated Games
One shot and repeated games
So far, we have just considered a game which is played only once. These are
known as one shot games.
In many situations of course, we interact multiple times with the same individual.
When a game is played multiple times by the same players, it is called a repeated
game.
The payoff in a repeated game is just the sum of the payoffs from each round.
18 / 24
Iterated Prisoner’s Dilemma
A very commonly studied repeated game is the Iterated Prisoner’s Dilemma.
Recall the Prisoner’s Dilemma payoff matrix:
Player 2
Cooperate
Defect
R
T
Cooperate
R
Player 1
S
S
P
Defect
T
P
The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2.
The Nash equilibrium in a one shot game was defect, defect.
What happens if we play n games with the same opponent?
19 / 24
Iterated Prisoner’s Dilemma
Player 2
Cooperate
Defect
R
T
Cooperate
R
Player 1
S
S
P
Defect
T
P
The values S,P ,R,T must satisfy T > R > P > S and R > (S + T )/2.
If we both defect for n rounds, then we each get a payoff of P n.
But if we were to both cooperate, we’d get Rn each!
Except I think I might defect in the nth round, since then I’d get (n − 1)R + T .
But my opponent will think the same thing so we’ll both be left with
(n − 1)R + P .
But I can’t cooperate in the last round, since if he defects I’ll only get
R(n − 1) + S, so I must defect in the nth round.
The problem is that this logic can now be applied to the n − 1th round, and the
n − 2th round, and so on...
Until we’re left defecting in round 1 and throughout the whole game.
20 / 24
Learning to play games
Okay, so the Nash equilibrium is for both players to always defect.
But the IPD models real world scenarios where people do cooperate!
And their payoffs are higher as a result.
Is this human irrationality again?
Mutual defection throughout the game is the Nash equilibrium, but it is not a
dominant strategy. I.e. it is not the best response to every other strategy that
your opponent could play.
Furthermore, for the Iterated Prisoner’s Dilemma there is no single best strategy
against all possible opponents.
Can we develop strategies which can perform well against a good range of other
strategies?
We could search the strategy space...but it’s very big!
For an n round game there are 22n−1 possible strategies!
Learning game playing strategies
Strategies can be encoded in many ways: neural networks, bitstrings, finite state
machines etc.
Learning is typically done through the (co)evolution of a population of strategies.
Surprisingly, strategies can emerge which bring about cooperation through the
threat of retaliation if the opponent defects (e.g. tit for tat).
21 / 24
So why is all this interesting?
Game theory has been used to try to understand:
Pricing and the formation of cartels in business,
Why people vote in certain ways,
Evolutionary dynamics in populations of animals,
How to maintain biodiversity,
Why humans appear to behave altruistically,
How to win competitions of Rock Paper Scissors,
Bacterial strain diversity,
How to allocate resources in computer networks,
Why countries spend billions on nuclear weapons and (almost) never use them.
And it’s also the basis of a lot of economics-inspired computation.
22 / 24
Summary
We have learnt:
What formal games are and how they can be described. We looked at several
examples, including particularly the Prisoner’s Dilemma.
That pure and mixed strategies define how a player plays a game. Some games
have a dominant strategy, i.e. one which is always best.
How Nash equilibria describe a certain type of “solution” for a game, where no
player can unilaterally improve his payoff.
That there are pure strategy Nash equilibria and mixed strategy Nash equilibria,
and that all games have at least one (mixed strategy) Nash equilibrium.
That Nash equilibria can lead to either good or bad outcomes for the players!
That in many (especially repeated) games, such as the Iterated Prisoner’s
Dilemma, there is often no dominant strategy.
That in these cases, we can learn high performing strategies.
23 / 24
Further reading
Stanford Encyclopedia of Philosophy. Game Theory; 2010.
http://plato.stanford.edu/entries/game-theory/.
Binmore K.
Game Theory: A Very Short Introduction.
Oxford University Press; 2007.
Kendall G, Yao X, Chong SY.
The Iterated Prisoners’ Dilemma: 20 Years On.
World Scientific; 2007.
24 / 24