Download PPT - UNC Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

John Forbes Nash Jr. wikipedia , lookup

Turns, rounds and time-keeping systems in games wikipedia , lookup

Deathmatch wikipedia , lookup

The Evolution of Cooperation wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Minimax wikipedia , lookup

Prisoner's dilemma wikipedia , lookup

Evolutionary game theory wikipedia , lookup

Nash equilibrium wikipedia , lookup

Chicken (game) wikipedia , lookup

Transcript
Review: Game theory
Alice:
Testify
Alice:
Refuse
Bob:
Testify
-5,-5
-10,0
Bob:
Refuse
0,-10
-1,-1
• Dominant strategy
• Nash equilibrium
• Pareto optimal outcome
Game of Chicken
Player 1
S
Player 2
Straight
Chicken
Chicken
Straight
C
S -10, -10
-1, 1
C
0, 0
1, -1
• Is there a dominant strategy for either player?
• Is there a Nash equilibrium?
(Straight, chicken) or (chicken, straight)
• Anti-coordination game: it is mutually beneficial for the
two players to choose different strategies
– Model of escalated conflict in humans and animals
(hawk-dove game)
http://en.wikipedia.org/wiki/Game_of_chicken
Mixed strategy equilibria
Player 1
S
Player 2
Straight
Chicken
Chicken
Straight
C
S -10, -10
-1, 1
C
0, 0
1, -1
• Mixed strategy: a player chooses between the moves
according to a probability distribution
• Suppose each player chooses S with probability 1/10.
Is that a Nash equilibrium?
• Consider payoffs to P1 while keeping P2’s strategy fixed
–
–
–
–
The payoff of P1 choosing S is (1/10)(–10) + (9/10)1 = –1/10
The payoff of P1 choosing C is (1/10)(–1) + (9/10)0 = –1/10
Is there a different strategy that can improve P1’s payoff?
Similar reasoning applies to P2
Finding mixed strategy equilibria
P1: Choose S
with prob. p
P1: Choose C
with prob. 1-p
P2: Choose S
with prob. q
-10, -10
-1, 1
P2: Choose C
with prob. 1-q
1, -1
0, 0
• Expected payoffs for P1 given P2’s strategy:
P1 chooses S: q(–10) +(1–q)1 = –11q + 1
P1 chooses C: q(–1) + (1–q)0 = –q
• In order for P2’s strategy to be part of a Nash
equilibrium, P1 has to be indifferent between its two
actions:
–11q + 1 = –q or q = 1/10
Similarly, p = 1/10
Mixed strategy equilibria:
Another example
Wife:
Ballet
Wife:
Football
Husband:
Ballet
3, 2
0, 0
Husband:
Football
0, 0
2, 3
• Pure strategy equilibria:
– (Ballet, ballet) or (football, football)
Mixed strategy equilibria:
Another example
•
•
•
•
•
•
•
Wife:
Ballet w/ prob. p
Wife:
Football w/ prob. 1-p
Husband:
Ballet w/ prob. q
3, 2
0, 0
Husband: Football w/
prob 1-q
0, 0
2, 3
Payoff to wife assuming she chooses ballet:
3q
Payoff to wife assuming she chooses football: 2(1–q)
3q = 2(1–q) or q = 2/5
2p
Payoff to husband assuming he chooses ballet:
Payoff to husband assuming he chooses football: 3(1-p)
2p = 3(1–p) or p = 3/5
Mixed strategy equilibrium: wife picks ballet w/ probability 3/5 and
husband picks football with probability 3/5
Mixed strategy equilibria:
Another example
Wife:
Ballet w/ prob. p
Wife:
Football w/ prob. 1-p
Husband:
Ballet w/ prob. q
3, 2
0, 0
Husband: Football w/
prob 1-q
0, 0
2, 3
• Mixed strategy equilibrium: wife picks ballet w/ probability 3/5 and
husband picks football with probability 3/5
• How often do they end up in different places?
P(wife = ballet, husband = football or wife = football, husband = ballet)
= 3/5 * 3/5 + 2/5 * 2/5 = 13/25
• What is the expected payoff for each?
3/5 * 2/5 * 3 + 3/5 * 2/5 * 2 = 6/5
• What is the payoff for always going to the other’s preferred event?
Back to rock-paper-scissors
P1
• Zero-sum game: want to find
minimax solution for P1
P2
0,0
1,-1 -1,1
-1,1
0,0
1,-1
1,-1 -1,1
0,0
Back to rock-paper-scissors
• Zero-sum game: want to find
minimax solution for P1
• Let r, p, s = probability of P1
P2
playing rock, paper, scissors
• Let’s find the expected payoffs for P1
given different actions of P2:
P1
0
1
-1
-1
0
1
1
-1
0
P2 plays rock:
u = r(0) + p(1) + s(–1) = p – s
P2 plays paper:
u = r(–1) + p(0) + s(1) = s – r
P2 plays scissors: u = r(1) + p(–1) + s(0) = r – p
• P2 is trying to minimize P1’s utility, so we have
u  p – s; u  s – r; u  r – p
• To find r, p, s, maximize u subject to the above
constraints and r + p + s = 1
– Linear programming problem
– Solution is (1/3, 1/3, 1/3)
Computing Nash equilibria
• Any game with a finite set of actions has at least
one Nash equilibrium
• If a player has a dominant strategy, there exists a
Nash equilibrium in which the player plays that
strategy and the other player plays the best
response to that strategy
• If both players have strictly dominant strategies,
there exists a Nash equilibrium in which they play
those strategies
Computing Nash equilibria
• For a two-player zero-sum game, simple linear
programming problem
• For non-zero-sum games, the algorithm has worstcase running time that is exponential in the number
of actions
• For more than two players, and for sequential
games, things get pretty hairy
Nash equilibria and rational
decisions
• If a game has a unique Nash equilibrium, it will
be adopted if each player
–
–
–
–
is rational and the payoff matrix is accurate
doesn’t make mistakes in execution
is capable of computing the Nash equilibrium
believes that a deviation in strategy on their part will
not cause the other players to deviate
– there is common knowledge that all players meet
these conditions
http://en.wikipedia.org/wiki/Nash_equilibrium
Nash equilibria and rational
decisions
Do you have a dominant strategy?
yes
no
Play dominant strategy
Do you know what the opponent will do?
yes
Maximize utility
no
Is opponent rational?
yes
Can agree on a Nash
equilibrium?
yes
Play the equilibrium
strategy
no
Maximize worst-case
outcome
no
Maximize worst-case
outcome
More fun stuff: Ultimatum game
• Alice and Bob are given a sum of money S to divide
–
–
–
–
Alice picks A, the amount she wants to keep for herself
Bob picks B, the smallest amount of money he is willing to accept
If S – A  B, Alice gets A and Bob gets S – A
If S – A < B, both players get nothing
• What is the Nash equilibrium?
– Alice offers Bob the smallest amount of money he will accept:
S–A=B
– Alice offers Bob nothing and Bob is not willing to accept anything
less than the full amount: A = S, B = S (both players get nothing)
• How would humans behave in this game?
– If Bob perceives Alice’s offer as unfair, Bob will be likely to refuse
– Is this rational?
• Maybe Bob gets some positive utility for “punishing” Alice?
Repeated games
Cooperate
• What if the Prisoner’s Dilemma
is played for many rounds and
Cooperate
-1,-1
the players remember what
Defect
-10,0
happened in the previous rounds?
• What if the number of games is fixed and known in
advance to both players?
Defect
0,-10
-5,-5
– Then the equilibrium is still to defect
• If the number of games is random or unknown,
cooperation may become a equilibrium strategy
– Perpetual punishment: cooperate unless the other player has
ever defected
– Tit for tat: start by cooperating, repeat the other player’s previous
move for all subsequent rounds
• In order for these strategies to work, the players must
know that they have both adopted them
Some multi-player games
• The diner’s dilemma
– A group of people go out to eat and agree to split the bill equally.
Each has a choice of ordering a cheap dish or an expensive dish
(the utility of the expensive dish is higher than that of the cheap
dish, but not enough for you to want to pay the difference)
– Nash equilibrium is for everybody to get the expensive dish
• El Farol bar problem (W. Brian Arthur)
– If less than 60% of the town’s population go to the bar, they will
have a better time than if they stayed at home. If more than 60%
of the people go to the bar, the bar will be too crowded and they
will have a worse time than if they stayed at home
– Nash equilibrium must be a mixed strategy
Mechanism design
(inverse game theory)
• Assuming that agents pick rational strategies, how
should we design the game to achieve a socially
desirable outcome?
• We have multiple agents and a center that
collects their choices and determines the outcome
Auctions
• Goals
– Maximize revenue to the seller
– Efficiency: make sure the buyer who values the goods
the most gets them
– Minimize transaction costs for buyer and sellers
Ascending-bid auction
• What’s the optimal strategy for a buyer?
– Bid until the current bid value exceeds your private value
• Usually revenue-maximizing and efficient, unless
the reserve price is set too low or too high
• Disadvantages
– Collusion
– Lack of competition
– Has high communication costs
Sealed-bid auction
• Each buyer makes a single bid and communicates it to the
auctioneer, but not to the other bidders
– Simpler communication
– More complicated decision-making: the strategy of a buyer depends on
what they believe about the other buyers
– Not necessarily efficient
• Sealed-bid second-price auction: the winner pays the price
of the second-highest bid
–
–
–
–
–
Let V be your private value and B be the highest bid by any other buyer
If V > B, your optimal strategy is to bid above B – in particular, bid V
If V < B, your optimal strategy is to bid below B – in particular, bid V
Therefore, your dominant strategy is to bid V
This is a truth revealing mechanism
Dollar auction
• A dollar bill is being auctioned off. It goes to the highest
bidder, but the second-highest bidder also has to pay
–
–
–
–
–
Player 1 bids 1 cent
Player 2 bids 2 cents
…
Player 2 bids 98 cents
Player 1 bids 99 cents
• If Player 2 passes, he loses 98 cents, if he bids $1, he might still come out even
– So Player 2 bids $1
• Now, if Player 1 passes, he loses 99 cents, if he bids $1.01, he only loses 1 cent
– …
• What went wrong?
– When figuring out the expected utility of a bid, a rational player
should take into account the future course of the game
• How about Player 1 starts by bidding 99 cents?
Tragedy of the commons
• States want to set their policies for controlling emissions
– Each state can reduce their emissions at a cost of -10
or continue to pollute at a cost of -5
– If a state decides to pollute, -1 is added to the utility of every
other state
• What is the dominant strategy for each state?
– Continue to pollute
– Each state incurs cost of -5-49 = -54
– If they all decided to deal with emissions, they would incur a cost
of only -10 each
• Mechanism for fixing the problem:
– Tax each state by the total amount by which they reduce the
global utility (externality cost)
– This way, continuing to pollute would now cost -54
Game theory issues
• Is it applicable to real life?
– Humans are not always rational
– Utilities may not always be known
– Other assumptions made by the game-theoretic model
may not hold
– Political difficulties may prevent theoretically optimal
mechanisms from being implemented
• Could it be more applicable to AI than to real life?
– Computing equilibria in complicated games is difficult
– Relationship between Nash equilibrium and rational
decision making is subtle