Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of artificial intelligence wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Intelligence explosion wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

The Talos Principle wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Transcript
Poker: Opponent Modelling
Early AI work on poker used simplified variants of poker.
More recently attention has focused mainly on “Limit Texas Hold’em”, in both
its “heads-up” form (only two players) and its many-player form (often 10).
Texas Hold’em is a popular form of poker in the USA.
As in all forms of poker, betting is an essential element.
Texas Hold’em offers four opportunities per hand for a round of betting.
In Limit Texas Hold’em there are two sizes of bet increment:
• the small bet - say $2
• the large bet - say $4
In No-Limit Texas Hold’em, players may bet any amount up to the current size of
the pot.
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
1
The structure of a hand of Texas Hold’em
A hand (if played to the bitter end) proceeds through nine stages:
Dealer gives each player two cards - “hole cards” face-down, player may see only his own cards
Round of (small) betting, started by the “blind”
Dealer lays three cards face-up
Round of (small) betting
Dealer lays a fourth card face-up
Round of (big) betting
Dealer lays a fifth card face-up
Round of (big) betting
Players still in the game show their cards to determine
the winner
“Preflop”
“Flop”
“Turn”
“River”
“Showdown”
The winner is the player who makes the best 5-card poker hand using a
combination of his hole cards and the community cards (the board).
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
2
Decisions to be made
In a round of betting, players have to choose one of five actions repeatedly,
starting off with the player to dealer’s left and proceeding clockwise:
Bet
If nobody has yet bet in the current round, a player may add the appropriatesize (small or large) bet to the pot.
Check
If nobody has yet bet in the current round, a player may do nothing.
Call
If someone has put more into the pot in the current round, a player may add
just enough to make their own contribution equal.
Raise
If someone has put more into the pot in the current round, a player may add
enough to make their own contribution equal and then add the appropriate
(small or large) bet on top. In Limit version, max 3 raises per round.
Fold
A player may withdraw from the hand, forfeiting any bets and raises already
put in the pot, and excluding themselves from further betting.
Limit games require no decision about the amount of bets and raises.
No-limit games require more complex reasoning because bets may vary in size.
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
3
Betting based on probabilities
One way to play poker is to use probabilities:
•
•
•
•
Given your own known hole cards,
and the community cards that are on show,
for each possible combination of community cards yet to appear,
how likely is your hand to be better than any other player’s hand?
Compare this to the pot odds - the ratio
the cost of making a bet/call
the size of the pot
• If the comparison is very favourable, bet or raise;
• if merely favourable, check or call;
• if not, check if possible otherwise fold
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
4
Predictability is bad
Basing your behaviour on the probabilities like this is a poor strategy.
Other players will
• observe the cards you reveal at the showdowns,
• learn about your conservative style of play,
• learn about your assessment of winning chances,
• interpret your betting behaviour as indicative of the strength of your hand,
• and use this to beat you over the course of many hands.
Good poker players
1. observe the decisions of their opponents and gather what evidence they can
2. base their decisions on models of their opponents, exploiting any weaknesses
they detect
3. strive to frustrate the formation of accurate models of themselves, by
bluffing, and by consciously, deliberately changing their own style
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
5
Poker as AI Testbed domain
Poker, like several other games, features
•Competing agents
•Chance
•Finite set of choices
•Large game tree
In addition, Poker has
•Risk assessment
•Deception
In many other games, there is little to be gained by modelling opponents.
Rudimentary models, like “contempt factor”, or no model at all, are common.
In poker, modelling opponents - and awareness of their trying to model you - is
essential to good play.
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
6
Bayesian Network approach to modelling
By training over many selfplayed hands, CPP (conditional
probability table) can be built up,
Then in real play, knowing all
influences upon “opponent
action” except “opponent current
hand”, can draw conclusions
about “opponent current hand”.
But CPP at ~200k entries cannot
reasonably be modified over a
game of ~100 hands.
25
4
25
10
8
(Boulton)
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
7
Classification of hands
At the outset, the two hole cards of an opponent player may be any two of the 50
cards you don’t have. 50x49/2=1225 combinations if you enumerate them.
Sufficient to distinguish 169 qualitatively different hands:
13 possible pairs - AA KK QQ … 22
78 pairings of cards of the same suit - AKs, AQs, AJs, A10s, A9s, … 42s, 32s
78 pairings of cards of different suits - AK, AQ, AJ, … 32
Collapsing still further, to 25 or so classes, loses some information but facilitates
learning of statistics.
Classification of boards and of pot sizes can proceed similarly.
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
8
The Loki program
Loki, from Univ.Alberta, used a probabilistic approach, with one initial model
(set of weights) for all players, then updating weights for individual players on
the basis of their observed actions.
• Assess prob. of holding each class of hand, given own cards & board;
• Modify prob. estimates in light of each action
 e.g. “raise”  increase strong hand probs. & decrease weak hand probs
 Adjust weights from estimated hands to better predict observed action
This showed improved performance compared to (i) programs with no modelling
and (ii) programs with only static modelling.
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
9
Bluffing behaviour
Being able to model others is only part of the solution. Good players find it easy
to model opponents who never bluff.
Bluffing purely at random (say 5% of hands) has a problem: in some cases
opponents can know for certain you cannot win, avoid bluffing at such a time.
Keeping raising when bluffing is not typical of behaviour when you truly do
have a good hand - good opponents will detect the difference.
Follow a plan: proceed as if your chance of losing was say 50% of your true
estimate of that chance - this will lead to consistent and realistic behaviour that
cannot be easily diagnosed as bluffing.
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
10
The Poki program
Poki is a rewrite & enhancement of Loki.
It features a neural-network opponent modelling mechanism, inputs include
• estimated hand strength
• estimated hand potential
• previous action of opponent
• position of player clockwise from dealer (first, last, neither)
• predictions from “expert predictors”
Opponent modelling is viewed as machine learning: predict opponent’s action
Backpropagation within the neural network
Plug-in “Expert Predictor(s)” (ensemble) may be machine-learning systems too
Poki also features game-tree search, to 5 ply, using “miximax” to handle the
problem of imperfect knowledge.
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
11
References
www.csse.monash.edu.au/hons/projects/2003/Darren.Boulton/website/
www.cs.ualberta.ca/~games/poker/
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
12
References
www.csse.monash.edu.au/hons/projects/2003/Darren.Boulton/website/
www.cs.ualberta.ca/~games/poker/
Quoted in Aaron Davidson’s 2002 MSc thesis at the U.Alberta site:
http://csiweb.ucd.ie/Staff/acater/comp4031.html
Artificial Intelligence for Games and Puzzles
13