Download Opponent Modelling in Heads-Up Poker

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Talos Principle wikipedia , lookup

Minimax wikipedia , lookup

Artificial intelligence in video games wikipedia , lookup

Transcript
Opponent Modelling in Heads-Up Poker
Timm Meyer1 and Dr. Jonathan L. Shapiro2
1
Philipps-University of Marburg, Department of Mathematics and Computer
Science [email protected]
2
University of Manchester, School of Computer Science [email protected]
Abstract. Poker is a challenging game for artificial intelligence research,
and similar to many real-world problems. Opponent modelling is an important part of the game, and essential for achieving high performance
for machines as well as humans. This paper describes a new type of
hierarchical representation usable for opponent modelling, with the potentiality to abstract dynamically.
1
Introduction
Learning in games of incomplete information presents many challenges to artificial intelligence research, including how to learn appropriate probabilistic strategies and how to represent the state space ([1], [2]).
Poker is a non-deterministic game with incomplete information. Due to the
fact that the players only see their own cards and in a game without a showdown
the cards are not revealed, no player has complete information. Besides, poker
has several other aspects which also arise in real-world problems.
Therefore, studying the game of poker holds a lot of promise for the development of methods which can be transferred to and used in the real world. Since
the objective in playing poker is to maximize the money won by exploiting the
opponent’s weaknesses, a main component of poker is opponent modelling.
Some research has already been done in this area with the University of Alberta
Computer Poker Research Group leading the way ([3],[4],[5],[6]).
2
A Hierarchical State Space Representation
Poker is a very situation-dependent game, and due to its complexity there is a
multitudinous variety of situations, not even remotely comprehensible to human
players. Therefore, even those have to abstract when modelling an opponent
during actual play. The less of a player’s behaviour is observed, the more needs to
be abstracted. And even for a computer player an efficient representation needs
to be found, due the large scale of the state space. In this paper we consider
Heads-Up Fixed Limit Texas Hold’Em, which restrains the game to two players
and fixed bet sizes.
A new kind of model is introduced which makes it possible to abstract dynamically depending on how much data is available. To accomplish this, a tree
structure is used to model post-flop play3 , in which every node represents certain
situations.
Figure 1 shows an excerpt of the first created tree. Compared to the final tree
which consists of 1,958 nodes with a maximum of 9 levels, this excerpt covers
only a small portion. All trees share the same first level consisting of nodes
separating the situations into “Made”4 -, “Drawing”5 -, and “Trash”6 -Hands.
Fig. 1. Tree Version One - Made Hands
Situations represented by child nodes are a subset of the situations represented by the parent node. If an upper node represents a made hand, further
down the tree refines this knowledge: a pair, a two pair, etc. Each node stores the
behaviour of the modelled player in its specific situations. The root node embodies the set of all situations, therefore it contains the player’s overall behaviour.
Not one tree, but a set of trees are created for a modelled player, depending on
different information, e.g. who was the last aggressor7 .
A player’s behaviour is described by the frequencies of the actions the player has
made. These include folding, checking / calling, betting / raising distinguished
by the betting level on which they take place.
The final trees were developed in close collaboration with a German professional
poker player8 , with the objective to reflect how a human poker player analyses
situations during play.
With the aid of such a model, the probabilities of specific hole cards in a new
game can be predicted. This is done by using the theorem of Bayes’ with the
specific nodes the game went through so far. Because of the hierarchical design,
it is possible to abstract dynamically while calculating these probabilities. This
means if hole cards, which represent situations seldom seen, are encountered it is
possible to move up in the tree, until the situation the node represents contains
enough examples.
3
4
5
6
7
8
Pre-flop play is handled with fourteen tables, which cover all possible situations.
A Made Hand is a hand with at least the value of a pair.
A Drawing Hand is a hand which has no value yet, but high potential to improve to
a strong hand.
A Trash Hand is a hand with neither value nor potential.
The person who made the last bet/raise is called the aggressor.
Who wishes to remain anonymous.
Whenever a game is observed, the appropriate nodes, which reflect the situations occurring during that game, are updated. If there is a showdown, this
is simple, since the hole cards are revealed. To be able to learn from the data
in which the modelled player folds, a method similar to the EM-Algorithm was
developed. In that case, the model itself is used to predict the hole cards the
player most likely folded. The tree is then updated using a weighting according
to the prediction.
3
Evaluation
The data used for evaluation is static historic data, and the model has not been
used yet in real situations. Therefore, the usefulness of the developed model is not
evaluated, but instead an evaluation of the quality of the model’s prediction is
made. For evaluation purpose, data about four players was available. The amount
of data ranged between 33K and 106K hand histories for every player, and in
one case the complete database was available, leading to complete information.
When measuring the quality of a prediction of single hole cards, the similarity
between the real situation and the situation the predicted hole cards represent,
has to be taken into account. To measure this similarity, the path from the root
node to both nodes, to which the hole cards lead, is examined. The more nodes
these two paths share, the more similar the situations are, hence the number
of shared nodes is counted, omitting the root node and every node, which is
independent from the hole cards, e.g. a node considering the board texture. This
count is further referred to as share count.
Figure 2 shows an evaluation using the share count of one modelled player
in all river situations. The figure contains four series created by four different
models: the baseline9 , a model which used only showdowns, a model which used
showdowns and the previously described EM-Approach, and a model which additionally used dynamic abstractions for calculating the probabilities of the hole
cards.
As seen in the figure, the share count for the model using the EM-Approach
and dynamic abstractions has a value of 1.55 for the 100 most probable hole
cards. This means within the 100 most probable hands the predicted hole cards
shares on average 1.55 nodes with the real hole cards. A value of 1.55 implies
that half of the time an second node is shared, which in Fig. 1 is equal to
the correct type of hand, e.g. a pair, a straight, et cetera. Compared to the
baseline this is very good, because the form of poker evaluated is Heads-Up, and
in Heads-Up the variety of situations is higher10 . Therefore, one can proceed on
the assumption that if this model is extended to games with multiple opponents,
the prediction will be better, because less hands are played in those games, and
thus the variety of situations is lower.
9
10
An “empty” model, i.e. the model has no information about the player
This is due to the fact that more hands are played than in games with multiple
opponents.
Fig. 2. Evaluation Using the Share Count in River Situations
Generally, further evaluation, and evaluation of many individual situations
showed positive and promising results, hence further research will be done.
4
Conclusion
This paper introduced a new representation for opponent modelling in poker,
which was originally developed and evaluated in a diploma thesis. This new
representation uses a hierarchical approach to partition the game state space to
make dynamic abstractions possible. The evaluation showed promising results
and a lot more results and opportunities to use, than what could be presented
within the scope of this paper.
Furthermore, there is still work in progress, for example building general
player models, which can be used as substitution until enough data about a
player is gathered, or the extension to games with multiple opponents.
References
1. Singh, S., Kearns, M., Mansour, Y.: Nash convergence of gradient dynamics in
general-sum games. Proceedings of the Sixteenth Conference on Uncertainty in
Artificial Intelligence (2000) 541–548
2. Littman, M.: Markov games as a framework for multi-agent reinforcement learning.
Proceedings of the Eleventh International Conference on Machine Learning 157163
(1994)
3. Davidson, A., Billings, D., Schaeffer, J., Szafron, D.: Improved opponent modeling
in poker. Proceedings of the 2000 International Conference on Artificial Intelligence
(2000) 1467–1473
4. Billings, D., Papp, D., Schaeffer, J., Szafron, D.: Opponent modeling in poker.
Proceedings of the Fifteenth National Conference on Artificial Intelligence (1998)
493–499
5. Billings, D., Davidson, A., Schaeffer, J., Szafron, D.: The challenge of poker. Artificial Intelligence 134(1-2) (2002) 201–240
6. Billings, D., Papp, D., Schaeffer, J., Szafron, D.: Poker as a testbed for machine
intelligence research. Advances in Artificial Intelligence (1997) 1–15