Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Opponent Modelling in Heads-Up Poker Timm Meyer1 and Dr. Jonathan L. Shapiro2 1 Philipps-University of Marburg, Department of Mathematics and Computer Science [email protected] 2 University of Manchester, School of Computer Science [email protected] Abstract. Poker is a challenging game for artificial intelligence research, and similar to many real-world problems. Opponent modelling is an important part of the game, and essential for achieving high performance for machines as well as humans. This paper describes a new type of hierarchical representation usable for opponent modelling, with the potentiality to abstract dynamically. 1 Introduction Learning in games of incomplete information presents many challenges to artificial intelligence research, including how to learn appropriate probabilistic strategies and how to represent the state space ([1], [2]). Poker is a non-deterministic game with incomplete information. Due to the fact that the players only see their own cards and in a game without a showdown the cards are not revealed, no player has complete information. Besides, poker has several other aspects which also arise in real-world problems. Therefore, studying the game of poker holds a lot of promise for the development of methods which can be transferred to and used in the real world. Since the objective in playing poker is to maximize the money won by exploiting the opponent’s weaknesses, a main component of poker is opponent modelling. Some research has already been done in this area with the University of Alberta Computer Poker Research Group leading the way ([3],[4],[5],[6]). 2 A Hierarchical State Space Representation Poker is a very situation-dependent game, and due to its complexity there is a multitudinous variety of situations, not even remotely comprehensible to human players. Therefore, even those have to abstract when modelling an opponent during actual play. The less of a player’s behaviour is observed, the more needs to be abstracted. And even for a computer player an efficient representation needs to be found, due the large scale of the state space. In this paper we consider Heads-Up Fixed Limit Texas Hold’Em, which restrains the game to two players and fixed bet sizes. A new kind of model is introduced which makes it possible to abstract dynamically depending on how much data is available. To accomplish this, a tree structure is used to model post-flop play3 , in which every node represents certain situations. Figure 1 shows an excerpt of the first created tree. Compared to the final tree which consists of 1,958 nodes with a maximum of 9 levels, this excerpt covers only a small portion. All trees share the same first level consisting of nodes separating the situations into “Made”4 -, “Drawing”5 -, and “Trash”6 -Hands. Fig. 1. Tree Version One - Made Hands Situations represented by child nodes are a subset of the situations represented by the parent node. If an upper node represents a made hand, further down the tree refines this knowledge: a pair, a two pair, etc. Each node stores the behaviour of the modelled player in its specific situations. The root node embodies the set of all situations, therefore it contains the player’s overall behaviour. Not one tree, but a set of trees are created for a modelled player, depending on different information, e.g. who was the last aggressor7 . A player’s behaviour is described by the frequencies of the actions the player has made. These include folding, checking / calling, betting / raising distinguished by the betting level on which they take place. The final trees were developed in close collaboration with a German professional poker player8 , with the objective to reflect how a human poker player analyses situations during play. With the aid of such a model, the probabilities of specific hole cards in a new game can be predicted. This is done by using the theorem of Bayes’ with the specific nodes the game went through so far. Because of the hierarchical design, it is possible to abstract dynamically while calculating these probabilities. This means if hole cards, which represent situations seldom seen, are encountered it is possible to move up in the tree, until the situation the node represents contains enough examples. 3 4 5 6 7 8 Pre-flop play is handled with fourteen tables, which cover all possible situations. A Made Hand is a hand with at least the value of a pair. A Drawing Hand is a hand which has no value yet, but high potential to improve to a strong hand. A Trash Hand is a hand with neither value nor potential. The person who made the last bet/raise is called the aggressor. Who wishes to remain anonymous. Whenever a game is observed, the appropriate nodes, which reflect the situations occurring during that game, are updated. If there is a showdown, this is simple, since the hole cards are revealed. To be able to learn from the data in which the modelled player folds, a method similar to the EM-Algorithm was developed. In that case, the model itself is used to predict the hole cards the player most likely folded. The tree is then updated using a weighting according to the prediction. 3 Evaluation The data used for evaluation is static historic data, and the model has not been used yet in real situations. Therefore, the usefulness of the developed model is not evaluated, but instead an evaluation of the quality of the model’s prediction is made. For evaluation purpose, data about four players was available. The amount of data ranged between 33K and 106K hand histories for every player, and in one case the complete database was available, leading to complete information. When measuring the quality of a prediction of single hole cards, the similarity between the real situation and the situation the predicted hole cards represent, has to be taken into account. To measure this similarity, the path from the root node to both nodes, to which the hole cards lead, is examined. The more nodes these two paths share, the more similar the situations are, hence the number of shared nodes is counted, omitting the root node and every node, which is independent from the hole cards, e.g. a node considering the board texture. This count is further referred to as share count. Figure 2 shows an evaluation using the share count of one modelled player in all river situations. The figure contains four series created by four different models: the baseline9 , a model which used only showdowns, a model which used showdowns and the previously described EM-Approach, and a model which additionally used dynamic abstractions for calculating the probabilities of the hole cards. As seen in the figure, the share count for the model using the EM-Approach and dynamic abstractions has a value of 1.55 for the 100 most probable hole cards. This means within the 100 most probable hands the predicted hole cards shares on average 1.55 nodes with the real hole cards. A value of 1.55 implies that half of the time an second node is shared, which in Fig. 1 is equal to the correct type of hand, e.g. a pair, a straight, et cetera. Compared to the baseline this is very good, because the form of poker evaluated is Heads-Up, and in Heads-Up the variety of situations is higher10 . Therefore, one can proceed on the assumption that if this model is extended to games with multiple opponents, the prediction will be better, because less hands are played in those games, and thus the variety of situations is lower. 9 10 An “empty” model, i.e. the model has no information about the player This is due to the fact that more hands are played than in games with multiple opponents. Fig. 2. Evaluation Using the Share Count in River Situations Generally, further evaluation, and evaluation of many individual situations showed positive and promising results, hence further research will be done. 4 Conclusion This paper introduced a new representation for opponent modelling in poker, which was originally developed and evaluated in a diploma thesis. This new representation uses a hierarchical approach to partition the game state space to make dynamic abstractions possible. The evaluation showed promising results and a lot more results and opportunities to use, than what could be presented within the scope of this paper. Furthermore, there is still work in progress, for example building general player models, which can be used as substitution until enough data about a player is gathered, or the extension to games with multiple opponents. References 1. Singh, S., Kearns, M., Mansour, Y.: Nash convergence of gradient dynamics in general-sum games. Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (2000) 541–548 2. Littman, M.: Markov games as a framework for multi-agent reinforcement learning. Proceedings of the Eleventh International Conference on Machine Learning 157163 (1994) 3. Davidson, A., Billings, D., Schaeffer, J., Szafron, D.: Improved opponent modeling in poker. Proceedings of the 2000 International Conference on Artificial Intelligence (2000) 1467–1473 4. Billings, D., Papp, D., Schaeffer, J., Szafron, D.: Opponent modeling in poker. Proceedings of the Fifteenth National Conference on Artificial Intelligence (1998) 493–499 5. Billings, D., Davidson, A., Schaeffer, J., Szafron, D.: The challenge of poker. Artificial Intelligence 134(1-2) (2002) 201–240 6. Billings, D., Papp, D., Schaeffer, J., Szafron, D.: Poker as a testbed for machine intelligence research. Advances in Artificial Intelligence (1997) 1–15