* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Modelling the Enemy: Recursive Cognitive Models in Dynamic Environments
Survey
Document related concepts
Computer Go wikipedia , lookup
Genetic algorithm wikipedia , lookup
Ecological interface design wikipedia , lookup
Human-Computer Interaction Institute wikipedia , lookup
Wizard of Oz experiment wikipedia , lookup
Concept learning wikipedia , lookup
Pattern recognition wikipedia , lookup
Intelligence explosion wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Machine learning wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Human–computer interaction wikipedia , lookup
Transcript
Modelling the Enemy: Recursive Cognitive Models in Dynamic Environments W. Joseph MacInnes, PhD ([email protected]) Division of Life Sciences University of Toronto (Scarborough) Scarborough, Ontario, Canada While many collaborative modelling projects make extensive use of user questionnaires as a base for the user model, competitive environments offer an additional difficulty/challenge by not being able to get assistance from the user being modelled. In fact, the user may hide or masquerade their behaviour. Also, while cooperative modellers may suggest options to the user that may or may not be taken; the competitive modeller has only its own recommendations with which to make its decision. The modeller, in effect, needs to achieve the same (or higher) accuracy with less information. Abstract Much research has been conducted on cognitive models for theoretical (memory) and applied (user modelling) research. This paper explores the efficacy of using such models when the subject of the model is actively trying to defy such attempts. Machine learning algorithms were implemented to model human performance in a dynamic computer game environment. Algorithms were chosen so as to vary the degree to which they were based on human cognitive performance, ranging from a simple state machine, to a neural network, and finally a mixture of experts fed by a clustering algorithm. Recursive modelling, also from cognitive theory, was also included in all three algorithms to ascertain any benefit. It was found that models based on human cognitive theory outperformed the others in most tasks, although recursion failed to offer the advantage reported in other studies. Introduction Experiments in human computer interaction offer a unique arena for machine learning algorithms. In these studies, the benefit is often symbiotic as not only to the users provide information on the effectiveness of the algorithms, but the algorithms can also provide insight into the abilities and individual differences of the users. Both can be seen as cognitive entities within these environments, and can provide standards by which we can measure the performance of the other. This research builds on past studies from machine learning, user modelling and cognitive science and each are discussed briefly below. Opponent modelling Machine learning for user modelling has been the focus of much research and review papers (Webb et al, 2001, Fischer, 2001). Opponent modelling, however, although the focus of debate in cognitive science, has yet to gain equal press in machine learning literature. Models from game theory have been applied to fields as broad as psychology and economics; computer games have become a giant growth industry; and military training simulations strive for more realism: and all of these applications could benefit from learning cognitive models of the human opponents within the tasks. Artificial intelligence and recently machine learning algorithms have been applied to user modelling problems. Webb et al. (2001) outline the difficulties in applying machine learning solutions to these problems, and they include the need for large data sets, the need for labelled data, concept drift; and computational complexity. Most of these are shared with other problem domains, but due to its special relation with user modelling, and opponent modelling in particular, the problem of concept drift (Widmer and Kubat, 1996) will be discussed further. Concept drift is often used to describe a property which may shift or change over time. This presents difficulties for many machine learning algorithms, since they often assume or perform better on static patterns. User modelling is notorious for this problem since user's preferences change frequently both between tasks and even within a certain task. Opponent modelling shares all of these difficulties, but also adds the possibility that the human may be intentionally shifting strategies to try to confuse the machine learning algorithm. This ‘intentional concept drift' or deception must also be accounted for if an algorithm is to succeed. Recursive modelling 1753 Thagard (1992) proposed to demonstrate the cognitive processes necessary for successful opponent modelling. The work explored zero to three levels of modelling an adversary. At Level 0, a person had self-insight "What is my strategy". Level 1 modelling involves devising a model of the adversary: "what does my opponent intend to do?" Level 2 modelling is a meta-perspective: "What does my adversary think I intend to do?" Level 3 modelling is a meta-meta-perspective: "What do I think my opponent expects that I think of him or her?" Thagard's most important contribution from the current perspective would be his discussion of recursion in deception. It was in this article that the author suggested second-order modelling was critical for successful deception. It is necessary to understand how the opponent will interpret ones actions in order to cause an erroneous interpretation. Thagard raised a very important point when he suggested that this recursive model required very little extra representation (space/memory) due to the necessary assumption that the opponent has roughly the same cognitive abilities as oneself. In competitive-agent environments where human and software agents were designed to be interchangeable, this assumption was often not the case. Thagard's research does not suggest the likely outcome of the modelling however, when the opponent's cognitive ability is not known such as modelling between human and computer entities. Other researchers have confirmed the importance of depth two recursion in competition between two humans (Burns & Vollymeyer,1998; MacInnes and Gilin, 2005) and between two similar computer agents (MacInnes, Banyasad and Upal, 2001) but the usefulness in mixed human/computer environments has not been tested. gating mechanism (possibly another neural net) is used to choose among the various experts. While this approach has had some success, its advantage usually lies in reducing the training time over a standard back-propagation neural network. In this situation, however, it is hypothesized that the overlap between the MOE implementation and stereotype theory will produce an advantage in opponent modelling. If opponent behaviour can be clustered (into, say, similar strategy), then these can be used to identify play styles in real time. Although it is tempting to label these clusters as ‘strategies' it should be emphasized that formal labelling of these groups is a very difficult process. Although clustering algorithms in general are excellent at grouping similar behaviour, it is not always obvious what these groups actually mean. In addition to the MOE, a standard back-propagation neural network and a finite state machine (from MacInnes, 2001) were used for comparison. Believability and the Turing test Stereotypes (and AI) The Turing test has become an infamous and much debated proposal, originally conceived by Alan Turing in 1950 to determine if a given machine had achieved human level intelligence. Although it is no longer taken as an accurate measure of human intelligence, it still remains an interesting, and monumental, problem as evidenced by the still unclaimed Loebner prize One of the pioneer works in the user modelling literature and definitely one of the most cited is Rich's paper on using stereotypes for user modelling (Rich, 1979). Although stereotypes had been used prior to 1979, this prior work focussed on stereotypes as they applied to human cognitive understanding. Rich, however, suggested that the usefulness of stereotypes could be extended to models of computer users. Since the rationale behind stereotypes is that the world is far too complex to understand (and remember) without simplification and categorization of the details, the same benefits could be achieved by computers. Applied cognitive models were implemented as a series of ‘facets' that represented individual facts about the user. Contained within each facet is a rating that shows the bias towards or against this attribute, a rating that reflects the certainty (probability) of this aspect, as well as the justification for the belief in this aspect. The classic Turing test (Turing, 1950), was originally proposed as a test for human intelligence. Alan Turing proposed that a machine would be deemed intelligent if a computer could believably mimic human written language. The test he proposed was based on a guessing game, and would include a judge whose job it was to make the determination, a confederate who played the part of the human speaker, and of course the computer program. If after a short conversation with both the program and the confederate, the judge could not determine which was the computer, then that program would pass the test of intelligence. Although the classic Turing test is no longer seen as an acceptable measure of human intelligence, it remains an excellent (and incredibly difficult) test of language mastery. It can also serve as a valid test of agent believability where the standard may only be to mimic believable human behaviour (MacInnes, 2004). Machine learning algorithms Methods A fairly recent area of machine learning research that may help in creating more dynamic user (or opponent) stereotypes, is with Jacob and Knowlan's Mixture of Experts (MOE) (Jacobs etal, 1991). In this algorithm, the authors use a divide-and-conquer approach in which several neural networks are trained on sub-portions of a problem, and a Algorithms 1754 Training data for all machine learning algorithms was recorded from individuals playing against other human opponents in the computer game environment. The same data that were used for the neural network was fed through this clustering algorithm producing a series of smaller, yet more specialized sub-groups. The theory was that these clusters would represent different styles of play, or opponent stereotypes. A hybrid, K-means clustering/ Kohonen Self Organizing Map (SOM) was used to partition the data into experts. A single dimensional, ten unit kmeans clusterer was implemented to provide distinct edges for the ten potential clusters. This algorithm was then modified to allow for the smoothing process found in typical SOM algorithms (Kohonen, 1982) to allow for higher mobility of training samples early in the learning process. The motivation and expectation for using this algorithm was that these experts would provide for more precise modeling of user behaviour. At the very least, it should provide the same accuracy with less training required due to the reduction of variance in the training samples (Jacobs, 1991). The problem of concept drift was also handled by only modelling the opponent's recent history. Once the clustering produced workable stereotypes in the off-line phase, it was simple to compare an opponent's current behaviour to these stereotypes on-line. Competing algorithms which were tested were a simple state machine used in MacInnes (2001), and a standard back-propagation neural network trained on the same data used for the MOE (although without clustering). The state machine used the main states of ‘fight’ and ‘search’ with sub-states including ‘avoid obstacle’, ‘search for opponent’, ‘align to target’ and ‘fire’. The neural net and MOE also used these states as a base, but transitions and decisions were based on output from the respective algorithm. For example, while the DFA had only the state of the environment to guide its search when the opponent went out of site, the net and MOE could predict based on observations. In addition, while firing during a fight, the DFA could only base its decision s on what it would do itself, while the other algorithms could choose based on other humans in similar situations. The neural net would assume a path based on all human observations, while the MOE gating mechanism would predict based on the expert which had most closely predicted that opponent in the recent past. Environment The neural net solution and the experts for the MOE themselves were implemented as a series of feedforward/back-propagation neural networks. Since the environment was dynamic and opponent's patterns may only become clear over time, a time-sensitive neural net was used as with the neural net agent. The nature of the domain (real-time games) and the possibility of concept drift suggest that a time sensitive neural net should be used. This was also an ideal way to implicitly incorporate planning in the model. Classification was performed in two stages: Offline clustering used to determine which samples will be used to train the different Net/Stereotypes; and on-line classification used to determine to which Net/Stereotype the current user belongs. To establish a connection with current literature (Laird, 2000) and current applications, the theme and feel (but not the graphics) of the arena were designed to resemble ‘firstperson shooter' multi-player computer games. While previous research (MacInnes, 2001) looked at competition of multiple software agents, these experiments always contained at least one human agent. Human versus software agent matches were implemented on a single computer, while human versus human matches were implemented on two computers over a small network. This allowed players to compete while being in different physical locations. With all matches, however, computers were connected via network, in adjacent rooms and synchronized to begin at the same time. There were no hints to a participant in a match whether an opponent was human or computer (other than the performance of the opponent in that match). Every agent solution was tested in a series of matches against human opponents according to the following experimental design. Volunteer participants were brought in two at a time and asked to compete against a series of agents (one opponent at a time) in a multi-agent competitive arena. Since a potential human opponent was present in an adjoining room for all matches, participants could not determine whether they were competing against a human or software agent. This arena was based on the virtual arena used for MacInnes (1999, 2001), but modified to allow for new software agents as well as Human/Human agent matches across a TCP network. As mentioned, three separate agent algorithms were tested in this research, the Mixture of Experts, a neural network and a simple state machine (DFA). 1755 Figure 1 Algorithm results for effectiveness (a) and believability (b). Effectiveness is measured as the number of points scored against, and believability as error in the modified Turing test, so lower is better for both Recursive level of opponent modelling was also varied. As with Burns(1998) Zero, first and second order recursion were added to the algorithms to determine if machine learning algorithms could benefit from recursion when modelling people. It has already been shown that such recursion has a significant benefit against other software agents (MacInnes, 1999), but humans are much more difficult to model. Recursion was added to the fight state for all algorithms since this was the only state which could benefit from this in all three algorithms (the net and MOE could also have benefited from recursion in the search state, but this would have differentiated the algorithms on more than their inherent qualities). At depth 0, each algorithm only used environmental information when choosing its moves and strategy while fighting the opponent. Any opponent information was not included as separate from other environment variables (eg location was included, but intent was not). At depth 1, the algorithm created its best model of what it believed its opponent would try to do in combat, and made decisions based on that. For example the DFA would assume its opponent was a similar DFA and act as if its opponent would do the same as itself; the neural net would predict the opponent’s future moves based on its previous moves, and the MOE would predict based the previous moves using the expert which most closely modelled the opponent in the recent past. Depth two repeated this process but included each algorithms best guess of what the algorithms thought the opponent thought the algorithm was going to do. It is worth mentioning that all algorithms had to assume that their opponent was not modelling recursively to avoid infinite recursion. Each algorithm was scored on two different dependant variables - effectiveness and believability. The effectiveness of an algorithm was used to determine the objective performance of an agent as a whole in this environment. This measure looks at how well an agent is able to compete with human opponents and is determined by the agent's score (number of kills) out of five in each of the matches. The more subjective variable, believability, looked at how well the software agent can impersonate a human agent under similar conditions. At the end of every match, participants were asked to rate their opponent as to whether they thought it was human or computer and rate their confidence from one to five. This produced a believability rating (-5..+5) which was subtracted from that same subjects rating for the one human opponent (to normalize individual bias). The final rating was a single score which reflected the degree to which that opponent was more or less believable than the human opponent which was faced. Although a number of between subject variables were recorded, they are beyond the scope of this paper. See MacInnes (2004) for a discussion of these results. Results Initial analysis was a simple test of correlation between effectiveness (the ability to defeat the human opponent) and believability (ability to mimic the human opponent). Although R2 was small (0.02) it was still significant (F=4.875, P<0.03). There was far more to the believability calculations than just how good an opponent was, although it was part of subjects’ determination. Report by subjects after the experiment suggest that experienced players chose very effective opponents as less human while the opposite was true for less experienced players. This trend of experience was not significant in the data however. Effectiveness and believability were then subjected to a three by three within (algorithm x recursion) and two between (sex and experience) Multiple Analysis of Variance (MANOVA). Effects were limited to depth of two (interactions) due to the inherent difficulties in finding subjects in all categories of sex and experience. Significance level was set at 0.05 for all analysis (although some marginal [<.07] effects will be reported as potential 1756 for further analysis). A total of ten dyads (ten male, ten female) participated in the experiment. Conclusion The MOE was the best algorithm for both effectiveness and believability (figure 1, A and B), although the advantage wasn't always significant. The MOE had the lowest number of losses against human opponents, but not significantly so (F<1.0). It did interact with the sex of the opponent, however (F(2,190)=3.12, P<0.05), implying that the algorithms varied in their success against men and women. A large part of the interaction lies with the neural network which was more successful against the female opponents than the male opponents. This data speaks to both the human differences and the algorithm differences. First, it shows the different strategies employed by men and women. Since there was no interaction with experience, this strategy difference goes beyond pure skill level. Second, it shows the flexibility of the MOE over the neural network. Although the net is capable of performing better against a particular type of player, the MOE does better on average against a variety of play styles. It is possible that the MOE is learning individual difference data, and clustering on that data even though it was not explicitly included in training (insofar as that variable produces behavioural differences). There was a non-significant (marginal) effect of opponent in the believability rating (F(2,190),P<0.07) (figure 1a). Although there was a significant difference between the MOE and neural net (P<0.03), there was no such difference with the DFA. Once again the MOE was the best performer of the software algorithms, although most experienced participants were able to spot the software opponents with some consistency. Even the MOE was not completely able to mimic human behaviour, even in this simplified environment. Surprisingly, recursion had no significant effect on either outcome. In fact, recursion level 0 (no recursion) showed the non-significant advantage in both analysis. Three theories are proposed for this lack of effect. First, as suggested in MacInnes (2001), the compounding error of recursion limits the maximum effective recursion level of any modelling algorithm. In this case, the modelling error is so high due to the variability of human action that it hinders the algorithm at very early recursion. The second theory is that The neural net and MOE algorithms fail to show an effect since recursive modelling is already included in their training. Since it hypothesized that humans model recursively in these situations (Thagard, 1992) a machine learning algorithm based on this data will already have implicit recursion included and therefore not benefit from further explicit recursion. A final possibility is that there are different types of opponent modelling; namely strategic and personal, and as current research suggests, it may only be personality modelling that benefits from recursion (MacInnes & Gilin, 2005) (These will be explored more directly in future work). This paper demonstrates techniques for using machine learning techniques in modelling a human in an adversarial environment. Adversarial, multi-agent environments offer a rich and challenging scenario for many machine learning algorithms, and applications using such environments are increasing. In these environments, a Mixture of Experts (MOE), which is based on theories of human modelling practices, was shown to have some advantages over finite state machines as well as single neural net learning. Although theoretical benefits of incorporating recursive modelling were discussed, no such advantages were measured in the listed experiments. Two theories were raised as explanation: a) human recursion may be more complex than previously thought (explored in MacInnes, 2006) b) There may be implicit learning of user recursion in the connectionist models (Net and MOE). Although there has been significant (though controversial) research on implicit learning in humans, less work has been done on the same in a machine learning context. Although all machine learning can be thought of as implicit learning of patterns, implicit learning of human behaviour and has interesting implications in user and cognitive modelling. The question of how these models may be incorporating recursion, however, will have to be left to future studies since the original training data for the algorithms did not include the recursive modelling abilities of the training subjects or the degree to which they used it. Current research is underway which measures the degree to which recursive ability of the training subjects is used in the clustering stage of the mixture of experts should shed light on this question. In spite of the null results, the potential benefits of AI recursion (in improving the AI agent and understanding the human agent) suggest that further research may be warranted. It has clearly been shown that recursive modelling plays an important part in human cognition and interaction, and it only stands to reason that it should be included in human cognitive models. Acknowledgments Thanks to Ray Klein for advice and support. Funding provided in part by Dalhousie University, Saint Mary's University, UTSC and the Centre for Computational and Cognitive Neuroscience, The National Science and Engineering Research Council of Canada and The Canadian Space Agency. References 1757 Burns, B & Vollmeyer, R. (1998). Modelling the Adversary and Success in Competition. Journal of Personality and Social Psychology, 75 (3), 711-718. Fischer, G. (2001) User Modeling in Human-Computer Interaction. User Modeling and User-Adapted Interaction ,11, 65-86. Hebb, D. O., & Williams, K. A. (1946). A method of rating animal intelligence. Journal of General Psychology, 34, 59-65. Jacobs, R & Nowlan, S. (1991). Adaptive Mixtures of Local Experts. Neural Computation, 3, 79-87. Kohonen, T. (1982). Self-organizing formation of topologically correct feature maps, Biological Cybernetics 43 (1), 59-69. Laird, J. (2000). It Knows What You're Going To Do: Adding Anticipation to a Quakebot. Presented at the AAAI Spring Symposium on Artificial Intelligence and Interactive Entertainment, March. MacInnes, W.J. (2004) Believability in Multi-Agent Computer Games: Revisiting the Turing Test. Proceedings of CHI, 1537. MacInnes, W.J. & Gilin, D. (2006) Recursion for Adversarial Modelling : New Evidence. Member Abstract, CogSci, 2006. MacInnes, J., Banyasad, O. & Upal, A.(2001). Watching Me, Watching You. Recursive modeling of autonomous agents. Abstracts of the Canadian Conference on AI, Ottawa, Ontario. 361-364. MacInnes, W.J. and Gilin, D. (2005) What I think you think I am going to do next: Perspective-Taking and Recursive Modeling in Computer Mediated Conflict. Proceedings of IACM. Rich, E. (1979). User Modeling via Stereotypes. Cognitive Science, 3, 329-354. Shore D., Stanford L. , MacInnes J. , Klein R. & Brown R. (2001). Of Mice and Men: Using Virtual Hebb-Williams mazes to compare learning across gender and species. Cognitive, Affective and Behavioral Neuroscience, 1(1), 83-89. Thagard, P. (1992). Adversarial Problem Solving: Modeling and Opponent Using Explanatory Coherence. Cognitive Science, 16, 123-149. Turing, A. (1950). Computing Machinery and Intelligence. Mind, 236, P433. Webb, G, Pazzani, M & Billsus, D. (2001). Machine Learning For User Modeling. User Modeling and UserAdapted Interaction, 11, 19-29. 1758