Download Modelling the Enemy: Recursive Cognitive Models in Dynamic Environments

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer Go wikipedia , lookup

Genetic algorithm wikipedia , lookup

Ecological interface design wikipedia , lookup

Human-Computer Interaction Institute wikipedia , lookup

Wizard of Oz experiment wikipedia , lookup

Concept learning wikipedia , lookup

Pattern recognition wikipedia , lookup

Intelligence explosion wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Machine learning wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Human–computer interaction wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Transcript
Modelling the Enemy:
Recursive Cognitive Models in Dynamic Environments
W. Joseph MacInnes, PhD ([email protected])
Division of Life Sciences University of Toronto (Scarborough)
Scarborough, Ontario, Canada
While many collaborative modelling projects make
extensive use of user questionnaires as a base for the user
model, competitive environments offer an additional
difficulty/challenge by not being able to get assistance from
the user being modelled. In fact, the user may hide or
masquerade their behaviour. Also, while cooperative
modellers may suggest options to the user that may or may
not be taken; the competitive modeller has only its own
recommendations with which to make its decision. The
modeller, in effect, needs to achieve the same (or higher)
accuracy with less information.
Abstract
Much research has been conducted on cognitive models for
theoretical (memory) and applied (user modelling) research.
This paper explores the efficacy of using such models when
the subject of the model is actively trying to defy such
attempts. Machine learning algorithms were implemented to
model human performance in a dynamic computer game
environment. Algorithms were chosen so as to vary the
degree to which they were based on human cognitive
performance, ranging from a simple state machine, to a neural
network, and finally a mixture of experts fed by a clustering
algorithm. Recursive modelling, also from cognitive theory,
was also included in all three algorithms to ascertain any
benefit. It was found that models based on human cognitive
theory outperformed the others in most tasks, although
recursion failed to offer the advantage reported in other
studies.
Introduction
Experiments in human computer interaction offer a unique
arena for machine learning algorithms. In these studies, the
benefit is often symbiotic as not only to the users provide
information on the effectiveness of the algorithms, but the
algorithms can also provide insight into the abilities and
individual differences of the users. Both can be seen as
cognitive entities within these environments, and can
provide standards by which we can measure the
performance of the other.
This research builds on past studies from machine learning,
user modelling and cognitive science and each are discussed
briefly below.
Opponent modelling
Machine learning for user modelling has been the focus of
much research and review papers (Webb et al, 2001,
Fischer, 2001). Opponent modelling, however, although
the focus of debate in cognitive science, has yet to gain
equal press in machine learning literature. Models from
game theory have been applied to fields as broad as
psychology and economics; computer games have become a
giant growth industry; and military training simulations
strive for more realism: and all of these applications could
benefit from learning cognitive models of the human
opponents within the tasks.
Artificial intelligence and recently machine learning
algorithms have been applied to user modelling problems.
Webb et al. (2001) outline the difficulties in applying
machine learning solutions to these problems, and they
include the need for large data sets, the need for labelled
data, concept drift; and computational complexity. Most of
these are shared with other problem domains, but due to its
special relation with user modelling, and opponent
modelling in particular, the problem of concept drift
(Widmer and Kubat, 1996) will be discussed further.
Concept drift is often used to describe a property which may
shift or change over time. This presents difficulties for
many machine learning algorithms, since they often assume
or perform better on static patterns. User modelling is
notorious for this problem since user's preferences change
frequently both between tasks and even within a certain
task. Opponent modelling shares all of these difficulties,
but also adds the possibility that the human may be
intentionally shifting strategies to try to confuse the machine
learning algorithm. This ‘intentional concept drift' or
deception must also be accounted for if an algorithm is to
succeed.
Recursive modelling
1753
Thagard (1992) proposed to demonstrate the cognitive
processes necessary for successful opponent modelling.
The work explored zero to three levels of modelling an
adversary. At Level 0, a person had self-insight "What is my
strategy". Level 1 modelling involves devising a model of
the adversary: "what does my opponent intend to do?"
Level 2 modelling is a meta-perspective: "What does my
adversary think I intend to do?" Level 3 modelling is a
meta-meta-perspective: "What do I think my opponent
expects that I think of him or her?"
Thagard's most important contribution from the current
perspective would be his discussion of recursion in
deception. It was in this article that the author suggested
second-order modelling was critical for successful
deception. It is necessary to understand how the opponent
will interpret ones actions in order to cause an erroneous
interpretation.
Thagard raised a very important point when he suggested
that this recursive model required very little extra
representation (space/memory) due to the necessary
assumption that the opponent has roughly the same
cognitive abilities as oneself.
In competitive-agent
environments where human and software agents were
designed to be interchangeable, this assumption was often
not the case. Thagard's research does not suggest the likely
outcome of the modelling however, when the opponent's
cognitive ability is not known such as modelling between
human and computer entities. Other researchers have
confirmed the importance of depth two recursion in
competition
between
two
humans
(Burns
&
Vollymeyer,1998; MacInnes and Gilin, 2005) and between
two similar computer agents (MacInnes, Banyasad and
Upal, 2001) but the usefulness in mixed human/computer
environments has not been tested.
gating mechanism (possibly another neural net) is used to
choose among the various experts. While this approach has
had some success, its advantage usually lies in reducing the
training time over a standard back-propagation neural
network. In this situation, however, it is hypothesized that
the overlap between the MOE implementation and
stereotype theory will produce an advantage in opponent
modelling. If opponent behaviour can be clustered (into,
say, similar strategy), then these can be used to identify play
styles in real time. Although it is tempting to label these
clusters as ‘strategies' it should be emphasized that formal
labelling of these groups is a very difficult process.
Although clustering algorithms in general are excellent at
grouping similar behaviour, it is not always obvious what
these groups actually mean.
In addition to the MOE, a standard back-propagation neural
network and a finite state machine (from MacInnes, 2001)
were used for comparison.
Believability and the Turing test
Stereotypes (and AI)
The Turing test has become an infamous and much debated
proposal, originally conceived by Alan Turing in 1950 to
determine if a given machine had achieved human level
intelligence. Although it is no longer taken as an accurate
measure of human intelligence, it still remains an
interesting, and monumental, problem as evidenced by the
still unclaimed Loebner prize
One of the pioneer works in the user modelling literature
and definitely one of the most cited is Rich's paper on using
stereotypes for user modelling (Rich, 1979). Although
stereotypes had been used prior to 1979, this prior work
focussed on stereotypes as they applied to human cognitive
understanding.
Rich, however, suggested that the
usefulness of stereotypes could be extended to models of
computer users. Since the rationale behind stereotypes is
that the world is far too complex to understand (and
remember) without simplification and categorization of the
details, the same benefits could be achieved by computers.
Applied cognitive models were implemented as a series of
‘facets' that represented individual facts about the user.
Contained within each facet is a rating that shows the bias
towards or against this attribute, a rating that reflects the
certainty (probability) of this aspect, as well as the
justification for the belief in this aspect.
The classic Turing test (Turing, 1950), was originally
proposed as a test for human intelligence. Alan Turing
proposed that a machine would be deemed intelligent if a
computer could believably mimic human written language.
The test he proposed was based on a guessing game, and
would include a judge whose job it was to make the
determination, a confederate who played the part of the
human speaker, and of course the computer program. If
after a short conversation with both the program and the
confederate, the judge could not determine which was the
computer, then that program would pass the test of
intelligence.
Although the classic Turing test is no longer seen as an
acceptable measure of human intelligence, it remains an
excellent (and incredibly difficult) test of language mastery.
It can also serve as a valid test of agent believability where
the standard may only be to mimic believable human
behaviour (MacInnes, 2004).
Machine learning algorithms
Methods
A fairly recent area of machine learning research that may
help in creating more dynamic user (or opponent)
stereotypes, is with Jacob and Knowlan's Mixture of Experts
(MOE) (Jacobs etal, 1991). In this algorithm, the authors
use a divide-and-conquer approach in which several neural
networks are trained on sub-portions of a problem, and a
Algorithms
1754
Training data for all machine learning algorithms was
recorded from individuals playing against other human
opponents in the computer game environment.
The same data that were used for the neural network was fed
through this clustering algorithm producing a series of
smaller, yet more specialized sub-groups. The theory was
that these clusters would represent different styles of play,
or opponent stereotypes. A hybrid, K-means clustering/
Kohonen Self Organizing Map (SOM) was used to partition
the data into experts. A single dimensional, ten unit kmeans clusterer was implemented to provide distinct edges
for the ten potential clusters. This algorithm was then
modified to allow for the smoothing process found in
typical SOM algorithms (Kohonen, 1982) to allow for
higher mobility of training samples early in the learning
process.
The motivation and expectation for using this algorithm was
that these experts would provide for more precise modeling
of user behaviour. At the very least, it should provide the
same accuracy with less training required due to the
reduction of variance in the training samples (Jacobs, 1991).
The problem of concept drift was also handled by only
modelling the opponent's recent history. Once the clustering
produced workable stereotypes in the off-line phase, it was
simple to compare an opponent's current behaviour to these
stereotypes on-line.
Competing algorithms which were tested were a simple
state machine used in MacInnes (2001), and a standard
back-propagation neural network trained on the same data
used for the MOE (although without clustering). The state
machine used the main states of ‘fight’ and ‘search’ with
sub-states including ‘avoid obstacle’, ‘search for opponent’,
‘align to target’ and ‘fire’. The neural net and MOE also
used these states as a base, but transitions and decisions
were based on output from the respective algorithm. For
example, while the DFA had only the state of the
environment to guide its search when the opponent went out
of site, the net and MOE could predict based on
observations. In addition, while firing during a fight, the
DFA could only base its decision s on what it would do
itself, while the other algorithms could choose based on
other humans in similar situations. The neural net would
assume a path based on all human observations, while the
MOE gating mechanism would predict based on the expert
which had most closely predicted that opponent in the
recent past.
Environment
The neural net solution and the experts for the MOE
themselves were implemented as a series of feedforward/back-propagation neural networks. Since the
environment was dynamic and opponent's patterns may only
become clear over time, a time-sensitive neural net was used
as with the neural net agent. The nature of the domain
(real-time games) and the possibility of concept drift
suggest that a time sensitive neural net should be used. This
was also an ideal way to implicitly incorporate planning in
the model. Classification was performed in two stages: Offline clustering used to determine which samples will be
used to train the different Net/Stereotypes; and on-line
classification used to determine to which Net/Stereotype the
current user belongs.
To establish a connection with current literature (Laird,
2000) and current applications, the theme and feel (but not
the graphics) of the arena were designed to resemble ‘firstperson shooter' multi-player computer games.
While
previous research (MacInnes, 2001) looked at competition
of multiple software agents, these experiments always
contained at least one human agent. Human versus software
agent matches were implemented on a single computer,
while human versus human matches were implemented on
two computers over a small network. This allowed players
to compete while being in different physical locations. With
all matches, however, computers were connected via
network, in adjacent rooms and synchronized to begin at the
same time. There were no hints to a participant in a match
whether an opponent was human or computer (other than
the performance of the opponent in that match).
Every agent solution was tested in a series of matches
against human opponents according to the following
experimental design. Volunteer participants were brought in
two at a time and asked to compete against a series of agents
(one opponent at a time) in a multi-agent competitive arena.
Since a potential human opponent was present in an
adjoining room for all matches, participants could not
determine whether they were competing against a human or
software agent. This arena was based on the virtual arena
used for MacInnes (1999, 2001), but modified to allow for
new software agents as well as Human/Human agent
matches across a TCP network.
As mentioned, three separate agent algorithms were tested
in this research, the Mixture of Experts, a neural network
and a simple state machine (DFA).
1755
Figure 1 Algorithm results for effectiveness (a) and believability (b). Effectiveness is measured as the number of
points scored against, and believability as error in the modified Turing test, so lower is better for both
Recursive level of opponent modelling was also varied. As
with Burns(1998) Zero, first and second order recursion
were added to the algorithms to determine if machine
learning algorithms could benefit from recursion when
modelling people. It has already been shown that such
recursion has a significant benefit against other software
agents (MacInnes, 1999), but humans are much more
difficult to model.
Recursion was added to the fight state for all algorithms
since this was the only state which could benefit from this in
all three algorithms (the net and MOE could also have
benefited from recursion in the search state, but this would
have differentiated the algorithms on more than their
inherent qualities). At depth 0, each algorithm only used
environmental information when choosing its moves and
strategy while fighting the opponent. Any opponent
information was not included as separate from other
environment variables (eg location was included, but intent
was not). At depth 1, the algorithm created its best model of
what it believed its opponent would try to do in combat, and
made decisions based on that. For example the DFA would
assume its opponent was a similar DFA and act as if its
opponent would do the same as itself; the neural net would
predict the opponent’s future moves based on its previous
moves, and the MOE would predict based the previous
moves using the expert which most closely modelled the
opponent in the recent past. Depth two repeated this
process but included each algorithms best guess of what the
algorithms thought the opponent thought the algorithm was
going to do. It is worth mentioning that all algorithms had
to assume that their opponent was not modelling recursively
to avoid infinite recursion.
Each algorithm was scored on two different dependant
variables - effectiveness and believability.
The
effectiveness of an algorithm was used to determine the
objective performance of an agent as a whole in this
environment. This measure looks at how well an agent is
able to compete with human opponents and is determined
by the agent's score (number of kills) out of five in each of
the matches. The more subjective variable, believability,
looked at how well the software agent can impersonate a
human agent under similar conditions. At the end of every
match, participants were asked to rate their opponent as to
whether they thought it was human or computer and rate
their confidence from one to five. This produced a
believability rating (-5..+5) which was subtracted from that
same subjects rating for the one human opponent (to
normalize individual bias). The final rating was a single
score which reflected the degree to which that opponent was
more or less believable than the human opponent which was
faced.
Although a number of between subject variables were
recorded, they are beyond the scope of this paper. See
MacInnes (2004) for a discussion of these results.
Results
Initial analysis was a simple test of correlation between
effectiveness (the ability to defeat the human opponent) and
believability (ability to mimic the human opponent).
Although R2 was small (0.02) it was still significant
(F=4.875, P<0.03). There was far more to the believability
calculations than just how good an opponent was, although
it was part of subjects’ determination. Report by subjects
after the experiment suggest that experienced players chose
very effective opponents as less human while the opposite
was true for less experienced players. This trend of
experience was not significant in the data however.
Effectiveness and believability were then subjected to a
three by three within (algorithm x recursion) and two
between (sex and experience) Multiple Analysis of Variance
(MANOVA). Effects were limited to depth of two
(interactions) due to the inherent difficulties in finding
subjects in all categories of sex and experience.
Significance level was set at 0.05 for all analysis (although
some marginal [<.07] effects will be reported as potential
1756
for further analysis). A total of ten dyads (ten male, ten
female) participated in the experiment.
Conclusion
The MOE was the best algorithm for both effectiveness and
believability (figure 1, A and B), although the advantage
wasn't always significant. The MOE had the lowest number
of losses against human opponents, but not significantly so
(F<1.0). It did interact with the sex of the opponent,
however (F(2,190)=3.12, P<0.05), implying that the
algorithms varied in their success against men and women.
A large part of the interaction lies with the neural network
which was more successful against the female opponents
than the male opponents. This data speaks to both the
human differences and the algorithm differences. First, it
shows the different strategies employed by men and women.
Since there was no interaction with experience, this strategy
difference goes beyond pure skill level. Second, it shows
the flexibility of the MOE over the neural network.
Although the net is capable of performing better against a
particular type of player, the MOE does better on average
against a variety of play styles. It is possible that the MOE
is learning individual difference data, and clustering on that
data even though it was not explicitly included in training
(insofar as that variable produces behavioural differences).
There was a non-significant (marginal) effect of
opponent in the believability rating (F(2,190),P<0.07)
(figure 1a). Although there was a significant difference
between the MOE and neural net (P<0.03), there was no
such difference with the DFA. Once again the MOE was
the best performer of the software algorithms, although
most experienced participants were able to spot the software
opponents with some consistency. Even the MOE was not
completely able to mimic human behaviour, even in this
simplified environment.
Surprisingly, recursion had no significant effect on
either outcome. In fact, recursion level 0 (no recursion)
showed the non-significant advantage in both analysis.
Three theories are proposed for this lack of effect. First, as
suggested in MacInnes (2001), the compounding error of
recursion limits the maximum effective recursion level of
any modelling algorithm. In this case, the modelling error is
so high due to the variability of human action that it hinders
the algorithm at very early recursion. The second theory is
that The neural net and MOE algorithms fail to show an
effect since recursive modelling is already included in their
training.
Since it hypothesized that humans model
recursively in these situations (Thagard, 1992) a machine
learning algorithm based on this data will already have
implicit recursion included and therefore not benefit from
further explicit recursion. A final possibility is that there are
different types of opponent modelling; namely strategic and
personal, and as current research suggests, it may only be
personality modelling
that benefits from recursion
(MacInnes & Gilin, 2005) (These will be explored more
directly in future work).
This paper demonstrates techniques for using machine
learning techniques in modelling a human in an adversarial
environment. Adversarial, multi-agent environments offer a
rich and challenging scenario for many machine learning
algorithms, and applications using such environments are
increasing. In these environments, a Mixture of Experts
(MOE), which is based on theories of human modelling
practices, was shown to have some advantages over finite
state machines as well as single neural net learning.
Although theoretical benefits of incorporating recursive
modelling were discussed, no such advantages were
measured in the listed experiments. Two theories were
raised as explanation: a) human recursion may be more
complex than previously thought (explored in MacInnes,
2006) b) There may be implicit learning of user recursion in
the connectionist models (Net and MOE). Although there
has been significant (though controversial) research on
implicit learning in humans, less work has been done on the
same in a machine learning context. Although all machine
learning can be thought of as implicit learning of patterns,
implicit learning of human behaviour and has interesting
implications in user and cognitive modelling. The question
of how these models may be incorporating recursion,
however, will have to be left to future studies since the
original training data for the algorithms did not include the
recursive modelling abilities of the training subjects or the
degree to which they used it. Current research is underway
which measures the degree to which recursive ability of the
training subjects is used in the clustering stage of the
mixture of experts should shed light on this question.
In spite of the null results, the potential benefits of AI
recursion (in improving the AI agent and understanding the
human agent) suggest that further research may be
warranted. It has clearly been shown that recursive
modelling plays an important part in human cognition and
interaction, and it only stands to reason that it should be
included in human cognitive models.
Acknowledgments
Thanks to Ray Klein for advice and support. Funding
provided in part by Dalhousie University, Saint Mary's
University, UTSC and the Centre for Computational and
Cognitive Neuroscience, The National Science and
Engineering Research Council of Canada and The Canadian
Space Agency.
References
1757
Burns, B & Vollmeyer, R. (1998). Modelling the Adversary
and Success in Competition. Journal of Personality and
Social Psychology, 75 (3), 711-718.
Fischer, G. (2001) User Modeling in Human-Computer
Interaction. User Modeling and User-Adapted Interaction
,11, 65-86.
Hebb, D. O., & Williams, K. A. (1946). A method of rating
animal intelligence. Journal of General Psychology, 34,
59-65.
Jacobs, R & Nowlan, S. (1991). Adaptive Mixtures of Local
Experts. Neural Computation, 3, 79-87.
Kohonen, T. (1982). Self-organizing formation of
topologically correct feature maps, Biological Cybernetics
43 (1), 59-69.
Laird, J. (2000). It Knows What You're Going To Do:
Adding Anticipation to a Quakebot. Presented at the
AAAI Spring Symposium on Artificial Intelligence and
Interactive Entertainment, March.
MacInnes, W.J. (2004) Believability in Multi-Agent
Computer Games: Revisiting the Turing Test.
Proceedings of CHI, 1537.
MacInnes, W.J. & Gilin, D. (2006) Recursion for
Adversarial Modelling : New Evidence.
Member
Abstract, CogSci, 2006.
MacInnes, J., Banyasad, O. & Upal, A.(2001). Watching
Me, Watching You. Recursive modeling of autonomous
agents. Abstracts of the Canadian Conference on AI,
Ottawa, Ontario. 361-364.
MacInnes, W.J. and Gilin, D. (2005) What I think you think
I am going to do next: Perspective-Taking and Recursive
Modeling in Computer Mediated Conflict. Proceedings of
IACM.
Rich, E. (1979). User Modeling via Stereotypes. Cognitive
Science, 3, 329-354.
Shore D., Stanford L. , MacInnes J. , Klein R. & Brown R.
(2001). Of Mice and Men: Using Virtual Hebb-Williams
mazes to compare learning across gender and species.
Cognitive, Affective and Behavioral Neuroscience, 1(1),
83-89.
Thagard, P. (1992). Adversarial Problem Solving: Modeling
and Opponent Using Explanatory Coherence. Cognitive
Science, 16, 123-149.
Turing, A. (1950). Computing Machinery and Intelligence.
Mind, 236, P433.
Webb, G, Pazzani, M & Billsus, D. (2001). Machine
Learning For User Modeling. User Modeling and UserAdapted Interaction, 11, 19-29.
1758