Download Computational Intelligence and Games

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Natural computing wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
International Journal of Automation and Computing
05(1), January 2008, 45-57
DOI: 10.1007/s11633-008-0045-8
Computational Intelligence and Games:
Challenges and Opportunities
Simon M. Lucas
Centre for Computational Intelligence, Department of Computing and Electronic Systems, University of Essex, Colchester CO4 3SQ, UK
Abstract: The last few decades have seen a phenomenal increase in the quality, diversity and pervasiveness of computer games. The
worldwide computer games market is estimated to be worth around USD 21bn annually, and is predicted to continue to grow rapidly.
This paper reviews some of the recent developments in applying computational intelligence (CI) methods to games, points out some of
the potential pitfalls, and suggests some fruitful directions for future research.
Keywords:
1
Games, machine learning, evolution, temporal difference learning (TDL), neural networks, n-tuple systems.
Introduction
Games have long been seen as an ideal test-bed for the
study of artificial intelligence (AI). Early pioneers of computing and information theory such as Alan Turing and
Claude Shannon were intrigued by the idea that computers
might one day play grand-master level chess, and sketched
out ideas of how this might be achieved. Much of the academic work on AI and games has focused on traditional
board games until recently, with the challenge being to beat
humans at a variety of challenging games. This strand of
work has been successful in achieving this goal for many
complex games, such as checkers and chess. Checkers is now
a solved game[1] , the result being a draw if both players play
optimally. Chess is far from being solved, but computers
play at world-leading level. Checkers and chess playing algorithms depend on efficient game-tree search coupled with
a heuristic board evaluation function, which may be handprogrammed or learned. In AI research, the emphasis is on
producing apparently intelligent behaviour using whatever
techniques are appropriate for a given problem. In computational intelligence (CI) research, the emphasis is placed
on intelligence being an emergent property. Examples of
CI methods include evolutionary computation, where intelligence may arise from due to principles of natural selection,
and neural networks, where intelligence may arise from the
collective connected behaviour of many simple neurons.
Both checkers and chess have seen the successful application of CI methods, with Fogel and Chellapilla0 s famous work on Blondie 24[2] (checkers) and Fogel et al.[3] on
Blondie 25 (chess). Both used a fairly standard minimax engine, but instead of using a hand-coded heuristic evaluation
function, used a multi-layer perceptron (MLP). The MLP
had specially designed input layers with spatially arranged
inputs covering all possible sub-squares of the 8 × 8 board.
While the architecture of the MLP was fixed, the weights
were evolved using co-evolution. Essentially, a (15 + 15)
evolution strategy (ES) was used, but the fitness function
was based on how well each network played against a set
of opponent networks drawn from the current population.
Manuscript received September 24, 2007; revised November 7, 2007
E-mail address: [email protected]
The Blondie systems achieved high levels of play both at
checkers and at chess, and have acted as a great inspiration
to many researchers. However, due to the way the MLP
was embedded in a minimax engine, it is unclear how much
of the playing ability comes from the neural network, and
how much from the minimax search.
Other games such as Go have been less susceptible to
this combination of game tree search and heuristic position
evaluation, due to the large branching factor and the difficulty of designing a useful position evaluation function. Go
remains a grand challenge for computer science research,
though very recently Monte Carlo methods have shown
great promise, especially for smaller versions of the game
such as 9 × 9 (standard Go is played on a 19 × 19 board).
While traditional game-tree methods search broad but shallow, Monte Carlo methods are relatively narrow but deep.
A position is evaluated by playing it many thousands of
times using random moves until the end of each game is
reached (at which point the true value of that particular
line of play is known for sure). Each position is then characterized by the probability of winning from that position,
based on those random play outs. Successful Go programs
such as MoGo[4, 5] use a more sophisticated version of this
called UCT Monte Carlo (UCT stands for upper confidence
bounds applied to trees). As each games is played out, common initial moves are shared in a tree structure so that each
line of play updated the statistics for all the tree-nodes that
led to it. More attention is given to playing out promising
nodes in the tree, but this is also balanced by the need for
exploration. UCT Monte Carlo has achieved remarkable results in Go, and the method is a testament to the power of
statistical techniques in general. It is reminiscent of the success of statistical language processing methods compared to
their more grammar-based alternatives[6] . It is likely that
UCT Monte Carlo also has significant potential to be applied to other games. While the branching factor of Go is
much larger than chess, it is still of a different nature to that
encountered in video games, with their virtually continuous
state-spaces.
While the traditional focus of academic research has been
on board games, and this research continues to flourish,
there is also a growing body of work on applying AI and
46
International Journal of Automation and Computing 05(1), January 2008
CI to video games, and many techniques are applicable to
any game regardless of genre, though may require careful
application to get best results. For example, it seems likely
that UCT Monte Carlo may have important application to
control non-player characters (NPCs) in video games, but
although the concept is quite simple, getting the best out of
the method has been the subject of intensive research over
the last few years. In the UK, the Engineering and Physical
Sciences Research Council (EPSRC) has just funded a network grant to foster closer collaboration between industry
and academia in the area of AI and games[7] .
Commercial games are based on increasingly sophisticated simulated environments, and present human players
with world views approaching photo-realism. As virtual
worlds become more complex, it becomes harder to NPCs
with sufficiently interesting and believable behaviour. Current NPC control techniques mostly involve methods such
as finite state machines, scripting, and search. Most of the
behaviour is hand-programmed, involving much human effort to code and to test. Although games companies have
occasionally used AI techniques such as genetic algorithms
and neural networks[8, 9] , the methods used in games often lag behind the state of the art in machine learning.
Similarly, machine learning and AI have much to benefit
from games by applying and extending algorithms to more
challenging environments than are usually encountered[4] .
Furthermore, as games utilise more realistic physics engines, there is scope for greater synergy between games and
robotics research: the control of a humanoid robot, or of
an NPC may eventually share much in common. One challenge for research in this area is to find sufficient common
ground to make collaboration attractive for both parties.
This is because while academics may be most interested in
how smart an NPC is, or how well it learns from its environment, a games company is in the business of selling games,
and is therefore likely to be most interested in how intelligent NPC behavior can be channeled into making games
more enjoyable. Many academics are now eager to test their
methods in the complex virtual worlds of modern console
games, and guided by industrial collaboration. This will
bring great benefits both to the research community and to
the games industry.
Peter Molyneux OBE, an internationally renowned game
designer recently commented: “AI is certainly the undiscovered country of games design. Any game genre – from
hardcore shooters to the most story-driven adventure game
– would be truly revolutionized by AI driving plot, characters and scenarios.”
On the other hand, there is still some resistance to the
more open-ended game play that advanced AI and CI techniques could facilitate. This comment on a blog site1 expresses a view held by many games industry insiders: “I
think the most drastic different in the mind set of an AI
programmer/designer in big budget game and an academic
is that in a real game project you really need to think about
the production issues (anything unpredictable is no-no) . . .”
By their very nature, however, next generation video
games will be more open-ended, and therefore unpredictable. This is in the same way that multi-player on1
Memoni, http://realtimecollisiondetection.net/blog/?p=36, October 13, 2007
line games are more open-ended and less predictable due to
the intelligence and variable behaviour of the other human
players.
The views of game-industry insiders such as these cannot
be simply dismissed. However, it is likely that their objection to more open-ended AI is based on what seems to be
currently achievable, rather than what might one day be
achieved. Hence it is not unpredictable behaviour that is
unacceptable, but annoying or uninteresting behaviour.
As NPC behaviour becomes more intelligent, and NPCs
are better able to learn, this opens the way for new genres
of game. For instance, the NERO game[10] involves players
pre-training characters to act according to a desired tactical
doctrine. This is achieved by NERO using a neuro-evolution
approach internally to evolve soldier behaviours that satisfy
the given training tasks. Fig. 1 shows a screenshot from the
game.
Fig. 1
A screenshot from the NERO game
NERO presents an example of the open-ended game-play
that CI methods enable, and one can expect to see many
new games that exploit CI in various ways.
Games have also been studied from a social, economics,
and financial perspective. A classic social game is the iterated prisoner0 s dilemma (IPD)[11] , which has led to many CI
studies of evolving and co-evolving strategies for this game.
CI methods have also been applied to many aspects of finance, such as supply-chain management[9] and sequential
bargaining[12] .
CI methods have been used not only to optimize agent
strategy, but also to design marketplace rules. A good example of this can be found in the work of Dave Cliff. He
noted that Gode and Sunder[13] had found that stochastic agents with zero intelligence (ZI) could approximate
the equilibrium price when trading in a suitably structured
market. Cliff[13] showed that the ZI agents could vary significantly from this performance when the market structure
was varied from the Gode and Sunder[14] setup. To overcome this, Cliff added a small amount of intelligence to the
agents (hence ZI plus (ZIP)), endowing each one with a
simple learning rule that adjusted their behaviour based on
observations of the last transaction. The ZIP agents were
shown to better approximate human behaviour than the ZI
ones.
S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities
Cliff then reasoned that if such simple agents could perform well, they could be used within a fitness function for
an evolutionary algorithm where the aim was to optimize
the design of the marketplace[15] . This is a great example of
the creative application of CI. In game terms, this is using
evolution to design the game rather than agents to play the
game.
2
Playing games with CI
This section describes the two main ways in which a neural network or fuzzy system can be interfaced to a game.
These are as an action selector, or as a value function.
As an example, we consider the Mountain Car task introduced by Moore[16] , and also described in [17, pp. 214].
The problem is illustrated in Fig. 2. The task is to drive a
car to reach a goal at the top of the hill on the right (the
rightmost vertical line), but the engine force is weaker than
gravity. There is a barrier to the left (the leftmost vertical line) which the car stops dead at. At each time step,
there are three possible actions: accelerate left (at = −1),
accelerate right (at = 1), or apply no force (at = 0). The
car engine has insufficient power to overcome gravity, so the
problem is a bit deceptive: to reach the goal as quickly as
possible, it may be necessary to accelerate (apply a force)
away from the goal in some circumstances. The following
equations specify the update rules for position (s) and velocity (v).
47
generate a set of future states. Each future state is then
evaluated by the value function. The action is then chosen
that would lead to the state with the highest value. Fig. 3
shows that for this example, the state reached by taking
the accelerate right action has the highest value, and so
that would be the chosen action. Value function methods
are very flexible because more computation can be applied
to look further into the future, and hence get a truer picture
of which actions lead to the best states in the long run.
Fig. 3 Illustration of the value function applied to the Mountain
Car problem
To interface an action selection network to this, the inputs to the network are the current state of the system
(defined by the car0 s current position and velocity), and
the output must be interpretable as one of the possible actions. For complex action sets, one might use one output
per possible action, and take the action associated with the
highest output value as in Fig. 4.
Fig. 2
The Mountain Car problem
st+1
=
bounds (st + vt+1 )
(1)
vt+1
=
boundv (vt + 0.001at − 0.0025cos(3st )). (2)
The bounds for s and v are as follows:
−1.2
≤ st+1
≤ 0.5
(3)
−0.07
≤ vt+1
≤ 0.07.
(4)
Each epoch (trial) starts with the car having a random
position and velocity drawn uniformly within the above
bounds. Each trial terminates when the car reaches the
goal, or after 2 500 steps. The best solutions for this particular configuration of the problem solve the task in approximately 52 steps on average (each step has a cost of 1, so
this is referred to as a score of −52).
The value function method has been a popular way to
solve this, and is depicted in Fig. 3. All possible actions
are applied to a model of the current state of the car to
Fig. 4 Illustration of the action selection method applied to the
Mountain Car problem
For this case, there are three possible outputs that have
a naturally ordered relationship, so a natural way to code is
to have a single output unit, and define the output intervals
to represent left, neutral, and right.
48
International Journal of Automation and Computing 05(1), January 2008
The mountain problem was considered to be reasonably
hard for neural networks to solve, and was used as a benchmark by Whiteson and Stone[18] as an initial test for their
NEAT+Q evolutionary/temporal difference hybrid learning algorithm, where NEAT is an abbreviation for neuroevolution with augmenting topologies, and +Q denotes the
addition of Q-learning, a particular type of temporal difference learning (TDL).
Running TDL with a MLP (or a CMAC architecture)
may produce value functions that have a meaningful interpretation, in this case, the value of each state is the negative of the number of steps that the agent should take
to reach the goal from that state. Whiteson and Stone[18]
solved the problem with a combination of neuro-evolution
and Q-learning (a kind of TDL). They noted that the value
functions produced by NEAT+Q were very different from
the TDL ones, yet the performance of these functions was
very similar.
Either TDL or evolution can train a network to perform
well at this task, and take hundreds (or even thousands) of
epochs or fitness evaluations. The value function learned by
TDL using a grid-based decomposition of the input space
and after 3000 epochs is shown in Fig. 5. It is shown that
the value had an average score of -61. The performance
is similar but inferior to the results reported for CMAC
architectures which use perturbed overlapping grids, but
illustrates how TDL attempts to learn a meaningful value
function.
interpretation of the output explained above, optimal solutions could be tanh(wv) where w is a single weight with a
high value such as 10, enough to make the neuron behave
like a sign function sgn(v).
For the value function, abs(v) is near-optimal, as shown
in Fig. 6, which assigns the value of a state to the absolute
value of the velocity in that state. It is also worth noting
that genetic programming would be expected to produce
solutions for this problem with relative ease, especially if
an abs(·) function was included in the terminal set.
Fig. 6
There are two important points raised here. First, the
problem setup is critical. The problem can be made easy or
hard depending on the details of the setup (input coding,
action versus state value function). The second point that
leads from this is that one should be wary of drawing any
conclusions from tests on problems that have such simple
solutions.
3
Fig. 5
A value function learned with TDL after 3000 epochs
However, as we can see, the problem can be made trivial depending on how it is presented to the learner. While
experimenting on this problem, it was observed that the
value function could look very different from run to run,
while generally converging to networks with high performance. Furthermore, it was observed that when the inputs
used were the square of the velocity and the position, it
was easily solved by a single layer perceptron network, with
all the best solutions tending to ignore the position, and
strongly weigh the velocity. This led to the hypothesis that
only the absolute value of the velocity matters for this problem, and this was confirmed empirically, with abs(v) having
an average score of -51.97 (standard error = 0.09) based on
100 000 trials. This compares favourably with the Whiteson and Stone[18] quoted averages of 52.75 (NEAT+Q) and
52.02 (Sarsa + CMAC), though NEAT+Q took hundreds
of thousands of episodes to reach that performance.
This leads to two very simple near optimal solutions, depending on whether a state value or direct action method
is used. For an action selection network with the ordered
A near-optimal solution
Learning game strategy
For many real-world game applications, it is vital that
learning can happen quickly. The main methods for game
strategy learning are TDL and evolution (including coevolution). From previous works of the author and others, it seems that these methods have significantly different
strengths and weaknesses. When it works, TDL often learns
much more quickly than evolution. This is not surprising
in one sense, since TDL learns a value function that is constantly updated during game-play. However, note that for
the most part this is done in an unsupervised way; it simply learns that game states that are close in time should
have similar values. At certain points in the game (which
points depends on the game; typically the end of the game
in the case of board games), the true game-state is known,
and over the course of training, this ties the value of states
close to those states, and so on. The problem is that for
non-trivial games, and for complex function approximators
such as MLPs, there is no guarantee that the value function
will converge to a reasonable value (let alone optimal).
Until recently, there had been very little work comparing
TDL with evolution. With recognition of the importance
of such comparisons, the body of work is growing steadily.
Previous comparisons deal mostly with learning evaluation
functions for board games.
An influential early experiment on using TDL to learn
board game evaluation functions is due to Tesauro[19] ,
who achieved world-class performance when training neural
network-based Backgammon evaluation functions with self-
S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities
play. Pollack and Blair[20] tried training evaluation functions using the same game and function representation, but
instead of TDL they used the simplest possible form of evolutionary algorithm (EA), a hill-climber where a single individual was repeatedly mutated, and the mutation was kept
only if it won a number of games over the non-mutated individual. The algorithm worked, but its end results were
inferior to those of Tesauro[21] .
Darwen[22] did a set of similar comparisons for backgammon, and found that a population-based EA eventually outperformed TDL when training a linear board evaluation
function, even though the EA was much slower. However,
when training a nonlinear evaluation function, board evaluators trained by the EA never reached the performance of
those trained by TDL for the simple reason that the EA
took too long; to be effective, the EA needed to evaluate
the same pair of individuals many times.
Runarsson and Lucas[23] investigated TDL versus coevolutionary learning (CEL) for small-board Go strategies.
There it was found that TDL learned faster, but that with
careful tuning, CEL eventually learned better strategies.
In particular, with CEL it was necessary to use parentoffspring weighted averaging in order to cope with the effects of noise. This effect was found to be even more pronounced in a follow-up paper by Lucas and Runarsson[24] ,
comparing the two methods for learning an Othello position
value function.
Kotnik and Kalita[25] found that evolution outperformed
TDL at learning to play the card-game rummy, which unlike the board games in the above studies is not a game of
perfect information.
There have also been some comparisons using controlbased problems. Taylor et al.[26] compared TDL and the
NEAT neuro-evolution algorithm for ‘keep-away’ robocup
soccer, and found that evolution could learn better policies, though it took more evaluations to do so. Results also
showed that TDL learned better policies when the task was
fully observable and NEAT learned faster when the task
was deterministic.
Gomez et al.[27] investigated an impressive range of reinforcement techniques, including several versions of neuroevolution and TDL, on four increasingly difficult versions of
the benchmark pole-balancing problem, ranging from simple to very hard. In striking contrast to some other studies,
the best evolutionary methods universally outperformed the
best temporal difference-based (TD-based) methods, both
in terms of learning speed and in terms of which methods could solve the harder versions of the problem at all.
Further, there was significant differences between different
neuro-evolutionary and TD-based methods, with the best
TD-based techniques sometimes outperforming some evolutionary techniques. The relative ordering of the algorithms
was similar across the different versions of the problem. One
feature of all their pole balancing problems was that they
could be solved by relatively small neural networks.
Some general results are beginning to emerge, but there is
still much to be learned. One tendency that can be observed
in many (though not all) of the observed studies is that TDL
learns faster than evolution, but evolution eventually learns
better strategies. Under which conditions this is true is an
important question.
4
49
TDL versus evolution for simulated
car racing
The choice of game to use for benchmarking CI methods
is important. On one hand, it is preferable for it to be
simple and fast to compute, since some learning algorithms
may require millions of simulated game steps to converge.
If it is too simple, however (as could be said of Mountain
Car), then it is questionable what can be learned from it.
Lucas and Togelius[28] compared TDL and evolution for
a simplified car racing game. In that study, they used a
point-to-point car racing game, i.e., the objective being to
drive a car to visit as many way-points as possible within a
given number time steps (they used 500). The way-points
were randomly generated, and had to be visited in order. At
any time the agent could only see the next three way-points.
The cars also had simple underlying models. This made the
simulation very fast to compute, allowing millions of gamesteps per second to be calculated (though the actual speed
would then depend on the complexity of the controller).
Fig. 7 shows a sample run for a hand-designed heuristic state value function, illustrating that the controller got
stuck orbiting a way-point. The heuristic was to value proximity to the next way-point, while punishing excess speed,
which can lead to oscillations, or the orbiting shown here
(for more details see [28]).
Fig. 7 A naive controller that became stuck after orbiting 5 waypoints
Because each trial or episode (i.e., a run of 500 time
steps) is randomly generated, there is a natural random
variation in the scores obtained by a particular agent. For
this task, average scores of about 20 are reasonable, above
30 are good, and above 40 are very good. Two types of car
were used, normal and holonomic. The normal car was a
simplified model of a bang-bang type radio controlled toy
car, which at each time step had choices: accelerate forward, accelerate backward, steer left, and steer right. The
holonomic car was modelled as a point mass. Again the
controller had 5 choices at each time step: apply no force,
or apply a force in one of the four compass point directions.
Each network had three inputs: the square of the velocity,
the distance to the next waypoint, and the angular differ-
50
International Journal of Automation and Computing 05(1), January 2008
ence between the current heading of the car and a straight
line connecting the car with the next waypoint.
Overall, the best results were obtained by evolving state
value MLPs. However, when TDL worked, it could learn
these state values much more quickly.
For the normal car, TDL was competitive with evolution,
with the very best learned controllers having very similar
performance. While evolution also worked well for the holonomic car, TDL failed badly for this case. For the normal
car, when TDL did learn it often achieved high performance
within the first 10 epochs, and in these cases offered much
faster learning than evolution. A sample successful run is
shown in Fig. 8. Note that that score is plotted against
epoch for TDL rather than against generation (used for evolution). An epoch is approximately equivalent to a single
fitness evaluation in terms of the computation required.
One of the most surprising conclusions of this work was
just how sensitive the results were to details of the setup
that might have hitherto seemed unimportant. It was observed that the MLP failed to learn a good controller for
the holonomic car when all three inputs were used. The
holonomic car does not have a heading in the sense of a
normal car, since it can be accelerated in any of the four
directions independently of its current direction of travel.
Hence, the heading information seems to act as a spurious
input. Evolution failed to learn to ignore this, but was much
more successful when using only the two-input version, as
can be seen in Tables 1–2.
Regarding the learning speed of each technique, some
sample runs demonstrate this. Fig. 9 shows an average of
20 runs of the evolutionary algorithm, evolving an MLP for
the normal car (might replace with a perceptron).
Table 1
Statistics for 20 learning runs of the normal car
Method
Mean
Standard error
EVO-MLP
35.0
0.24
EVO-perceptron
30.2
0.33
TDL-perceptron
26.2
1.4
Heuristic
18.8
1.0
Table 2
Statistics for 20 runs of the holonomic car
Method
Mean
Standard error
EVO-MLP-2
32.2
0.06
EVO-perceptron-2
26.7
0.01
EVO-perceptron-3
18.5
2.1
EVO-MLP-3
11.3
0.7
Heuristic
26.8
0.21
The extent to which results from one domain (e.g.
rummy) can be used to make predictions for another domain, such as car racing or Othello is unclear. One of the
significant findings of the current work is that even when
tasks are very similar (and all to do with a simple car racing problem), minor changes in the problem setup can introduce unpredictable biases resulting in significant changes in
the relative performance of each method. There currently
seems to be very little theoretical research that can help us
here, and a great need for empirical investigation.
Fig. 8 Learning a perceptron-based state-controller with TDL
(a successful run)
Fig. 9
5
Evolving a state-value MLP for controlling a normal car
Othello, and the choice of architecture
The first strong learning Othello program developed was
Bill[29, 30] . Later, the first program to beat a human champion was Buro and Logistello[31] , the best Othello program
from 1993–1997. Logistello also uses a linear weighted evaluation function but with more complex features than just
the plain board. Nevertheless, the weights are tuned automatically using self-play. Logistello also uses an opening
book based on over 23 000 tournament games and fast game
tree search[8] .
More recently, Chong et al.[32] co-evolved a spatially
aware MLP for playing Othello. Their MLP was similar to
the one used by Fogel and Chellapilla[2] for playing checkers, and had a dedicated input unit for every possible subsquare of the board. Together with the hidden layers, this
led to a network with 5 900 weights, which they evolved
with around 100 000 games. The weighted piece counter
(WPC) used in the current paper has only 64 weights. The
results below show that optimal tuning of such WPCs can
take hundreds of thousands of games, and relies heavily on
parent-child averaging. These considerations suggest that
further improvement in the performance of evolved spatial
MLPs should be possible.
In this section we review recent results on learning a position value function for Othello.
Othello is a challenging unsolved game, where the best
computer players already exceed human ability. It is an
51
S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities
interesting benchmark for CI methods, and the author has
been running a series of competitions to find the best neural
network (or other function approximation architecture) for
this game. First, we give a brief explanation of the game.
Othello is played on an 8 × 8 board between two players,
black and white (black moves first). At each turn, a counter
must be placed on the board if there are any legal places to
play, else the player must pass. At each move, the player
must place a counter on an empty board square to “pincer”
one or more opponent counters on a line between the new
counter and an old counter. All opponent counters that are
pincered in this way are flipped over to the color of the current player. The initial board has four counters (two of each
color) with black to play first. This is shown in Fig. 10, with
the open circles representing the possible places that black
can play (under symmetry, all opening moves for black are
identical). The game terminates when there are no legal
moves available for either player, which happens when the
board is full (after 60 moves, since the opening board already has four counters on it), or when neither player can
play. The winner is the player with the most pieces of their
color at the end of the game.
the site. When a network is uploaded, it is played against
the standard heuristic WPC for many games (initially 1 000,
but this has been reduced to 100 to reduce load), and this
gives it a ranking in the trial league. Then, for particular
competition events, entrants are allowed to nominate two of
their best networks to participate in a round-robin league.
For the competitions, and for the results in this paper,
all play is based on value functions evaluated at 1-ply. All
games are played with a 10% chance of a forced random
move. Hence, this is no longer strictly speaking Othello,
but it is a better benchmark for our purpose. When evaluating two feed-forward neural networks on the true game of
Othello, there are only two possible outcomes, depending
only on who plays first.
The best network found in this way so far was an MLP.
Co-evolution finds it hard to learn MLPs for this task, and
for a long time the best network was a TDL-trained MLP.
For the 2006 IEEE Congress on Evolutionary Computation
(CEC) Othello competition, however, a new champion was
developed by Kyung-Joon Kim and Sung-Bae Cho. They
seeded a population with small random variations of the
previous best MLP, and then ran co-evolution for 100 generations. This was able to produce a champion that performed in the round-robin league significantly better than
the other players, and than the TDL-trained MLP that it
was developed from. This points toward the value of TDL
/ evolution hybrids.
5.1
Fig. 10
The opening board for Othello
As play proceeds, the piece difference tends to oscillate
wildly, and some strategies aim to have few counters during
the middle stages of the game to limit possible opponent
moves. Fig. 11 shows how piece difference can change during
the course of a game.
Fig. 11 Typical volalite trajectory of piece difference during the
game of Othello
The author has been running an Othello neural network
web server for the past two years. During that time, well
over one thousand neural networks have been uploaded to
Parent-child averaging
A surprising result obtained by Runarsson and Lucas[23]
for small-board Go, and validated even more dramatically
for Othello[24] , is the way that standard co-evolution can
fail to learn very well. In both these studies, forced random
moves were used to get a more robust evaluation of playing ability. To get good performance from co-evolution,
they found it essential to use parent-child averaging. This
is a technique that had been previously used in evolution
strategies, and was also used by Pollack and Blair[20] with
their (1 + 1) ES for back-gammon. Runarsson and Lucas[23]
experimented with a wide variety of settings for the ES,
but unless parent child weighted averaging was used, all
performed poorly. The following equation explains the process. The weights of the parent neural network are in vector
w0 and the weights of the best child network are in vector
w. The β factor controls the weighting between parent and
child. With β = 1.0 we get a standard ES. With β = 0,
no information from the child is used at all, so no evolution occurs. Remarkably, they found that small values of
β worked best. The results plotted in Fig. 12 (from [24])
show how standard co-evolution (i.e. a standard ES with
relative fitness function) fails to learn much, but learning
does occur with β = 0.05. Overall, they found that coevolution with weighted parent-child averaging performed
better than TDL, and was able to learn a WPC that can
outperform the standard heuristic weights. TDL was not
able to match this, and while it learned more quickly, it was
unable to match the standard heuristic weights.
w
~ 0 ← βw
~ + β(w
~i − w
~ 0 ).
(5)
52
International Journal of Automation and Computing 05(1), January 2008
Fig. 12 Co-evolutionary learning average performance (probability of a win) versus the heuristic player, plotted against generation (The 1/45000 term indicates the scaling on this axis; multiple each value by 45000 to find the number of games played.
The grey lines indicate one standard deviation from the mean,
and this run used a (1, 10) ES[24] .)
5.2
n-tuple architectures
Given that background, this now brings us on to the subject of architecture. In the field of CI and games, by far
the most popular choice of architecture is the neural network. This comes in various forms, but perceptrons and
multi-layer perceptrons are the most common. Also popular are more flexibly structured networks such as those
created with NEAT, where there is no distinct layering and
connections may evolve between any neurons, and the number of neurons is also not fixed.
Very recently, the author has experimented with using ntuple networks for this task. n-tuple networks date back to
the late 1950s with the optical character recognition work
of Bledsoe and Browning[33] . More detailed treatments of
standard n-tuple systems can be found in [34, 35]. They
work by randomly sampling input space with a set of n
points. If each sample point has m possible values, then the
sample point can be interpreted as an n digit number in base
m, and used as an index into an array of weights. The value
function for a board is then calculated by summing over all
table values indexed by all the n-tuples. The n-tuple works
in a similar way to the kernel trick used in support vector
machines (SVMs), and is also related to Kanerva0 s sparse
distributed memory model. The low dimensional board is
projected into a high dimensional sample space by the ntuple indexing process.
This work is still in its initial stages, but has already
proved to be remarkably successful. An n-tuple network
trained with a few hundred of self-play games was able to
significantly outperform the 2006 CEC champion.
Fig. 13 illustrates the system architecture but shows only
a single n-tuple. Each n-tuple specifies a set of n board
locations, but samples them under all equivalent reflections
and rotations. Fig. 13 shows a single 3-tuple, sampling 3
squares along an edge into the corner.
Fig. 13 The system architecture of the n-tuple-based value function, showing a single 3-tuple sampling at its eight equivalent
positions (equivalent under reflection and rotation)
5.3
How it works
Fig. 13 illustrates the operation of a single n-tuple. The
value function for a board is simply the sum of the values
for each n-tuple. For convenient training with error backpropagation, the total output is put through a tanh function. Each n-tuple has an associated look-up table (LUT).
The output for each n-tuple is calculated by summing the
LUT values indexed by each of its equivalent sample positions (eight in the example). Each sample position is simply interpreted as an n digit ternary (base three) number,
since each square has three possible values (white, vacant,
or black). The board digit values were chosen as (white=0,
vacant=1, black=2). By inspecting the board in Fig. 13,
it can be seen that each n-tuple sample point indexes the
look-up table value pointed to by the arrow. These table
values are shown after several hundred self-play games of
training using TDL. The larger the black bar for a LUT
entry, the more positive the value (the actual range for this
figure was between about +/ − 0.04). Some of these table entries have obvious interpretations. Good for black
means more positive, good for white means more negative.
The LUT entry for zero corresponds to all sampled squares
being white: this is the most negative value in the table.
The LUT entry for twenty six corresponds to all sampled
squares being black: this is the most positive value in the
table. This can be expressed in the following equation.
v(b) =
X
l(d)
(6)
d∈D(b)
where b is the board, v(b) is the value of the calculated value
of the board, d is a sampled n digit number in the set D(b)
of samples, given the n-tuple, and l is the vector of values
in the LUT.
Given the explanation above for how the value function is
calculated, the l entries of LUT can be seen as the weights of
a single layer perceptron. The indexing operation performs
a non-linear mapping to high-dimensional feature space, but
that mapping is fixed for any particular choice of n-tuples.
53
S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities
Since a linear function is being learned, there are no local
optima to contend with.
The TDL training process, as shown in Algorithm, is very
simple, and can be explained in two parts.
The first is how it is interfaced to the Othello game. The
game engine calls a TDL update method for any TDL player
after each move has been made. It calls “inGameUpdate”
during a game, or “terminalUpdate” at the end of a game.
To show just how simple the process is, the code of these
algorithm is shown as below.
Algorithm. The main two methods for TDL learning in Othello.
public void inGameUpdate(double[ ] prev, double[ ] next)
and taking a random walk from that point. At each step
of the walk, the next square is chosen as one of the eight
immediate neighbours of the current square. Each walk was
for six steps, but only distinct squares are retained. So each
randomly constructed n-tuple had between 2 and 6 sample
points. The results in this paper are based on 30 such ntuples. One would expect some n-tuples to be more useful
than others, and there should be scope for evolving the ntuples sample points while training the look-up table values
using TDL.
Fig. 14 shows how performance improves with the number of self-play games. After every 25 self play games, performance was measured by playing 100 games against the
standard heuristic player (50 each as black and white).
{
double op = tanh(net.forward(prev));
double tg = tanh(net.forward(next));
double delta = alpha · (tg − op) · (1 − op · op);
net.updateWeights(prev, delta);
}
public void terminalUpdate(double[ ] prev, double tg)
{
double op = tanh(net.forward(prev));
double delta = alpha · (tg − op) · (1 − op · op);
net.updateWeights(prev, delta);
}
The variables are as follows: op is the output of the network; tg is the target value; alpha is the learning rate (set
to 0.001); delta is the back error term; prev is the previous
state of the board; next is the current state of the board;
“net” is an instance variable bound to some neural network
type of architecture (an n-tuple system in this case).
The n-tuple system implements the “net” interface, and
an instance of one is bound to the “net” in the code. The
forward method calculates the output of the network given
a board as input. The “updateWeight” method propagates
an error term, and makes updates based on this in conjunction with the board input. For the n-tuple system the
update method is very simple. While the value function was
calculated by summing over all LUT entries indexed by the
current board state, the update rule simply adds the error
term δ to all LUT entries indexed by the current board:
l(d) = l(d) + δ,
∀d ∈ D(b)
(7)
One of the best features of an n-tuple system is how it
scales with size. Because of the indexing operation, it is
independent of the size of the LUT. Therefore, although
the LUT size grows exponentially with respect to n, the
speed remains almost constant, and linear in the number of
n-tuples. Hence, n-tuple value functions with hundreds of
thousands of weights can be calculated extremely quickly.
5.4
Choosing the sample points
The n positions can be arranged in a straight line, in a
rectangle, or as random points scattered over the board.
The results in this paper are based on random snakes:
shapes constructed from random walks. Each n-tuple is
constructed by choosing a random square on the board,
Fig. 14 Variation in win ratio against a heuristic player (each
sample point based on 100 games, 50 each as black and white)
(The x-axis, nGames/25, indicates the scaling; multiply each
value by 25 to find the number of games played. The y-axis
shows the ratio of the number of wins to the total number of
games played)
Table 3 shows how performance against the 2006 CEC
champion varies with the number of self play games (in this
case playing 200 games against the champion (100 each as
black and as white), where nsp denotes the number of selfplay games. After the first 500 self-play games have been
played, the champion is defeated in nearly 70% of games.
Table 3 Performance of TDL n-tuple player versus 2006 CEC
champion over 200 games, sampled after varying nsp
nsp
Won
Drawn
Lost
250
89
5
106
500
135
6
59
750
142
5
53
1000
136
2
62
1250
142
5
53
Not only has the n-tuple based player reached a higher
level of performance than any player to date, it has also
done so much more quickly. While n-tuple systems are well
known to a small base of appreciative users for their high
speed and reasonable accuracy, their accuracy is usually
not quite as high as the best neural networks or SVMs for
many pattern recognition tasks. Game strategy learning
is different to pattern recognition in important ways: the
data is acquired through active exploration, and the data is
typically much noisier. This might be the reason why they
54
International Journal of Automation and Computing 05(1), January 2008
perform so well on this task.
6
Future directions
6.1
Robotics and games
The simulation middleware that underlies many video
games enables the development of more lifelike games to be
developed with greater ease. With high quality middleware
to support game development, developers can concentrate
their creativity on creating fun environments with appropriate interest and challenge, with the game-play arising
naturally from the situation rather than having to be preprogrammed. For example, until recently, the graphics for
explosions would have to be designed with a great deal of
manual effort. With modern physics engines such as Ageia0 s
PhysX, it can be simulated as a particle system; the graphics then arise directly from the physical model. The explosion can then depend very naturally on the amount of fuel
in the tank, for example.
A major interest of our research group has been in car
racing challenges. Competitive car driving is a problem of
great practical importance, and has received some attention
from the CI community. Most often researchers have used
various learning methods for developing controllers for car
racing simulations or games[36, 37] . However, CI techniques
have also been applied to physical car racing, famously by
Thrun in the DARPA Grand Challenge[38] (where DARPA
is defense advanced research projects agency), but also
by e.g. Tanev et al.[39] who evolved controllers for radiocontrolled toy cars.
The author has recently been involved with work developing a robotic car racing platform2 . Robotic car racing
offers the same type of challenge whether done on full size
cars or model cars, but model cars are much cheaper. If
the research can be done in simulation, then of course it
is cheaper still, though ultimately less convincing, at least
until transferred back to the real world.
As a starting point, take the challenge of driving quickly
along a path or road using a computer vision system as
the main source of input. The vision problem is of similar
complexity whether doing it on a real-world car, or using
the video output of a modern console or personal computer
(PC) racing game. This can be seen from the examples.
Fig. 15 shows an image captured by the web-cam on our
robotic model car (while in motion at about 10 miles per
hour). Fig. 16 shows a screen shot taken from Sega Rally
running on a PC.
While the vision problems are of similar complexity (depending on the details of the environment or track), the
game version typically offers much more forgiving physics
than the real world, especially when it comes to car damage!
We are currently investigating running a car racing competition using an on-line commercial gaming lobby such as
XBox live, where all the competitors are software agents
processing the real-time video from the console output to
drive the tracks as competitively as possible.
2
http://dces.essex.ac.uk/staff/lucas/roborace/roborace.html
Fig. 15
An image captured on-board our autonomous model car
Fig. 16
6.2
A screen shot from Sega Rally
Direct video input mode
Direct video input mode has much wider application than
car racing. The idea is that the input to the software agent
is simply the real-time video output from the game (audio
could also be included).
One of the main difficulties for academic research on CI
methods for video games is the time taken to learn the application programming interface (API) to interact with a
complex game. While we may look forward to more standardization of NPC APIs in the future, the possibility of
circumventing API issues by giving the software agent exactly the same view as a human player offers tremendous
challenges, but for some types of game (especially car racing
games) it may be possible to make rapid progress. Indeed,
results have already been obtained for a relatively simple
3D car racing game using CI methods[37] .
More recently, this screen capture mode has been used
to enable software agents to play Ms. Pac-Man without requiring any access to details of the software.
Fig. 17 shows an agent under test controlling the Ms.
Pac-Man character using the direct screen capture mode of
operation. The Ms. Pac-Man game is shown to the right,
the extracted game objects are shown to the left, and the
top window shows the direction key currently selected by
the agent. The screen is captured approximately 15 times
per second. The exact number depends on the speed of
S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities
the computer and the amount of processing performed by
the agent control algorithm. After each screen is captured,
some basic image processing routines (e.g. connected component analysis) are then run to extract the main game
objects, such as the Ms. Pac-Man agent, the ghosts, the
power pills, and the food pills.
Fig. 17 An agent under test controlling the Ms. Pac-Man character via direct screen capture
These extracted objects are then given as input to the
agent under test. In response to this, the agent generates a
key event (one of the cursor keys) to control the movement
of the Ms. Pac-Man agent.
Because of the inherent delays in the screen capture process and the image processing operations, the state of the
game that the agent sees will often be out of date at the
point of delivery, and any further processing done by the
agent only adds to the delay in sending the key event to
the operating system to control the game. This delay is
typically variable, but is often between 50 ms and 100 ms.
Naive agent controllers that ignore this effect often spend
much of their time oscillating around a junction while missing the turning! From a CI viewpoint this merely adds to
the challenge of developing high quality controllers. This
has been run as a competition for the 2007 IEEE CEC, and
will also be run for 2008 IEEE World Congress on Computational Intelligence (WCCI). So far, the leading entry
has been one called Pacool, supplied by Abbas Mehrabian
and Arian Khosravi, which has scored over 17 000 points
when run in this mode. The current human world champion record stands at 933 580 (by Abdner Ashman, who
cleared 133 screens in the process), so there is some way
to go yet until software agents can compete with the best
humans at this game.
6.3
Competitions
There are many game-related competitions, and this is a
great way to drive research forward. One of the most active
fields of game research is Go, and the dedicated Go community is also one of the best at running regular competitions.
Without this, radical new approaches such as MoGo,
which really was a major departure from conventional com-
55
puter Go wisdom, would have most likely taken much longer
to become established.
Well-designed competitions are simply the best way to
establish which techniques work best. Some games require
interesting AI rather than competitive AI. Interesting behaviour is in the eye of the beholder: it is subjective and
naturally hard to measure, though statistically there may
be much agreement about what is interesting or not. Other
games (especially real-time strategy) do require smarter AI,
and any games company wishing to discover the best AI for
their game might find that publishing the appropriate API
and running open competitions offered a very cost-effective
way to do this.
This can be done for the entire behaviour of an agent,
or to measure performance over certain tasks. This also
enables a rapid transfer of technology between academia
and the games industry. As for some standard tasks, it will
be relatively easy to test the various performance aspects
of a given algorithm or component. The use of standard
benchmarks also makes entry into this research area attractive for an academic. This has already been seen for simpler
game environments such as the Stanford general game playing competition3 , the Essex Othello neural network server4 ,
project Hoshimi5 and simulated Car racing6 .
There are many interesting challenges and opportunities
involved in making these competitions work for more complex games, including the standardization of sensory inputs,
body parts, control systems and actuators.
There are also great opportunities for companies wishing to outsource advanced software agent development work
where it is unclear who may be the best providers for the
required system. In such cases, a competition can be run
where the reward goes to the entrants who provide the best
solutions. This is essentially what DARPA did by organizing the grand challenge, but it can also be done using the
web for smaller scale challenges.
7
Conclusions
There exists a diverse range of game genres on which to
test and apply CI methods, and this paper has mentioned
only a few examples. CI methods can be used to successfully develop competitive agents for these games to control
non-player characters. However, the study of the Mountain
Car problem, point-to-point car racing, and Othello led to
some remarkable observations. In the case of the Mountain
Car problem, we showed how certain problem setups could
render the problem trivial, so that running an evolutionary algorithm often revealed near-optimal solutions in the
initial randomly constructed population.
In point-to-point car racing we observed that learning
performance could be severely impaired by the presence of
an unnecessary input.
In the case of Othello, there were two main findings.
First, standard co-evolution failed to learn good quality
WPCs. However, by using parent-child weighted averaging
we were able to learn high quality WPCs, the best of which
3
http://games.stanford.edu
http://algoval.essex.ac.uk:8080/othello/html/Othello.html
5
http://www.project-hoshimi.com
6
http://julian.togelius.com/cig2007competition
4
56
International Journal of Automation and Computing 05(1), January 2008
even outperformed the standard heuristic weights. Second,
by adopting a radically different type of neural network, an
n-tuple system, we were able to easily outperform the best
performing MLPs for this problem.
In summary, much experimentation and perhaps even
rather novel techniques may be necessary to get best performance from CI methods. This makes the area very interesting and challenging to research.
References
[1] J. Schaeffer, N. Burch, Y. Björnsson, A. Kishimoto, M.
Müller, R. Lake, P. Lu, S. Sutphen. Checkers Is Solved.
Science, vol. 317, no. 5844, pp. 1518–1522, 2007.
[2] K. Chellapilla, D. B. Fogel. Evolving an Expert Checkers
Playing Program without Using Human Expertise. IEEE
Transactions on Evolutionary Computation, vol. 5, no. 4,
pp. 422–428, 2001.
[3] D. B. Fogel, T. J. Hays, S. L. Hahn, J. Quon. An Evolutionary Self-learning Chess Program. Proceedings of the IEEE,
vol. 92, no. 12, pp. 1947–1954, 2004.
[4] S. Gelly, Y. Wang, R. Munos, O. Teytaud. Modification of
UCT with Patterns in Monte-Carlo Go, Technical Report
6062, INRIA, France, 2006.
[5] Y. Wang, S. Gelly. Modifications of UCT and Sequence-like
Simulations for Monte-Carlo Go. In Proceedings of IEEE
Symposium on Computational Intelligence and Games,
IEEE Press, pp. 175–182, 2007.
[6] E. Charniak. Statistical Language Learning, MIT Press,
Cambrige, Massachusetts, USA, 1996.
[7] S. Colton, P. Cowling, S. M. Lucas. An Industry/Academia
Research Network on Artificial Intelligence and Games
Technologies, Technical Report EP/F033834, EPSRC,
Swindon, 2007.
[8] M. Buro. ProbCut: An Effective Selective Extension of the
Aalpha-Beta Algorithm. ICCA Journal, vol. 18, no. 2, pp.
71–76, 1995.
[15] D. Cliff. Explorations in Evolutionary Design of Online
Auction Market Mechanisms. Journal of Electronic Commerce Research and Applications, vol. 2, no. 2, pp. 162–175,
2003.
[16] A. Moore. Efficient Memory-based Learning for Robot Control, Ph. D. dissertation, University of Cambridge, UK,
1990.
[17] R. Sutton and A. Barto. Introduction to Reinforcement
Learning, MIT Press, Cambridge, MA, USA, 1998.
[18] S. Whiteson, P. Stone. Evolutionary Function Approximation for Reinforcement Learning. Journal of Machine Learning Research, vol. 7, pp. 877–917, 2006.
[19] G. Tesauro. Temporal Difference Learning and TDgammon. Communications of the ACM, vol. 38, no. 3, pp.
58–68, 1995.
[20] J. B. Pollack, A. D. Blair. Co-evolution in the Successful
Learning of Backgammon Strategy. Machine Learning, vol.
32, no. 3, pp. 225–240, 1998.
[21] G. Tesauro. Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy”. Machine Learning,
vol. 32, no. 3, pp. 241–243, 1998.
[22] P. J. Darwen. Why Co-evolution Beats Temporal Difference
Learning at Backgammon for a Linear Architecture, but not
a Non-linear Architecture. In Proceedings of Congress on
Evolutionary Computation, IEEE Press, vol. 2, pp. 1003–
1010, 2001.
[23] T. P. Runarsson, S. M. Lucas. Co-evolution versus Selfplay Temporal Difference Learning for Acquiring Position
Evaluation in Small-board Go. IEEE Transactions on Evolutionary Computation, vol. 9, no. 6, pp. 628–640, 2005.
[24] S. M. Lucas, T. P. Runarsson. Temporal Difference Learning versus Co-evolution for Acquiring Othello Position Evaluation. In Proceedings of IEEE Symposium on Computational Intelligence and Games, Reno/Lake Tahoe, USA, pp.
53–59, 2006.
[9] T. Gosling, N. Jin, E. Tsang. Games, Supply Chains and
Automatic Strategy Discovery Using Evolutionary Computation. Handbook of Research on Nature-inspired Computing for Economics and Management, J. P. Rennard (ed.),
vol. 2, pp. 572–588, 2007.
[25] C. Kotnik, J. Kalita. The Significance of Temporaldifference Learning in Self-play Training: TD-rummy versus EVO-rummy. In Proceedings of the International Conference on Machine Learning, Washington D.C., USA, pp.
369–375. 2003.
[10] K. O. Stanley, B. D. Bryant, R. Miikkulainen. Real-time
Neuroevolution in the NERO Video Game. IEEE Transactions on Evolutionary Computation, vol. 9, no. 6, pp.
653–668, 2005.
[26] M. E. Taylor, S. Whiteson, P. Stone. Comparing Evolutionary and Temporal Difference Methods in a Reinforcement
Learning Domain. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, Seattle,
Washington, USA, pp. 1321–1328, 2006.
[11] R. M. Axelrod. The Evolution of Cooperation, Basic Books
Inc., New York, USA, 1984.
[12] N. Jin, E. Tsang. Co-adaptive Strategies for Sequential Bargaining Problems with Discount Factors and Outside Options. In Proceedings of Congress on Evolutionary Computation, Vancouver, BC, Canada, pp. 2149–2156, 2006.
[13] D. Cliff. Minimal-intelligence Agents for Bargaining Behaviors in Market-based Environments, Technical Report HPL97-91 970811, Hewlett Packard Laboratories, USA, 1997.
[14] D. K. Gode, S. Sunder. Allocative Efficiency of Markets
with Zero-intelligence Traders: Market as a Partial Substitute for Individual Rationality. The Journal of Political
Economy, vol. 101, no. 1, pp. 119–137, 1993.
[27] F. Gomez, J. Schmidhuber, R. Miikkulainen. Efficient Nonlinear Control through Neuroevolution. In Proceedings of
the European Conference on Machine Learning, Lecture
Notes in Computer Science, vol. 4212, pp. 654–662, 2006.
[28] S. M. Lucas, J. Togelius. Point-to-point Car Racing: An Initial Study of Evolution versus Temporal Difference Learning. In Proceedings of IEEE Symposium on Computational
Intelligence and Games, Westin Harbour Castle Toronto,
Ontario Canada, pp. 260–267, 2007.
[29] K.-F. Lee, S. Mahajan. A Pattern Classification Approach
to Evaluation Function Learning. Artificial Intelligence, vol.
36, no. 1, pp. 1–25, 1988.
S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities
[30] K.-F. Lee, S. Mahajan. The Development of a World Class
Othello Program. Artificial Intelligence, vol. 43, no. 1, pp.
21–36, 1990.
[31] M. Buro. LOGISTELLO – A Strong Learning Othello Program. NEC Research Institute, Princeton, NJ,
1997, [Online], Available: http://www.cs.ualberta.ca/˜
mburo/ps/log-overview.ps.gz.
[32] S. Y. Chong, M. K. Tan, J. D. White. Observing the Evolution of Neural Networks Learning to Play the Game of
Othello. IEEE Transactions on Evolutionary Computation,
vol. 9, no. 3, pp. 240–251, 2005.
[33] W. W. Bledsoe, I. Browning. Pattern Recognition and
Reading by Machine. In Proceedings of the Eastern Joint
Computer Conference, pp. 225–232. 1959.
[34] J. Ullman. Experiments with the n-tuple Method of Pattern
Recognition. IEEE Transactions on Computers, vol. 18, no.
12, pp. 1135–1137, 1969.
[35] R. Rohwer, M. Morciniec. A Theoretical and Experimental
Account of n-tuple Classifier Performance. Neural Computation, vol. 8, no. 3, pp. 629–642, 1996.
[36] B. Chaperot, C. Fyfe. Improving Artificial Intelligence in a
Motocross Game. In Proceedings of IEEE Symposium on
Computational Intelligence and Games, Reno/Lake Tahoe,
USA, pp. 181–186, 2006.
[37] D. Floreano, T. Kato, D. Marocco, E. Sauser. Coevolution
of Active Vision and Feature Selection. Biological Cybernetics, vol. 90, no. 3, pp. 218–228, 2004.
[38] S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A.
Aron, J. Diebel, P. Fong, J. Gale, M. Halpenny, G. Hoffmann, K. Lau, C. Oakley, M. Palatucci, V. Pratt, P. Stang,
S. Strohband, C. Dupont, L.-E. Jendrossek, C. Koelen, C.
Markey, C. Rummel, J. van Niekerk, E. Jensen, P. Alessandrini, G. Bradski, B. Davies, S. Ettinger, A. Kaehler, A.
Nefian, and P. Mahoney. The Robot that Won the DARPA
Grand Challenge. Journal of Field Robotics, vol. 23, no. 9,
pp. 661–692, 2006.
57
[39] I. Tanev, M. Joachimczak, H. Hemmi, K. Shimohara. Evolution of the Driving Styles of Anticipatory Agent Remotely
Operating a Scaled Model of Racing Car. In Proceedings of
IEEE Congress on Evolutionary Computation, vol. 2, pp.
1891–1898, 2005.
Simon M. Lucas received the B. Sc. degree in computer systems engineering from
the University of Kent, UK, in 1986, and
the Ph. D. degree from the University of
Southampton, UK, in 1991. He has worked
a year as a research engineer for GEC
Avionics. After a one-year post doctoral
research fellowship (funded by British Telecom), he was appointed to a lectureship at
the University of Essex in 1992 and is currently a reader in computer science.
He was chair of IAPR Technical Committee 5 on Benchmarking and Software. He is the inventor of the scanning n-tuple
classifier, a fast and accurate OCR method. He was appointed
inaugural chair of the IEEE CIS Games Technical Committee in
July 2006, and has been competitions chair for many international conferences, and co-chaired the first IEEE Symposium on
Computational Intelligence and Games in 2005. He was program
chair for IEEE Congress on Evolutionary Computation (CEC) in
2006, and program co-chair for IEEE Symposium on Computational Intelligence and Games (CIG) in 2007, and will be program
co-chair for International Conference on Parallel Problem Solving from Nature in 2008. He is an associated editor of IEEE
Transactions on Evolutionary Computation, and the Journal of
Memetic Computing. He was invited keynote speaker at IEEE
Congress on Evolutionary Computation in 2007.
His research interests include evolutionary computation,
games, and pattern recognition, and he has published widely
in these fields with over 120 refereed papers, mostly in leading
international conferences and journals.