AI & Parallelism
By: Bryan Griffiths
Parallel AI in Academics
 Parallel AI in the Gaming Industry
Areas of use
 Academic Stigma
 Extending Specific Algorithms (GA)
 Benefits for other Algorithms
There have been significant advances in parallel
and distributed computing.
But what are the implications of these advances
for AI?
Four areas:
Psychological modeling
Improving efficiency
Helping to organize systems in modular fashion
New methods and mechanisms
Psychological Modeling
This system was originally proposed as a model of
human information processing and storage.
The ideas of short term and long-term memory,
independently operating productions, matching, and
other operations came from psychological literature.
The human brain contains individual neurons which are
slow compared to digital computer circuits, but there are
vast numbers of these and they are richly connected
components that operate concurrently.
SOAR is production system with dual mission:
Architecture for building AI systems
Model human intelligence
SOAR incorporates both sequential and parallel aspects.
Improving Efficiency
AI programs consume significant space and time
resources when working with complex problems.
It is therefore important that AI algorithms make use of
advances in parallel computation to speed-up research.
There are several sources of parallelism for speedup in
production systems:
Production level parallelism, in which all the productions match
themselves against working memory in parallel.
Condition level parallelism, in which all of the conditions of single
production are matched in parallel.
Action level parallelism, in which all of the actions of a single
production are executed in parallel.
Task level parallelism, in which several cycles are executed
Parallelizing AI Algorithms
The amount of task level parallelism available is
completely dependent on the nature of the task.
In medical diagnosis, each production firing
might be dependent on the previous production
firing, thus enabling a long, sequential chain of
reasoning to occur.
If the system is diagnosing five patients
simultaneously, productions involving different
patients would not interact with one another and
could be executed in parallel.
Embarrassingly parallel, but extremely useful.
Parallelizing AI Algorithms
Ten authors can write a book much faster than one author
Ten woman cannot bear a child any faster than one can!
Likewise throwing more processors at an AI problem may not bring
desired benefits.
Many problems can be solved efficiently by parallel methods.
It is not always easy to convert a sequential algorithm into an
efficient parallel one.
Some AI algorithms whose parallel aspects have been studied are:
Genetic Algorithms
Most types of searches (BFS, DFS, IDA*)
Alpha-beta pruning
Theorem proving
Neural Nets
Academic Stigma
Currently most AI research does not use
parallel implementations of the AI
algorithms. Why is that?
Parallel algorithms need parallel machines,
which of course means more money is
needed to do the research.
 Also the way other academics view the
effectiveness of an algorithm:
 If
its parallel it must be slow.
 If its parallel it must be a weaker algorithm.
Academic Stigma
If its parallel it must be slow?
You obviously parallelized it because your
implementation was slow.
If its parallel it must be a weaker
You must have parallelized it because you
were not getting “good” solutions and needed
more CPUs traversing the search space to
achieve your answers.
Extending Specific Algorithms
Genetic Algorithms:
 is a search technique used to compute
solutions to optimization and search
 Categorized as global search heuristics.
 A class of evolutionary algorithms that use
techniques inspired by evolutionary
biology such as inheritance, mutation,
selection, and recombination.
Genetic Algorithms
Implemented as a computer simulation in
which a population of abstract
representations of candidate solutions to
an optimization problem, evolves toward
better solutions.
 Applications in biogenetics, computer
science, engineering, economics,
chemistry, manufacturing, mathematics,
physics and other fields.
Genetic Algorithms
Parallel Genetic Algorithms
Parallel implementations of genetic
algorithms come in various flavours:
Coarse grained parallel genetic algorithms
assume a population on each of the computer
nodes and migration of individuals among the
 Types:
Island Model, Migration Models.
Fine grained parallel genetic algorithms
assume an individual on each processor node
which acts with neighboring individuals for
selection and reproduction. A.k.a Cellular
Parallel Genetic Algorithms
Single-population master-slave GAs
distributes the evaluation of individuals by
scheduling fractions of the population among
the processing slave nodes. Such a model
has the advantage for ease of implementation
and does not alter the search behavior of a
sequential GA.
 Hierarchical parallel GAs are basically any
combination of two or more of the three basic
forms of PGA is an HPGA.
Parallel Genetic Algorithms
PGA - Migration
Migration can be done in various ways and
can be a factor or function of your cluster
 Ring Method
 Grid Method
PGA - Migration
There are also various ways of choosing
which chromosome will migrate from one
subpopulation to the next.
 Random
 Best/Bad
PGA - Migration
PGA - Migration
PGA - Migration
Another approach could be to implement a
central database that subpopulations would
submit their best too and then draw the best
chromosome from that database to replace their
worst. (King of the Hill Migration)
This provides a faster migration of your top
chromosome across your entire set of
subpopulations and thereby exposing your elite
to new genetic materials in hopes of advancing it
even farther.
PGA - Migration
PGA – Migration
Easy to implement, even in existing GA code.
 Low network traffic.
 Customizable to your architecture.
 Faster than sequential GA and generally
achieves a better result. (Unless optimal
solution is easy to obtain.)
PGA – Migration
Supposedly harder to work with because you
know have three new variables to fine tune.
 Migration
 Number of chromosomes to migrate
 Migration direction and layout
Grid Enabled - HPGA
Benefits for other Algorithms
(Greedy Algorithms)
Large search space.
 A heuristic that tells which current
alternative looks best.
Take that alternative first.
 Potentially neglect other paths.
In parallel:
Take first N best looking alternatives.
 Reducing the neglect.
Benefits for other Algorithms
(Parameter sweeping)
Find the best solution for a given N
dimensional function by trying all input
 Given a function f(x,y,z,…)
Find its max, min, highest differential, etc
Tries all x, y, z,…
Mostly embarrassingly parallel.
 Algorithms are extremely stupid.
Use other AI/search methods !
Benefits for other Algorithms
(Simulated Annealing / Tabu Search)
Every value (or group of) can be handled
by a cpu and searched along its own path.
 When a cpu becomes free it could request
a new value (or group) from another cpu
and continue on from that point.
Benefits for other Algorithms
(Neural Networks)
Large neural networks can have huge
computational needs.
 Training a neural network with many input
data sets can be time consuming.
 Some problems require real time
Benefits for other Algorithms
(Neural Networks)
Perform learning in parallel
Embarrassingly parallel approaches:
Simply start N neural networks in parallel with
different (pseudo) random initial weights and then
select one from the final trained networks.
Evaluate input using multiple different networks in
parallel and choose answer from network with highest
level of confidence
In this approach messages only need to be sent for:
Sending input data to processors
Retrieving output from all processors
Benefits for other Algorithms
(Neural Networks)
Split forward propagating network into
(Neural Networks)
Split network into strongly connected
Benefits for other Algorithms
(Breadth First Search)
Parallel Implementation:
Maintain shared queues for all processors
 One
queue per depth
Each processor gets/adds vertices from
 Barrier until all vertices from one depth visited
Note: search order is potentially different
than sequential search order
Benefits for other Algorithms
(Depth First Search)
Visited(root) = true;
For all neighbour vertices w of root
If not visited(w)
Visited(root) = true;
ParFor all neighbour vertices w of root
If not visited(w)
Dfs_parent(w) = r
Dfs_parent(w) = r
Note: search order is potentially different than sequential
search order
Benefits for other Algorithms
(IDA *)
For I = 0 to max_depth
 DFS_search_to_depth(I)
• Parallel:
For I = 0 to max_depth
 Parallel
Parallel AI in the Gaming Industry
Industry Stigmas
 Areas of Use
 A Look at Past and Current Generations of
 Some Examples Of AI Engines
 Some Ideas of Mine
 What the Future Holds
Industry Stigma
True AI uses to many resources that we need instead for
our new “mega-super cool flashy graphics that are uberrealistic” engine and that other stuff…
Generally true especially when you understand that “stuff” they
are talking about outside of the graphics engine include, physics
engines, audio engines, voice and/or video – chat, and
everything else that goes into the game.
In most companies AI gets a choke-hold put on it very early in
development reducing it to simple finite state machines and
scripting/trigger based events because “well it works and that’s
all we need”.
Games that feature “revolutionary” AI are sometimes as simple
as a system that now randomly says some kind of battle chatter
or screams of agony when they die.
Areas of Use
for Parallel AI
Everywhere in this industry:
Graphics, animation, face effects, ect.
 Pathfinding and path smoothing.
 Individual and group NPC behaviours.
 Dynamic music and sound systems.
 Competitive and co-operative strategic play.
 Arcade games that have large search spaces.
 Chess,
Go, ect.
Limitations of Past Generations
Single Processor
 Single Thread
 Limited Resources
Small amount of RAM
 Small amount of ROM
Current Generation of Hardware
Cell Processor:
Power PE:
A 64-bit general purpose register set.
a 64-bit floating point register set.
a 128-bit Altivec register set.
Synergistic PEs:
Able to do SIMD computations.
Or scalar data types ranging from 8 to 128-bits in size.
Current Generation of Hardware
Xbox 360:
triple-core PowerPC-based design.
 Each
of the cores has two symmetric hardware
 Multiple FPU and SIMD vector processing units in
each core.
VMX128 (similar to Altivec, just with more registers)
Current Generation of Hardware
Intel CPUs – Dual, Quad and Eight-core
versions are in production or in the pipeline.
 GPU Technologies becoming even more
parallel in nature:
 Multi-processor
 Crossfire
So how did they use this power?
UE3… multi-threaded, barely.
 But that’s not all bad, it means we could
do more advanced AI elements in a game
if we separate it from the engine and
parallelize the AI middleware engine.
AI Middleware Engines
AI.implant, from BioGraphic Technologies in
Montreal, Canada.
Focuses on animation control, offering unique AI
solutions for complex animations that a developer
might need a solution for.
Hierarchical pathfinding
Rule-based decisions
Group behaviors
Flocking behaviors
We could further parallelize this by computing
behaviours for various groups or characters that
are currently running around.
AI Middleware Engines
Kynapse Engine, Kynogon.
A.I. code reusable independently of engine code
Advanced 3D topology dynamic analysis
Runtime identification of key topological places for hiding,
surrounding, organizing opposite flank assault, etc.
Path planning
Path smoothing
3D pathfinding in a destructible world:
What They Could be Doing…
Take the internet, a simple hosted server,
storage drives, a learning algorithm, an AI
implementation of algorithm “X” and a little
bit of big brother, now blend…
 If we can’t do the parallel heavy weight AI
on the user’s system then outsource the
What They Could be Doing…
NPC AI for FPS that could be updated over time keeping
the game fresh and challenging.
Just use a GA offsite on your company server that analyses the
fitness of the AI when playing against different players then
recombine the chromosome and send out the new AI to be
tested and fitness to be evaluated.
Planning AI for RTS games that constantly evolves along
with players as they both discover new strategies.
Even after 10 different “balancing patches” have been applied to
a game like Starcraft, the AI would not be outdated if you played
it in a single player game or included an AI player in a
multiplayer game.
What They Could be Doing…
Do you enjoy different kinds of music
besides the rock, techno and orchestral
music used in most games?
No problem, submit your favorite playlist of
music and have the game adapt it to the
gameplay using a Neural Net that replaces
battle music with your favorite Kung-fu Fighter
remix or cranky truck driver-gone mad country
track, ect.
Where this Could Lead in the
LucasArts next Indiana Jones video game
and Star wars game contains some
interesting AI improvments for character
animations and physics reactions:
Where this Could Lead in the
CryEngine 2:
Multi-threaded Engine
 Which improves many aspects of the game
such as AI and physics by speeding up CPU
computations. One huge advantage to the
CryEngine 2 is that it will detect the number of
threads the CPU(s) have and will then equally
distribute code out across all of the threads.
Where this Could Lead in the
At a glance many of AI problems can be
solved with embarrassingly parallel
 Behaviours
 Learning
 etc…
Where this Could Lead in the
But when you look at the problem more indepth you begin to realize that the
problems can be solved in their simple
forms this way, but in there more complex
forms communication, scheduling and
prediction issues become apparent.
Group pathing
 Complex strategies (Flanking, bait and trap,
surrounding, ect.)
Where this Could Lead in the
This brings us back to many of the harder
parallel problems to be solved in order to
implement these more complex forms of
game AI in parallel.
Comments, Questions or Ideas?
