Download View PDF - CiteSeerX

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Incomplete Nature wikipedia , lookup

Machine learning wikipedia , lookup

Soar (cognitive architecture) wikipedia , lookup

Wizard of Oz experiment wikipedia , lookup

Embodied cognitive science wikipedia , lookup

Intelligence explosion wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Knowledge representation and reasoning wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

AI winter wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Transcript
Challenges of Massive Parallelism
Hiroaki Kitano
Center for Machine Translation Software Engineering Laboratory
Carnegie Mellon University
N E C Corporation
5000 Forbes
2-11-5 Shibaura, Minato
Pittsburgh, PA 15213 U.S.A.
Tokyo 108 Japan
[email protected]
[email protected]
Abstract
merely speed u p t r a d i t i o n a l A I models. I n fact, m a n y
t r a d i t i o n a l models are not appropriate for massively par­
allel i m p l e m e n t a t i o n . Vast majorities o f the A I models
proposed so far are strongly influenced by the perfor­
mance of existing von N e u m a n n architectures. In a nar­
r o w sense, a von Neumann architecture can be repre­
sented by a single processor architecture w i t h a memory
to store instructions and data. Even in the early 80s,
available c o m p u t i n g power for most AI researchers was
far less t h a n t h a t for personal computers in 1993, and
hopelessly slower t h a n advanced RISC-based worksta­
tions. A d d i t i o n a l l y , memory space has increased d r a s t i ­
cally in 10 years.
A r t i f i c i a l Intelligence has been the field of study
for e x p l o r i n g the principles u n d e r l y i n g thought,
a n d u t i l i z i n g t h e i r discovery to develop use­
f u l computers.
T r a d i t i o n a l A I models have
been, consciously or subconsciously, o p t i m i z e d
for available c o m p u t i n g resources w h i c h has led
A I i n c e r t a i n directions. T h e emergence o f mas­
sively parallel c o m p u t e r s liberates the way i n ­
telligence m a y b e m o d e l e d . A l t h o u g h the A I
c o m m u n i t y has y e t to make a q u a n t u m leap,
there are a t t e m p t s to make use of t h e o p p o r t u ­
nities offered by massively parallel computers,
such as m e m o r y - b a s e d reasoning, genetic algor i t h m s , a n d other novel models. E v e n w i t h i n
t h e t r a d i t i o n a l A I approach, researchers have
b e g u n to realize t h a t the needs for h i g h per­
formance c o m p u t i n g a n d very large knowledge
bases to develop intelligent systems requires
massively p a r a l l e l A I techniques. I n this C o m ­
puters a n d T h o u g h t A w a r d lecture, I w i l l argue
t h a t massively parallel a r t i f i c i a l intelligence w i l l
a d d new dimensions t o t h e ways t h a t the A I
goals are pursued, a n d d e m o n s t r a t e t h a t mas­
sively p a r a l l e l a r t i f i c i a l intelligence i s where A I
meets t h e real w o r l d .
1.
W i t h i n the hardware constraints to date, sequential
rule application, for example, has been an o p t i m a l i m ­
p l e m e n t a t i o n strategy. W h i l e I agree t h a t this idea is
not merely an i m p l e m e n t a t i o n strategy ( i n fact there
are numbers of cognitive bases for sequential rule ap­
p l i c a t i o n ) , hardware constraints have prevented AI re­
searchers f r o m seriously investigating and experimenting
on other, more c o m p u t a t i o n a l l y demanding approaches.
For example, the memory-based reasoning m o d e l could
not have been experimentally supported, using only the
c o m p u t i n g power and memory space available in the
early 1980s. Genetic algorithms and neural networks
could not be seriously investigated w i t h o u t a m a j o r leap
in c o m p u t i n g power.
Introduction
Massively P a r a l l e l A r t i f i c i a l Intelligence is a field of s t u d y
e x p l o r i n g methodologies for b u i l d i n g intelligent systems
a n d i n v e s t i g a t i n g c o m p u t a t i o n a l models o f t h o u g h t a n d
behavior t h a t use massively parallel c o m p u t i n g models
[ W a l t z , 1990, K i t a n o et a l . , 1991a]. T r a d i t i o n a l l y , a r t i f i ­
cial intelligence research has t w o goals: as an engineer­
i n g discipline, a r t i f i c i a l intelligence pursues methods to
b u i l d useful a n d i n t e l l i g e n t systems. As a scientific dis­
c i p l i n e , a r t i f i c i a l intelligence aims at understanding the
c o m p u t a t i o n a l mechanisms of t h o u g h t a n d behavior. I
believe t h a t massively parallel a r t i f i c i a l intelligence w i l l
b e t h e c e n t r a l p i l l a r i n t h e u l t i m a t e success o f a r t i f i c i a l
intelligence i n b o t h engineering a n d scientific goals.
Massively
parallel
artificial
intelligence
does
not
One m i g h t argue t h a t if a program were r u n for, say,
a m o n t h , experimental results could have been obtained
even in the early 1980s. However, in practice, dominat­
ing c o m p u t a t i o n a l resources for such a substantial period
of t i m e w o u l d be impossible for the vast m a j o r i t y of researchers, and painfully long t u r n around t i m e for exper­
iments can be a m a j o r discouraging factor in p r o m o t i n g
research activities based on computationally demanding
paradigms.
T h i s hardware factor inevitably guided, consciously or
subconsciously, A I i n a particular direction. A l t h o u g h
the issues and approach may vary, several researchers
have been expressing concern over the influence of ex­
i s t i n g computer architectures on our models of t h o u g h t .
D a v i d W a l t z stated:
the methods and perspective of AI have been
d r a m a t i c a l l y skewed by the existence of the
c o m m o n d i g i t a l computer, sometime called the
Kitano
813
von Neumann machine, and ultimately, AI
will have to be based on ideas and hardware
quite different from what is currently central
to it.[Waltz, 1988]
and proposed memory-based reasoning. Rodney Brooks
noted in the his 1991 Computers and Thought lecture
the state of computer architecture has been
a strong influence on our models of thought.
The Von Neumann model of computation
has lead Artificial Intelligence in particular
directionspBrooks, 199l]
and proposed the subsumption architecture.
Figure 1: Memory Capacity for Mass Production Mem­
ory Chips
Therefore, it is critically important to understand the
nature of the computational resources which are avail­
able today and will be available in the near future. I
argue that massively parallel artificial intelligence will
add a new dimension to our models of thought and to
the approaches used in building intelligent systems. In
the next section, the state-of-the-art in computing tech­
nology will be reviewed and the inevitability of massive
parallelism will be discussed.
2.
Massively Parallel C o m p u t e r s
Advancements in hardware technologies have always
been a powerful driving force in promoting new chal­
lenges. At the same time, available hardware perfor­
mance and architectures have been constraints for mod­
els of intelligence. This section reviews progress in device
technologies and architecture, in order to give an idea of
the computing power and memory space which could be
available to AI researchers today.
2.1.
Devices
Device technology progress is fast and steady. Memory
chips, D R A M , capacity increases at a 40% a year rate.
64M D R A M will be available shortly, and successful ex­
perimental results have been reported for a
design
rule. Figure 1 shows how this trend would continue into
the future.
Microprocessor performance increases at the rate of 20
to 30% a year. In 1990, engineers at Intel corporation
predicted that processors in the year 2,000 (Micro2000)
would contain over 50 million transistors per square inch
and run with a 250MHz clock[Gelsinger et al., 1989].
The Micro2000 would contain four 750 MIPS CPUs run
in parallel and achieves 2000 MIPS. In 1992, they have
commented that their prediction was too modest! This
implies the possibility of achieving low cost, high perfor­
mance computing devices. Thus, some of the research on
current massively parallel computers may be conducted
on workstations in the future.
On the other hand, uniprocessor supercomputer per­
formance is improving at a substantially lower rate
814
Awards
Figure 2: Peak Performance for Supercomputers
than the microprocessor improvement rate. A l l newly
announced vector supercomputers use multiprocessing
technologies, so that the slow processor performance im­
provement rate can be compensated for (Figure 2).
In addition, massively parallel machines are already
competitive with current supercomputers in some ar­
eas [Myczkowski and Steele, 1991, Sabot et al., 1991,
Hord, 1990]. It is expected that, in the near future, the
majority of high-end supercomputers will be massively
parallel machines.
This implies that the progress in microprocessors will
face a slow down stage in the future, and that an ar­
chitecture for a higher level of parallelism will be intro­
duced. This increasing availability of high performance
computers, from personal computers to massively par­
allel supercomputers, will allow experiments on simu­
lated massive parallelism to be made at numerous places.
Thus, computational constraints should no longer im­
pose restrictions on how thought is modeled.
Special purpose devices provide dramatic speed up
for certain types of processing. For example, associa­
tive memory is a powerful device for many AI applica­
tions. It provides a compact and highly parallel pat­
tern matcher, which can be used for various applica­
tions such as memory-based reasoning, knowledge base
search, and genetic algorithms [Stormon et al., 1992,
Twardowski, 1990, Kitano et al., 1991c]. The benefit
of associative memory is that it can attain very high
parallelism in a single chip. Although only a few simple
operations, such as bit pattern matching, can be accom­
plished, these are central operations for A l .
There are also efforts to build high performance and
large scale neural network device chips (Graf et al., 1988,
Ae and Aibara, 1989, Mead, 1989]. Although current
devices have not attained implementation of sufficiently
large networks, which can cope with many real appli­
cations, there are some promising research results. For
example, the use of wafer scale integration for neural
networks may provide for scaling up to a reasonable size
[Yasunaga et al., 1991].
2.2.
Architecture
The history of massively parallel computers can be
traced back to Illiac-IV [Bouknight et al., 1972], D A P
[Bowler and Pawley, 1984], and M P P [Batcher, 1980].
Illiac-IV was the first SIMD supercomputer with 64 processors operating at a rate of 300 MIPS. MPP consists of
16,384 1-bit processors. D A P ranges from 1,024 to 4,096
processors. These early machines were built particularly
for scientific supercomputing.
However, the real turning point came with the devel­
opment of the connection machine [Hillis, 1985]. The
connection machine was motivated by N E T L [Fahlman,
1979] and other work on semantic networks, and was
originally designed with Al applications in mind. Hillis
noted:
This particular application, retrieving commonsense knowledge from a semantic network,
was one of the primary motivations for the de­
sign of the Connection Machine[Hillis, 1985].
The CM-2 Connection Machine, which is the second
commercial version of the Connection Machine, con­
sists of large numbers of 1-bit processors and float­
ing point processing units [TMC, 1989]. Various sys­
tems has been implemented, such as rule-based infer­
ence systems [Blelloch, 1986, Blelloch, 1986], a mas­
sively parallel assumption-based t r u t h maintenance sys­
tem [Dixon and de Kleer, 1988], a text retrieval system
[Stanfillet al., 1989], stereo vision programs [Drumheller,
1986], a frame-based knowledge representation system
[Evett et al.. 1990a], heuristic search programs [Evett
et al., 1990b], parallel search techniques [Geller, 1991],
a classifier system [Robertson, 1987], case-base retriev­
ers [Kolodner, 1988, Cook, 1991, Kettler et al., 1993],
a motion control system [Atkeson and Reinkensymeyer,
1990], and others (See [Kitano et al., 1991a] for a partial
list of systems developed). The newest version, CM-5
[ T M C , 1991], uses 32-bit SPARC chips interconnected
via a tree structured network. Companies, such as MasPar Computer [Blank, 1990, Nickolls, 1990], Intel, Cray,
I B M , NCR, Fujitsu, NEC and others, have started de­
velopment projects for massively parallel computers, and
some of these are already on the market.
The block diagram of I X M 2 associative memory processor
Figure 3: IXM-2
Although most massively parallel machines developed
so far employ distributed memory architectures, there
have been some attempts to build a shared memory ma­
chine. KSR-1 [KSR, 1992] by Kendall Square Research
is an attempt to overcome the difficulties in software de­
velopment which often arise on distributed memory ma­
chines. KSR-1 employs the ALLCACHE architecture,
which is a virtual shared memory (VSM) model. In a
VSM model, a physical architecture resembles that of a
distributed memory architecture, but local memories are
virtually shared by other processors, so that program­
mers do not have to worry about the physical location
of data.
Independently from these commercial machines, sev­
eral research machines have been developed at universi­
ties and research institutions. These projects include the
J-Machine [Dally et al., 1989] at M I T , SNAP [Moldovan
et al., 1990] at the University of Southern California
(USC), and IXM-2 [Hieuchi et al., 1991] at ElectroTechnical Laboratory (ETL). I have been involved in some of
these projects, as will be described below.
The IXM-2 massively parallel associative memory processor was developed at ETL by Higuchi and his col­
leagues [Higuchi et a l , 1991], DCM-2 consists of 64
associative processors and 9 communication processors
(figure 3). The associative processor is organized us­
ing a T800 transputer [inmos, 1987], 4K words associa­
tive memory, SRAM, and other circuitry. Due to an in­
tensive use of associative memory, IXM-2 exhibits 256K
parallelism for bit pattern matching. IXM-2 is now be­
ing used for various research projects at E T L , Carnegie
Mellon University, and the ATR Interpreting Telephony
Research Laboratory. The application domains range
through genetic algorithms, knowledge-base search, and
example-based machine translation.
Kitano
815
Figure 4: SNAP-1
The Semantic Network Array Processor (SNAP) is
now being developed by Professor Dan Moldovan's team
at the University of Southern California. A SNAP-1
prototype has been developed and has been operational
since the summer of 1991. SNAP-1 consists of 32 clus­
ters interconnected via a modified hypercube (Figure 4).
Five TMS320C30 processors form a cluster, which store
up to 1,024 semantic network nodes. Processors within
a cluster communicate through a multi-port memory.
Several systems have been implemented on SNAP, such
as the DmSNAP machine translation system [Kitano et
al., 1991b], the PASS parallel speech processing system
[Chung et al., 1992], and some classification algorithms
[Kim and Moldovan, 1990].
These trends in hardware and architectures will en­
able massive computing power and memory space to be
available for AI research.
2.3.
Wafer-Scale I n t e g r a t i o n for
M e m o r y - B a s e d Reasoning
WSI-MBR is a wafer-scale integration (WSI) project de­
signed for memory-based reasoning. W S I is the stateof-the-art in V L S I fabrication technology, and has been
applied to various domains such as neural networks [Yasunaga et al., 1991]. WSI fabricates one large VLSIbased system on a wafer as opposed to conventional
V L S I production which fabricates over 100 chips from
one wafer. The advantage of W S I is in its size (high
integration level), performance, cost, and reliability:
Size: W S I is compact because nearly all circuits neces­
sary for the system are fabricated on a single wafer.
P e r f o r m a n c e : W S I has a substantial performance ad­
vantage because it minimizes wiring length.
C o s t : W S I is cost effective because it minimizes the requirement for using expensive assembly lines.
R e l i a b i l i t y : W S I is reliable because it eliminates the
bonding process which is the major cause of circuit
malfunctions.
However, there is one big problem in W S I fabrication:
defects. In conventional VLSI fabrication, one wafer
consists of over 100 chips. Typically, there are certain
percentages of defective chips. Traditionally, chips w i t h
defects have been simply discarded and the chips with­
out defects have been used. To estimate the faults in
816
Awards
a W S I chip, we can use the Seeds model[Seeds, 1967]:
where Y is a yield of the wafer, D is the
fault density which is, in the fabrication process being
used, about 1 fault per cm2, and A is the chip area. This
is a reasonable rate for the current fabrication process.
However, even this level of fault would cause fatal prob­
lems for an attempt to build, for example, an entire I B M
370 on one wafer. Unless sophisticated defect-control
mechanisms and robust circuits are used, a single defect
could collapse an entire operation. But, redundant cir­
cuits diminish the benefits of the WSI. This trade-off has
not been solved.
M B R is ideal for WSI, and avoids the defect prob­
lem because it does not rely upon any single data unit.
WSI-MBR is a digital/analog hybrid W S I specialized for
memory-based reasoning. The digital/analog hybrid ap­
proach has been used in order to increase parallelism and
improve performance. In the digital computing circuit,
a floating point processor part takes up the most of chip
area. On the other hand, the analog circuit requires only
a fraction of the area for implementation of equivalent
floating point operation circuits and drastic speed up can
be attained. For detailed arguments on the relative ad­
vantages of analog and digital circuits, see [Kitano and
Yasunaga, 1992]. Use of the less area-demanding analog
approach provides two major advantages over the digi­
tal approach: (1) increased parallelism, and (2) speed up
due to relaxed wiring constraints (critical paths and wire
width). Expected performance with the current design
is 70Tflops on a single 8 inch wafer. Using wafer stack
and micro-pin technologies, peta-flops on a few hundred
million record system would be technically feasible!
3.
G r a n d Challenge A I A p p l i c a t i o n s
From the engineering point of view, the ultimate suc­
cess in artificial intelligence can be measured by the eco­
nomic and social impact of the systems deployed in the
real-world. Applications with significant economic and
social impacts provide us with some grand challenge AI
applications.
Grand challenge AI applications are applications with
significant social, economic, engineering, and scientific
impacts. The term Grand Challenge was defined as
a fundamental problem in science or engineer­
ing, with broad economic and scientific impact,
whose solution will require the application of
high-performance computing resources.
in the U.S. High Performance Computing Act of 1991.
Thus grand challenge AI applications are AI applica­
tions with significant social, scientific, and engineering
impacts. Typical examples are Human Genome project
[NRC, 1988] and speech-to-speech translation system
projects. Two workshops have been organized to dis­
cuss grand challenge AI applications.
In February 1992, NSF sponsored The Workshop on
High Performance Computing and Communications for
Grand Challenge Applications: Computer
Natural Language and Speech Processing, and Artificial Intelligence. In October 1992, I organized The Workshop
on Grand Challenge AI Applications in Tokyo — partic­
ipated in by a group of leading Japanese researchers[Kitano, 1992b]. Details on these workshops and some of the
on-going grand challenge AI applications can be found
in [Kitano et al., 1993].
Typical applications discussed include: a speechto-speech translation system, human genome analysis,
Global Architecture for Information Access (GAIA) —
a highly intelligent information access system, Shogi and
Go systems which beat Meijin (Grand Master), intelli­
gent robots and programs which undertake daily tasks,
and intelligent vehicles. There have also been discus­
sions as to wehat technologies are needed to support
grand challenge AI applications. The conclusions from
both workshops are surprisingly similar. The common
thread was the need for massive computing power, mas­
sive memory storage, massive data resource, and ultra
high band-width communication networks.
For a grand challenge to succeed, our computer sys­
tems must be able to be scaled up. For example, natural
language systems must be scaled up to the point where
a dictionary of the system contains all common words
and many domain-specific terms, and to where grammar
rules cover most syntactic structures. For any serious
machine translation system, this means that the dictio­
nary contains from half a million to a few million word
entries, and the grammar rules amount to over 10,000.
In addition, for systems to be scaled up, and to be
deployed in the real world, they must be able to cope
with a noisy environment and incomplete data resources.
4.
The Bottlenecks
This section discusses limitations of "the traditional AI
approach." The traditional AI approach can be charac­
terized by several salient features:
F o r m a l R e p r e s e n t a t i o n s : Most traditional models
use rigid and formal knowledge representation
schemes. Thus, all knowledge must be explicitly
represented in order for the system to use that
knowledge. There is no implicit knowledge in the
system.
R u l e d r i v e n i n f e r e n c i n g : Reasoning
is
generally
driven by rules or principles, which are abstract and
generalize knowledge on how to manipulate specific
knowledge.
S t r o n g M e t h o d s : Since the system depends on ex­
plicit knowledge and rules, domain theory must be
understood in order to build any system based on
the traditional approach.
H a n d - C r a f t e d K n o w l e d g e Bases: Knowledge
and rules have been hand-coded at extensive labor
costs. In many cases, coding must be carried out by
experts in the field.
Figure 5: Approaches for Building Intelligent Systems
In essence, these are derived from the physical symbol
system hypothesis and the heuristic search. As Dave
Waltz stated:
For thirty years, virtually all AI paradigms
were based on variants of what Herbert Simon
and Allen Newell have presented as physical
symbol systems and heuristic search hypothe­
ses.
The fundamental assumption in the traditional approach
is that experts know the necessary knowledge regard­
ing the problem domain, and that expert knowledge
can be explicitly written using formal representations.
Toy problems, such as the blocks world and the Tower
of Hanoi, meet this condition. And thus, historically,
many AI research efforts have been carried out on do­
mains such as the blocks world and the Tower of Hanoi.
The intention of such work is that by proving the effec­
tiveness of the method using small and tractable tasks
the method can be applied to real-world problems. To
rephrase, most research has been carried out in such a
way that researchers develop a highly intelligent system
in a very restricted domain. Researchers hoped that
such systems could be scaled up with larger funding and
increased effort (Figure 5: Independently, Brooks and
Kanade uses similar figures).
However, experiences in expert systems, machine
translation systems, and other knowledge-based systems
indicate that scaling up is extremely difficult for many
of the prototypes. For example, it is relatively easy to
build a machine translation system, which translates a
few very complex sentences. But, it would be far more
difficult to build a machine translation system which cor­
rectly translates thousands of sentences of medium com­
plexity. Japanese companies have invested a decade of
time and a serious amount of labor inputs to developing
commercial machine translation systems, based on tra­
ditional models of language and intelligence. So far, a
Kitano
817
report on high quality machine translation systems has
not been published.
There are several reasons why the traditional AI ap­
proach fails in the real-world. The basic assumption is
that knowledge exists somewhere in the mind of experts,
so that, if it can be written down in operational form —
the expertise can be transferred to the system. How­
ever, this assumption does not stand in the real-world.
The following three factors prevent the traditional AI
approach from real-world deployment.
I n c o m p l e t e n e s s : It is almost impossible to obtain a
complete set of knowledge for a given problem do­
main. Experts themselves may not have the knowl­
edge, or the knowledge may be tacit so that it can
not be expressed in a formal manner. Thus, a cer­
tain portion of the knowledge is always absent.
I n c o r r e c t n e s s : There is no guarantee that expert
knowledge is always correct and that encoding is
perfect. A large knowledge base, which contains
over 10,000 frames, must include a few errors. If
there were a 0.5% error rate, there would be over
5,000 incorrect frames in a million frame knowledge­
base!
I n c o n s i s t e n e t : The set of knowledge represented may
be inconsistent, because (1) contextual factors to
maintain consistency were ignored, or (2) expert
knowledge is inconsistent.
These realities in regard to real world data are dev­
astating to the traditional AI approach. Moreover, in
addition to these problems, there are other problems,
such as:
H u m a n B i a s : The way knowledge is expressed is in­
evitably influenced by available domain theory.
When the domain theory is false, the whole effort
can fail.
T r a c t a b i l i t y : The system will be increasingly in­
tractable, as it scales up, due to complex interaction
between piece wise rules.
E c o n o m i c s : When rules are extracted from experts,
system development is a labor intensive task. For
example, even if MCC's CYC [Lenat and Guha,
1989] system provides a certain level of robustness
for knowledge-based systems, proliferation of such
systems would be limited due to high development
cost.
Empirically, efforts to eradicate these problems have
not been successful. In essence, AI theories for the realworld must assume data resources to be inconsistent,
incomplete, and inaccurate.
Lenat and Feigenbaum pointed out the scaling up
problem for expert systems, and proposed the Breadth
Hypothesis (BH) [Lenat and Feigenbaum, 1991] and the
CYC project [Lenat and Guha, 1989]. While there is
some t r u t h in the B H , whether or not the robustness
can be attained by a pure symbolic approach is open
to a question. A series of very large knowledge based
818
Awards
systems projects, such as CYC [Lenat and Guha, 1989],
knowledge-based machine translation ( K B M T ) [Good­
man and Nirenberg, 199l], corporate-wide CBR system
[Kitano et al., 1992], and large-scale CBR systems [Kettler et al., 1993] will be important test-beds for this ap­
proach.
For these systems to succeed, I believe that incorpo­
ration of mechanisms to handle messy real world data
resources will be necessary.
5.
C o m p u t i n g , M e m o r y , and M o d e l i n g
One promising approach for building real world AI appli­
cations is to exploit the maximum use of massively par­
allel computing power and data resources. In essence, I
argue that an approach emphasizing massive computing
power, massive data resources, and sophisticated mod­
eling will play a central role in building grand challenge
AI applications.
C o m p u t i n g : The importance of computing power
can be represented by the Deep-Thought Chess
machine[Hsu, 1991, Hsu et al., 1990, Hsu, 1990].
Deep-Thought demonstrates the power of comput­
ing for playing chess. It was once believed that
a strong heuristic approach was the way to build
the grand master level chess machine. However,
the history of chess machines indicates that com­
puting power and chess machine strength have al­
most direct co-relation (Figure 6: reproduced based
on [Hsu et al., 1990]). Deep-Thought-II consists
of 1,000 processors computing over a billion moves
per second. It is expected to beat a grand mas­
ter. Deep-Thought exemplifies the significance of
the massive computing. Similarly, real-time pro­
cessing, using a very large knowledge source re­
quires massive computing power [Evett et al., 1990a,
Evett et al., 1990b, Geller, 1991].
M e m o r y : The need for memory can be argued from
a series of successes achieved in memory-based
reasoning.
Starting from the initial success of
MBRtalk[Stanfill and Waltz, 1988], memory-based
reasoning has been applied to various domains,
such as protein structure prediction [Zhang et al.,
1988], machine translation [Sato and Nagao, 1990,
Sumita and Iida, 1991, Furuse and Iida, 1992,
Kitano and Higuchi, 1991a, Kitano. 1991], cen­
sus classificationfCreecy et al., 1992], parsing[Kitano and Higuchi, 1991b, Kitano et al., 1991b], and
weather forecasting. PACE, a census classification
system, attained 57% classification accuracy for oc­
cupation codes and 63% classification accuracy for
industry codes [Creecy et a l , 1992]. The AICOS
expert system attained only 37% for the occupation
codes and 57% for industry codes. These successes
can be attributed to the superiority of the approach
which places memory as the basis for intelligence,
rather than fragile hand-crafted rules. In a memorybased reasoning system, the quality of the solution
depends upon the amount of data collected. Fig­
ure 7 shows the general relationship between the
Figure 6: Progress of Chess Machines
Figure 8: Modeling and Accuracy
not mature enough to establish statistics systems for var­
ious phenomena. However, using statistical ideas, even
in a primitive manner, can greatly help to understand
the nature of many phenomena.
Figure 7: Memory and Accuracy
amount of data and solution quality. The success of
memory-based reasoning demonstrates the signifi­
cance of a massive memory or data-stream.
M o d e l i n g : The importance of modeling can be discussed using the SPHINX speech recognition
system[Lee, 1988]. Using massive computing and
a massive data-stream is not sufficient to build such
artificial intelligence systems. The critical issue is
how the application domain is modeled. Figure 8
shows the improvement in recognition rate, with
modeling sophistication. Even if massively parallel
machines and large data resources are used, if the
modeling is not appropriate, only a poor result can
be expected. SPHINX exemplifies the significance
of modeling.
There are several reasons that these factors are impor­
tant, the following analysis may help in understanding
the effectiveness of the approach.
D i s t r i b u t i o n Many of the phenomena that AI tries to
deal with are artifacts determined by human beings, such
as natural languages, society, and engineering systems.
However, it is acknowledged that even these phenomena
follow basic statistical principles, as applied in nature.
Normal distribution (also called Gaussian distribution)
and Poisson distribution are important distributions. In
physics, quantum statistics (such as Bose-Einstein statis­
tics and Fermi-Dirac statistics) and a classical statistics
(Maxwell-Boltzmann statistics) are used. A I , however, is
Zipf's law, for example, describes distributions of
types of events in various activities. In Zipf's law, the
multiplication of the rank of an event (r) and the fre­
quency of the event (f) is kept constant (C = rf). For
example, when C = 0.1, an event of rank 1 should have
a 10.0% share of all events, a rank 2 event should oc­
cupy 5.0%, and so on (Figure 9). The sum of the top 10
events occupies only 29.3%. Despite the fact that the
occurance probability for an individual event, greater
than the 11-th rank, has only a fraction of a percent
(e.g. 0.5% for the 11-th rank), the sum of these events
amounts to 70%. This law can be observed in various
aspects of the world. AI models using heuristics rules
will be able to cover events of high frequency relatively
easily, but extending the coverage to capture irregular
and less frequent events will be major disturbing fac­
tors. However, the sum of these less frequent events will
occupy a significant proportion of the time, so that the
system must handle these events. In fact, about 60%
to 70% of grammar rules in typical commercial machine
translation systems are written for specific linguistic phe­
nomena, whose frequencies are very low. This empir­
ical data confirms Zipfs law in the natural language.
Memory-based reasoning is a straightforward method to
cover both high frequency and low frequency events. In
fact, the relationship between data size and accuracy of
a M B R system and the accumulated frequency in Zipf's
law is surprisingly similar (Figure 10).
T h e C e n t r a l L i m i t T h e o r e m Inaccuracy, or noise,
inherent in large data resources can be mitigated using
the very nature of these large data resources. Assuming
that inaccuracy on data is distributed following a certain
distribution (not necessary Gaussian), the central limit
theorem indicates that the distribution will be Gaussian
over large data sets. A simple analysis illustrates the
central limit theorem. Assuming the linear model, y =
where y is an observed value, is a true value, and
is noise, the central limit theorem indicates that the E will
follow a normal distribution, regardless of the original
Kitano
819
F i g u r e 12: O v e r l a p p i n g Noisy D a t a
F i g u r e 9: Z i p f ' s L a w
F i g u r e 13: Accracy D e g r a d a t i o n a n d Precision
F i g u r e 10: Z i p f ' s L a w a n d M B R
d i s t r i b u t i o n w h i c h produces
W h e n t h e expected value
of is 0, t h e observer w i l l get t h e t r u e value.
I n fact, a d d i n g various types o f noise t o m e m o r y based reasoning systems does not cause m a j o r accuracy
d e g r a d a t i o n . F i g u r e 11 shows accuracy d e g r a d a t i o n for
M B R t a l k ( a M B R version o f N E T t a l k [Sejnowski a n d
Rosenberg, 1987]), w h e n noise is added to t h e weights.
T h e noise follows a u n i f o r m d i s t r i b u t i o n .
N% noise
means t h a t a weight can be r a n d o m l y changed in the
range, where W is a weight value. T h e degra­
d a t i o n r a t e is m u c h smaller, w h e n larger d a t a sets are
used.
F i g u r e 1 1 : Accracy D e g r a d a t i o n b y Noisy D a t a
820
Awards
T h e L a w of Large N u m b e r s
A c c o r d i n g t o t h e law
of large n u m b e r s , the peak in a d a t a d i s t r i b u t i o n w i l l be
narrower w i t h a large number of d a t a sets. Assume t h a t
there are t w o close t r u e values. D a t a d i s t r i b u t i o n s m a y
overlap due to noise. However, t h e law of large numbers
indicates t h a t , by collecting larger d a t a sets, overlap can
b e m i t i g a t e d ( F i g u r e 12). ( I have not yet c o n f i r m e d
w h e t h e r these effects can be observed in memory-based
reasoning. However, the r e l a t i o n s h i p between accuracy
a n d d a t a size implies t h a t t h e law is in effect ( F i g u r e
13).)
Nonlinear Boundary
A vast m a j o r i t y o f real w o r l d
phenomena are non-linear. A n y a t t e m p t t o a p p r o x i m a t e
non-linear boundaries, using Unear or discrete m e t h o d s ,
w i l l create c e r t a i n levels of inaccuracy. A simple i l l u s t r a ­
t i o n is s h o w n in F i g u r e 14. T h e real b o u n d a r y is s h o w n
by t h e (non-linear) solid line. A v a i l a b l e d a t a p o i n t s are
s h o w n by O and X. A s s u m i n g t h a t a categorization
has been m a d e on b o t h the Y a n d X axis as A1, A 2 , A3
a n d B1, B 2 , B3, the region can be expressed as (A2 and
(B2 or B 3 ) ) . However, there are areas w h i c h do n o t fit
the rectangular area described by this symbolic represen­
t a t i o n . T h e proponents o f t h e symbolic approach m a y
argue t h a t t h e p r o b l e m can be c i r c u m v e n t e d by using
fine-grained s y m b o l systems. However, there is always
an area where s y m b o l systems a n d t h e real b o u n d a r y
d o n o t f i t . U s i n g a n i n f i n i t e n u m b e r o f symbols w o u l d
e l i m i n a t e t h e error — b u t t h i s w o u l d no longer be a
s y m b o l system! Therefore, for the non-linear b o u n d a r y
p r o b l e m s , s y m b o l systems are i n h e r e n t l y i n c o m p l e t e .
T h e use of a large n u m b e r of d a t a p o i n t s a n d statis­
t i c a l c o m p u t i n g m e t h o d s , however, can b e t t e r m i t i g a t e
t h e error w i t h s i g n i f i c a n t l y lower h u m a n effort. T h i s o b ­
s e r v a t i o n has t w o m a j o r i m p l i c a t i o n s : F i r s t , i t implies
manipulation of the world model and a procedural model
for sentence analysis. The world models, grammar, and
inputs were assumed to be complete. These systems are
typical examples of the traditional AI approach.
Figure 14: Nonlinear boundary of the problems
that a massively parallel memory-based approach may
be able to attain a certain level of competence. This will
be discussed in the next section. Second, it has major im­
plications as to how to build very large knowledge-based
systems. My recommendation is to use redundancy and
population encoding even when the system is based on
the traditional AI approach.
6.
Speech-to-Speech Translation
The idea discussed above is also effective in speech-tospeech translation systems. Speech-to-speech transla­
tion is one of the major topics of the grand challenge AI
applications, and its success will have major social and
scientific impacts. Speech-to-speech translation systems
are generally composed of a speech recognition module,
a translation module, and a speech synthesis module.
The first group of speech-to-speech translation systems
appeared in the late 1980s. These include Speech Trans
[Saito and Tomita, 1988] and Dm-Dialog [Kitano, 1990a]
developed at Carnegie Mellon University, and SL-Trans
[Morimoto et al., 1989] developed at the A T R Interpret­
ing Telephony Research Laboratories. These systems
were immediately followed by a second group of systems,
which included JANUS [Waibel et al., 1991] at C M U ,
and ASURA [Kikui et al., 1993] at ATR. This section
discusses limitations with the traditional AI approach
for natural language processing and how massively par­
allel artificial intelligence can mitigate these problems.
6.1.
T r a d i t i o n a l V i e w o f N a t u r a l Language
The traditional approach to natural language processing
has been to rely on extensive rule application. The ba­
sic direction for the approach is to build up an internal
representation, such as a parse-tree or case-frame, using
a set of rules and principles. In essence, it follows the
traditional approach to artificial intelligence.
In the early 1970s, there were efforts to develop natural
language systems for small and closed domains. Woods
developed L U N A R , which could answer questions about
moon rocks[Woods et al., 1972]. In the L U N A R sys­
tem, an A T N represented possible sequences of syntac­
tic categories in the form of a transition network[Woods,
1970]. Winograd developed the famous SHRDLU program, which involved simple questions about the blocks
world [Winograd, 1972]. These efforts heavily involved
In the mid 70s, Schank proposed Conceptual Depen­
dency theory [Schank, 1975J and the conceptual informa­
tion processing paradigm. His claim was that there are
sets of primitives (such as ATRANS, PTRANS, and PROPEL)
which represent individual actions, and that language
understanding is a heavily semantic-driven process,
where syntax plays a very limited role. Schank called
his approach a cognitive simulation. From the late 1970s
to the early 1980s, there was a burst of ideas regarding
conceptual information processing, mostly advocated by
Schank's research group at Yale. These ideas are well
documented in Inside Computer UnderstandinglSchank,
1981] and in Dynamic Memory [Schank, 1982]. Although
the emphasis changed from syntax to semantics, the ba­
sic framework followed the traditional AI approach —
the systems represented knowledge using primitives and
used heuristics to derive a CD representation from an in­
put sentence. Again, this requires that the world model
and the knowledge must be complete and hand-crafted.
In the mid 80s, the syntactic approach gained renewed
interest. It was motivated by unification-based gram­
mar formalisms, such as Lexical-Functional Grammar
(LFG:[Kaplan and Bresnan, 1982], Generalized Phrase
Structure Grammar (GPSG:[Gazdar et al., 1985]), and
Head-driven Phrase Structure Grammar (HPSG:[Pol­
lard and Sag, 1987]). Introduction of a powerful in­
formation manipulation operation, such as unification
and well-formalized theories, resulted in some success
in developing experimental systems. Once again, how­
ever, these approaches were within the traditional AI
approach. The frustrating fact is that the major empha­
sis of these theories was to determine whether or not a
given sentence was grammatical. This is a bit of exag­
gerating, but the point is that modern linguistic theories
ignore many phenomena important for building natural
language systems. For example, linguistic theories typ­
ically do not explain how people understand ungrammatical sentences. Most linguistic theories merely put
* in front of the ungrammatical sentence. Many lin­
guistically interesting sentences, such as extremely deep
center-embedded sentences, almost never appear in real­
ity. In addition, these theories do not entail a theory on
how to disambiguate interpretations, which is a major
problem in natural language processing.
Therefore, although there have been progress and
changes in emphasis, these approaches are all within
the traditional approach towards artificial intelligence.
As I have argued in the previous section, systems based
on the traditional approach inevitably face a number of
problems. For example, developers of commercial ma­
chine translation consider that the following problems
are inherent in their approach, which is grounded in tra­
ditional AI and NLP:
P e r f o r m a n c e : Performance of most existing machine
translation systems is not sufficient for real-time
tasks such as speech-to-speech translation. It takes
Kitano
821
a few seconds to a few minutes to translate one sen­
tence.
S c a l a b i l i t y : Current machine translation systems are
difficult to scale up because their processing com­
plexity makes the systems' behavior almost in­
tractable.
Q u a l i t y : Intractability of a system's behavior combined
with other factors lowers the quality of translations.
G r a m m a r W r i t i n g : By the same token, grammar
writing is very difficult since a complex sentence has
to be described by piecewise rules. It is a hard and
time consuming task, partly due to the intractabil­
ity of the system's behavior, when these rules are
added into the whole system.
Notice that the capability to cope w i t h ungrammatical sentences is not even listed, because such a feature is
not listed in the initial specification of the system. Ob­
viously, I do not intend to claim that massively parallel
artificial intelligence will immediately open an avenue
for high performance and robust natural language sys­
tems. The accomplishment of such specifications seems
to be far in the future. However, I do argue that reliance
on models which assume complete knowledge will never
accomplish the goal!
6.2.
Memory-Based Model of Natural
Language
The alternative to the traditional approach to natural
language processing is a memory-based model. The
memory-based approach to intelligence has been ex­
plicitly discussed since the early 80s. In 1981, Nagao proposed a model of translation based on anal­
ogy at a N A T O conference (later published in [Nagao,
1984]). This model is the precursor for recent research
on memory-based and example-based models of trans­
lation. Nagao argued that humans translate sentences,
by using similar past examples of translation. In 1985,
Stanfill and Waltz proposed a memory-based reasoning
paradigm[Stanfill and Waltz, 1988, Stanfill and Waltz,
1986]. The basic idea of memory-based reasoning places
memory at the foundation of intelligence. It assumes
that large numbers of specific events are stored in mem­
ory, and response to new events is handled by first recall­
ing past events which are similar to the new input, and
invoking actions associated w i t h these retrieved events
to handle the new input. In the same year, Riesbech
and Martin proposed the direct memory access parsing
model[Rie8beck and Martin, 1986], They argued that
the Build-and-Store approach toward parsing is incor­
rect, and proposed the Recognize-and-Record approach.
The main thrust of the D M A P was to view language un­
derstanding as retrieval of episodic memory. D M A P can
be viewed as an application of case-based reasoning to
parsing.
Despite certain differences among these models, the
common thread is to view memory as the foundation
of intelligence. This idea runs counter to most AI approaches which place rules or heuristics as the central
thrust of reasoning. At the same time, differences be­
822
Awards
tween models later became very important. For example,
D M A P , in essence, uses complex indexing and heuristics
which follows the tradition of A I . Thus, the weaknesses
of traditional AI were also exposed when scaling up the
D M A P system.
Massively parallel memory-based natural language
processing directly inherits these ideas. For example,
in the memory-based parsing model, parsing is viewed
as a memory-search process which locates the past oc­
currence of similar sentences. The interpretation is built
by activating past occurrences. Rationales for memorybased natural language processing include:
V e r y L a r g e F i n i t e Space: The Very Large Finite
Space (VLFS) concept is critical to the memorybased approach. The most frequently raised concern
with memory-based approaches to natural language
processing is how these models account for the pro­
ductivity of language — the ability to generate an
infinite set of sentences from a finite grammar. Op­
ponents to the memory-based approach would claim
that due to the productivity of language, this ap­
proach cannot cover the space of all possible sen­
tences. However, the productivity of language is
incorrect. The productivity of language ignores re­
source boundedness. It should be noted that only
a finite number of sentences can be produced when
the following conditions stand:
C o n d i t i o n 1: Finite Vocabulary
C o n d i t i o n 2: Finite Grammar Rules
C o n d i t i o n 3: Finite Sentence Length
Conditions 1 and 2 stand, since people only have fi­
nite vocabulary and grammar at a given time. Ob­
viously, Condition 3 stands, as there is no infinite
length sentence. For example, the longest sentence
in one recent CNN Prime News report was 48 words
long. 99.7% of sentences were within 25 words of
length. Logically, natural language is a set of sen­
tences within a very large finite space (VLFS).
S i m i l a r i t y : In memory-based natural language process­
ing, input sentences are matched against all relevant
data to find similar sentences in the data-base. The
assumption is that similar problems have a similar solution. In fact, a series of experimental data
on example-based machine translation indicate rea­
sonable accuracy can be maintained for a relatively
broad space. Figure 15 illustrates the relation­
ship between accuracy and normalized distance in
E B M T . (This figure is reproduced based on [Sumita
and Iida, 1992].) Therefore, a memory-based ap­
proach may be able to cover a real solution space
w i t h implementable numbers of examples.
C o l l e c t i v e D e c i s i o n : The memory-based approach
can take advantage of the redundancy of informa­
tion implicit in a large data resource. An advantages
of this collective decision making is the increased ro­
bustness against the inaccuracy of individual data.
As has been argued previously, the central limit theorem and the law of large numbers ensure the robustness of the system.
w h i c h accepts speaker-independent, continuous speech.
It has been p u b l i c l y demonstrated since M a r c h 1989.
T h e first version o f  D M D I A L O G had a vocabulary o f
only 70 words. It was extended to 300 words in the sec­
o n d version. M a j o r characteristics of  D M D l A L O G are:
M a s s i v e l y P a r a l l e l C o m p u t i n g M o d e l : Knowledge
representation a n d algorithms are designed to ex­
p l o i t the m a x i m u m level of parallelism.
F i g u r e 15: N o r m a l i z e d Distance a n d A c c u r a c y
F i g u r e 16: S o l u t i o n Spaces
R e a l S p a c e C o v e r i n g : A s s u m i n g t h a t there i s a set
of g r a m m a r rules w h i c h covers a c e r t a i n p o r t i o n of
n a t u r a l language sentences, t h e g r a m m a r n o t o n l y
covers sentences w h i c h a c t u a l l y appear in the real
w o r l d , b u t it also covers sentences w h i c h are g r a m ­
m a t i c a l , b u t also never p r o d u c e d i n reality. A n em­
p i r i c a l s t u d y revealed t h a t only 0.53% of possible
sentences considered to be g r a m m a t i c a l are actually
p r o d u c e d . F i g u r e 16 shows how s o l u t i o n spaces
overlap. T h e memory-based approach never overgenerates because the m e m o r y contains only exam­
ples of a c t u a l sentences used in t h e past.
I n a d d i t i o n t o these rationales, i t s h o u l d b e n o t e d t h a t
t h e memory-based approach keeps examples, whereas
n e u r a l networks and s t a t i s t i c a l approaches do n o t . T h i s
difference is i m p o r t a n t because the memory-based ap­
proach w i l l be able to c a p t u r e singular events a n d higher
order nonlinearities, w h i l e n e u r a l networks and s t a t i s t i ­
cal approaches often f a i l to c a p t u r e these. For neural
n e t w o r k s a n d s t a t i s t i c a l m e t h o d s , the a b i l i t y t o capture
singular a n d nonlinear c u r v a t u r e is d e t e r m i n e d by their
n e t w o r k s t r u c t u r e o r t h e m o d e l . I n memory-based rea­
soning, t h e r e are chances t h a t dense lower order inter­
p o l a t i o n m a y a p p r o x i m a t e higher order nonlinear curva­
ture.
6.3.
Systems
Since 1988, I have been b u i l d i n g several massively par­
allel n a t u r a l language systems.  D M D l A L O G , o r D M DlALOG for s h o r t , is a representative system resulting
f r o m t h e early w o r k . DMDlALOG is a speech-to-speech
d i a l o g t r a n s l a t i o n system between Japanese a n d E n g l i s h
M e m o r y - B a s e d A p p r o a c h : The
knowledge-base
consists of a large collection of translation exam­
ples and templates. Parsing is viewed as a recall of
past sentences in the memory network.
P a r a l l e l C o n s t r a i n t M a r k e r - P a s s i n g : The
basic c o m p u t a t i o n a l mechanism is marker-passing
in w h i c h markers carry various i n f o r m a t i o n among
nodes in the memory network [Charniak, 1983,
C h a r n i a k , 1986]. T h i s is a useful mechanism w h i c h
has been studied in various applications [Hendler,
1989a, Hendler, 1989b, Hendler, 1988, H i r s t , 1986,
N o r v i g , 1989]. Markers are not bit markers, b u t are
s t r u c t u r e d markers w h i c h carry d a t a structures. I n
a d d i t i o n , propagation paths for each t y p e of marker
are restricted by the types and orders of links to be
traversed.
I n t e g r a t i o n o f S p e e c h a n d N L P : T h e architecture
enables i n t e r a c t i o n between speech processing and
n a t u r a l language processing. T h e b o t t o m up pro­
cess provides the likelihood t h a t a certain inter­
p r e t a t i o n could be correct. T h e t o p - d o w n process­
ing imposes constraints and a priori p r o b a b i l i t y of
phoneme and w o r d hypotheses.
T h e first version o f  D M D I A L O G was strongly influ­
enced by the idea of D M A P , b u t substantial extensions
has been made. It was a memory-based approach to nat­
u r a l language processing since the parsing process is to
recall past sentences in the memory network. However,
the focus of the system was on accomplishing speechto-speech translation. Thus, the system design has been
rather conservative by today's standards in the memorybased translation community. I n fact,  D M D I A L O G
used a complicated indexing mechanism for the m e m o r y
n e t w o r k , and markers carried feature structures when
markers were propagating to a part of the network used
for abstract memory. It used three abstraction levels —
specific cases, generalized cases, and unification g r a m ­
mar (Figure 17). Therefore,  D M D I A L O G can be con­
sidered as a m i x t u r e of the t r a d i t i o n a l approach and a
memory-based approach.
Despite t h e fact t h a t the system was n o t a f u l l i m p l e ­
m e n t a t i o n of memory-based n a t u r a l language processing
p a r a d i g m ,  D M D I A L O G demonstrated various interest­
i n g and promising characteristics such as a h i g h level of
parallelism, simultaneous i n t e r p r e t a t i o n capability, and
robustness against inconsistent knowledge. One of the
salient features of the model is the i n t e g r a t i o n of speech
processing and linguistic processing. I n  D M D I A L O G ,
a c t i v a t i o n o f the network starts b o t t o m u p triggered b y
external inputs. T h e activation propagates u p w a r d i n
the network to the w o r d , sentence, and discourse levels.
At each level, predictions have been made w h i c h are rep-
Kitano
823
memory network modification to create discourse enti­
ties. This follows an approach taken in the case-based
reasoning community [Hammond, 1986, Kolodner, 1984,
Riesbeck and Schank, 1989]. In a sense, D M S N A P is
much closer to the original D M A P idea, rather than to
the memory-based reasoning approach.
Figure 17: Translation Paths
Figure 18: Bottom-Up Activation and Top-Down Prediction
resented by downward propagation of prediction markers
with a priori probability measures. There are multiple
levels of loops in the system (Figure 18).
However, the performance of the system on a serial
machine was far from satisfactory. The parallelism was
simulated on software in the first version. In addition,
increasing emphasis on robustness against ill-formed in­
puts and the inconsistency of the knowledge-base fur­
ther degraded the total performance. Fortunately, the
high level of parallelism in DMDlALOG enabled reimplementing the system on various massively parallel
computers.
Massively parallel implementations of versions of
DMDlALOG started in the fall of 1989. Two different
approaches have been examined. One approach was to
emphasize the D M A P approach using complex indexing.
D M S N A P was implemented on the Semantic Network
Array Processor (SNAP) to examine this approach. One
other approach was to focus on memory-based parsing
using simpler indexing and a large number of examples.
A S T R A L was implemented on IXM-2 based on this ap­
proach.
Implementations of $ D M D I A L 0 G on SNAP began in
October 1989 when I started a joint research project w i t h
Dan Moldovan's group at the University of Southern Cal­
ifornia. At that time, Moldovan's group had been de­
signing SNAP-1. I worked together w i t h Moldovan's to
implement D M S N A P , a SNAP version of  D M D I A L O G .
The first version was completed by the end of 1990.
D M S N A P emphasized complex indexing and dynamic
824
Awards
An independent research program to implement a ver­
sion of $ D M D I A L 0 G on the I X M 2 massively parallel as­
sociative memory processor began in March 1990 with
Tetsuya Higuchi at ElectroTechnical Laboratory ( E T L ) .
I X M 2 is an associative memory processor designed and
developed by Higuchi and his colleagues at E T L . It is
a faithful hardware implementation of N E T L [Fahlman,
1979], but using associative memory chips. A S T R A L ,
the I X M 2 version of DMDlALOG was completed in
the summer of 1990. In A S T R A L , complex indexing
was eliminated, and a large set of sentence templates
were used as a central source of knowledge[Kitano and
Higuchi, 1991b].
These differences in emphasis have been employed
to make maximum use of architectural strengths for
each kind of massively parallel computer. Comparisons
between different implementations of  D M D I A L O G revealed contributing factors and bottlenecks for robust
speech-to-speech translation systems. D M S N A P faced
a scaling problem due to reliance on complex indexing,
and a performance bottleneck due to node instantiation
which inevitably involves an array controller — a serial
process. In addition, it is difficult to maintain multiple
contexts before an interpretation is uniquely determined.
On the other hand, A S T R A L exhibited desirable scal­
ing properties and single millisecond order parsing per­
formance. The D M S N A P project has been re-directed
to place more emphasis on the memory-based approach
without complex indexing and node instantiation.
While I have been working on massively parallel imple­
mentations and performance aspects of memory-based
approach to natural language, a series of reports has
been made on the quality of the translation attained by
memory-based models. Sato-san developed MBT-l[Sato,
1992] for word selection in Japanese-English translation,
and attained an accuracy of 85.9%. He extended the
idea to the transfer phase of translation in his M B T II [Sato and Nagao, 1990]. A group at the A T R I n ­
terpreting Telephony Research Laboratory developed
Example-Based Machine Translation (EBMT:[Sumita
and Iida, 1991]) and Transfer-Driven Machine Transla­
tion (TDMT:[Furuse and Iida, 1992]). E B M T translates
Japanese phrase of type ' 'A no B" with an accuracy
of 89%. T D M T applied the memory-based translation
model to translate whole sentence, and attained an accu­
racy of 88.9%. Since the architecture of T D M T is almost
identical to the baseline architecture of DMDlALOG,
the high translation accuracy of T D M T and the high per­
formance of massively parallel versions of $ D M D I A L 0 G
indicates a promising future for the approach.
Since 1992, a new joint research project has begun
to implement E B M T and T D M T on various massively
parallel computers. E B M T was already implemented on
the I X M 2 , CM-2, and SNAP-1 machines [Sumita et al.,
means t h a t even if the memory-base contains d a t a to
cover the entire solution space, certain errors (є) s t i l l re­
m a i n . A h y b r i d approach has been proposed and tested
in [Zhang et al., 1992] to overcome the problem of b i ­
ased i n t e r n a l representations. Their experiment shows
c e r t a i n improvements can be gained by using m u l t i p l e
strategies.
Figure 19: Asymptotic Convergence
1993]. E a r l y results shows t h a t the E B M T approach f i t s
w e l l w i t h massively parallel computers a n d scales well.
I m p l e m e n t a t i o n o f T D M T i s i n progress. Independently,
Sato has i m p l e m e n t e d M B T - I I I [Sato, 1993] on C M - 5
a n d n C U B E machines.
6.4.
Limitations and Solutions
A l t h o u g h I s t r o n g l y believe t h a t t h e memory-based approach w i l l be a p o w e r f u l approach in m a n y a p p l i c a t i o n
d o m a i n s , there are l i m i t a t i o n s to this approach.
F i r s t , there is t h e p r o b l e m of d a t a collection. Here,
Z i p f s l a w w h i c h was a s t r o n g s u p p o r t i n g rationale plays
a killer role. A l t h o u g h most problems can be defined
as h a v i n g V L F S as their s o l u t i o n space, it is economi­
cally a n d p r a c t i c a l l y infeasible to collect a l l solutions in a
memory-base. T h u s , a c e r t a i n p a r t of t h e s o l u t i o n space
m u s t be left o u t . If solutions, w h i c h are n o t covered, can
be d e r i v e d f r o m similar examples in t h e memory-base,
t h e entire s o l u t i o n space can be covered. However, it
is most likely t h a t rare examples are t r u l y irregular so
t h a t any s i m i l a r examples cannot derive correct s o l u t i o n .
For t h i s t y p e of problems, t h e o n l y s o l u t i o n at this m o ­
m e n t is to keep a d d i n g rare events to t h e memory-base.
I t s h o u l d b e n o t e d , however, t h a t this p r o b l e m i s n o t
u n i q u e to a memory-based a p p r o a c h . In rule-based sys­
tems, rules w h i c h are specific to rare events m u s t keep
being added to a rule-base.
Second, the memory-based approach is n o t free f r o m
t h e h u m a n bias p r o b l e m .
Just like the t r a d i t i o n a l
A I approach, t h e representations o f example and m o d ­
eling are created by a system designer.
For exam­
ple, c u r r e n t memory-based t r a n s l a t i o n systems use a
h a n d - c r a f t e d thesaurus to d e t e r m i n e a distance between
words. A series of experiments i n d i c a t e d t h a t inaccu­
racy of thesaurus-based distance calculations is t h e m a ­
j o r source o f t r a n s l a t i o n errors. M e t h o d s t o refine do­
m a i n theories need to be i n c o r p o r a t e d [Shavlik and Towell, 1989, Towell et a l . , 1990]. If i n a p p r o p r i a t e represen­
t a t i o n a n d m o d e l i n g have been a d o p t e d , the system w i l l
e x h i b i t poor p e r f o r m a n c e . E v e n i f a n a p p r o p r i a t e representation a n d m o d e l i n g is used, it is i m p l a u s i b l e t h a t
t h e representation a n d m o d e l w o u l d be complete so t h a t
100% accuracy c o u l d be a t t a i n e d . Some bias in representation a n d m o d e l i n g w o u l d b e i n e v i t a b l e . W h e n
there is some bias,
in F i g u r e 19 w i l l n o t be zero. T h i s
T h i r d , pure memory-based approach models f o r m only
a p a r t of the cognitive process. A l t h o u g h the memorybased process may plays an i m p o r t a n t roles in intelligent
activities, use of rules in the h u m a n cognitive process
cannot be ignored. I believe t h a t there is a place for
rules. However, it is likely to be a very different role
f r o m w h a t has been proposed in the past. A rule should
be created as the result of autonomous generalization
over a large set of examples - it is not given a p r i o r i .
Resource bounds are the m a j o r reason for using rules in
this c o n t e x t . There must be a cost for Btoring in m e m ­
ory. T h u s , a trade-off exists between memory and rules.
Pinker [Pinker, 1991] made an interesting observation
on the relationship between memory and rules for E n ­
glish verbs. In English, the 13 most frequent verbs are
irregular verbs: be, have, do, say, make, go, take, come,
see, get, know, give, find. M o s t other verbs are regu­
lar. L o w frequency irregular verbs are reshaped, and
become regular over the centuries. Pinker proposed the
rule-associative-memory theory, w h i c h considers t h a t ir­
regular verbs are stored in the memory and t h a t regular
verbs are handled by rules. In this model, generalization
takes place over a set of regular verbs.
F i n a l l y , f r o m the engineering v i e w p o i n t , explicit rules
given a priori work as a m o n i t o r .
For example, h u ­
m a n translators use rules w h i c h define how specific terms
should be translated. Even for h u m a n experts, such
rules are given in an explicit f o r m . A memory-based
m o d e l should be able to incorporate such reality. T h u s ,
I recently proposed a model of memory-based machine
t r a n s l a t i o n which combines a memory-based approach
and a rule-based approach in a novel fashion [ K i t a n o ,
1993].
7.
Grand Breakthrough
Massively parallel artificial intelligence can be an ideal
p l a t f o r m for the scientific and engineering research
for next generation c o m p u t i n g systems and models of
t h o u g h t . A l t h o u g h discussions have been focused on
the memory-based approach, even this approach i n ­
volves several trails in t r a d i t i o n a l A I , w h i c h are po­
t e n t i a l l y undesirable as a model of t h o u g h t . Massively
parallel memory-based reasoning and massively parallel
V L K B search use explicit symbols and do not involve
strong learning and a d a p t a t i o n mechanisms by t h e m ­
selves. T h u s , they are inherently biased by the features
a n d representation schemes defined by the system designer. A l t h o u g h the problems of representation, d o m a i n
knowledge, and knowledge-base b u i l d i n g can be greatly
m i t i g a t e d by memory-based p a r a d i g m , they are not t o ­
t a l l y free f r o m the p r o b l e m . Therefore, while there are
m a n y places where the present f o r m of massively p a r a l ­
lel AI can be useful, we must go beyond this for a next
Kitano
825
generation paradigm to come into play. At least, it will
be a significant scientific challenge. I argue that mas­
sively parallel artificial intelligence itself will provide an
ideal experimental basis for the genesis of new genera­
tion technologies. In this section, therefore, I focus on
scientific aspects, rather than engineering. I believe that
the following properties will characterizes the realistic
theory for intelligence:
E m e r g e n c e : Intelligence is an emergent property. It
emerges from as a result of interactions between sub­
components. Emergence can take place in various
levels such as emergence of intelligence, emergence
of structure, and emergence of rules for structure
development. In this section, the term emergence is
used to indicate emergence of intelligence.
E v o l u t i o n : Intelligence is one of the by-products of evo­
lution. Intelligence is the memory of matters since
the birth of this universe. Thus, a true theory of in­
telligence must be justified from an evolutional con­
text.
Symbiosis: Intelligence emerges as a result of symbiotic
computing. The essence of intelligence is diversity
and heterogeneity. Symbiosis of diverse and heterogeneous components is the key to the intelligence.
D i v e r s i t y : Diversity is the essence of intelligence. No
meaningful artifacts can be created without sub­
stantial diversity, the reality of our brain, body,
and the eco-system demonstrates the significance of
the diversity.
M o t i v a t i o n : Intelligence is driven by motivation. Ob­
viously, learning is a key factor in intelligence. For
learning theories to be legitimate from evolutional
and psychological contexts, they must involve theories on the motivation for learning. Essentially,
learning and other intelligent behavior are driven to
the direction which maximizes survivability of the
gene.
P h y s i c s : Intelligence is governed by physics.
Ulti­
mately, intelligent system must be grounded on
physics. Nano-technology and device technology
provide direct grounding, and digital physics may
be able to ground the intelligence on hypothetical
physics.
7.1.
Emergence
Intelligence is an emergent property. For example, nat­
ural language emerged from ethological and biological
interaction under a given environment. Our linguistic
capability is an emergent property based on our brain
structure, and evolved under selectional pressure where
individuals with better communication capacity survived
better.
Brooks proposed the subsumption architecture as an
alternative to a Sense-Model-Plan-Action (SMPA) archi­
tecture [Brooks, 1986]. The subsumption architecture
is a layered architecture in which each layer is defined
in a behavior-based manner. The network consists of
simple Augmented Finite State Machines (AFSM), and
826
Awards
has no central control mechanism. A series of mobile
robots based on the subsumption architecture demon­
strated that a certain level of behaviors which would
appear to be intelligence can emerge.
Currently, the structure of the system is given a priori
in the subsumption architecture. Human bias and the
problems of the physical symbol systems take place again
because definitions for the units of behavior and internal
circuitry for each layer must be defined by a designer. In
addition, manual designing of the system which contains
large numbers of layers for higher levels of intelligence is
be extremely difficult.
The research front now has to go into the next level
of emergence — the emergence of structure. Structure
could emerge using the principles of self-organization and
evolution [Jantsch, 1980, Nicolis and Prigogine, 1977].
Self-organization is a vague notion which can be in­
terpreted in many ways. However, since the a priori
definition for a structure is infeasible, self-organization
of a system's structure through dynamic interaction
with an environment needs to take place. D A R W I N m developed by Edelman's group [Edelman, 1987,
Edelman, 1989] shows some interesting properties for a
self-organizing network without explicit teaching signals.
However, D A R W I N - I I I is limited to synaptic weight
modifications, and has no structure emergence property.
Independently, recent studies by Hasida and his col­
leagues propose the emergence of information flow as
a result of dynamic interaction between the environ­
ment and constraints imposed orthogonally [Hasida et
al., 1993]. Although this work is still at a preliminary
stage, it has an interesting idea that the system has no
defined functional unit. Functionality emerges through
dynamic interaction with the environment.
The concept of a Holonic computer was proposed by
Shimizu [Shimizu, et al., 1988]. A holonic computer
consists of devices which have non-linear oscillation ca­
pability (van del Pol oscillator), which are claimed to
be self-organizing. Shimizu emphasizes an importance
of Holonic Loop, which is a dynamic interaction of ele­
ments and top-down constraints emerged from bottom
up interactions. Figure 20 is a conceptual image of
next generation emergent computers (Needs for the lim­
bic system and the endogenous system will be discussed
later).
Although these models have not captured the emer­
gence of structure by themselves, combining these ideas
with evolutional computing may enable the emergence
of structures and the emergence of structure generating
mechanisms.
7.2.
Evolution
Intelligence is a by-product of evolution. Species with
various forms of intelligence find their niches for survival.
Different species could have different forms of intelligence
as they have been evolved under different environment,
and hence different selectional pressures. If the environ­
ment for human beings had been different from what has
Figure 2 1 : Development using L-system
are k n o w n to have the same origin, and to have differ­
entiated in the course of evolution. Projections f r o m the
l i m b i c system, w h i c h is an old part of the b r a i n , con­
trols awake/sleep r h y t h m s and other basic moods of the
b r a i n . These systems play a central role in b u i l d i n g a
m o t i v a t i o n for the system.
F i g u r e 20:
puters?
A n A r c h i t e c t u r e for N e x t Generation C o m ­
been i m p o s e d as t h e selectional pressure for m a n k i n d ,
t h e f o r m o f h u m a n intelligence c o u l d have been very dif­
ferent. D o l p h i n s , for example, are considered to have
a c o m p a r a b l e numbers of neurons, however, h u m a n be­
ings and d o l p h i n s have evolved to have different kinds of
brains. Physiological differences o f t h e b r a i n i n e v i t a b l y
affects t h e f o r m of intelligence.
T h e s t r u c t u r e of our b r a i n is also affected by our evo­
l u t i o n a l p a t h . Obviously, b r a i n is n o t a tabula rasa, n o t
o n l y a g l o b a l s t r u c t u r e such as t h e existence of a b r a i n
stem, neo-cortex, and other b r a i n components, b u t also
w i t h n u m b e r s o f local circuits w h i c h are genetically de­
fined such as the h y p e r c o l u m n , Papez c i r c u i t s , and h i p p o c a m p a l loops. I n v e s t i g a t i o n of these existing struc­
tures of b r a i n a n d how t h e b r a i n has evolved m a y p r o v i d e
s u b s t a n t i a l insights.
C r e a t i n g s t r u c t u r a l systems using e v o l u t i o n a l c o m p u t ­
i n g , such as genetic a l g o r i t h m s , generally requires use
of a d e v e l o p m e n t a l phase.
W h i l e t h e biological the­
o r y for development has n o t been established, there
are some useful m a t h e m a t i c a l tools to describe develop­
m e n t — t h e L y n d e n m a y e r system [Lindenmayer, 1968,
L i n d e n m a y e r , 1971] a n d the C e l l u l a r a u t o m a t o n . For ex­
ample, t h e L-system can be augmented to handle g r a p h
r e w r i t i n g so t h a t a w i d e range of development processes
can be described ( F i g u r e 21). W i l s o n proposed using the
L-system i n the c o n t e x t o f a r t i f i c i a l life [ W i l s o n , 1987],
However, I believe I was t h e first to a c t u a l l y i m p l e m e n t
a n d e x p e r i m e n t w i t h this e v o l u t i o n o f development [ K i t a n o , 1990b]. Since t h e L-system is a descriptive m o d e l ,
a n d n o t a m o d e l w h i c h describes mechanisms for de­
v e l o p m e n t , new a n d m o r e biologically g r o u n d e d models
s h o u l d be i n t r o d u c e d . Nevertheless, t h e encouraging results achieved in a series of experiments [ K i t a n o , 1990b,
K i t a n o , 1992a, G r u a u , 1992] d e m o n s t r a t e t h a t the evo­
l u t i o n of development is an i m p o r t a n t issue.
I n a d d i t i o n , t h e e v o l u t i o n a l perspective leads u s t o i n ­
vestigate t h e mechanism of i m m u n e systems [Tonegawa,
1983], p r o j e c t i o n s f r o m t h e l i m b i c system, a n d the en­
dogenous system [ M c G a u g h , 1989]. T h e central ner­
vous system, i m m u n e system, and endogenous system
7.3.
Symbiosis
S y m b i o t i c c o m p u t i n g is an idea whereby we view an i n ­
telligent system as a symbiosis between various different
kinds of subcomponents, each of w h i c h is differentiated
t h r o u g h evolution. Symbiosis has three m a j o r types:
host-parasite, fusion, and network. T h e idea of symbio­
sis itself is originated in the field of biology. One of the
strong proponents of symbiosis is L y n n Margulis. M a r gulis considered symbiosis as an essential principle for
the creation of eukaryotic cells. She claims
some parts of eukaryotic cells resulted directly
f r o m the f o r m a t i o n of permanent associations
between organisms of different species, the evo­
l u t i o n of symbiosis [Margulis, 1981].
W h i l e the serial endosymbiosis theory ( S E T ) [Taylor,
1974] is a symbiosis theory for eukaryotic cells, w h i c h
is a fusion, there are symbioses at various levels.
I claim t h a t symbiosis is one of the central principles
for models of t h o u g h t and life. Particularly, symbiosis
is the key for evolution of intelligence. A l t h o u g h genetic
algorithms and other genetically-inspired models provide
powerful a d a p t a t i o n capabilities, it w o u l d be extremely
difficult and time-consuming to create highly complex
structures from scratch. It may be difficult for genetic
algorithms alone to create complex structures w i t h h i g h
m u l t i m o d a l i t y . Symbiosis can take place at various level
f r o m cells to global multi-agent systems. At the neural
level, neural symbiosis may be a useful idea to consider.
In The Society of Mind, M a r v i n M i n s k y postulates i n ­
telligence to emerge from the society of large numbers
of agents [Minsky, 1986]. T h e society of m i n d idea is
one instance of symbiosis. It is a weak f o r m of s y m ­
biosis, w h i c h can be categorized as a network, because
i n d i v i d u a l agents r e t a i n their independency. Symbiotic
c o m p u t i n g offers a more dynamic picture. In s y m b i ­
otic c o m p u t i n g , subcomponents can be merged to f o r m
a t i g h t symbiosis or loosely coupled to f o r m a weak s y m ­
biosis. A component can be differentiated to f o r m sev­
eral kinds of components, b u t could be merged again
later. I n the l i g h t o f s y m b i o t i c c o m p u t i n g , the society o f
m i n d s and the subsumption architecture are instances,
Kitano
827
or snapshots, of s y m b i o t i c c o m p u t i n g .
7.4.
Diversity
D i v e r s i t y is an essential factor for intelligence. For e x a m ­
ple, in order for t h e symbiosis to be effective, or even for
symbiosis to take place, components involved in s y m b i o ­
sis m u s t have different functions. T h u s , t h e a s s u m p t i o n
of symbiosis is diversity. E v o l u t i o n o f t e n drives species
i n t o niches so t h a t d i f f e r e n t i a t i o n between species takes
place.
Intelligence is a m a n i f e s t a t i o n of a h i g h l y complex a n d
d y n a m i c system whose components m a i n t a i n a h i g h level
of diversity. It is analogous to b i g artifacts such as b i g
b u i l d i n g s a n d bridges. I n physics, p a r t i c l e physicists t r y
to discover t h e u n i f i e d theory. N u m b e r s of theories such
as t h e s u p e r s t r i n g t h e o r y a n d the s u p e r s y m m e t r y theor
have been proposed. A l t h o u g h these theories comes very
close to t h e u n i f i e d theory, or even if one of t h e m is t h e
u n i f i e d theory, t h e y never e x p l a i n how a b u i l d i n g can be
built.
B u i l d i n g s are b u i l t using various different components
i n t e r r e l a t e d t o each other j u s t like symbiosis. I t i s i m p o s ­
sible to b u i l d any m e a n i n g f u l a r c h i t e c t u r e using a single
t y p e o f component. Theories w h i c h c l a i m t h a t one par­
t i c u l a r mechanism can create intelligence f a l l i n t o t h i s
fallacy. T h u s , it is w r o n g to c l a i m t h a t rules alone can
be t h e mechanism b e h i n d intelligence. It is w r o n g to
c l a i m t h a t m e m o r y alone can be the mechanism b e h i n d
intelligence.
M a n y models o f n e u r a l n e t w o r k s make t h e same mis­
take. M o s t of models are t o o s i m p l i f i e d as t h e y usually
assume a single n e u r o n t y p e . T h e real b r a i n consists
o f numbers o f different types o f neurons interconnected
in a specific a n d perhaps genetically specified manner.
I n a d d i t i o n , most neural n e t w o r k s assumes electric i m ­
pulses to be the sole source of i n t e r - n e u r o n c o m m u n i c a ­
t i o n . Recent biological studies revealed t h a t h o r m o n a l
effects also affect the state of neurons.
E v e n if mechanisms at microscopic levels can be de­
scribed by r e l a t i v e l y small numbers of principles, it is
analogous to a t h e o r y of p a r t i c l e physics. In order for
us to f o r m u l a t e t h e m o d e l of t h o u g h t , we m u s t find how
these components create diverse substructures a n d how
they interact.
7.5.
Motivation — Genetic Supervision
No learning takes place w i t h o u t m o t i v a t i o n . T h i s is t h e
factor w h i c h i s c r i t i c a l l y l a c k i n g i n m u c h c u r r e n t A I re­
search. T h e t e r m motivation is used as a drive for learn­
ing. A l t h o u g h , b o t h conscious a n d subconscious drives
are i m p o r t a n t , t h e f o l l o w i n g discussion focuses on a s u b ­
conscious d r i v e .
One example w o u l d i l l u s t r a t e t h e p o i n t o f m y discus­
sion. I n f a n t s learn t o avoid dangerous t h i n g s w i t h o u t any
teaching signal f r o m the e n v i r o n m e n t . W h e n a n i n f a n t
touches a very h o t o b j e c t , the i n f a n t reflexively removes
828
Awards
its h a n d a n d screams. A f t e r several such experiences,
t h e i n f a n t w o u l d learns t o avoid t h e hot o b j e c t . T h e
question is w h y this i n f a n t learns to avoid h o t objects
r a t h e r t h a n learning t o prefer t h e m . T h e r e m u s t b e some
i n n a t e d r i v e , w h i c h guides learning in a p a r t i c u l a r direc­
t i o n . L e a r n i n g is genetically supervised. We learn to
avoid dangerous t h i n g s because i n d i v i d u a l s whose learn­
i n g i s d r i v e n t o t h a t d i r e c t i o n have s u r v i v e d b e t t e r t h a n
a g r o u p of i n d i v i d u a l s whose learning is d r i v e n to pre­
fer dangerous t h i n g s . Therefore, learning is genetically
supervised in a p a r t i c u l a r d i r e c t i o n w h i c h improves the
r e p r o d u c t i o n r a t e . I n a d d i t i o n , studies o n c h i l d language
a c q u i s i t i o n seems to s u p p o r t t h e existence of an i n n a t e
m e c h a n i s m for selective learning. N o t j u s t d i r e c t i o n a n d
focus, b u t also the learning mechanisms are d e t e r m i n e d
genetically.
Some species of b i r d d e m o n s t r a t e a f o r m of learning
called imprinting. For these b i r d s , t h e first m o v i n g ob­
j e c t w h i c h t h e y see after h a t c h i n g w i l l be i m p r i n t e d on
t h e i r b r a i n as being t h e i r m o t h e r . I m p r i n t i n g is an i n ­
nate learning w h i c h allows birds to q u i c k l y recognize
t h e i r m o t h e r , so t h a t t h e y can be fed and p r o t e c t e d w e l l .
W h i l e there are c e r t a i n risks t h a t a n i n d i v i d u a l m i g h t
i m p r i n t s o m e t h i n g different, e v o l u t i o n a l choice has been
made to use the p a r t i c u l a r mechanism under t h e g i v e n
environment.
H o w knowledge should be organized and represented
largely depends u p o n the purpose o f learning. I f t h e
learning is to avoid dangerous t h i n g s , features of ob­
jects such as t e m p e r a t u r e , speed, distance, and other
factors c o u l d b e i m p o r t a n t . I n a d d i t i o n , reasoning mech­
anisms to p r o v i d e quick r e a l - t i m e response w i l l be used
rather t h a n accurate b u t slow mechanisms. T h e subs u m p t i o n a r c h i t e c t u r e c a p t u r e d a mechanism w h i c h is
largely p r i m i t i v e b u t i m p o r t a n t for s u r v i v a l . T h u s , these
mechanisms m u s t be composed of r e l a t i v e l y simple cir­
c u i t r y and p r o v i d e r e a l - t i m e a n d autonomous reactions
without central control.
T h e idea of reinforcement learning [ W a t k i n є , 1989]
provides a more realistic f r a m e w o r k of l e a r n i n g t h a n
most t r a d i t i o n a l models of learning w h i c h assume ex­
p l i c i t teaching signals. Reinforcement learning assumes
t h a t the a c t i o n m o d u l e received a scalar value feedback
called reward, according to their action under a c e r t a i n
e n v i r o n m e n t . I feel this is closer to reality. T h e s h o r t ­
c o m i n g of reinforcement l e a r n i n g is t h a t t h e e v a l u a t i o n
f u n c t i o n t o p r o v i d e t h e r e w a r d signal t o t h e a c t i o n m o d ­
ule, is given a priori.
A c k e l y a n d L i t t m a n proposed
E v o l u t i o n a l Reinforcement L e a r n i n g ( E R L : [ A c k l e y a n d
L i t t m a n , 1992]) t o remedy t h i s p r o b l e m . I n E R L , the
e v a l u a t i o n f u n c t i o n co-evolves w i t h t h e action m o d u l e
s o t h a t t h e i n d i v i d u a l w i t h a better e v a l u a t i o n f u n c t i o n
learns m o r e effectively. I n d e p e n d e n t l y , E d e l m a n uses the
value system to d r i v e learning of D A R W I N - I I I to a de­
sired d i r e c t i o n . I call t h i s class of models Genetic Supervision. A general a r c h i t e c t u r e for genetic supervision
s h o u l d look like F i g u r e 22.
For genetic supervision t o b e g r o u n d e d i n t h e b i o l o g i ­
cal a n d e v o l u t i o n a l c o n t e x t , several new factors m u s t be
considered such as t h e influence of hormones over m e m -
that artificial life is "a constructive lower bound for ar­
tificial intelligence" [Belew, 1991]. Brooks made a good
argument for the embodiment of robots in the real world
at a macroscopic level. The argument in this section is at
a more microscopic level. My argument is that physics
will be necessary to make emergence, development, and
evolution to take place. It should be noted that this is
one extreme, and much AI research will be carried out
without physics. But, as research delves into basic mech­
anisms, physics will have to be considered. There are two
promising directions, both approaches whcih should be
pursued for physical grounding of AI and Alife — Nanotechnology and digital physics.
Nano-technology can be a driving force grounding AI
and ALife research to actual physics. The same physics
which has been applied to ourselves for billions of years
can be applied to A I . This approach will result in enor­
mous impact to society because the very definition of
natural life and intelligence will be contested in the light
of the state-of-the-art technologies.
Figure 22: Learning and Evolution
ory modulation [McGaugh, 1989] and A 6 and A 10 neural
systems. These systems have an older origin than neo­
cortex, and control deep emotional and motivational as­
pects of our behavior. Due to the fact these systems were
established at an early stage of evolution, their signal
processing-transmission capabilities are limited. Gener­
ally, signals from these systems act as an analog global
modulator. Nevertheless, I believe that these factors
play critically important roles in the next generation of
intelligence theories.
7.6.
Physics
Since intelligence is a result of evolution and emergence,
it is governed by physics. As long as models of intel­
ligence deal with stationary models, the significance of
physics does not come into play. However, once the re­
search is directed to more basic properties such as emer­
gence, development, and evolution, physics cannot be
ignored. For example, morphogenesis involves cell move­
ments and cell divisions which are largely influenced by
physics. Morphogenesis is not necessary, or it can be
significantly simplified, if there is no physics involved.
However, at the same time, theories without physics in­
evitably assume a priori external constraints. Basic con­
figurations of gene and reproduction mechanisms could
have been different if a different physics has been applied.
Recent discoveries from the U.S. Space Shuttle missions,
particularly by Japanese experiments in Endeavor, un­
covered the importance of gravitational influence in cell
division and the gene repair. The semantics of the world
could have been very different if a spontaneous break­
down of symmetry took place in a very different way.
This argument largely assumes artificial life research to
be the basis for artificial intelligence. I agree with Belew
To the contrary, digital physics will be able to create
a new physics. Massively parallel machines in the near
future may be able to produce physics in themselves.
This was one of the motivations for building the con­
nection machine [Hillis, 1985]. Just as Langton termed
Artificial Life life-as-it-could-be [Langton, 1989], digital
physics may allow us to pursue the universe-as-it-couldbe.
8.
Conclusion
The arguments developed in this paper can be expressed
in a single statement: Massively parallel artificial intelligence is where AI meets the real world. The phrase AI
meets the real world has been used in various contexts,
such as Robotics is where AI meets the real world. Nev­
ertheless, the phrase has its own significance and that is
why it has been repeatedly used.
Artificial intelligence is the field of studying models of
intelligence and engineering methods for building intelli­
gent systems. The model of intelligence must face the bi­
ological reality of the brain, and intelligent systems must
be deployed in the real world. The real world is where we
live, and embodies vastness, complexity, noise, and other
factors which are not easy to overcome. In order for the
AI to succeed, powerful tools to tackle complexity and
vastness of the real world phenomena must be prepared.
In physics, there are powerful tools such as mathematics,
particle accelerators, the hubble telescope, and other ex­
perimental and conceptual devices. Biology made a leap
when D N A sequencing and gene recombination methods
were discovered.
Artificial intelligence, however, has been constrained
by available computing resources for the last thirty
years. The conceptual devices have been influenced by
hardware architectures to date. This is represented by
the traditional AI approach. Fortunately, however, the
progress of V L S I technology liberates us from the com­
putational constraints.
Kitano
829
Computationally demanding and data-driven approaches, such as memory-based reasoning and genetic
algorithms emerge as realistic alternatives to the tradi­
tional approach. This does not mean that the traditional
approach will be eliminated. There are suitable applica­
tions for traditional models, and there are places where
massively parallel approach will be suitable. Thus, there
will be co-habitation of different approaches. However,
the emergent new approach will be a powerful method
to challenge the goals of A I , which has not been accom­
plished so far.
Massive computing power and memory space, along
with new ideas for A I , will allow us to challenge the re­
alities in our engineering and scientific discipline. For AI
to be a hard-core science and engineering discipline, real
world AI must be pursued and our arsenals for attack­
ing the problems must be enriched. I hope that discus­
sions developed in this paper contribute to the commu­
nity making a quantum leap into the future.
Acknowledgements
There are just too many people to thank: Makoto Nagao, Jaime Carbonell, Masaru Tomita, Jay McClleland,
David Waltz, Terry Sejnowski, Alex Waibel, Kai-Pu Lee,
Jim Gellar, David Goldberg, Dan Moldovan, Tetsuya
Higuchi, Satoshi Sato, Hideto Tomabechi, Moritoshi Yasunaga, Ron DeMara, Toyoaki Nishida, Hitoshi Iida,
Akira Kurematsu, Hiroshi Okuno, K o i t i Hasida, Katashi
Nagao, Toshio Yokoi, Mario Tokoro, Lori Levin, Rod­
ney Brooks, Chris Langton, Katsu Shimohara, Scott
Fahlman, David Touretzky, Hideo Shimazu, Akihiro Shibata, Keiji Kojima, Hitoshi Matsubara, Eiichi Osawa,
Eiichirou Sumita, Kozo Oi... I cannot list them all.
Sorry if you are not in the list. Jim Hendler gave me
various advices for the final manuscript of this paper. I
owe him a lot.
This research is supported, in part, by National Sci­
ence Foundation grant MIP-9009111 and by Pittsburgh
Supercomputing Center IRI-910002P.
References
[Ackley and Littman, 1992] Ackley, D. and Littman,
M., "Interactions Between Learning and Evolu­
tion," Artificial Life II, Addison Wesley, 1992.
[Ae and Aibara, 1989] Ae, T. and Aibara, R., "A neural
network for 3-D V L S I accelerator," VLSI for Artificial Intelligence, Kluwer Academic Pub., 1989.
[Atkeson and Reinkensymeyer, 1990] Atkeson, C. and
Rienkensymeyer, D., "Using Associative ContentAddressable Memories to Control Robots," Artificial Intelligence at MIT, 1990.
[Batcher, 1980] Batcher, K., "Design of a Massively Par­
allel Processor," IEEE Transactions on Computers
29, no.9, September 1980.
[Belew, 1991] Belew, R., "Artificial Life: A Construc­
tive Lower Bound for Artificial Intelligence," IEEE
Expert, Vol. 6, No. 1, 8-15, 1991.
830
Awards
[Blank, 1990] Blank, T., "The MasPar MP-1 Architec­
ture," Proc. of COMPCON Spring 90, 1990.
[Blelloch, 1986] Blelloch, G. E., AFS-1: A programming
language for massively concurrent computers, Tech
Report 918, M I T AI Lab., 1986.
[BleUoch, 1986] Blelloch, G. E., "CIS: A Massively Con­
current Rule-Based System," Proceedings of the National Conference on Artificial Intelligence (AAAI86), 1986.
[Bouknight et al., 1972] Bouknight, W. J., et al., "The
I L L I A C I V System," Proceedings of the IEEE, 369388, A p r i l , 1972.
[Bowler and Pawley, 1984] Bowler, K. C. and Pawley,
G. S., "Molecular Dynamics and Monte Carlo
Simulation in Solid-State and Elementary Parti­
cle Physics," Proceedings of the IEEE, 74, January
1984.
[Brooks, 1991] Brooks, R., "Intelligence Without Rea­
son," Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-91), Sydney,
1991.
[Brooks, 1986] Brooks, R., "A robust layered control
system for a mobile robot," IEEE J. Robotics and
Automation, RA-2, 14-23, April, 1986.
[Charniak, 1983] Charniak, E., "Passing Markers: A
Theory of Contextual Influence in Language Com­
prehension," Cognitive Science, 7, 171-190, 1983.
[Charniak, 1986] Charniak, E., "A Neat Theory of
Marker Passing," Proc. of AAAI-86, 1986.
[Chung et al., 1992] Chung, S., Moldovan, D., and De­
Mara, R., A Parallel Computational Model for Integrated Speech and Natural Language Understanding,
University of Southern California, TR PKPL-92-2,
also as IMPACT-92-008-USC, 1992.
[Cook, 1991] Cook, J., "The Base Selection Task in Ana­
logical Planning," Proc. of IJCAI-91, 1991.
[Creecy et al., 1992] Creecy, R., Masand, B., Smith, S.,
and Waltz, D., "Trading MIPS and Memory for
Knowledge Engineering: Automatic Classification
of Census Returns on a Massively Parallel Super­
computer," Communications of the ACM, 1992.
[Dally et al., 1989] Dally, W. J., et. al. "The J-Machine:
a fine-grain concurrent computer," Information
Processing 89, Elsevier North Holland, 1989.
[Dixon and de Kleer, 1988] Dixon, M. and de Kleer, J.,
"Massively Parallel Assumption-based Truth Main­
tenance," Proceedings of the the National Conference on Artificial Intelligence (AAAI-88), 1988.
[Drumheller, 1986] Drumheller, M., "Connection Ma­
chine stereomatching," Proceedings of the National
Conference on Artificial Intelligence (AAAI-86),
1986.
[EDR, 1988] Japan Electric Dictionary Research Insti­
tute, " E D R Electric Dictionaries," Technical Re­
port, Japan Electric Dictionary Research Institute,
1988.
[Edelman, 1989] Edelman,
G.,
The Remembered Present: A Biological Theory of
Consciousness, Basic Books, 1989.
[Edelman, 1987] Edelman, G., Neural Darwinism: The
Theory of Neuronal Group Selection, Basic Books,
1987.
[Evett et al., 1990a] Evett, M., Hendler, J., and Spector, L., P A R K A : Parallel knowledge representation
on the Connection Machine, Tech Report CS-TR2409, University of Maryland, 1990.
[Evett et al., 1990b] Evett, M., Hendler, J., Mahanti,
A., and Nau, D., " P R A * A memory-limited heuris­
tic search procedure for the connection machine,"
Proceedings - Frontiers of Massively Parallel Com­
puting, 1990.
[Fahlman, 1979] Fahlman, S. E., N E T L : A System f o r
Representing and Using Real World Knowledge,
M I T Press, 1979.
[Furuse and Iida, 1992] Furuse, O. and Iida, H., "Coop­
eration between Transfer and Analysis in ExampleBased Framework," Proc. of COLING-92, Nantes,
1992.
[Gazdar et al., 1985] Gazdar, G., Klein, G., Pullum, G.,
and Sag, I., Generalized Phrase Structure Grammar, Harvard University Press, 1985.
[Geller, 199l] Geller, J., Advanced Update Operations in
Massively Parallel Knowledge Representation, New
Jersey Institute of Technology, CIS-91-28, 1991.
[Gelsinger et al., 1989] Gelsinger,
P.,
Gargini,
P., Parker, G., and Yu, A., "Microprocessors circa
2000," I E E E Spectrum, Oct. 1989.
[Goodman and Nirenberg, 1991] Goodman,
K. and Nirenberg, S. (Eds.), The K B M T Project:
A case study in Knowledge-Based Machine Trans­
lation, Morgan Kaufmann, 1991.
[Graf et al., 1988] Graf, H., et. al., " V L S I implementa­
tion of a neural network model," I E E E Computer,
Vol. 21, No. 3, 1988.
[Gruau, 1992] Gruau, F., Celluar Encoding of Genetic
Neural Netowrks, Ecole Normale Suerieure de Lyon,
Lyon, 1992.
[Hammond, 1986] Hammond, K., Case-Based Planning:
An Integrated Theory of Planning, Learning, and
Memory, Ph. D. Thesis, Yale University, 1986.
[Hasida et al., 1993] Hasida, K., Nagao, K., and
Miyata, T., "Joint Utterance:
Intrasentential
Speaker/Hearer Switch as an Emergent Phe­
nomenon," Proc. of IJCAI-93, 1993.
[Hendler, 1989a] Hendler,
J., "Marker-passing over Microfeatures: Towards a
Hybrid Symbolic/Connectionist Model," Cognitive
Science, 13, 79-106, 1989.
[Hendler, 1989b] Hendler, J., "The Design and Implementation of Marker-Passing Systems," Connection
Science, Vol. 1, No. 1, 1989.
[Hendler, 1988] Hendler, J., Integrating Marker-Passing
and Problem-Solving, Lawrence Erlbaum Asso­
ciates, 1988.
[Higuchi et al., 199l] Higuchi, T., Kitano, H., Furuya,
T., Kusumoto, H., Handa, K., and Kokubu, A.,
" I X M 2 : A Parallel Associate Processor for Knowl­
edge Processing," Proceedings of the National Con­
ference on Artificial Intelligence, 1991.
[Hillis, 1985] Hillis, D., The Connection Machine, The
M.I.T. Press, 1985.
[Hirst, 1986] Hirst, G., Semantic Interpretation and
the Resolution of Ambiguity, Cambridge University
Press, 1986.
[Holland, 1975] Holland, J., Adaptation in Natural and
Artificial Systems, University of Michigan, 1975.
[Hord, 1990] Hord, M., Parallel Supercomputing in
SIMD Architectures, CRC Press, 1990.
[Hsu, 1991] Hsu, F. H., "The Role of Chess in Al Re­
search," Proc. of IJCAI-91, 1991.
[Hsu, 1990] Hsu, F. H., Large Scale Parallelization of
Alpha-Beta Search: An Algorithmic and Architec­
tural Study with Computer Chess, CMU-CS-90-108,
Ph.D. Thesis, 1990.
[Hsu et a l , 1990] Hsu, F. H., Anantharaman, T.,
• Campbell, M. and Nowatzyk, A., "A Grandmaster
Chess Machine," Scientific American, Vol. 263, No.
4, Oct., 1990.
[IEDM, 1991] I E E E , Proc. of the International Electron
Device Meeting (IEDM-91), 1991.
[Inmos, 1987] Inmos, I M S T800 Transputer, 1987.
[Jantsch, 1980] Jantsch, E., The Self-Organizing Uni­
verse, Pergamon Press, 1980.
[Kaplan and Bresnan, 1982] Kaplan, R. and Bresnan,
J., "Lexical-Functional Grammar: A Formal Sys­
tem for Grammatical Representation," The Mental
Representation of Grammatical Relations, The M I T
Press, 1982.
[KSR, 1992] Kendall Square Research, KSR-1 Technical
Summary, Kendall Square Research, 1992.
[Kettler et a l , 1993] Kettler,
B., Andersen, W., Hendler, J. and Evett, M. "Mas­
sively Parallel Support for a Case-based Planning
System", Proceedings of the Ninth I E E E Confer­
ence on Al Applications, Orlando, Florida, March
1993.
[Kikui et al., 1993] Kikui,
K.,
Seligman,
M., Takezawa, T., "Spoken Language Translation
System: ASURA," Video Proceedings of IJCAI-93,
1993.
[Kitano, 1993] Kitano, H., "A Comprehensive and Prac­
tical Model of Memory-Based Machine Transla­
tion," Proceedings of the International Joint Con­
ference on Artificial Intelligence (IJCAI-93), 1993.
[Kitano et al., 1993] Kitano, H., Wah, B., Hunter, L.,
Oka, R., Yokoi, T., and Hahn, W., "Grand Chal­
lenge Al Applications," Proceedings of the Inter­
national Joint Conference on Artificial Intelligence
(IJCAI-93), 1993.
[Kitano, 1992a] Kitano, H., Neurogenetic Learning
Carnegie Mellon University, C M T Technical Report, 1992.
[Kitano, 1992b] Kitano, H. (Ed.), Proceedings of the
Workshop on Grand Challenge AI Applications,
Tokyo, 1992. (in Japanese)
[Kitano and Yasunaga, 1992] Kitano, H. and Yasunaga,
M., "Wafer Scale Integration for Massively Parallel
Memory-Based Reasoning," Proceedings of the Na­
tional Conference on Artificial Intelligence ( A A A I 92), 1992.
[Kitano et al., 1992] Kitano, H., Shibata, A., Shimazu,
H., Kajihara, J., and Sato, A., "Building a LargeScale and Corporate-Wide Case-Based System,"
Proc. of AAAI-92, 1992.
Kitano
831
[ K i t a n o e t a l . , 1991a] K i t a n o , H . , Hendler, J . , H i g u c h i ,
T . , M o l d o v a n , D . a n d W a l t z , D., "Massively P a r a l ­
lel A r t i f i c i a l I n t e l l i g e n c e , " Proceeding o f the I n t e r n a t i o n a l J o i n t Conference o n A r t i f i c i a l Intelligence
( I J C A I - 9 1 ) , 1991.
[ K i t a n o e t a l , 1991b] K i t a n o , H , M o l d o v a n , D., C h a ,
S., " H i g h Performance N a t u r a l Language Process­
i n g o n Semantic N e t w o r k A r r a y Processor," P r o ­
ceeding o f the I n t e r n a t i o n a l J o i n t Conference o n A r ­
t i f i c i a l Intelligence ( I J C A I - 9 1 ) , 1991.
[ K i t a n o et a l . , 1991c] K i t a n o , H . , S m i t h , S. F., a n d
H i g u c h i , T . , " G A - 1 : A P a r a l l e l Associative M e m ­
o r y Processor for R u l e L e a r n i n g w i t h Genetic A l ­
g o r i t h m s , " Proceedings of the I n t e r n a t i o n a l Confer­
ence on Genetic A l g o r i t h m s ( I C G A - 9 1 ) , 1991.
[ K i t a n o a n d H i g u c h i , 1991a] K i t a n o , H . a n d H i g u c h i ,
T . , " H i g h Performance M e m o r y - B a s e d T r a n s l a t i o n
o n I X M 2 Massively P a r a l l e l Associative M e m o r y
Processor," Proceeding of the N a t i o n a l Conference
o n A r t i f i c i a l Intelligence ( A A A I - 9 1 ) , 1991.
[ K i t a n o a n d H i g u c h i , 1991b] K i t a n o , H . a n d H i g u c h i ,
T . , "Massively P a r a l l e l M e m o r y - B a s e d P a r s i n g , "
Proceeding of the I n t e r n a t i o n a l J o i n t Conference on
A r t i f i c i a l Intelligence ( I J C A I - 9 1 ) , 1991.
[ K i t a n o a n d M o l d o v a n , 1991] K i t a n o , H . a n d M o l d o v a n ,
D., " S e m a n t i c N e t w o r k A r r a y Processor as a
Massively P a r a l l e l P l a t f o r m for R e a l - t i m e a n d
Large-Scale N a t u r a l Language Processing," Proc o f
C O L I N G - 9 2 , 1992.
[ K i t a n o , 1991] K i t a n o , H . , "  D M D U L O G : A n E x p e r i ­
m e n t a l Speech-to-Speech Dialogue T r a n s l a t i o n Sys­
t e m , " I E E E Computer, June, 1991.
[ K i t a n o , 1990a] K i t a n o , H., "  D M D I A L O G : A Speechto-Speech Dialogue T r a n s l a t i o n , " M a c h i n e Transla­
t i o n , 5, 301-338, 1990.
[ K i t a n o , 1990b] K i t a n o , H . , " D e s i g n i n g N e u r a l N e t ­
works using Genetic A l g o r i t h m s w i t h G r a p h Gener­
a t i o n s y s t e m , " Complex Systems, V o l . 4., N u m . 4.,
1990.
[ K i m a n d M o l d o v a n , 1990] K i m , J . a n d M o l d o v a n , D.,
" P a r a l l e l Classification for K n o w l e d g e Representa­
t i o n o n S N A P , " Proceedings o f the 1990 I n t e r n a ­
t i o n a l Conference on P a r a l l e l Processing, 1990.
[ K o l o d n e r , 1988] K o l o d n e r , J . , " R e t r i e v i n g E v e n t s f r o m
a Case M e m o r y : A P a r a l l e l I m p l e m e n t a t i o n , " P r o ­
ceedings of Case-Based Reasoning Workshop, 1988.
[ K o l o d n e r , 1984] K o l o d n e r , J . , R e t r i e v a l a n d Organiza­
t i o n a l Strategies in Conceptual M e m o r y : A C o m ­
p u t e r M o d e l , Lawrence E r l b a u m A s s o c , 1984.
[ L a n g t o n , 1991] L a n g t o n , C , T a y l o r , C , F a r m e r , J . ,
a n d Rasmussen, S.., (Eds.) A r t i f i c i a l L i f e I I , A d ­
dison Wesley, 1991.
[ L a n g t o n , 1989] L a n g t o n , C , ( E d . ) A r t i f i c i a l L i f e , A d ­
dison Wesley, 1989.
[Lee, 1988] Lee,
K. F.,
Large-Vocabulary SpeakerIndependent C o n t i n u o u s Speech Recognition: The
S P H I N X System, C M U - C S - 8 8 - 1 4 8 , P h . D . Thesis,
1988.
[Lenat a n d Feigenbaum, 1991] L e n a t , D. B. a n d Feigenb a u m , E., " O n the thresholds o f k n o w l e d g e , " A r t i ­
ficial Intelligence, 47, 185-250, 1991.
832
Awards
[Lenat a n d G u h a , 1989] L e n a t , D. B. a n d G u h a , R. V . ,
B u i l d i n g Large Knowledge-Based Systems, A d d i s o n Wesley, 1989.
[ L i n d e n m a y e r , 1971] L i n d e n m a y e r , A . , " D e v e l o p m e n t a l
Systems w i t h o u t Celluar i n t e r a c t i o n s , t h e i r L a n ­
guages a n d G r a m m a r s , " J . Theor. B i o l , 30, 455484, 1971.
[ L i n d e n m a y e r , 1968] L i n d e n m a y e r , A . , " M a t h e m a t i c a l
M o d e l s for Celluar I n t e r a c t i o n s i n D e v e l o p m e n t , "
J . Theor. B i o l , 18, 280-299, 1968.
[ M a r g u l i s , 1981] M a r g u l i s , L., Symbiosis in Cell E v o l u ­
t i o n , Freeman, 1981.
[ M c G a u g h , 1989] M c G a u g h , J . , " I n v o l v e m e n t o f H o r ­
m o n a l a n d N e u r o m o d u l a t o r Systems i n t h e Reg­
u l a t i o n o f M e m o r y Storage," A n n . Rev. N e u r o s c i . ,
12:255-87, 1989.
[ M e a d , 1989] M e a d , C , Analog V L S I a n d N e u r a l Sys­
tems, A d d i s o n Wesley, 1989.
[ M i n s k y , 1986] M i n s k y , M . , The Society o f M i n d , S i m o n
a n d Schuster, N e w Y o r k , N Y , 1986.
[ M o l d o v a n et a l . , 1990] M o l d o v a n , D., Lee, W . , a n d
L i n , C , S N A P : A Marker-Passing Architecture f o r
Knowledge Processing, Technical R e p o r t P K P L 904 , D e p a r t m e n t o f E l e c t r i c a l E n g i n e e r i n g Systems,
U n i v e r s i t y of S o u t h e r n C a l i f o r n i a , 1990.
[ M o r i m o t o e t a l . , 1989] M o r i m o t o , T . , I i d a , H . , K u r e m a t s u , A . , Shikano, K., and A i z a w a , T . , " S p o ­
ken Language T r a n s l a t i o n , " Proc. of I n f o J a p a n - 9 0 ,
T o k y o , 1990.
[ M y c z k o w s k i a n d Steele, 1 9 9 l ] M y c z k o w s k i ,
J.
a n d Steele, G., "Seismic M o d e l i n g At 14 Gigaflops
O n T h e C o n n e c t i o n M a c h i n e , " Supercomputing-91,
1991.
[Nagao, 1984] Nagao, M . , "A F r a m e w o r k of a M e c h a n ­
ical T r a n s l a t i o n between Japanese a n d E n g l i s h by
Analogy Principle," Artificial and Human Intelli­
gence, E l i t h o r n , A . , a n d B a n e r j i , R. ( E d s . ) , Elsevier
Science Publishers, B . V . 1984.
[ N R C , 1988] N a t i o n a l Research C o u n c i l , M a p p i n g a n d
Sequencing the H u m a n Genome, N a t i o n a l A c a d e m y
Press, W a s h i n g t o n , D.C., 1988.
[Nickolls, 1990] Nickolls, J . , " T h e Design of t h e M a s P a r
M P - 1 : A Cost Effective Massively P a r a l l e l C o m ­
p u t e r , " Proc. o f C O M P C O N S p r i n g 90, 1990.
[Nicolis a n d P r i g o g i n e , 1977] Nicolis, G. a n d P r i g o g i n e ,
I., Self-Organization i n N o n e q u i b i l l i u m Systems,
W i l e y , N e w Y o r k , N Y , 1977.
[ N o r v i g , 1989] N o r v i g , P., " M a r k e r Passing as a Weak
M e t h o d for T e x t I n f e r e n c i n g , " Cognitive Science,
13, 569-620, 1989.
[Pinker, 1991] P i n k e r , S., "Rules of L a n g u a g e , " ; Science,
V o l . 253, 530-535, 1991.
[ P o l l a r d a n d Sag, 1987] P o l l a r d ,
C.
and
Sag, L , I n f o r m a t i o n - B a s e d Syntax a n d Semantics,
V o l u m e 1, C S L I L e c t u r e Notes, 13, 1987.
[ Q u i l l i a n , 1968] Q u i l l i a n , M . R., " S e m a n t i c M e m o r y , "
Semantic I n f o r m a t i o n Processing, M i n s k y , M .
( E d . ) , 216-270, T h e M I T Press, 1968.
[Riesbeck and M a r t i n , 1986] Riesbeck, C. a n d M a r t i n ,
C , " D i r e c t M e m o r y Access P a r s i n g , " Experience,
M e m o r y , a n d Reasoning, Lawrence E r l b a u m Asso­
ciates, 1986.
[Riesbeck a n d Schank, 1989] Riesbeck, C. a n d Schank,
R., Inside Case-Baaed Reasoning, Lawrence E r l b a u m Associates, 1989.
[Robertson, 1987] R o b e r t s o n , G . , " P a r a l l e l I m p l e m e n t a ­
t i o n of Genetic A l g o r i t h m s in a Classifier S y s t e m , "
D a v i s , L., ( E d . ) Genetic A l g o r i t h m s a n d S i m u l a t e d
A n n e a l i n g , M o r g a n K a u f m a n n Publishers, 1987.
[ R u m e l h a r t a n d M c C l e l l a n d , 1986] R u m e l h a r t , D . a n d
M c C l e l l a n d , J . , P a r a l l e l D i s t r i b u t e d Processing: E x p l o r a t i o n s i n the M i c r o s t r u c t u r e o f C o g n i t i o n , T h e
M I T Press, 1986.
[Sabot et a l , 1991] Sabot, G., Tennies, L., Vasilevsky,
A . and Shapiro, R., " C o m p i l e r P a r a l l e l i z a t i o n o f
an E l l i p t i c G r i d Generator for 1990 G o r d o n B e l l
P r i z e , " Supercomputing-91, 1991.
[Saito and T o m i t a , 1988] Saito, H . a n d T o m i t a , M . ,
" P a r s i n g Noisy Sentences," Proc. o f C O L I N G - 8 8 ,
1988.
[Sato a n d Nagao, 1990] Sato, S. a n d Nagao, M . , " T o ­
w a r d M e m o r y - b a s e d T r a n s l a t i o n , " Proceedings o f
C O L I N G - 9 0 , 1990.
[Sato, 1992] Sato, S., Example-Based M a c h i n e Transla­
t i o n , P h . D . Thesis, K y o t o U n i v e r s i t y , 1992.
[Sato, 1993] Sato, S., Example-Based T r a n s l a t i o n of
Technical Terms, I S - R R - 9 3 - 4 1 , J a p a n A d v a n c e d I n ­
s t i t u t e of Science and Technology ( J A I S T ) , 1993.
[Schank, 1982] Schank,
Press, 1982.
R.,
Dynamic
Memory,
MIT
[Schank, 1981] Schank, R. a n d Riesbeck, C, Inside
C o m p u t e r Understanding, Lawrence E r l b a u m As­
sociates, 1981.
[Schank, 1975] Schank, R.,
Conceptual
Processing, N o r t h H o l l a n d , 1975.
Information
[Seeds, 1967] Seeds, R., " Y i e l d a n d Cost Analysis of
B i p o l a r L S I , " Proc. o f I E E E I n t e r n a t i o n a l Electron
Devices Meeting, 1967.
[Sejnowski a n d Rosenberg, 1987] Sejnowski,
T.
a n d Rosenberg, C , " P a r a l l e l networks t h a t learn
to pronounce English t e x t , " Complex Systems, 1:
145-168, 1987.
[Shavlik a n d Towell, 1989] Shavlik, J. and T o w e l l , G . ,
" A n approach t o c o m b i n i n g explanation-based a n d
n e u r a l learning a l g o r i t h m s , " C o n n e c t i o n Science,
1:233-255, 1989.
[ S h i m i z u , e t a l , 1988] S h i m i z u , H . , Y a m a g u c h i , Y . , and
Satoh, K., " H o l o v i s i o n : a Semantic I n f o r m a t i o n
Processor for V i s u a l P e r c e p t i o n , " d y n a m i c P a t t e r n s
in Complex Systems, (Eds.) Kelso, S., et. al., W o r l d
Scientific, 42-76, 1988.
[Spiessens a n d B e r n a r d , 1 9 9 l ]
Spiessens, P., a n d B e r n a r d , M . , "A Massively P a r ­
allel Genetic A l g o r i t h m : I m p l e m e n t a t i o n and F i r s t
A n a l y s i s , " Proc. o f I C G A - 9 1 , 1991.
[Stanfill a n d W a l t z , 1988] S t a n f i l l , C. and W a l t z , D.,
" T h e M e m o r y - B a s e d Reasoning P a r a d i g m , " P r o ­
ceedings of the Case-Based Reasoning Workshop,
D A R P A , 1988.
[Stanfill a n d W a l t z , 1986] S t a n f i l l , C. and W a l t z , D.,
" T o w a r d M e m o r y - B a s e d Reasoning," C o m m u n i c a ­
t i o n s o f the A C M , 1986.
[Stanfill e t a l . , 1989] S t a n f i l l , C , T h a u , R., and W a l t z ,
D., " A Parallel I n d e x e d A l g o r i t h m for I n f o r m a t i o n
R e t r i e v a l , " Proceedings of the S I G - I R - 8 9 , 1989.
[ S t o r m o n e t a l , 1992] S t o r m o n ,
C,
Troullinos,
N.,
Saleh, E., C h a v a n , A . , B r u l e , M . , and O l d f i e l d , J . ,
" A General-Purpose C M O S Associative Processor
I C a n d S y s t e m . " I E E E M i c r o , December, 1992.
[ S u m i t a et a l . , 1993J S u m i t a , E., O i , K., I i d a , H . ,
Higuchi, T., K i t a n o , H., "Example-Based Machine
T r a n s l a t i o n o n Massively Parallel Processors," P r o ­
ceedings of the I n t e r n a t i o n a l J o i n t Conference on
A r t i f i c i a l Intelligence ( I J C A I - 9 3 ) , 1993.
[ S u m i t a a n d I i d a , 1992] S u m i t a , E.,
and I i d a ,
H.,
" E x a m p l e - B a s e d Transfer of Japanese A d n o m i n a l
Particles i n t o E n g l i s h , " I E I C E Transaction o f I n ­
f o r m a t i o n a n d Systems, J u l y , 1992.
[ S u m i t a a n d I i d a , 1991] S u m i t a , E., and I i d a , H., " E x ­
p e r i m e n t s a n d Prospects o f Example-Based M a ­
chine T r a n s l a t i o n , " Proceedings of the A n n u a l Meet­
i n g o f the A C L ( A C L - 9 1 ) , 1991.
[ N O A H , 1992] Systems Research & Development I n s t i ­
t u t e of J a p a n , A P l a n f o r the Knowledge Archives
Project, 1992.
[Towell et a l . , 1990] T o w e l l , G., Shavlik, J. and N o ordewier, M . , " R e f i n e m e n t o f A p p r o x i m a t e D o m a i n
Theories by Knowledge-Based N e u r a l N e t w o r k s , "
Proc. o f A A A I - 9 0 , 1990.
[Taylor, 1974] Taylor, F., " I m p l i c a t i o n s a n d extensions
of the serial endosymbiosis theory of the o r i g i n of
eukaryotes," Taxon, 23:229-258, 1974.
[ T M C , 1991] T h i n k i n g Machines C o r p o r a t i o n ,
The
C o n n e c t i o n Machine C M - 5 Technical S u m m a r y ,
1991.
[ T M C , 1989] T h i n k i n g Machines C o r p o r a t i o n , M o d e l
C M - 2 Technical S u m m a r y , Technical R e p o r t T R 8 9 - 1 , 1989.
[Tonegawa, 1983] Tonegawa, S., " S o m a t i c Generation of
A n t i b o d y D i v e r s i t y , " N a t u r e , V o l . 302, No. 5909,
575-581, A p r i l 14, 1983.
[ T w a r d o w s k i , 1990] T w a r d o w s k i , K., " I m p l e m e n t a t i o n s
of a Genetic A l g o r i t h m based Associative Classifier
System ( A C S ) , " Proc. o f I n t e r n a t i o n a l Conference
on Tools f o r A r t i f i c i a l Intelligence, 1990.
[Waibel e t a l . , 199l] W a i b e l , A . , J a i n , A . , N c N a i r , A . ,
Saito, H., H a u p t m a n n , A . , and Tebelskis, J . ,
" J A N U S : A Speech-to-speech translation system
using connectionist a n d symbolic processing strate­
gies," I E E E Proceedings of the 1991 I n t e r n a t i o n a l
Conference on Acoustics, Speech, and Signal P r o ­
cessing, A p r i l , 1991.
[ W a l t z , 1991] W a l t z , D., " E i g h t Principles for B u i l d i n g
an I n t e l l i g e n t R o b o t , " Proceedings of the F i r s t I n ­
t e r n a t i o n a l Conference on S i m u l a t i o n of Adaptive
B e h a v i o r : Prom animals t o animats, M I T Press,
1991.
[ W a l t z , 1990] W a l t z , D., "Massively Parallel A l , " P r o ­
ceedings o f the N a t i o n a l Conference o n A r t i f i c i a l I n ­
telligence ( A A A I - 9 0 ) , 1990.
[ W a l t z , 1988] W a l t z , D., " T h e Prospects for B u i l d i n g
T r u l y I n t e l l i g e n t Machines," G r a u b a r d , S . ( E d . ) ,
The A r t i f i c i a l Intelligence Debate, T h e M I T Press,
1988.
Kitano
833
[Waltz and Pollack, 1985] Waltz, D. and Pollack, J.,
"Massively Parallel Parsing: A Strongly Interactive
Model of Natural Language Interpretation," Cogni­
tive Science, 9(1): 51-74, 1985.
[Watkins, 1989] Watkins, C, Learning with Delayed Re­
wards, Ph. D. Thesis, Cambridge University, 1989.
[Werner and Dyer, 1991] Werner, G. and Dyer, M.,
"Evolution of Communication in Artificial Organ­
isms," Artificial Life I I , Addison Wesley, 1991.
[Wilson, 1987] Wilson, S., "The genetic algorithm and
biological development," Proc. of International
Conference on Genetic Algorithms, 1987.
[Winograd, 1972] Winograd, T., Understanding Natural
Language, Academic Press, New York, 1972.
[Woods, 1970] Woods, W., "Transition network gram­
mars for natural language analysis," Communica­
tions of the A C M , 13:10, 591-606, 1970.
[Woods et al., 1972] Woods, W., Kaplan, R., NashWebber, B., The lunar sciences natural language in­
formation system: Final report, Bolt, Beranek and
Newman Report No. 2378, Cambridge, Mass., 1972.
[Yasunaga et al., 199l] Yasunaga, et al., "A SelfLearning Neural Network Composed of 1152 Dig­
ital Neurons in Wafer-Scale LSIs," Proc. of the I n ­
ternational Joint Conference on Neural Networks at
Singapore (IJCNN-91), 1991.
[Zhang et al., 1988] Zhang, X., Waltz, D., and Mesirov,
J., Protein Structure Prediction by Memory-based
Reasoning, Thinking Machine Corporation, 1988.
[Zhang et al., 1992] Zhang, X., Mesirov, J., and Waltz,
D., "Hybrid System for Protein Secondary Struc­
ture Prediction," J. M o l B i o l , 225, 1049-1063,
1992.
834
Awards