* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download View PDF - CiteSeerX
Survey
Document related concepts
Incomplete Nature wikipedia , lookup
Machine learning wikipedia , lookup
Soar (cognitive architecture) wikipedia , lookup
Wizard of Oz experiment wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Intelligence explosion wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Knowledge representation and reasoning wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Transcript
Challenges of Massive Parallelism Hiroaki Kitano Center for Machine Translation Software Engineering Laboratory Carnegie Mellon University N E C Corporation 5000 Forbes 2-11-5 Shibaura, Minato Pittsburgh, PA 15213 U.S.A. Tokyo 108 Japan [email protected] [email protected] Abstract merely speed u p t r a d i t i o n a l A I models. I n fact, m a n y t r a d i t i o n a l models are not appropriate for massively par allel i m p l e m e n t a t i o n . Vast majorities o f the A I models proposed so far are strongly influenced by the perfor mance of existing von N e u m a n n architectures. In a nar r o w sense, a von Neumann architecture can be repre sented by a single processor architecture w i t h a memory to store instructions and data. Even in the early 80s, available c o m p u t i n g power for most AI researchers was far less t h a n t h a t for personal computers in 1993, and hopelessly slower t h a n advanced RISC-based worksta tions. A d d i t i o n a l l y , memory space has increased d r a s t i cally in 10 years. A r t i f i c i a l Intelligence has been the field of study for e x p l o r i n g the principles u n d e r l y i n g thought, a n d u t i l i z i n g t h e i r discovery to develop use f u l computers. T r a d i t i o n a l A I models have been, consciously or subconsciously, o p t i m i z e d for available c o m p u t i n g resources w h i c h has led A I i n c e r t a i n directions. T h e emergence o f mas sively parallel c o m p u t e r s liberates the way i n telligence m a y b e m o d e l e d . A l t h o u g h the A I c o m m u n i t y has y e t to make a q u a n t u m leap, there are a t t e m p t s to make use of t h e o p p o r t u nities offered by massively parallel computers, such as m e m o r y - b a s e d reasoning, genetic algor i t h m s , a n d other novel models. E v e n w i t h i n t h e t r a d i t i o n a l A I approach, researchers have b e g u n to realize t h a t the needs for h i g h per formance c o m p u t i n g a n d very large knowledge bases to develop intelligent systems requires massively p a r a l l e l A I techniques. I n this C o m puters a n d T h o u g h t A w a r d lecture, I w i l l argue t h a t massively parallel a r t i f i c i a l intelligence w i l l a d d new dimensions t o t h e ways t h a t the A I goals are pursued, a n d d e m o n s t r a t e t h a t mas sively p a r a l l e l a r t i f i c i a l intelligence i s where A I meets t h e real w o r l d . 1. W i t h i n the hardware constraints to date, sequential rule application, for example, has been an o p t i m a l i m p l e m e n t a t i o n strategy. W h i l e I agree t h a t this idea is not merely an i m p l e m e n t a t i o n strategy ( i n fact there are numbers of cognitive bases for sequential rule ap p l i c a t i o n ) , hardware constraints have prevented AI re searchers f r o m seriously investigating and experimenting on other, more c o m p u t a t i o n a l l y demanding approaches. For example, the memory-based reasoning m o d e l could not have been experimentally supported, using only the c o m p u t i n g power and memory space available in the early 1980s. Genetic algorithms and neural networks could not be seriously investigated w i t h o u t a m a j o r leap in c o m p u t i n g power. Introduction Massively P a r a l l e l A r t i f i c i a l Intelligence is a field of s t u d y e x p l o r i n g methodologies for b u i l d i n g intelligent systems a n d i n v e s t i g a t i n g c o m p u t a t i o n a l models o f t h o u g h t a n d behavior t h a t use massively parallel c o m p u t i n g models [ W a l t z , 1990, K i t a n o et a l . , 1991a]. T r a d i t i o n a l l y , a r t i f i cial intelligence research has t w o goals: as an engineer i n g discipline, a r t i f i c i a l intelligence pursues methods to b u i l d useful a n d i n t e l l i g e n t systems. As a scientific dis c i p l i n e , a r t i f i c i a l intelligence aims at understanding the c o m p u t a t i o n a l mechanisms of t h o u g h t a n d behavior. I believe t h a t massively parallel a r t i f i c i a l intelligence w i l l b e t h e c e n t r a l p i l l a r i n t h e u l t i m a t e success o f a r t i f i c i a l intelligence i n b o t h engineering a n d scientific goals. Massively parallel artificial intelligence does not One m i g h t argue t h a t if a program were r u n for, say, a m o n t h , experimental results could have been obtained even in the early 1980s. However, in practice, dominat ing c o m p u t a t i o n a l resources for such a substantial period of t i m e w o u l d be impossible for the vast m a j o r i t y of researchers, and painfully long t u r n around t i m e for exper iments can be a m a j o r discouraging factor in p r o m o t i n g research activities based on computationally demanding paradigms. T h i s hardware factor inevitably guided, consciously or subconsciously, A I i n a particular direction. A l t h o u g h the issues and approach may vary, several researchers have been expressing concern over the influence of ex i s t i n g computer architectures on our models of t h o u g h t . D a v i d W a l t z stated: the methods and perspective of AI have been d r a m a t i c a l l y skewed by the existence of the c o m m o n d i g i t a l computer, sometime called the Kitano 813 von Neumann machine, and ultimately, AI will have to be based on ideas and hardware quite different from what is currently central to it.[Waltz, 1988] and proposed memory-based reasoning. Rodney Brooks noted in the his 1991 Computers and Thought lecture the state of computer architecture has been a strong influence on our models of thought. The Von Neumann model of computation has lead Artificial Intelligence in particular directionspBrooks, 199l] and proposed the subsumption architecture. Figure 1: Memory Capacity for Mass Production Mem ory Chips Therefore, it is critically important to understand the nature of the computational resources which are avail able today and will be available in the near future. I argue that massively parallel artificial intelligence will add a new dimension to our models of thought and to the approaches used in building intelligent systems. In the next section, the state-of-the-art in computing tech nology will be reviewed and the inevitability of massive parallelism will be discussed. 2. Massively Parallel C o m p u t e r s Advancements in hardware technologies have always been a powerful driving force in promoting new chal lenges. At the same time, available hardware perfor mance and architectures have been constraints for mod els of intelligence. This section reviews progress in device technologies and architecture, in order to give an idea of the computing power and memory space which could be available to AI researchers today. 2.1. Devices Device technology progress is fast and steady. Memory chips, D R A M , capacity increases at a 40% a year rate. 64M D R A M will be available shortly, and successful ex perimental results have been reported for a design rule. Figure 1 shows how this trend would continue into the future. Microprocessor performance increases at the rate of 20 to 30% a year. In 1990, engineers at Intel corporation predicted that processors in the year 2,000 (Micro2000) would contain over 50 million transistors per square inch and run with a 250MHz clock[Gelsinger et al., 1989]. The Micro2000 would contain four 750 MIPS CPUs run in parallel and achieves 2000 MIPS. In 1992, they have commented that their prediction was too modest! This implies the possibility of achieving low cost, high perfor mance computing devices. Thus, some of the research on current massively parallel computers may be conducted on workstations in the future. On the other hand, uniprocessor supercomputer per formance is improving at a substantially lower rate 814 Awards Figure 2: Peak Performance for Supercomputers than the microprocessor improvement rate. A l l newly announced vector supercomputers use multiprocessing technologies, so that the slow processor performance im provement rate can be compensated for (Figure 2). In addition, massively parallel machines are already competitive with current supercomputers in some ar eas [Myczkowski and Steele, 1991, Sabot et al., 1991, Hord, 1990]. It is expected that, in the near future, the majority of high-end supercomputers will be massively parallel machines. This implies that the progress in microprocessors will face a slow down stage in the future, and that an ar chitecture for a higher level of parallelism will be intro duced. This increasing availability of high performance computers, from personal computers to massively par allel supercomputers, will allow experiments on simu lated massive parallelism to be made at numerous places. Thus, computational constraints should no longer im pose restrictions on how thought is modeled. Special purpose devices provide dramatic speed up for certain types of processing. For example, associa tive memory is a powerful device for many AI applica tions. It provides a compact and highly parallel pat tern matcher, which can be used for various applica tions such as memory-based reasoning, knowledge base search, and genetic algorithms [Stormon et al., 1992, Twardowski, 1990, Kitano et al., 1991c]. The benefit of associative memory is that it can attain very high parallelism in a single chip. Although only a few simple operations, such as bit pattern matching, can be accom plished, these are central operations for A l . There are also efforts to build high performance and large scale neural network device chips (Graf et al., 1988, Ae and Aibara, 1989, Mead, 1989]. Although current devices have not attained implementation of sufficiently large networks, which can cope with many real appli cations, there are some promising research results. For example, the use of wafer scale integration for neural networks may provide for scaling up to a reasonable size [Yasunaga et al., 1991]. 2.2. Architecture The history of massively parallel computers can be traced back to Illiac-IV [Bouknight et al., 1972], D A P [Bowler and Pawley, 1984], and M P P [Batcher, 1980]. Illiac-IV was the first SIMD supercomputer with 64 processors operating at a rate of 300 MIPS. MPP consists of 16,384 1-bit processors. D A P ranges from 1,024 to 4,096 processors. These early machines were built particularly for scientific supercomputing. However, the real turning point came with the devel opment of the connection machine [Hillis, 1985]. The connection machine was motivated by N E T L [Fahlman, 1979] and other work on semantic networks, and was originally designed with Al applications in mind. Hillis noted: This particular application, retrieving commonsense knowledge from a semantic network, was one of the primary motivations for the de sign of the Connection Machine[Hillis, 1985]. The CM-2 Connection Machine, which is the second commercial version of the Connection Machine, con sists of large numbers of 1-bit processors and float ing point processing units [TMC, 1989]. Various sys tems has been implemented, such as rule-based infer ence systems [Blelloch, 1986, Blelloch, 1986], a mas sively parallel assumption-based t r u t h maintenance sys tem [Dixon and de Kleer, 1988], a text retrieval system [Stanfillet al., 1989], stereo vision programs [Drumheller, 1986], a frame-based knowledge representation system [Evett et al.. 1990a], heuristic search programs [Evett et al., 1990b], parallel search techniques [Geller, 1991], a classifier system [Robertson, 1987], case-base retriev ers [Kolodner, 1988, Cook, 1991, Kettler et al., 1993], a motion control system [Atkeson and Reinkensymeyer, 1990], and others (See [Kitano et al., 1991a] for a partial list of systems developed). The newest version, CM-5 [ T M C , 1991], uses 32-bit SPARC chips interconnected via a tree structured network. Companies, such as MasPar Computer [Blank, 1990, Nickolls, 1990], Intel, Cray, I B M , NCR, Fujitsu, NEC and others, have started de velopment projects for massively parallel computers, and some of these are already on the market. The block diagram of I X M 2 associative memory processor Figure 3: IXM-2 Although most massively parallel machines developed so far employ distributed memory architectures, there have been some attempts to build a shared memory ma chine. KSR-1 [KSR, 1992] by Kendall Square Research is an attempt to overcome the difficulties in software de velopment which often arise on distributed memory ma chines. KSR-1 employs the ALLCACHE architecture, which is a virtual shared memory (VSM) model. In a VSM model, a physical architecture resembles that of a distributed memory architecture, but local memories are virtually shared by other processors, so that program mers do not have to worry about the physical location of data. Independently from these commercial machines, sev eral research machines have been developed at universi ties and research institutions. These projects include the J-Machine [Dally et al., 1989] at M I T , SNAP [Moldovan et al., 1990] at the University of Southern California (USC), and IXM-2 [Hieuchi et al., 1991] at ElectroTechnical Laboratory (ETL). I have been involved in some of these projects, as will be described below. The IXM-2 massively parallel associative memory processor was developed at ETL by Higuchi and his col leagues [Higuchi et a l , 1991], DCM-2 consists of 64 associative processors and 9 communication processors (figure 3). The associative processor is organized us ing a T800 transputer [inmos, 1987], 4K words associa tive memory, SRAM, and other circuitry. Due to an in tensive use of associative memory, IXM-2 exhibits 256K parallelism for bit pattern matching. IXM-2 is now be ing used for various research projects at E T L , Carnegie Mellon University, and the ATR Interpreting Telephony Research Laboratory. The application domains range through genetic algorithms, knowledge-base search, and example-based machine translation. Kitano 815 Figure 4: SNAP-1 The Semantic Network Array Processor (SNAP) is now being developed by Professor Dan Moldovan's team at the University of Southern California. A SNAP-1 prototype has been developed and has been operational since the summer of 1991. SNAP-1 consists of 32 clus ters interconnected via a modified hypercube (Figure 4). Five TMS320C30 processors form a cluster, which store up to 1,024 semantic network nodes. Processors within a cluster communicate through a multi-port memory. Several systems have been implemented on SNAP, such as the DmSNAP machine translation system [Kitano et al., 1991b], the PASS parallel speech processing system [Chung et al., 1992], and some classification algorithms [Kim and Moldovan, 1990]. These trends in hardware and architectures will en able massive computing power and memory space to be available for AI research. 2.3. Wafer-Scale I n t e g r a t i o n for M e m o r y - B a s e d Reasoning WSI-MBR is a wafer-scale integration (WSI) project de signed for memory-based reasoning. W S I is the stateof-the-art in V L S I fabrication technology, and has been applied to various domains such as neural networks [Yasunaga et al., 1991]. WSI fabricates one large VLSIbased system on a wafer as opposed to conventional V L S I production which fabricates over 100 chips from one wafer. The advantage of W S I is in its size (high integration level), performance, cost, and reliability: Size: W S I is compact because nearly all circuits neces sary for the system are fabricated on a single wafer. P e r f o r m a n c e : W S I has a substantial performance ad vantage because it minimizes wiring length. C o s t : W S I is cost effective because it minimizes the requirement for using expensive assembly lines. R e l i a b i l i t y : W S I is reliable because it eliminates the bonding process which is the major cause of circuit malfunctions. However, there is one big problem in W S I fabrication: defects. In conventional VLSI fabrication, one wafer consists of over 100 chips. Typically, there are certain percentages of defective chips. Traditionally, chips w i t h defects have been simply discarded and the chips with out defects have been used. To estimate the faults in 816 Awards a W S I chip, we can use the Seeds model[Seeds, 1967]: where Y is a yield of the wafer, D is the fault density which is, in the fabrication process being used, about 1 fault per cm2, and A is the chip area. This is a reasonable rate for the current fabrication process. However, even this level of fault would cause fatal prob lems for an attempt to build, for example, an entire I B M 370 on one wafer. Unless sophisticated defect-control mechanisms and robust circuits are used, a single defect could collapse an entire operation. But, redundant cir cuits diminish the benefits of the WSI. This trade-off has not been solved. M B R is ideal for WSI, and avoids the defect prob lem because it does not rely upon any single data unit. WSI-MBR is a digital/analog hybrid W S I specialized for memory-based reasoning. The digital/analog hybrid ap proach has been used in order to increase parallelism and improve performance. In the digital computing circuit, a floating point processor part takes up the most of chip area. On the other hand, the analog circuit requires only a fraction of the area for implementation of equivalent floating point operation circuits and drastic speed up can be attained. For detailed arguments on the relative ad vantages of analog and digital circuits, see [Kitano and Yasunaga, 1992]. Use of the less area-demanding analog approach provides two major advantages over the digi tal approach: (1) increased parallelism, and (2) speed up due to relaxed wiring constraints (critical paths and wire width). Expected performance with the current design is 70Tflops on a single 8 inch wafer. Using wafer stack and micro-pin technologies, peta-flops on a few hundred million record system would be technically feasible! 3. G r a n d Challenge A I A p p l i c a t i o n s From the engineering point of view, the ultimate suc cess in artificial intelligence can be measured by the eco nomic and social impact of the systems deployed in the real-world. Applications with significant economic and social impacts provide us with some grand challenge AI applications. Grand challenge AI applications are applications with significant social, economic, engineering, and scientific impacts. The term Grand Challenge was defined as a fundamental problem in science or engineer ing, with broad economic and scientific impact, whose solution will require the application of high-performance computing resources. in the U.S. High Performance Computing Act of 1991. Thus grand challenge AI applications are AI applica tions with significant social, scientific, and engineering impacts. Typical examples are Human Genome project [NRC, 1988] and speech-to-speech translation system projects. Two workshops have been organized to dis cuss grand challenge AI applications. In February 1992, NSF sponsored The Workshop on High Performance Computing and Communications for Grand Challenge Applications: Computer Natural Language and Speech Processing, and Artificial Intelligence. In October 1992, I organized The Workshop on Grand Challenge AI Applications in Tokyo — partic ipated in by a group of leading Japanese researchers[Kitano, 1992b]. Details on these workshops and some of the on-going grand challenge AI applications can be found in [Kitano et al., 1993]. Typical applications discussed include: a speechto-speech translation system, human genome analysis, Global Architecture for Information Access (GAIA) — a highly intelligent information access system, Shogi and Go systems which beat Meijin (Grand Master), intelli gent robots and programs which undertake daily tasks, and intelligent vehicles. There have also been discus sions as to wehat technologies are needed to support grand challenge AI applications. The conclusions from both workshops are surprisingly similar. The common thread was the need for massive computing power, mas sive memory storage, massive data resource, and ultra high band-width communication networks. For a grand challenge to succeed, our computer sys tems must be able to be scaled up. For example, natural language systems must be scaled up to the point where a dictionary of the system contains all common words and many domain-specific terms, and to where grammar rules cover most syntactic structures. For any serious machine translation system, this means that the dictio nary contains from half a million to a few million word entries, and the grammar rules amount to over 10,000. In addition, for systems to be scaled up, and to be deployed in the real world, they must be able to cope with a noisy environment and incomplete data resources. 4. The Bottlenecks This section discusses limitations of "the traditional AI approach." The traditional AI approach can be charac terized by several salient features: F o r m a l R e p r e s e n t a t i o n s : Most traditional models use rigid and formal knowledge representation schemes. Thus, all knowledge must be explicitly represented in order for the system to use that knowledge. There is no implicit knowledge in the system. R u l e d r i v e n i n f e r e n c i n g : Reasoning is generally driven by rules or principles, which are abstract and generalize knowledge on how to manipulate specific knowledge. S t r o n g M e t h o d s : Since the system depends on ex plicit knowledge and rules, domain theory must be understood in order to build any system based on the traditional approach. H a n d - C r a f t e d K n o w l e d g e Bases: Knowledge and rules have been hand-coded at extensive labor costs. In many cases, coding must be carried out by experts in the field. Figure 5: Approaches for Building Intelligent Systems In essence, these are derived from the physical symbol system hypothesis and the heuristic search. As Dave Waltz stated: For thirty years, virtually all AI paradigms were based on variants of what Herbert Simon and Allen Newell have presented as physical symbol systems and heuristic search hypothe ses. The fundamental assumption in the traditional approach is that experts know the necessary knowledge regard ing the problem domain, and that expert knowledge can be explicitly written using formal representations. Toy problems, such as the blocks world and the Tower of Hanoi, meet this condition. And thus, historically, many AI research efforts have been carried out on do mains such as the blocks world and the Tower of Hanoi. The intention of such work is that by proving the effec tiveness of the method using small and tractable tasks the method can be applied to real-world problems. To rephrase, most research has been carried out in such a way that researchers develop a highly intelligent system in a very restricted domain. Researchers hoped that such systems could be scaled up with larger funding and increased effort (Figure 5: Independently, Brooks and Kanade uses similar figures). However, experiences in expert systems, machine translation systems, and other knowledge-based systems indicate that scaling up is extremely difficult for many of the prototypes. For example, it is relatively easy to build a machine translation system, which translates a few very complex sentences. But, it would be far more difficult to build a machine translation system which cor rectly translates thousands of sentences of medium com plexity. Japanese companies have invested a decade of time and a serious amount of labor inputs to developing commercial machine translation systems, based on tra ditional models of language and intelligence. So far, a Kitano 817 report on high quality machine translation systems has not been published. There are several reasons why the traditional AI ap proach fails in the real-world. The basic assumption is that knowledge exists somewhere in the mind of experts, so that, if it can be written down in operational form — the expertise can be transferred to the system. How ever, this assumption does not stand in the real-world. The following three factors prevent the traditional AI approach from real-world deployment. I n c o m p l e t e n e s s : It is almost impossible to obtain a complete set of knowledge for a given problem do main. Experts themselves may not have the knowl edge, or the knowledge may be tacit so that it can not be expressed in a formal manner. Thus, a cer tain portion of the knowledge is always absent. I n c o r r e c t n e s s : There is no guarantee that expert knowledge is always correct and that encoding is perfect. A large knowledge base, which contains over 10,000 frames, must include a few errors. If there were a 0.5% error rate, there would be over 5,000 incorrect frames in a million frame knowledge base! I n c o n s i s t e n e t : The set of knowledge represented may be inconsistent, because (1) contextual factors to maintain consistency were ignored, or (2) expert knowledge is inconsistent. These realities in regard to real world data are dev astating to the traditional AI approach. Moreover, in addition to these problems, there are other problems, such as: H u m a n B i a s : The way knowledge is expressed is in evitably influenced by available domain theory. When the domain theory is false, the whole effort can fail. T r a c t a b i l i t y : The system will be increasingly in tractable, as it scales up, due to complex interaction between piece wise rules. E c o n o m i c s : When rules are extracted from experts, system development is a labor intensive task. For example, even if MCC's CYC [Lenat and Guha, 1989] system provides a certain level of robustness for knowledge-based systems, proliferation of such systems would be limited due to high development cost. Empirically, efforts to eradicate these problems have not been successful. In essence, AI theories for the realworld must assume data resources to be inconsistent, incomplete, and inaccurate. Lenat and Feigenbaum pointed out the scaling up problem for expert systems, and proposed the Breadth Hypothesis (BH) [Lenat and Feigenbaum, 1991] and the CYC project [Lenat and Guha, 1989]. While there is some t r u t h in the B H , whether or not the robustness can be attained by a pure symbolic approach is open to a question. A series of very large knowledge based 818 Awards systems projects, such as CYC [Lenat and Guha, 1989], knowledge-based machine translation ( K B M T ) [Good man and Nirenberg, 199l], corporate-wide CBR system [Kitano et al., 1992], and large-scale CBR systems [Kettler et al., 1993] will be important test-beds for this ap proach. For these systems to succeed, I believe that incorpo ration of mechanisms to handle messy real world data resources will be necessary. 5. C o m p u t i n g , M e m o r y , and M o d e l i n g One promising approach for building real world AI appli cations is to exploit the maximum use of massively par allel computing power and data resources. In essence, I argue that an approach emphasizing massive computing power, massive data resources, and sophisticated mod eling will play a central role in building grand challenge AI applications. C o m p u t i n g : The importance of computing power can be represented by the Deep-Thought Chess machine[Hsu, 1991, Hsu et al., 1990, Hsu, 1990]. Deep-Thought demonstrates the power of comput ing for playing chess. It was once believed that a strong heuristic approach was the way to build the grand master level chess machine. However, the history of chess machines indicates that com puting power and chess machine strength have al most direct co-relation (Figure 6: reproduced based on [Hsu et al., 1990]). Deep-Thought-II consists of 1,000 processors computing over a billion moves per second. It is expected to beat a grand mas ter. Deep-Thought exemplifies the significance of the massive computing. Similarly, real-time pro cessing, using a very large knowledge source re quires massive computing power [Evett et al., 1990a, Evett et al., 1990b, Geller, 1991]. M e m o r y : The need for memory can be argued from a series of successes achieved in memory-based reasoning. Starting from the initial success of MBRtalk[Stanfill and Waltz, 1988], memory-based reasoning has been applied to various domains, such as protein structure prediction [Zhang et al., 1988], machine translation [Sato and Nagao, 1990, Sumita and Iida, 1991, Furuse and Iida, 1992, Kitano and Higuchi, 1991a, Kitano. 1991], cen sus classificationfCreecy et al., 1992], parsing[Kitano and Higuchi, 1991b, Kitano et al., 1991b], and weather forecasting. PACE, a census classification system, attained 57% classification accuracy for oc cupation codes and 63% classification accuracy for industry codes [Creecy et a l , 1992]. The AICOS expert system attained only 37% for the occupation codes and 57% for industry codes. These successes can be attributed to the superiority of the approach which places memory as the basis for intelligence, rather than fragile hand-crafted rules. In a memorybased reasoning system, the quality of the solution depends upon the amount of data collected. Fig ure 7 shows the general relationship between the Figure 6: Progress of Chess Machines Figure 8: Modeling and Accuracy not mature enough to establish statistics systems for var ious phenomena. However, using statistical ideas, even in a primitive manner, can greatly help to understand the nature of many phenomena. Figure 7: Memory and Accuracy amount of data and solution quality. The success of memory-based reasoning demonstrates the signifi cance of a massive memory or data-stream. M o d e l i n g : The importance of modeling can be discussed using the SPHINX speech recognition system[Lee, 1988]. Using massive computing and a massive data-stream is not sufficient to build such artificial intelligence systems. The critical issue is how the application domain is modeled. Figure 8 shows the improvement in recognition rate, with modeling sophistication. Even if massively parallel machines and large data resources are used, if the modeling is not appropriate, only a poor result can be expected. SPHINX exemplifies the significance of modeling. There are several reasons that these factors are impor tant, the following analysis may help in understanding the effectiveness of the approach. D i s t r i b u t i o n Many of the phenomena that AI tries to deal with are artifacts determined by human beings, such as natural languages, society, and engineering systems. However, it is acknowledged that even these phenomena follow basic statistical principles, as applied in nature. Normal distribution (also called Gaussian distribution) and Poisson distribution are important distributions. In physics, quantum statistics (such as Bose-Einstein statis tics and Fermi-Dirac statistics) and a classical statistics (Maxwell-Boltzmann statistics) are used. A I , however, is Zipf's law, for example, describes distributions of types of events in various activities. In Zipf's law, the multiplication of the rank of an event (r) and the fre quency of the event (f) is kept constant (C = rf). For example, when C = 0.1, an event of rank 1 should have a 10.0% share of all events, a rank 2 event should oc cupy 5.0%, and so on (Figure 9). The sum of the top 10 events occupies only 29.3%. Despite the fact that the occurance probability for an individual event, greater than the 11-th rank, has only a fraction of a percent (e.g. 0.5% for the 11-th rank), the sum of these events amounts to 70%. This law can be observed in various aspects of the world. AI models using heuristics rules will be able to cover events of high frequency relatively easily, but extending the coverage to capture irregular and less frequent events will be major disturbing fac tors. However, the sum of these less frequent events will occupy a significant proportion of the time, so that the system must handle these events. In fact, about 60% to 70% of grammar rules in typical commercial machine translation systems are written for specific linguistic phe nomena, whose frequencies are very low. This empir ical data confirms Zipfs law in the natural language. Memory-based reasoning is a straightforward method to cover both high frequency and low frequency events. In fact, the relationship between data size and accuracy of a M B R system and the accumulated frequency in Zipf's law is surprisingly similar (Figure 10). T h e C e n t r a l L i m i t T h e o r e m Inaccuracy, or noise, inherent in large data resources can be mitigated using the very nature of these large data resources. Assuming that inaccuracy on data is distributed following a certain distribution (not necessary Gaussian), the central limit theorem indicates that the distribution will be Gaussian over large data sets. A simple analysis illustrates the central limit theorem. Assuming the linear model, y = where y is an observed value, is a true value, and is noise, the central limit theorem indicates that the E will follow a normal distribution, regardless of the original Kitano 819 F i g u r e 12: O v e r l a p p i n g Noisy D a t a F i g u r e 9: Z i p f ' s L a w F i g u r e 13: Accracy D e g r a d a t i o n a n d Precision F i g u r e 10: Z i p f ' s L a w a n d M B R d i s t r i b u t i o n w h i c h produces W h e n t h e expected value of is 0, t h e observer w i l l get t h e t r u e value. I n fact, a d d i n g various types o f noise t o m e m o r y based reasoning systems does not cause m a j o r accuracy d e g r a d a t i o n . F i g u r e 11 shows accuracy d e g r a d a t i o n for M B R t a l k ( a M B R version o f N E T t a l k [Sejnowski a n d Rosenberg, 1987]), w h e n noise is added to t h e weights. T h e noise follows a u n i f o r m d i s t r i b u t i o n . N% noise means t h a t a weight can be r a n d o m l y changed in the range, where W is a weight value. T h e degra d a t i o n r a t e is m u c h smaller, w h e n larger d a t a sets are used. F i g u r e 1 1 : Accracy D e g r a d a t i o n b y Noisy D a t a 820 Awards T h e L a w of Large N u m b e r s A c c o r d i n g t o t h e law of large n u m b e r s , the peak in a d a t a d i s t r i b u t i o n w i l l be narrower w i t h a large number of d a t a sets. Assume t h a t there are t w o close t r u e values. D a t a d i s t r i b u t i o n s m a y overlap due to noise. However, t h e law of large numbers indicates t h a t , by collecting larger d a t a sets, overlap can b e m i t i g a t e d ( F i g u r e 12). ( I have not yet c o n f i r m e d w h e t h e r these effects can be observed in memory-based reasoning. However, the r e l a t i o n s h i p between accuracy a n d d a t a size implies t h a t t h e law is in effect ( F i g u r e 13).) Nonlinear Boundary A vast m a j o r i t y o f real w o r l d phenomena are non-linear. A n y a t t e m p t t o a p p r o x i m a t e non-linear boundaries, using Unear or discrete m e t h o d s , w i l l create c e r t a i n levels of inaccuracy. A simple i l l u s t r a t i o n is s h o w n in F i g u r e 14. T h e real b o u n d a r y is s h o w n by t h e (non-linear) solid line. A v a i l a b l e d a t a p o i n t s are s h o w n by O and X. A s s u m i n g t h a t a categorization has been m a d e on b o t h the Y a n d X axis as A1, A 2 , A3 a n d B1, B 2 , B3, the region can be expressed as (A2 and (B2 or B 3 ) ) . However, there are areas w h i c h do n o t fit the rectangular area described by this symbolic represen t a t i o n . T h e proponents o f t h e symbolic approach m a y argue t h a t t h e p r o b l e m can be c i r c u m v e n t e d by using fine-grained s y m b o l systems. However, there is always an area where s y m b o l systems a n d t h e real b o u n d a r y d o n o t f i t . U s i n g a n i n f i n i t e n u m b e r o f symbols w o u l d e l i m i n a t e t h e error — b u t t h i s w o u l d no longer be a s y m b o l system! Therefore, for the non-linear b o u n d a r y p r o b l e m s , s y m b o l systems are i n h e r e n t l y i n c o m p l e t e . T h e use of a large n u m b e r of d a t a p o i n t s a n d statis t i c a l c o m p u t i n g m e t h o d s , however, can b e t t e r m i t i g a t e t h e error w i t h s i g n i f i c a n t l y lower h u m a n effort. T h i s o b s e r v a t i o n has t w o m a j o r i m p l i c a t i o n s : F i r s t , i t implies manipulation of the world model and a procedural model for sentence analysis. The world models, grammar, and inputs were assumed to be complete. These systems are typical examples of the traditional AI approach. Figure 14: Nonlinear boundary of the problems that a massively parallel memory-based approach may be able to attain a certain level of competence. This will be discussed in the next section. Second, it has major im plications as to how to build very large knowledge-based systems. My recommendation is to use redundancy and population encoding even when the system is based on the traditional AI approach. 6. Speech-to-Speech Translation The idea discussed above is also effective in speech-tospeech translation systems. Speech-to-speech transla tion is one of the major topics of the grand challenge AI applications, and its success will have major social and scientific impacts. Speech-to-speech translation systems are generally composed of a speech recognition module, a translation module, and a speech synthesis module. The first group of speech-to-speech translation systems appeared in the late 1980s. These include Speech Trans [Saito and Tomita, 1988] and Dm-Dialog [Kitano, 1990a] developed at Carnegie Mellon University, and SL-Trans [Morimoto et al., 1989] developed at the A T R Interpret ing Telephony Research Laboratories. These systems were immediately followed by a second group of systems, which included JANUS [Waibel et al., 1991] at C M U , and ASURA [Kikui et al., 1993] at ATR. This section discusses limitations with the traditional AI approach for natural language processing and how massively par allel artificial intelligence can mitigate these problems. 6.1. T r a d i t i o n a l V i e w o f N a t u r a l Language The traditional approach to natural language processing has been to rely on extensive rule application. The ba sic direction for the approach is to build up an internal representation, such as a parse-tree or case-frame, using a set of rules and principles. In essence, it follows the traditional approach to artificial intelligence. In the early 1970s, there were efforts to develop natural language systems for small and closed domains. Woods developed L U N A R , which could answer questions about moon rocks[Woods et al., 1972]. In the L U N A R sys tem, an A T N represented possible sequences of syntac tic categories in the form of a transition network[Woods, 1970]. Winograd developed the famous SHRDLU program, which involved simple questions about the blocks world [Winograd, 1972]. These efforts heavily involved In the mid 70s, Schank proposed Conceptual Depen dency theory [Schank, 1975J and the conceptual informa tion processing paradigm. His claim was that there are sets of primitives (such as ATRANS, PTRANS, and PROPEL) which represent individual actions, and that language understanding is a heavily semantic-driven process, where syntax plays a very limited role. Schank called his approach a cognitive simulation. From the late 1970s to the early 1980s, there was a burst of ideas regarding conceptual information processing, mostly advocated by Schank's research group at Yale. These ideas are well documented in Inside Computer UnderstandinglSchank, 1981] and in Dynamic Memory [Schank, 1982]. Although the emphasis changed from syntax to semantics, the ba sic framework followed the traditional AI approach — the systems represented knowledge using primitives and used heuristics to derive a CD representation from an in put sentence. Again, this requires that the world model and the knowledge must be complete and hand-crafted. In the mid 80s, the syntactic approach gained renewed interest. It was motivated by unification-based gram mar formalisms, such as Lexical-Functional Grammar (LFG:[Kaplan and Bresnan, 1982], Generalized Phrase Structure Grammar (GPSG:[Gazdar et al., 1985]), and Head-driven Phrase Structure Grammar (HPSG:[Pol lard and Sag, 1987]). Introduction of a powerful in formation manipulation operation, such as unification and well-formalized theories, resulted in some success in developing experimental systems. Once again, how ever, these approaches were within the traditional AI approach. The frustrating fact is that the major empha sis of these theories was to determine whether or not a given sentence was grammatical. This is a bit of exag gerating, but the point is that modern linguistic theories ignore many phenomena important for building natural language systems. For example, linguistic theories typ ically do not explain how people understand ungrammatical sentences. Most linguistic theories merely put * in front of the ungrammatical sentence. Many lin guistically interesting sentences, such as extremely deep center-embedded sentences, almost never appear in real ity. In addition, these theories do not entail a theory on how to disambiguate interpretations, which is a major problem in natural language processing. Therefore, although there have been progress and changes in emphasis, these approaches are all within the traditional approach towards artificial intelligence. As I have argued in the previous section, systems based on the traditional approach inevitably face a number of problems. For example, developers of commercial ma chine translation consider that the following problems are inherent in their approach, which is grounded in tra ditional AI and NLP: P e r f o r m a n c e : Performance of most existing machine translation systems is not sufficient for real-time tasks such as speech-to-speech translation. It takes Kitano 821 a few seconds to a few minutes to translate one sen tence. S c a l a b i l i t y : Current machine translation systems are difficult to scale up because their processing com plexity makes the systems' behavior almost in tractable. Q u a l i t y : Intractability of a system's behavior combined with other factors lowers the quality of translations. G r a m m a r W r i t i n g : By the same token, grammar writing is very difficult since a complex sentence has to be described by piecewise rules. It is a hard and time consuming task, partly due to the intractabil ity of the system's behavior, when these rules are added into the whole system. Notice that the capability to cope w i t h ungrammatical sentences is not even listed, because such a feature is not listed in the initial specification of the system. Ob viously, I do not intend to claim that massively parallel artificial intelligence will immediately open an avenue for high performance and robust natural language sys tems. The accomplishment of such specifications seems to be far in the future. However, I do argue that reliance on models which assume complete knowledge will never accomplish the goal! 6.2. Memory-Based Model of Natural Language The alternative to the traditional approach to natural language processing is a memory-based model. The memory-based approach to intelligence has been ex plicitly discussed since the early 80s. In 1981, Nagao proposed a model of translation based on anal ogy at a N A T O conference (later published in [Nagao, 1984]). This model is the precursor for recent research on memory-based and example-based models of trans lation. Nagao argued that humans translate sentences, by using similar past examples of translation. In 1985, Stanfill and Waltz proposed a memory-based reasoning paradigm[Stanfill and Waltz, 1988, Stanfill and Waltz, 1986]. The basic idea of memory-based reasoning places memory at the foundation of intelligence. It assumes that large numbers of specific events are stored in mem ory, and response to new events is handled by first recall ing past events which are similar to the new input, and invoking actions associated w i t h these retrieved events to handle the new input. In the same year, Riesbech and Martin proposed the direct memory access parsing model[Rie8beck and Martin, 1986], They argued that the Build-and-Store approach toward parsing is incor rect, and proposed the Recognize-and-Record approach. The main thrust of the D M A P was to view language un derstanding as retrieval of episodic memory. D M A P can be viewed as an application of case-based reasoning to parsing. Despite certain differences among these models, the common thread is to view memory as the foundation of intelligence. This idea runs counter to most AI approaches which place rules or heuristics as the central thrust of reasoning. At the same time, differences be 822 Awards tween models later became very important. For example, D M A P , in essence, uses complex indexing and heuristics which follows the tradition of A I . Thus, the weaknesses of traditional AI were also exposed when scaling up the D M A P system. Massively parallel memory-based natural language processing directly inherits these ideas. For example, in the memory-based parsing model, parsing is viewed as a memory-search process which locates the past oc currence of similar sentences. The interpretation is built by activating past occurrences. Rationales for memorybased natural language processing include: V e r y L a r g e F i n i t e Space: The Very Large Finite Space (VLFS) concept is critical to the memorybased approach. The most frequently raised concern with memory-based approaches to natural language processing is how these models account for the pro ductivity of language — the ability to generate an infinite set of sentences from a finite grammar. Op ponents to the memory-based approach would claim that due to the productivity of language, this ap proach cannot cover the space of all possible sen tences. However, the productivity of language is incorrect. The productivity of language ignores re source boundedness. It should be noted that only a finite number of sentences can be produced when the following conditions stand: C o n d i t i o n 1: Finite Vocabulary C o n d i t i o n 2: Finite Grammar Rules C o n d i t i o n 3: Finite Sentence Length Conditions 1 and 2 stand, since people only have fi nite vocabulary and grammar at a given time. Ob viously, Condition 3 stands, as there is no infinite length sentence. For example, the longest sentence in one recent CNN Prime News report was 48 words long. 99.7% of sentences were within 25 words of length. Logically, natural language is a set of sen tences within a very large finite space (VLFS). S i m i l a r i t y : In memory-based natural language process ing, input sentences are matched against all relevant data to find similar sentences in the data-base. The assumption is that similar problems have a similar solution. In fact, a series of experimental data on example-based machine translation indicate rea sonable accuracy can be maintained for a relatively broad space. Figure 15 illustrates the relation ship between accuracy and normalized distance in E B M T . (This figure is reproduced based on [Sumita and Iida, 1992].) Therefore, a memory-based ap proach may be able to cover a real solution space w i t h implementable numbers of examples. C o l l e c t i v e D e c i s i o n : The memory-based approach can take advantage of the redundancy of informa tion implicit in a large data resource. An advantages of this collective decision making is the increased ro bustness against the inaccuracy of individual data. As has been argued previously, the central limit theorem and the law of large numbers ensure the robustness of the system. w h i c h accepts speaker-independent, continuous speech. It has been p u b l i c l y demonstrated since M a r c h 1989. T h e first version o f D M D I A L O G had a vocabulary o f only 70 words. It was extended to 300 words in the sec o n d version. M a j o r characteristics of D M D l A L O G are: M a s s i v e l y P a r a l l e l C o m p u t i n g M o d e l : Knowledge representation a n d algorithms are designed to ex p l o i t the m a x i m u m level of parallelism. F i g u r e 15: N o r m a l i z e d Distance a n d A c c u r a c y F i g u r e 16: S o l u t i o n Spaces R e a l S p a c e C o v e r i n g : A s s u m i n g t h a t there i s a set of g r a m m a r rules w h i c h covers a c e r t a i n p o r t i o n of n a t u r a l language sentences, t h e g r a m m a r n o t o n l y covers sentences w h i c h a c t u a l l y appear in the real w o r l d , b u t it also covers sentences w h i c h are g r a m m a t i c a l , b u t also never p r o d u c e d i n reality. A n em p i r i c a l s t u d y revealed t h a t only 0.53% of possible sentences considered to be g r a m m a t i c a l are actually p r o d u c e d . F i g u r e 16 shows how s o l u t i o n spaces overlap. T h e memory-based approach never overgenerates because the m e m o r y contains only exam ples of a c t u a l sentences used in t h e past. I n a d d i t i o n t o these rationales, i t s h o u l d b e n o t e d t h a t t h e memory-based approach keeps examples, whereas n e u r a l networks and s t a t i s t i c a l approaches do n o t . T h i s difference is i m p o r t a n t because the memory-based ap proach w i l l be able to c a p t u r e singular events a n d higher order nonlinearities, w h i l e n e u r a l networks and s t a t i s t i cal approaches often f a i l to c a p t u r e these. For neural n e t w o r k s a n d s t a t i s t i c a l m e t h o d s , the a b i l i t y t o capture singular a n d nonlinear c u r v a t u r e is d e t e r m i n e d by their n e t w o r k s t r u c t u r e o r t h e m o d e l . I n memory-based rea soning, t h e r e are chances t h a t dense lower order inter p o l a t i o n m a y a p p r o x i m a t e higher order nonlinear curva ture. 6.3. Systems Since 1988, I have been b u i l d i n g several massively par allel n a t u r a l language systems. D M D l A L O G , o r D M DlALOG for s h o r t , is a representative system resulting f r o m t h e early w o r k . DMDlALOG is a speech-to-speech d i a l o g t r a n s l a t i o n system between Japanese a n d E n g l i s h M e m o r y - B a s e d A p p r o a c h : The knowledge-base consists of a large collection of translation exam ples and templates. Parsing is viewed as a recall of past sentences in the memory network. P a r a l l e l C o n s t r a i n t M a r k e r - P a s s i n g : The basic c o m p u t a t i o n a l mechanism is marker-passing in w h i c h markers carry various i n f o r m a t i o n among nodes in the memory network [Charniak, 1983, C h a r n i a k , 1986]. T h i s is a useful mechanism w h i c h has been studied in various applications [Hendler, 1989a, Hendler, 1989b, Hendler, 1988, H i r s t , 1986, N o r v i g , 1989]. Markers are not bit markers, b u t are s t r u c t u r e d markers w h i c h carry d a t a structures. I n a d d i t i o n , propagation paths for each t y p e of marker are restricted by the types and orders of links to be traversed. I n t e g r a t i o n o f S p e e c h a n d N L P : T h e architecture enables i n t e r a c t i o n between speech processing and n a t u r a l language processing. T h e b o t t o m up pro cess provides the likelihood t h a t a certain inter p r e t a t i o n could be correct. T h e t o p - d o w n process ing imposes constraints and a priori p r o b a b i l i t y of phoneme and w o r d hypotheses. T h e first version o f D M D I A L O G was strongly influ enced by the idea of D M A P , b u t substantial extensions has been made. It was a memory-based approach to nat u r a l language processing since the parsing process is to recall past sentences in the memory network. However, the focus of the system was on accomplishing speechto-speech translation. Thus, the system design has been rather conservative by today's standards in the memorybased translation community. I n fact, D M D I A L O G used a complicated indexing mechanism for the m e m o r y n e t w o r k , and markers carried feature structures when markers were propagating to a part of the network used for abstract memory. It used three abstraction levels — specific cases, generalized cases, and unification g r a m mar (Figure 17). Therefore, D M D I A L O G can be con sidered as a m i x t u r e of the t r a d i t i o n a l approach and a memory-based approach. Despite t h e fact t h a t the system was n o t a f u l l i m p l e m e n t a t i o n of memory-based n a t u r a l language processing p a r a d i g m , D M D I A L O G demonstrated various interest i n g and promising characteristics such as a h i g h level of parallelism, simultaneous i n t e r p r e t a t i o n capability, and robustness against inconsistent knowledge. One of the salient features of the model is the i n t e g r a t i o n of speech processing and linguistic processing. I n D M D I A L O G , a c t i v a t i o n o f the network starts b o t t o m u p triggered b y external inputs. T h e activation propagates u p w a r d i n the network to the w o r d , sentence, and discourse levels. At each level, predictions have been made w h i c h are rep- Kitano 823 memory network modification to create discourse enti ties. This follows an approach taken in the case-based reasoning community [Hammond, 1986, Kolodner, 1984, Riesbeck and Schank, 1989]. In a sense, D M S N A P is much closer to the original D M A P idea, rather than to the memory-based reasoning approach. Figure 17: Translation Paths Figure 18: Bottom-Up Activation and Top-Down Prediction resented by downward propagation of prediction markers with a priori probability measures. There are multiple levels of loops in the system (Figure 18). However, the performance of the system on a serial machine was far from satisfactory. The parallelism was simulated on software in the first version. In addition, increasing emphasis on robustness against ill-formed in puts and the inconsistency of the knowledge-base fur ther degraded the total performance. Fortunately, the high level of parallelism in DMDlALOG enabled reimplementing the system on various massively parallel computers. Massively parallel implementations of versions of DMDlALOG started in the fall of 1989. Two different approaches have been examined. One approach was to emphasize the D M A P approach using complex indexing. D M S N A P was implemented on the Semantic Network Array Processor (SNAP) to examine this approach. One other approach was to focus on memory-based parsing using simpler indexing and a large number of examples. A S T R A L was implemented on IXM-2 based on this ap proach. Implementations of $ D M D I A L 0 G on SNAP began in October 1989 when I started a joint research project w i t h Dan Moldovan's group at the University of Southern Cal ifornia. At that time, Moldovan's group had been de signing SNAP-1. I worked together w i t h Moldovan's to implement D M S N A P , a SNAP version of D M D I A L O G . The first version was completed by the end of 1990. D M S N A P emphasized complex indexing and dynamic 824 Awards An independent research program to implement a ver sion of $ D M D I A L 0 G on the I X M 2 massively parallel as sociative memory processor began in March 1990 with Tetsuya Higuchi at ElectroTechnical Laboratory ( E T L ) . I X M 2 is an associative memory processor designed and developed by Higuchi and his colleagues at E T L . It is a faithful hardware implementation of N E T L [Fahlman, 1979], but using associative memory chips. A S T R A L , the I X M 2 version of DMDlALOG was completed in the summer of 1990. In A S T R A L , complex indexing was eliminated, and a large set of sentence templates were used as a central source of knowledge[Kitano and Higuchi, 1991b]. These differences in emphasis have been employed to make maximum use of architectural strengths for each kind of massively parallel computer. Comparisons between different implementations of D M D I A L O G revealed contributing factors and bottlenecks for robust speech-to-speech translation systems. D M S N A P faced a scaling problem due to reliance on complex indexing, and a performance bottleneck due to node instantiation which inevitably involves an array controller — a serial process. In addition, it is difficult to maintain multiple contexts before an interpretation is uniquely determined. On the other hand, A S T R A L exhibited desirable scal ing properties and single millisecond order parsing per formance. The D M S N A P project has been re-directed to place more emphasis on the memory-based approach without complex indexing and node instantiation. While I have been working on massively parallel imple mentations and performance aspects of memory-based approach to natural language, a series of reports has been made on the quality of the translation attained by memory-based models. Sato-san developed MBT-l[Sato, 1992] for word selection in Japanese-English translation, and attained an accuracy of 85.9%. He extended the idea to the transfer phase of translation in his M B T II [Sato and Nagao, 1990]. A group at the A T R I n terpreting Telephony Research Laboratory developed Example-Based Machine Translation (EBMT:[Sumita and Iida, 1991]) and Transfer-Driven Machine Transla tion (TDMT:[Furuse and Iida, 1992]). E B M T translates Japanese phrase of type ' 'A no B" with an accuracy of 89%. T D M T applied the memory-based translation model to translate whole sentence, and attained an accu racy of 88.9%. Since the architecture of T D M T is almost identical to the baseline architecture of DMDlALOG, the high translation accuracy of T D M T and the high per formance of massively parallel versions of $ D M D I A L 0 G indicates a promising future for the approach. Since 1992, a new joint research project has begun to implement E B M T and T D M T on various massively parallel computers. E B M T was already implemented on the I X M 2 , CM-2, and SNAP-1 machines [Sumita et al., means t h a t even if the memory-base contains d a t a to cover the entire solution space, certain errors (є) s t i l l re m a i n . A h y b r i d approach has been proposed and tested in [Zhang et al., 1992] to overcome the problem of b i ased i n t e r n a l representations. Their experiment shows c e r t a i n improvements can be gained by using m u l t i p l e strategies. Figure 19: Asymptotic Convergence 1993]. E a r l y results shows t h a t the E B M T approach f i t s w e l l w i t h massively parallel computers a n d scales well. I m p l e m e n t a t i o n o f T D M T i s i n progress. Independently, Sato has i m p l e m e n t e d M B T - I I I [Sato, 1993] on C M - 5 a n d n C U B E machines. 6.4. Limitations and Solutions A l t h o u g h I s t r o n g l y believe t h a t t h e memory-based approach w i l l be a p o w e r f u l approach in m a n y a p p l i c a t i o n d o m a i n s , there are l i m i t a t i o n s to this approach. F i r s t , there is t h e p r o b l e m of d a t a collection. Here, Z i p f s l a w w h i c h was a s t r o n g s u p p o r t i n g rationale plays a killer role. A l t h o u g h most problems can be defined as h a v i n g V L F S as their s o l u t i o n space, it is economi cally a n d p r a c t i c a l l y infeasible to collect a l l solutions in a memory-base. T h u s , a c e r t a i n p a r t of t h e s o l u t i o n space m u s t be left o u t . If solutions, w h i c h are n o t covered, can be d e r i v e d f r o m similar examples in t h e memory-base, t h e entire s o l u t i o n space can be covered. However, it is most likely t h a t rare examples are t r u l y irregular so t h a t any s i m i l a r examples cannot derive correct s o l u t i o n . For t h i s t y p e of problems, t h e o n l y s o l u t i o n at this m o m e n t is to keep a d d i n g rare events to t h e memory-base. I t s h o u l d b e n o t e d , however, t h a t this p r o b l e m i s n o t u n i q u e to a memory-based a p p r o a c h . In rule-based sys tems, rules w h i c h are specific to rare events m u s t keep being added to a rule-base. Second, the memory-based approach is n o t free f r o m t h e h u m a n bias p r o b l e m . Just like the t r a d i t i o n a l A I approach, t h e representations o f example and m o d eling are created by a system designer. For exam ple, c u r r e n t memory-based t r a n s l a t i o n systems use a h a n d - c r a f t e d thesaurus to d e t e r m i n e a distance between words. A series of experiments i n d i c a t e d t h a t inaccu racy of thesaurus-based distance calculations is t h e m a j o r source o f t r a n s l a t i o n errors. M e t h o d s t o refine do m a i n theories need to be i n c o r p o r a t e d [Shavlik and Towell, 1989, Towell et a l . , 1990]. If i n a p p r o p r i a t e represen t a t i o n a n d m o d e l i n g have been a d o p t e d , the system w i l l e x h i b i t poor p e r f o r m a n c e . E v e n i f a n a p p r o p r i a t e representation a n d m o d e l i n g is used, it is i m p l a u s i b l e t h a t t h e representation a n d m o d e l w o u l d be complete so t h a t 100% accuracy c o u l d be a t t a i n e d . Some bias in representation a n d m o d e l i n g w o u l d b e i n e v i t a b l e . W h e n there is some bias, in F i g u r e 19 w i l l n o t be zero. T h i s T h i r d , pure memory-based approach models f o r m only a p a r t of the cognitive process. A l t h o u g h the memorybased process may plays an i m p o r t a n t roles in intelligent activities, use of rules in the h u m a n cognitive process cannot be ignored. I believe t h a t there is a place for rules. However, it is likely to be a very different role f r o m w h a t has been proposed in the past. A rule should be created as the result of autonomous generalization over a large set of examples - it is not given a p r i o r i . Resource bounds are the m a j o r reason for using rules in this c o n t e x t . There must be a cost for Btoring in m e m ory. T h u s , a trade-off exists between memory and rules. Pinker [Pinker, 1991] made an interesting observation on the relationship between memory and rules for E n glish verbs. In English, the 13 most frequent verbs are irregular verbs: be, have, do, say, make, go, take, come, see, get, know, give, find. M o s t other verbs are regu lar. L o w frequency irregular verbs are reshaped, and become regular over the centuries. Pinker proposed the rule-associative-memory theory, w h i c h considers t h a t ir regular verbs are stored in the memory and t h a t regular verbs are handled by rules. In this model, generalization takes place over a set of regular verbs. F i n a l l y , f r o m the engineering v i e w p o i n t , explicit rules given a priori work as a m o n i t o r . For example, h u m a n translators use rules w h i c h define how specific terms should be translated. Even for h u m a n experts, such rules are given in an explicit f o r m . A memory-based m o d e l should be able to incorporate such reality. T h u s , I recently proposed a model of memory-based machine t r a n s l a t i o n which combines a memory-based approach and a rule-based approach in a novel fashion [ K i t a n o , 1993]. 7. Grand Breakthrough Massively parallel artificial intelligence can be an ideal p l a t f o r m for the scientific and engineering research for next generation c o m p u t i n g systems and models of t h o u g h t . A l t h o u g h discussions have been focused on the memory-based approach, even this approach i n volves several trails in t r a d i t i o n a l A I , w h i c h are po t e n t i a l l y undesirable as a model of t h o u g h t . Massively parallel memory-based reasoning and massively parallel V L K B search use explicit symbols and do not involve strong learning and a d a p t a t i o n mechanisms by t h e m selves. T h u s , they are inherently biased by the features a n d representation schemes defined by the system designer. A l t h o u g h the problems of representation, d o m a i n knowledge, and knowledge-base b u i l d i n g can be greatly m i t i g a t e d by memory-based p a r a d i g m , they are not t o t a l l y free f r o m the p r o b l e m . Therefore, while there are m a n y places where the present f o r m of massively p a r a l lel AI can be useful, we must go beyond this for a next Kitano 825 generation paradigm to come into play. At least, it will be a significant scientific challenge. I argue that mas sively parallel artificial intelligence itself will provide an ideal experimental basis for the genesis of new genera tion technologies. In this section, therefore, I focus on scientific aspects, rather than engineering. I believe that the following properties will characterizes the realistic theory for intelligence: E m e r g e n c e : Intelligence is an emergent property. It emerges from as a result of interactions between sub components. Emergence can take place in various levels such as emergence of intelligence, emergence of structure, and emergence of rules for structure development. In this section, the term emergence is used to indicate emergence of intelligence. E v o l u t i o n : Intelligence is one of the by-products of evo lution. Intelligence is the memory of matters since the birth of this universe. Thus, a true theory of in telligence must be justified from an evolutional con text. Symbiosis: Intelligence emerges as a result of symbiotic computing. The essence of intelligence is diversity and heterogeneity. Symbiosis of diverse and heterogeneous components is the key to the intelligence. D i v e r s i t y : Diversity is the essence of intelligence. No meaningful artifacts can be created without sub stantial diversity, the reality of our brain, body, and the eco-system demonstrates the significance of the diversity. M o t i v a t i o n : Intelligence is driven by motivation. Ob viously, learning is a key factor in intelligence. For learning theories to be legitimate from evolutional and psychological contexts, they must involve theories on the motivation for learning. Essentially, learning and other intelligent behavior are driven to the direction which maximizes survivability of the gene. P h y s i c s : Intelligence is governed by physics. Ulti mately, intelligent system must be grounded on physics. Nano-technology and device technology provide direct grounding, and digital physics may be able to ground the intelligence on hypothetical physics. 7.1. Emergence Intelligence is an emergent property. For example, nat ural language emerged from ethological and biological interaction under a given environment. Our linguistic capability is an emergent property based on our brain structure, and evolved under selectional pressure where individuals with better communication capacity survived better. Brooks proposed the subsumption architecture as an alternative to a Sense-Model-Plan-Action (SMPA) archi tecture [Brooks, 1986]. The subsumption architecture is a layered architecture in which each layer is defined in a behavior-based manner. The network consists of simple Augmented Finite State Machines (AFSM), and 826 Awards has no central control mechanism. A series of mobile robots based on the subsumption architecture demon strated that a certain level of behaviors which would appear to be intelligence can emerge. Currently, the structure of the system is given a priori in the subsumption architecture. Human bias and the problems of the physical symbol systems take place again because definitions for the units of behavior and internal circuitry for each layer must be defined by a designer. In addition, manual designing of the system which contains large numbers of layers for higher levels of intelligence is be extremely difficult. The research front now has to go into the next level of emergence — the emergence of structure. Structure could emerge using the principles of self-organization and evolution [Jantsch, 1980, Nicolis and Prigogine, 1977]. Self-organization is a vague notion which can be in terpreted in many ways. However, since the a priori definition for a structure is infeasible, self-organization of a system's structure through dynamic interaction with an environment needs to take place. D A R W I N m developed by Edelman's group [Edelman, 1987, Edelman, 1989] shows some interesting properties for a self-organizing network without explicit teaching signals. However, D A R W I N - I I I is limited to synaptic weight modifications, and has no structure emergence property. Independently, recent studies by Hasida and his col leagues propose the emergence of information flow as a result of dynamic interaction between the environ ment and constraints imposed orthogonally [Hasida et al., 1993]. Although this work is still at a preliminary stage, it has an interesting idea that the system has no defined functional unit. Functionality emerges through dynamic interaction with the environment. The concept of a Holonic computer was proposed by Shimizu [Shimizu, et al., 1988]. A holonic computer consists of devices which have non-linear oscillation ca pability (van del Pol oscillator), which are claimed to be self-organizing. Shimizu emphasizes an importance of Holonic Loop, which is a dynamic interaction of ele ments and top-down constraints emerged from bottom up interactions. Figure 20 is a conceptual image of next generation emergent computers (Needs for the lim bic system and the endogenous system will be discussed later). Although these models have not captured the emer gence of structure by themselves, combining these ideas with evolutional computing may enable the emergence of structures and the emergence of structure generating mechanisms. 7.2. Evolution Intelligence is a by-product of evolution. Species with various forms of intelligence find their niches for survival. Different species could have different forms of intelligence as they have been evolved under different environment, and hence different selectional pressures. If the environ ment for human beings had been different from what has Figure 2 1 : Development using L-system are k n o w n to have the same origin, and to have differ entiated in the course of evolution. Projections f r o m the l i m b i c system, w h i c h is an old part of the b r a i n , con trols awake/sleep r h y t h m s and other basic moods of the b r a i n . These systems play a central role in b u i l d i n g a m o t i v a t i o n for the system. F i g u r e 20: puters? A n A r c h i t e c t u r e for N e x t Generation C o m been i m p o s e d as t h e selectional pressure for m a n k i n d , t h e f o r m o f h u m a n intelligence c o u l d have been very dif ferent. D o l p h i n s , for example, are considered to have a c o m p a r a b l e numbers of neurons, however, h u m a n be ings and d o l p h i n s have evolved to have different kinds of brains. Physiological differences o f t h e b r a i n i n e v i t a b l y affects t h e f o r m of intelligence. T h e s t r u c t u r e of our b r a i n is also affected by our evo l u t i o n a l p a t h . Obviously, b r a i n is n o t a tabula rasa, n o t o n l y a g l o b a l s t r u c t u r e such as t h e existence of a b r a i n stem, neo-cortex, and other b r a i n components, b u t also w i t h n u m b e r s o f local circuits w h i c h are genetically de fined such as the h y p e r c o l u m n , Papez c i r c u i t s , and h i p p o c a m p a l loops. I n v e s t i g a t i o n of these existing struc tures of b r a i n a n d how t h e b r a i n has evolved m a y p r o v i d e s u b s t a n t i a l insights. C r e a t i n g s t r u c t u r a l systems using e v o l u t i o n a l c o m p u t i n g , such as genetic a l g o r i t h m s , generally requires use of a d e v e l o p m e n t a l phase. W h i l e t h e biological the o r y for development has n o t been established, there are some useful m a t h e m a t i c a l tools to describe develop m e n t — t h e L y n d e n m a y e r system [Lindenmayer, 1968, L i n d e n m a y e r , 1971] a n d the C e l l u l a r a u t o m a t o n . For ex ample, t h e L-system can be augmented to handle g r a p h r e w r i t i n g so t h a t a w i d e range of development processes can be described ( F i g u r e 21). W i l s o n proposed using the L-system i n the c o n t e x t o f a r t i f i c i a l life [ W i l s o n , 1987], However, I believe I was t h e first to a c t u a l l y i m p l e m e n t a n d e x p e r i m e n t w i t h this e v o l u t i o n o f development [ K i t a n o , 1990b]. Since t h e L-system is a descriptive m o d e l , a n d n o t a m o d e l w h i c h describes mechanisms for de v e l o p m e n t , new a n d m o r e biologically g r o u n d e d models s h o u l d be i n t r o d u c e d . Nevertheless, t h e encouraging results achieved in a series of experiments [ K i t a n o , 1990b, K i t a n o , 1992a, G r u a u , 1992] d e m o n s t r a t e t h a t the evo l u t i o n of development is an i m p o r t a n t issue. I n a d d i t i o n , t h e e v o l u t i o n a l perspective leads u s t o i n vestigate t h e mechanism of i m m u n e systems [Tonegawa, 1983], p r o j e c t i o n s f r o m t h e l i m b i c system, a n d the en dogenous system [ M c G a u g h , 1989]. T h e central ner vous system, i m m u n e system, and endogenous system 7.3. Symbiosis S y m b i o t i c c o m p u t i n g is an idea whereby we view an i n telligent system as a symbiosis between various different kinds of subcomponents, each of w h i c h is differentiated t h r o u g h evolution. Symbiosis has three m a j o r types: host-parasite, fusion, and network. T h e idea of symbio sis itself is originated in the field of biology. One of the strong proponents of symbiosis is L y n n Margulis. M a r gulis considered symbiosis as an essential principle for the creation of eukaryotic cells. She claims some parts of eukaryotic cells resulted directly f r o m the f o r m a t i o n of permanent associations between organisms of different species, the evo l u t i o n of symbiosis [Margulis, 1981]. W h i l e the serial endosymbiosis theory ( S E T ) [Taylor, 1974] is a symbiosis theory for eukaryotic cells, w h i c h is a fusion, there are symbioses at various levels. I claim t h a t symbiosis is one of the central principles for models of t h o u g h t and life. Particularly, symbiosis is the key for evolution of intelligence. A l t h o u g h genetic algorithms and other genetically-inspired models provide powerful a d a p t a t i o n capabilities, it w o u l d be extremely difficult and time-consuming to create highly complex structures from scratch. It may be difficult for genetic algorithms alone to create complex structures w i t h h i g h m u l t i m o d a l i t y . Symbiosis can take place at various level f r o m cells to global multi-agent systems. At the neural level, neural symbiosis may be a useful idea to consider. In The Society of Mind, M a r v i n M i n s k y postulates i n telligence to emerge from the society of large numbers of agents [Minsky, 1986]. T h e society of m i n d idea is one instance of symbiosis. It is a weak f o r m of s y m biosis, w h i c h can be categorized as a network, because i n d i v i d u a l agents r e t a i n their independency. Symbiotic c o m p u t i n g offers a more dynamic picture. In s y m b i otic c o m p u t i n g , subcomponents can be merged to f o r m a t i g h t symbiosis or loosely coupled to f o r m a weak s y m biosis. A component can be differentiated to f o r m sev eral kinds of components, b u t could be merged again later. I n the l i g h t o f s y m b i o t i c c o m p u t i n g , the society o f m i n d s and the subsumption architecture are instances, Kitano 827 or snapshots, of s y m b i o t i c c o m p u t i n g . 7.4. Diversity D i v e r s i t y is an essential factor for intelligence. For e x a m ple, in order for t h e symbiosis to be effective, or even for symbiosis to take place, components involved in s y m b i o sis m u s t have different functions. T h u s , t h e a s s u m p t i o n of symbiosis is diversity. E v o l u t i o n o f t e n drives species i n t o niches so t h a t d i f f e r e n t i a t i o n between species takes place. Intelligence is a m a n i f e s t a t i o n of a h i g h l y complex a n d d y n a m i c system whose components m a i n t a i n a h i g h level of diversity. It is analogous to b i g artifacts such as b i g b u i l d i n g s a n d bridges. I n physics, p a r t i c l e physicists t r y to discover t h e u n i f i e d theory. N u m b e r s of theories such as t h e s u p e r s t r i n g t h e o r y a n d the s u p e r s y m m e t r y theor have been proposed. A l t h o u g h these theories comes very close to t h e u n i f i e d theory, or even if one of t h e m is t h e u n i f i e d theory, t h e y never e x p l a i n how a b u i l d i n g can be built. B u i l d i n g s are b u i l t using various different components i n t e r r e l a t e d t o each other j u s t like symbiosis. I t i s i m p o s sible to b u i l d any m e a n i n g f u l a r c h i t e c t u r e using a single t y p e o f component. Theories w h i c h c l a i m t h a t one par t i c u l a r mechanism can create intelligence f a l l i n t o t h i s fallacy. T h u s , it is w r o n g to c l a i m t h a t rules alone can be t h e mechanism b e h i n d intelligence. It is w r o n g to c l a i m t h a t m e m o r y alone can be the mechanism b e h i n d intelligence. M a n y models o f n e u r a l n e t w o r k s make t h e same mis take. M o s t of models are t o o s i m p l i f i e d as t h e y usually assume a single n e u r o n t y p e . T h e real b r a i n consists o f numbers o f different types o f neurons interconnected in a specific a n d perhaps genetically specified manner. I n a d d i t i o n , most neural n e t w o r k s assumes electric i m pulses to be the sole source of i n t e r - n e u r o n c o m m u n i c a t i o n . Recent biological studies revealed t h a t h o r m o n a l effects also affect the state of neurons. E v e n if mechanisms at microscopic levels can be de scribed by r e l a t i v e l y small numbers of principles, it is analogous to a t h e o r y of p a r t i c l e physics. In order for us to f o r m u l a t e t h e m o d e l of t h o u g h t , we m u s t find how these components create diverse substructures a n d how they interact. 7.5. Motivation — Genetic Supervision No learning takes place w i t h o u t m o t i v a t i o n . T h i s is t h e factor w h i c h i s c r i t i c a l l y l a c k i n g i n m u c h c u r r e n t A I re search. T h e t e r m motivation is used as a drive for learn ing. A l t h o u g h , b o t h conscious a n d subconscious drives are i m p o r t a n t , t h e f o l l o w i n g discussion focuses on a s u b conscious d r i v e . One example w o u l d i l l u s t r a t e t h e p o i n t o f m y discus sion. I n f a n t s learn t o avoid dangerous t h i n g s w i t h o u t any teaching signal f r o m the e n v i r o n m e n t . W h e n a n i n f a n t touches a very h o t o b j e c t , the i n f a n t reflexively removes 828 Awards its h a n d a n d screams. A f t e r several such experiences, t h e i n f a n t w o u l d learns t o avoid t h e hot o b j e c t . T h e question is w h y this i n f a n t learns to avoid h o t objects r a t h e r t h a n learning t o prefer t h e m . T h e r e m u s t b e some i n n a t e d r i v e , w h i c h guides learning in a p a r t i c u l a r direc t i o n . L e a r n i n g is genetically supervised. We learn to avoid dangerous t h i n g s because i n d i v i d u a l s whose learn i n g i s d r i v e n t o t h a t d i r e c t i o n have s u r v i v e d b e t t e r t h a n a g r o u p of i n d i v i d u a l s whose learning is d r i v e n to pre fer dangerous t h i n g s . Therefore, learning is genetically supervised in a p a r t i c u l a r d i r e c t i o n w h i c h improves the r e p r o d u c t i o n r a t e . I n a d d i t i o n , studies o n c h i l d language a c q u i s i t i o n seems to s u p p o r t t h e existence of an i n n a t e m e c h a n i s m for selective learning. N o t j u s t d i r e c t i o n a n d focus, b u t also the learning mechanisms are d e t e r m i n e d genetically. Some species of b i r d d e m o n s t r a t e a f o r m of learning called imprinting. For these b i r d s , t h e first m o v i n g ob j e c t w h i c h t h e y see after h a t c h i n g w i l l be i m p r i n t e d on t h e i r b r a i n as being t h e i r m o t h e r . I m p r i n t i n g is an i n nate learning w h i c h allows birds to q u i c k l y recognize t h e i r m o t h e r , so t h a t t h e y can be fed and p r o t e c t e d w e l l . W h i l e there are c e r t a i n risks t h a t a n i n d i v i d u a l m i g h t i m p r i n t s o m e t h i n g different, e v o l u t i o n a l choice has been made to use the p a r t i c u l a r mechanism under t h e g i v e n environment. H o w knowledge should be organized and represented largely depends u p o n the purpose o f learning. I f t h e learning is to avoid dangerous t h i n g s , features of ob jects such as t e m p e r a t u r e , speed, distance, and other factors c o u l d b e i m p o r t a n t . I n a d d i t i o n , reasoning mech anisms to p r o v i d e quick r e a l - t i m e response w i l l be used rather t h a n accurate b u t slow mechanisms. T h e subs u m p t i o n a r c h i t e c t u r e c a p t u r e d a mechanism w h i c h is largely p r i m i t i v e b u t i m p o r t a n t for s u r v i v a l . T h u s , these mechanisms m u s t be composed of r e l a t i v e l y simple cir c u i t r y and p r o v i d e r e a l - t i m e a n d autonomous reactions without central control. T h e idea of reinforcement learning [ W a t k i n є , 1989] provides a more realistic f r a m e w o r k of l e a r n i n g t h a n most t r a d i t i o n a l models of learning w h i c h assume ex p l i c i t teaching signals. Reinforcement learning assumes t h a t the a c t i o n m o d u l e received a scalar value feedback called reward, according to their action under a c e r t a i n e n v i r o n m e n t . I feel this is closer to reality. T h e s h o r t c o m i n g of reinforcement l e a r n i n g is t h a t t h e e v a l u a t i o n f u n c t i o n t o p r o v i d e t h e r e w a r d signal t o t h e a c t i o n m o d ule, is given a priori. A c k e l y a n d L i t t m a n proposed E v o l u t i o n a l Reinforcement L e a r n i n g ( E R L : [ A c k l e y a n d L i t t m a n , 1992]) t o remedy t h i s p r o b l e m . I n E R L , the e v a l u a t i o n f u n c t i o n co-evolves w i t h t h e action m o d u l e s o t h a t t h e i n d i v i d u a l w i t h a better e v a l u a t i o n f u n c t i o n learns m o r e effectively. I n d e p e n d e n t l y , E d e l m a n uses the value system to d r i v e learning of D A R W I N - I I I to a de sired d i r e c t i o n . I call t h i s class of models Genetic Supervision. A general a r c h i t e c t u r e for genetic supervision s h o u l d look like F i g u r e 22. For genetic supervision t o b e g r o u n d e d i n t h e b i o l o g i cal a n d e v o l u t i o n a l c o n t e x t , several new factors m u s t be considered such as t h e influence of hormones over m e m - that artificial life is "a constructive lower bound for ar tificial intelligence" [Belew, 1991]. Brooks made a good argument for the embodiment of robots in the real world at a macroscopic level. The argument in this section is at a more microscopic level. My argument is that physics will be necessary to make emergence, development, and evolution to take place. It should be noted that this is one extreme, and much AI research will be carried out without physics. But, as research delves into basic mech anisms, physics will have to be considered. There are two promising directions, both approaches whcih should be pursued for physical grounding of AI and Alife — Nanotechnology and digital physics. Nano-technology can be a driving force grounding AI and ALife research to actual physics. The same physics which has been applied to ourselves for billions of years can be applied to A I . This approach will result in enor mous impact to society because the very definition of natural life and intelligence will be contested in the light of the state-of-the-art technologies. Figure 22: Learning and Evolution ory modulation [McGaugh, 1989] and A 6 and A 10 neural systems. These systems have an older origin than neo cortex, and control deep emotional and motivational as pects of our behavior. Due to the fact these systems were established at an early stage of evolution, their signal processing-transmission capabilities are limited. Gener ally, signals from these systems act as an analog global modulator. Nevertheless, I believe that these factors play critically important roles in the next generation of intelligence theories. 7.6. Physics Since intelligence is a result of evolution and emergence, it is governed by physics. As long as models of intel ligence deal with stationary models, the significance of physics does not come into play. However, once the re search is directed to more basic properties such as emer gence, development, and evolution, physics cannot be ignored. For example, morphogenesis involves cell move ments and cell divisions which are largely influenced by physics. Morphogenesis is not necessary, or it can be significantly simplified, if there is no physics involved. However, at the same time, theories without physics in evitably assume a priori external constraints. Basic con figurations of gene and reproduction mechanisms could have been different if a different physics has been applied. Recent discoveries from the U.S. Space Shuttle missions, particularly by Japanese experiments in Endeavor, un covered the importance of gravitational influence in cell division and the gene repair. The semantics of the world could have been very different if a spontaneous break down of symmetry took place in a very different way. This argument largely assumes artificial life research to be the basis for artificial intelligence. I agree with Belew To the contrary, digital physics will be able to create a new physics. Massively parallel machines in the near future may be able to produce physics in themselves. This was one of the motivations for building the con nection machine [Hillis, 1985]. Just as Langton termed Artificial Life life-as-it-could-be [Langton, 1989], digital physics may allow us to pursue the universe-as-it-couldbe. 8. Conclusion The arguments developed in this paper can be expressed in a single statement: Massively parallel artificial intelligence is where AI meets the real world. The phrase AI meets the real world has been used in various contexts, such as Robotics is where AI meets the real world. Nev ertheless, the phrase has its own significance and that is why it has been repeatedly used. Artificial intelligence is the field of studying models of intelligence and engineering methods for building intelli gent systems. The model of intelligence must face the bi ological reality of the brain, and intelligent systems must be deployed in the real world. The real world is where we live, and embodies vastness, complexity, noise, and other factors which are not easy to overcome. In order for the AI to succeed, powerful tools to tackle complexity and vastness of the real world phenomena must be prepared. In physics, there are powerful tools such as mathematics, particle accelerators, the hubble telescope, and other ex perimental and conceptual devices. Biology made a leap when D N A sequencing and gene recombination methods were discovered. Artificial intelligence, however, has been constrained by available computing resources for the last thirty years. The conceptual devices have been influenced by hardware architectures to date. This is represented by the traditional AI approach. Fortunately, however, the progress of V L S I technology liberates us from the com putational constraints. Kitano 829 Computationally demanding and data-driven approaches, such as memory-based reasoning and genetic algorithms emerge as realistic alternatives to the tradi tional approach. This does not mean that the traditional approach will be eliminated. There are suitable applica tions for traditional models, and there are places where massively parallel approach will be suitable. Thus, there will be co-habitation of different approaches. However, the emergent new approach will be a powerful method to challenge the goals of A I , which has not been accom plished so far. Massive computing power and memory space, along with new ideas for A I , will allow us to challenge the re alities in our engineering and scientific discipline. For AI to be a hard-core science and engineering discipline, real world AI must be pursued and our arsenals for attack ing the problems must be enriched. I hope that discus sions developed in this paper contribute to the commu nity making a quantum leap into the future. Acknowledgements There are just too many people to thank: Makoto Nagao, Jaime Carbonell, Masaru Tomita, Jay McClleland, David Waltz, Terry Sejnowski, Alex Waibel, Kai-Pu Lee, Jim Gellar, David Goldberg, Dan Moldovan, Tetsuya Higuchi, Satoshi Sato, Hideto Tomabechi, Moritoshi Yasunaga, Ron DeMara, Toyoaki Nishida, Hitoshi Iida, Akira Kurematsu, Hiroshi Okuno, K o i t i Hasida, Katashi Nagao, Toshio Yokoi, Mario Tokoro, Lori Levin, Rod ney Brooks, Chris Langton, Katsu Shimohara, Scott Fahlman, David Touretzky, Hideo Shimazu, Akihiro Shibata, Keiji Kojima, Hitoshi Matsubara, Eiichi Osawa, Eiichirou Sumita, Kozo Oi... I cannot list them all. Sorry if you are not in the list. Jim Hendler gave me various advices for the final manuscript of this paper. I owe him a lot. This research is supported, in part, by National Sci ence Foundation grant MIP-9009111 and by Pittsburgh Supercomputing Center IRI-910002P. References [Ackley and Littman, 1992] Ackley, D. and Littman, M., "Interactions Between Learning and Evolu tion," Artificial Life II, Addison Wesley, 1992. [Ae and Aibara, 1989] Ae, T. and Aibara, R., "A neural network for 3-D V L S I accelerator," VLSI for Artificial Intelligence, Kluwer Academic Pub., 1989. [Atkeson and Reinkensymeyer, 1990] Atkeson, C. and Rienkensymeyer, D., "Using Associative ContentAddressable Memories to Control Robots," Artificial Intelligence at MIT, 1990. [Batcher, 1980] Batcher, K., "Design of a Massively Par allel Processor," IEEE Transactions on Computers 29, no.9, September 1980. [Belew, 1991] Belew, R., "Artificial Life: A Construc tive Lower Bound for Artificial Intelligence," IEEE Expert, Vol. 6, No. 1, 8-15, 1991. 830 Awards [Blank, 1990] Blank, T., "The MasPar MP-1 Architec ture," Proc. of COMPCON Spring 90, 1990. [Blelloch, 1986] Blelloch, G. E., AFS-1: A programming language for massively concurrent computers, Tech Report 918, M I T AI Lab., 1986. [BleUoch, 1986] Blelloch, G. E., "CIS: A Massively Con current Rule-Based System," Proceedings of the National Conference on Artificial Intelligence (AAAI86), 1986. [Bouknight et al., 1972] Bouknight, W. J., et al., "The I L L I A C I V System," Proceedings of the IEEE, 369388, A p r i l , 1972. [Bowler and Pawley, 1984] Bowler, K. C. and Pawley, G. S., "Molecular Dynamics and Monte Carlo Simulation in Solid-State and Elementary Parti cle Physics," Proceedings of the IEEE, 74, January 1984. [Brooks, 1991] Brooks, R., "Intelligence Without Rea son," Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-91), Sydney, 1991. [Brooks, 1986] Brooks, R., "A robust layered control system for a mobile robot," IEEE J. Robotics and Automation, RA-2, 14-23, April, 1986. [Charniak, 1983] Charniak, E., "Passing Markers: A Theory of Contextual Influence in Language Com prehension," Cognitive Science, 7, 171-190, 1983. [Charniak, 1986] Charniak, E., "A Neat Theory of Marker Passing," Proc. of AAAI-86, 1986. [Chung et al., 1992] Chung, S., Moldovan, D., and De Mara, R., A Parallel Computational Model for Integrated Speech and Natural Language Understanding, University of Southern California, TR PKPL-92-2, also as IMPACT-92-008-USC, 1992. [Cook, 1991] Cook, J., "The Base Selection Task in Ana logical Planning," Proc. of IJCAI-91, 1991. [Creecy et al., 1992] Creecy, R., Masand, B., Smith, S., and Waltz, D., "Trading MIPS and Memory for Knowledge Engineering: Automatic Classification of Census Returns on a Massively Parallel Super computer," Communications of the ACM, 1992. [Dally et al., 1989] Dally, W. J., et. al. "The J-Machine: a fine-grain concurrent computer," Information Processing 89, Elsevier North Holland, 1989. [Dixon and de Kleer, 1988] Dixon, M. and de Kleer, J., "Massively Parallel Assumption-based Truth Main tenance," Proceedings of the the National Conference on Artificial Intelligence (AAAI-88), 1988. [Drumheller, 1986] Drumheller, M., "Connection Ma chine stereomatching," Proceedings of the National Conference on Artificial Intelligence (AAAI-86), 1986. [EDR, 1988] Japan Electric Dictionary Research Insti tute, " E D R Electric Dictionaries," Technical Re port, Japan Electric Dictionary Research Institute, 1988. [Edelman, 1989] Edelman, G., The Remembered Present: A Biological Theory of Consciousness, Basic Books, 1989. [Edelman, 1987] Edelman, G., Neural Darwinism: The Theory of Neuronal Group Selection, Basic Books, 1987. [Evett et al., 1990a] Evett, M., Hendler, J., and Spector, L., P A R K A : Parallel knowledge representation on the Connection Machine, Tech Report CS-TR2409, University of Maryland, 1990. [Evett et al., 1990b] Evett, M., Hendler, J., Mahanti, A., and Nau, D., " P R A * A memory-limited heuris tic search procedure for the connection machine," Proceedings - Frontiers of Massively Parallel Com puting, 1990. [Fahlman, 1979] Fahlman, S. E., N E T L : A System f o r Representing and Using Real World Knowledge, M I T Press, 1979. [Furuse and Iida, 1992] Furuse, O. and Iida, H., "Coop eration between Transfer and Analysis in ExampleBased Framework," Proc. of COLING-92, Nantes, 1992. [Gazdar et al., 1985] Gazdar, G., Klein, G., Pullum, G., and Sag, I., Generalized Phrase Structure Grammar, Harvard University Press, 1985. [Geller, 199l] Geller, J., Advanced Update Operations in Massively Parallel Knowledge Representation, New Jersey Institute of Technology, CIS-91-28, 1991. [Gelsinger et al., 1989] Gelsinger, P., Gargini, P., Parker, G., and Yu, A., "Microprocessors circa 2000," I E E E Spectrum, Oct. 1989. [Goodman and Nirenberg, 1991] Goodman, K. and Nirenberg, S. (Eds.), The K B M T Project: A case study in Knowledge-Based Machine Trans lation, Morgan Kaufmann, 1991. [Graf et al., 1988] Graf, H., et. al., " V L S I implementa tion of a neural network model," I E E E Computer, Vol. 21, No. 3, 1988. [Gruau, 1992] Gruau, F., Celluar Encoding of Genetic Neural Netowrks, Ecole Normale Suerieure de Lyon, Lyon, 1992. [Hammond, 1986] Hammond, K., Case-Based Planning: An Integrated Theory of Planning, Learning, and Memory, Ph. D. Thesis, Yale University, 1986. [Hasida et al., 1993] Hasida, K., Nagao, K., and Miyata, T., "Joint Utterance: Intrasentential Speaker/Hearer Switch as an Emergent Phe nomenon," Proc. of IJCAI-93, 1993. [Hendler, 1989a] Hendler, J., "Marker-passing over Microfeatures: Towards a Hybrid Symbolic/Connectionist Model," Cognitive Science, 13, 79-106, 1989. [Hendler, 1989b] Hendler, J., "The Design and Implementation of Marker-Passing Systems," Connection Science, Vol. 1, No. 1, 1989. [Hendler, 1988] Hendler, J., Integrating Marker-Passing and Problem-Solving, Lawrence Erlbaum Asso ciates, 1988. [Higuchi et al., 199l] Higuchi, T., Kitano, H., Furuya, T., Kusumoto, H., Handa, K., and Kokubu, A., " I X M 2 : A Parallel Associate Processor for Knowl edge Processing," Proceedings of the National Con ference on Artificial Intelligence, 1991. [Hillis, 1985] Hillis, D., The Connection Machine, The M.I.T. Press, 1985. [Hirst, 1986] Hirst, G., Semantic Interpretation and the Resolution of Ambiguity, Cambridge University Press, 1986. [Holland, 1975] Holland, J., Adaptation in Natural and Artificial Systems, University of Michigan, 1975. [Hord, 1990] Hord, M., Parallel Supercomputing in SIMD Architectures, CRC Press, 1990. [Hsu, 1991] Hsu, F. H., "The Role of Chess in Al Re search," Proc. of IJCAI-91, 1991. [Hsu, 1990] Hsu, F. H., Large Scale Parallelization of Alpha-Beta Search: An Algorithmic and Architec tural Study with Computer Chess, CMU-CS-90-108, Ph.D. Thesis, 1990. [Hsu et a l , 1990] Hsu, F. H., Anantharaman, T., • Campbell, M. and Nowatzyk, A., "A Grandmaster Chess Machine," Scientific American, Vol. 263, No. 4, Oct., 1990. [IEDM, 1991] I E E E , Proc. of the International Electron Device Meeting (IEDM-91), 1991. [Inmos, 1987] Inmos, I M S T800 Transputer, 1987. [Jantsch, 1980] Jantsch, E., The Self-Organizing Uni verse, Pergamon Press, 1980. [Kaplan and Bresnan, 1982] Kaplan, R. and Bresnan, J., "Lexical-Functional Grammar: A Formal Sys tem for Grammatical Representation," The Mental Representation of Grammatical Relations, The M I T Press, 1982. [KSR, 1992] Kendall Square Research, KSR-1 Technical Summary, Kendall Square Research, 1992. [Kettler et a l , 1993] Kettler, B., Andersen, W., Hendler, J. and Evett, M. "Mas sively Parallel Support for a Case-based Planning System", Proceedings of the Ninth I E E E Confer ence on Al Applications, Orlando, Florida, March 1993. [Kikui et al., 1993] Kikui, K., Seligman, M., Takezawa, T., "Spoken Language Translation System: ASURA," Video Proceedings of IJCAI-93, 1993. [Kitano, 1993] Kitano, H., "A Comprehensive and Prac tical Model of Memory-Based Machine Transla tion," Proceedings of the International Joint Con ference on Artificial Intelligence (IJCAI-93), 1993. [Kitano et al., 1993] Kitano, H., Wah, B., Hunter, L., Oka, R., Yokoi, T., and Hahn, W., "Grand Chal lenge Al Applications," Proceedings of the Inter national Joint Conference on Artificial Intelligence (IJCAI-93), 1993. [Kitano, 1992a] Kitano, H., Neurogenetic Learning Carnegie Mellon University, C M T Technical Report, 1992. [Kitano, 1992b] Kitano, H. (Ed.), Proceedings of the Workshop on Grand Challenge AI Applications, Tokyo, 1992. (in Japanese) [Kitano and Yasunaga, 1992] Kitano, H. and Yasunaga, M., "Wafer Scale Integration for Massively Parallel Memory-Based Reasoning," Proceedings of the Na tional Conference on Artificial Intelligence ( A A A I 92), 1992. [Kitano et al., 1992] Kitano, H., Shibata, A., Shimazu, H., Kajihara, J., and Sato, A., "Building a LargeScale and Corporate-Wide Case-Based System," Proc. of AAAI-92, 1992. Kitano 831 [ K i t a n o e t a l . , 1991a] K i t a n o , H . , Hendler, J . , H i g u c h i , T . , M o l d o v a n , D . a n d W a l t z , D., "Massively P a r a l lel A r t i f i c i a l I n t e l l i g e n c e , " Proceeding o f the I n t e r n a t i o n a l J o i n t Conference o n A r t i f i c i a l Intelligence ( I J C A I - 9 1 ) , 1991. [ K i t a n o e t a l , 1991b] K i t a n o , H , M o l d o v a n , D., C h a , S., " H i g h Performance N a t u r a l Language Process i n g o n Semantic N e t w o r k A r r a y Processor," P r o ceeding o f the I n t e r n a t i o n a l J o i n t Conference o n A r t i f i c i a l Intelligence ( I J C A I - 9 1 ) , 1991. [ K i t a n o et a l . , 1991c] K i t a n o , H . , S m i t h , S. F., a n d H i g u c h i , T . , " G A - 1 : A P a r a l l e l Associative M e m o r y Processor for R u l e L e a r n i n g w i t h Genetic A l g o r i t h m s , " Proceedings of the I n t e r n a t i o n a l Confer ence on Genetic A l g o r i t h m s ( I C G A - 9 1 ) , 1991. [ K i t a n o a n d H i g u c h i , 1991a] K i t a n o , H . a n d H i g u c h i , T . , " H i g h Performance M e m o r y - B a s e d T r a n s l a t i o n o n I X M 2 Massively P a r a l l e l Associative M e m o r y Processor," Proceeding of the N a t i o n a l Conference o n A r t i f i c i a l Intelligence ( A A A I - 9 1 ) , 1991. [ K i t a n o a n d H i g u c h i , 1991b] K i t a n o , H . a n d H i g u c h i , T . , "Massively P a r a l l e l M e m o r y - B a s e d P a r s i n g , " Proceeding of the I n t e r n a t i o n a l J o i n t Conference on A r t i f i c i a l Intelligence ( I J C A I - 9 1 ) , 1991. [ K i t a n o a n d M o l d o v a n , 1991] K i t a n o , H . a n d M o l d o v a n , D., " S e m a n t i c N e t w o r k A r r a y Processor as a Massively P a r a l l e l P l a t f o r m for R e a l - t i m e a n d Large-Scale N a t u r a l Language Processing," Proc o f C O L I N G - 9 2 , 1992. [ K i t a n o , 1991] K i t a n o , H . , " D M D U L O G : A n E x p e r i m e n t a l Speech-to-Speech Dialogue T r a n s l a t i o n Sys t e m , " I E E E Computer, June, 1991. [ K i t a n o , 1990a] K i t a n o , H., " D M D I A L O G : A Speechto-Speech Dialogue T r a n s l a t i o n , " M a c h i n e Transla t i o n , 5, 301-338, 1990. [ K i t a n o , 1990b] K i t a n o , H . , " D e s i g n i n g N e u r a l N e t works using Genetic A l g o r i t h m s w i t h G r a p h Gener a t i o n s y s t e m , " Complex Systems, V o l . 4., N u m . 4., 1990. [ K i m a n d M o l d o v a n , 1990] K i m , J . a n d M o l d o v a n , D., " P a r a l l e l Classification for K n o w l e d g e Representa t i o n o n S N A P , " Proceedings o f the 1990 I n t e r n a t i o n a l Conference on P a r a l l e l Processing, 1990. [ K o l o d n e r , 1988] K o l o d n e r , J . , " R e t r i e v i n g E v e n t s f r o m a Case M e m o r y : A P a r a l l e l I m p l e m e n t a t i o n , " P r o ceedings of Case-Based Reasoning Workshop, 1988. [ K o l o d n e r , 1984] K o l o d n e r , J . , R e t r i e v a l a n d Organiza t i o n a l Strategies in Conceptual M e m o r y : A C o m p u t e r M o d e l , Lawrence E r l b a u m A s s o c , 1984. [ L a n g t o n , 1991] L a n g t o n , C , T a y l o r , C , F a r m e r , J . , a n d Rasmussen, S.., (Eds.) A r t i f i c i a l L i f e I I , A d dison Wesley, 1991. [ L a n g t o n , 1989] L a n g t o n , C , ( E d . ) A r t i f i c i a l L i f e , A d dison Wesley, 1989. [Lee, 1988] Lee, K. F., Large-Vocabulary SpeakerIndependent C o n t i n u o u s Speech Recognition: The S P H I N X System, C M U - C S - 8 8 - 1 4 8 , P h . D . Thesis, 1988. [Lenat a n d Feigenbaum, 1991] L e n a t , D. B. a n d Feigenb a u m , E., " O n the thresholds o f k n o w l e d g e , " A r t i ficial Intelligence, 47, 185-250, 1991. 832 Awards [Lenat a n d G u h a , 1989] L e n a t , D. B. a n d G u h a , R. V . , B u i l d i n g Large Knowledge-Based Systems, A d d i s o n Wesley, 1989. [ L i n d e n m a y e r , 1971] L i n d e n m a y e r , A . , " D e v e l o p m e n t a l Systems w i t h o u t Celluar i n t e r a c t i o n s , t h e i r L a n guages a n d G r a m m a r s , " J . Theor. B i o l , 30, 455484, 1971. [ L i n d e n m a y e r , 1968] L i n d e n m a y e r , A . , " M a t h e m a t i c a l M o d e l s for Celluar I n t e r a c t i o n s i n D e v e l o p m e n t , " J . Theor. B i o l , 18, 280-299, 1968. [ M a r g u l i s , 1981] M a r g u l i s , L., Symbiosis in Cell E v o l u t i o n , Freeman, 1981. [ M c G a u g h , 1989] M c G a u g h , J . , " I n v o l v e m e n t o f H o r m o n a l a n d N e u r o m o d u l a t o r Systems i n t h e Reg u l a t i o n o f M e m o r y Storage," A n n . Rev. N e u r o s c i . , 12:255-87, 1989. [ M e a d , 1989] M e a d , C , Analog V L S I a n d N e u r a l Sys tems, A d d i s o n Wesley, 1989. [ M i n s k y , 1986] M i n s k y , M . , The Society o f M i n d , S i m o n a n d Schuster, N e w Y o r k , N Y , 1986. [ M o l d o v a n et a l . , 1990] M o l d o v a n , D., Lee, W . , a n d L i n , C , S N A P : A Marker-Passing Architecture f o r Knowledge Processing, Technical R e p o r t P K P L 904 , D e p a r t m e n t o f E l e c t r i c a l E n g i n e e r i n g Systems, U n i v e r s i t y of S o u t h e r n C a l i f o r n i a , 1990. [ M o r i m o t o e t a l . , 1989] M o r i m o t o , T . , I i d a , H . , K u r e m a t s u , A . , Shikano, K., and A i z a w a , T . , " S p o ken Language T r a n s l a t i o n , " Proc. of I n f o J a p a n - 9 0 , T o k y o , 1990. [ M y c z k o w s k i a n d Steele, 1 9 9 l ] M y c z k o w s k i , J. a n d Steele, G., "Seismic M o d e l i n g At 14 Gigaflops O n T h e C o n n e c t i o n M a c h i n e , " Supercomputing-91, 1991. [Nagao, 1984] Nagao, M . , "A F r a m e w o r k of a M e c h a n ical T r a n s l a t i o n between Japanese a n d E n g l i s h by Analogy Principle," Artificial and Human Intelli gence, E l i t h o r n , A . , a n d B a n e r j i , R. ( E d s . ) , Elsevier Science Publishers, B . V . 1984. [ N R C , 1988] N a t i o n a l Research C o u n c i l , M a p p i n g a n d Sequencing the H u m a n Genome, N a t i o n a l A c a d e m y Press, W a s h i n g t o n , D.C., 1988. [Nickolls, 1990] Nickolls, J . , " T h e Design of t h e M a s P a r M P - 1 : A Cost Effective Massively P a r a l l e l C o m p u t e r , " Proc. o f C O M P C O N S p r i n g 90, 1990. [Nicolis a n d P r i g o g i n e , 1977] Nicolis, G. a n d P r i g o g i n e , I., Self-Organization i n N o n e q u i b i l l i u m Systems, W i l e y , N e w Y o r k , N Y , 1977. [ N o r v i g , 1989] N o r v i g , P., " M a r k e r Passing as a Weak M e t h o d for T e x t I n f e r e n c i n g , " Cognitive Science, 13, 569-620, 1989. [Pinker, 1991] P i n k e r , S., "Rules of L a n g u a g e , " ; Science, V o l . 253, 530-535, 1991. [ P o l l a r d a n d Sag, 1987] P o l l a r d , C. and Sag, L , I n f o r m a t i o n - B a s e d Syntax a n d Semantics, V o l u m e 1, C S L I L e c t u r e Notes, 13, 1987. [ Q u i l l i a n , 1968] Q u i l l i a n , M . R., " S e m a n t i c M e m o r y , " Semantic I n f o r m a t i o n Processing, M i n s k y , M . ( E d . ) , 216-270, T h e M I T Press, 1968. [Riesbeck and M a r t i n , 1986] Riesbeck, C. a n d M a r t i n , C , " D i r e c t M e m o r y Access P a r s i n g , " Experience, M e m o r y , a n d Reasoning, Lawrence E r l b a u m Asso ciates, 1986. [Riesbeck a n d Schank, 1989] Riesbeck, C. a n d Schank, R., Inside Case-Baaed Reasoning, Lawrence E r l b a u m Associates, 1989. [Robertson, 1987] R o b e r t s o n , G . , " P a r a l l e l I m p l e m e n t a t i o n of Genetic A l g o r i t h m s in a Classifier S y s t e m , " D a v i s , L., ( E d . ) Genetic A l g o r i t h m s a n d S i m u l a t e d A n n e a l i n g , M o r g a n K a u f m a n n Publishers, 1987. [ R u m e l h a r t a n d M c C l e l l a n d , 1986] R u m e l h a r t , D . a n d M c C l e l l a n d , J . , P a r a l l e l D i s t r i b u t e d Processing: E x p l o r a t i o n s i n the M i c r o s t r u c t u r e o f C o g n i t i o n , T h e M I T Press, 1986. [Sabot et a l , 1991] Sabot, G., Tennies, L., Vasilevsky, A . and Shapiro, R., " C o m p i l e r P a r a l l e l i z a t i o n o f an E l l i p t i c G r i d Generator for 1990 G o r d o n B e l l P r i z e , " Supercomputing-91, 1991. [Saito and T o m i t a , 1988] Saito, H . a n d T o m i t a , M . , " P a r s i n g Noisy Sentences," Proc. o f C O L I N G - 8 8 , 1988. [Sato a n d Nagao, 1990] Sato, S. a n d Nagao, M . , " T o w a r d M e m o r y - b a s e d T r a n s l a t i o n , " Proceedings o f C O L I N G - 9 0 , 1990. [Sato, 1992] Sato, S., Example-Based M a c h i n e Transla t i o n , P h . D . Thesis, K y o t o U n i v e r s i t y , 1992. [Sato, 1993] Sato, S., Example-Based T r a n s l a t i o n of Technical Terms, I S - R R - 9 3 - 4 1 , J a p a n A d v a n c e d I n s t i t u t e of Science and Technology ( J A I S T ) , 1993. [Schank, 1982] Schank, Press, 1982. R., Dynamic Memory, MIT [Schank, 1981] Schank, R. a n d Riesbeck, C, Inside C o m p u t e r Understanding, Lawrence E r l b a u m As sociates, 1981. [Schank, 1975] Schank, R., Conceptual Processing, N o r t h H o l l a n d , 1975. Information [Seeds, 1967] Seeds, R., " Y i e l d a n d Cost Analysis of B i p o l a r L S I , " Proc. o f I E E E I n t e r n a t i o n a l Electron Devices Meeting, 1967. [Sejnowski a n d Rosenberg, 1987] Sejnowski, T. a n d Rosenberg, C , " P a r a l l e l networks t h a t learn to pronounce English t e x t , " Complex Systems, 1: 145-168, 1987. [Shavlik a n d Towell, 1989] Shavlik, J. and T o w e l l , G . , " A n approach t o c o m b i n i n g explanation-based a n d n e u r a l learning a l g o r i t h m s , " C o n n e c t i o n Science, 1:233-255, 1989. [ S h i m i z u , e t a l , 1988] S h i m i z u , H . , Y a m a g u c h i , Y . , and Satoh, K., " H o l o v i s i o n : a Semantic I n f o r m a t i o n Processor for V i s u a l P e r c e p t i o n , " d y n a m i c P a t t e r n s in Complex Systems, (Eds.) Kelso, S., et. al., W o r l d Scientific, 42-76, 1988. [Spiessens a n d B e r n a r d , 1 9 9 l ] Spiessens, P., a n d B e r n a r d , M . , "A Massively P a r allel Genetic A l g o r i t h m : I m p l e m e n t a t i o n and F i r s t A n a l y s i s , " Proc. o f I C G A - 9 1 , 1991. [Stanfill a n d W a l t z , 1988] S t a n f i l l , C. and W a l t z , D., " T h e M e m o r y - B a s e d Reasoning P a r a d i g m , " P r o ceedings of the Case-Based Reasoning Workshop, D A R P A , 1988. [Stanfill a n d W a l t z , 1986] S t a n f i l l , C. and W a l t z , D., " T o w a r d M e m o r y - B a s e d Reasoning," C o m m u n i c a t i o n s o f the A C M , 1986. [Stanfill e t a l . , 1989] S t a n f i l l , C , T h a u , R., and W a l t z , D., " A Parallel I n d e x e d A l g o r i t h m for I n f o r m a t i o n R e t r i e v a l , " Proceedings of the S I G - I R - 8 9 , 1989. [ S t o r m o n e t a l , 1992] S t o r m o n , C, Troullinos, N., Saleh, E., C h a v a n , A . , B r u l e , M . , and O l d f i e l d , J . , " A General-Purpose C M O S Associative Processor I C a n d S y s t e m . " I E E E M i c r o , December, 1992. [ S u m i t a et a l . , 1993J S u m i t a , E., O i , K., I i d a , H . , Higuchi, T., K i t a n o , H., "Example-Based Machine T r a n s l a t i o n o n Massively Parallel Processors," P r o ceedings of the I n t e r n a t i o n a l J o i n t Conference on A r t i f i c i a l Intelligence ( I J C A I - 9 3 ) , 1993. [ S u m i t a a n d I i d a , 1992] S u m i t a , E., and I i d a , H., " E x a m p l e - B a s e d Transfer of Japanese A d n o m i n a l Particles i n t o E n g l i s h , " I E I C E Transaction o f I n f o r m a t i o n a n d Systems, J u l y , 1992. [ S u m i t a a n d I i d a , 1991] S u m i t a , E., and I i d a , H., " E x p e r i m e n t s a n d Prospects o f Example-Based M a chine T r a n s l a t i o n , " Proceedings of the A n n u a l Meet i n g o f the A C L ( A C L - 9 1 ) , 1991. [ N O A H , 1992] Systems Research & Development I n s t i t u t e of J a p a n , A P l a n f o r the Knowledge Archives Project, 1992. [Towell et a l . , 1990] T o w e l l , G., Shavlik, J. and N o ordewier, M . , " R e f i n e m e n t o f A p p r o x i m a t e D o m a i n Theories by Knowledge-Based N e u r a l N e t w o r k s , " Proc. o f A A A I - 9 0 , 1990. [Taylor, 1974] Taylor, F., " I m p l i c a t i o n s a n d extensions of the serial endosymbiosis theory of the o r i g i n of eukaryotes," Taxon, 23:229-258, 1974. [ T M C , 1991] T h i n k i n g Machines C o r p o r a t i o n , The C o n n e c t i o n Machine C M - 5 Technical S u m m a r y , 1991. [ T M C , 1989] T h i n k i n g Machines C o r p o r a t i o n , M o d e l C M - 2 Technical S u m m a r y , Technical R e p o r t T R 8 9 - 1 , 1989. [Tonegawa, 1983] Tonegawa, S., " S o m a t i c Generation of A n t i b o d y D i v e r s i t y , " N a t u r e , V o l . 302, No. 5909, 575-581, A p r i l 14, 1983. [ T w a r d o w s k i , 1990] T w a r d o w s k i , K., " I m p l e m e n t a t i o n s of a Genetic A l g o r i t h m based Associative Classifier System ( A C S ) , " Proc. o f I n t e r n a t i o n a l Conference on Tools f o r A r t i f i c i a l Intelligence, 1990. [Waibel e t a l . , 199l] W a i b e l , A . , J a i n , A . , N c N a i r , A . , Saito, H., H a u p t m a n n , A . , and Tebelskis, J . , " J A N U S : A Speech-to-speech translation system using connectionist a n d symbolic processing strate gies," I E E E Proceedings of the 1991 I n t e r n a t i o n a l Conference on Acoustics, Speech, and Signal P r o cessing, A p r i l , 1991. [ W a l t z , 1991] W a l t z , D., " E i g h t Principles for B u i l d i n g an I n t e l l i g e n t R o b o t , " Proceedings of the F i r s t I n t e r n a t i o n a l Conference on S i m u l a t i o n of Adaptive B e h a v i o r : Prom animals t o animats, M I T Press, 1991. [ W a l t z , 1990] W a l t z , D., "Massively Parallel A l , " P r o ceedings o f the N a t i o n a l Conference o n A r t i f i c i a l I n telligence ( A A A I - 9 0 ) , 1990. [ W a l t z , 1988] W a l t z , D., " T h e Prospects for B u i l d i n g T r u l y I n t e l l i g e n t Machines," G r a u b a r d , S . ( E d . ) , The A r t i f i c i a l Intelligence Debate, T h e M I T Press, 1988. Kitano 833 [Waltz and Pollack, 1985] Waltz, D. and Pollack, J., "Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation," Cogni tive Science, 9(1): 51-74, 1985. [Watkins, 1989] Watkins, C, Learning with Delayed Re wards, Ph. D. Thesis, Cambridge University, 1989. [Werner and Dyer, 1991] Werner, G. and Dyer, M., "Evolution of Communication in Artificial Organ isms," Artificial Life I I , Addison Wesley, 1991. [Wilson, 1987] Wilson, S., "The genetic algorithm and biological development," Proc. of International Conference on Genetic Algorithms, 1987. [Winograd, 1972] Winograd, T., Understanding Natural Language, Academic Press, New York, 1972. [Woods, 1970] Woods, W., "Transition network gram mars for natural language analysis," Communica tions of the A C M , 13:10, 591-606, 1970. [Woods et al., 1972] Woods, W., Kaplan, R., NashWebber, B., The lunar sciences natural language in formation system: Final report, Bolt, Beranek and Newman Report No. 2378, Cambridge, Mass., 1972. [Yasunaga et al., 199l] Yasunaga, et al., "A SelfLearning Neural Network Composed of 1152 Dig ital Neurons in Wafer-Scale LSIs," Proc. of the I n ternational Joint Conference on Neural Networks at Singapore (IJCNN-91), 1991. [Zhang et al., 1988] Zhang, X., Waltz, D., and Mesirov, J., Protein Structure Prediction by Memory-based Reasoning, Thinking Machine Corporation, 1988. [Zhang et al., 1992] Zhang, X., Mesirov, J., and Waltz, D., "Hybrid System for Protein Secondary Struc ture Prediction," J. M o l B i o l , 225, 1049-1063, 1992. 834 Awards