Download Processing by Data and Program Blocks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
1015
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 11, NOVEMBER 1978
Processing
by
Data and Program Blocks
MARIO R. SCHAFFNER,
Abstract-A processing system is presented that implements
simultaneously the efficiency of the special-purpose processor and the
total applicability of the general-purpose computer-characteristics
commonly thought of as being mutually exclusive. This is achieved
through specializing the machine by programming the hardware
structure, rather than by adding software systems to it. Data are
organized in circulating pages which form a multiplicity of local
dynamic memories for each process. Programs are made up of
modules, each describing a transient special-purpose machine. A
characteristic of this approach is that the processes are data-driven,
rather than program-driven. The programming language presents
significant flexibility and efficiency in modeling certain classes of
problems, and it may be of interest as an implementation model in a
broader context. Applications to real-time processing of radar
signals are reported. The relevance ofcharacteristics ofthis system to
problems in multiprogramming and multiprocessing systems is
discussed.
Index Terms-Computer architecture, data-driven processing,
implementation models, microprogramming, multiprocessors,
multiprogramming, paging systems, radar signal processing, realtime signal processing, structural programming language.
I. INTRODUCTION
T HE CONTEXT of this paper is real-time digital processing of large quantities of data, for both research and
operational applications.
In this context, typical approach is to use special-purpose
digital processors that are designed to optimize the needed
performance in the given environment. The drawback of this
approach, however, is that new equipment has to be
designed and procured every time that there is a change in
processing.
Today computer availability indicates that a more appropriate solution is to use general-purpose computers, with
specialization obtained through software. However, many
interesting processes cannot be achieved in real time by
means of affordable general-purpose computers, a fact that
makes the expectation of generality deceptive. Moreover,
the development of software systems sometimes constitutes
a large task in itself. Also, processing is frequently in the class
of pattern recognition, which typically requires a large
degree of parallelism and which often necessitates a variety
of strategies that are difficult to formulate in a single
programming language. Often, coding in machine language
becomes necessary, with consequent loss of the benefits of
the high-level user languages. One direction of investigation
Manuscript received February 20, 1977; revised March 27, 1978. This
work was supported by the National Aeronautics and Space Administration under Contracts NASr-158, NSR-09-015-033, and NASW-2276.
M. R. Schaffner was with the Department of Meteorology, Massachusetts Institute of Technology, Cambridge, MA. He is now with the
National Center for Atmospheric Research, Boulder, CO 80307.
MEMBER, IEEE
is to find computer architectures for general-purpose machines by which the tasks of digital signal processing can be
handled efficiently [1].
We have investigated the question of whether the
efficiency of the special-purpose processor and the generality of the general-purpose computer are necessarily mutually exclusive. In the context specified above, we present a
solution to this problem of conflicting goals: specializing the
machine by programming the hardware structure, rather
than by adding software systems to it. In -this solution, it
turns out that in the context of the classes of problems
referred to in this paper, the effort of programming becomes
significantly simplified.
The key to implementing such an approach is to define a
language that permits both 1) an effective and efficient
representation of the processes in terms of computational
structures and data structures, and 2) a direct implementation of these structures by means of suitable hardware.
Section II describes the frame common to the language and
to the hardware. Section III gives more details on the
hardware, and Section IV gives more details on the language, with an example of actual programming. Section V
reports about applications and about experiments which
have been conducted.
This processing solution belongs to the class of datadriven machines [2]; a brief discussion on the subject is in
Section VI. Some aspects of the solution described in Section
II are also relevant to the problems of multiprogramming
and multiprocessing systems. Most modern computers
operate with some degree of multiprogramming, ranging
from simultaneously handling different parts of a large
program, to simultaneously serving numerous users (time
sharing). A design criterion in multiprogramming is the
balance between the advantages of running other programs
when a program must wait for data or resources and the
overhead introduced by switching these programs. An overview of these topics can be found in [3]. Varying degrees of
multiprocessing are also present in most modern computers,
ranging from the simple simultaneous execution of the
different activities of a process (e.g., computation, I/O, data
handling) to the actual simultaneous computations performed by the large parallel or array computers. A recent
brief survey of multiprocessing systems can be found in [4].
The main problem in multiprocessing is programming for
an efficient use of the resources available. The scheduling of
the different processors adds complexity to the operating
system and the application software, and even necessitates
special features in the programming languages, e.g., [5].
Aspects of the solution presented that are relevant to these
topics are also discussed in Section VI.
0018-9340/78/1100-1015$00.75 (© 1978 IEEE
1016
IEEE TRANSACTIONS ON COMPUTERS, VOL.
c-27, NO.
11, NOVEMBER
1978
Fig. 1. Frame for the modeling of processes.
II. THE IMPLEMENTATION MODEL
A. The Page Discipline
A particular discipline, here referred to as page discipline,
is assumed for the data. Basically, this discipline consists of
keeping the working data sets, i.e., the data presently in use
by each process or portion of a large process, grouped
together in (small) pages which move as units in the system.
Thus, the pages constitute a number of local, dynamic
memories, one for each process or portion of a process.
No formal definition of these pages is given, but a
constructive procedure is described in the following. To help
visualize these pages, the reader can refer to the frame in Fig.
1, in which pages move as data blocks from one register
array to another. In one station, the assembler, the pages can
acquire new data; in the programmable network, the pages
transform their data; in another station, the packer, they can
route data; in the memory, the pages rest.
A random-access discipline, which obviously is needed in
certain cases, is also employed in the system, as described in
Section II-E. But it has, so to speak, an auxiliary role; it is
used only when random access is an inherent characteristic
of the precess itself or when it constitutes the simplest way to
handle particular data.
B. Description of Structure
If the page discipline is used, it is no longer necessary to
decompose a process into sequences of instructions
(instruction = opcode + addresses). It is more convenient
to view a process as a (minimal) set of global transformations for the page; global transformation here means what
can be done at the present time with the data presently in the
page. For instance, if a page contains variables A, B, and C, a
global transformation at a certain point of the process may
consist of the following operations:
A
A +C
2
2'
CCc+ 1,
B+C
B-2
2
D
2A+B
In this data transformation we also see an example of
creating data variables in the page. Variable D did not exist
previously in the page; it has been introduced by the global
transformation. In this particular case, one of two choices
may be specified: either to execute all operations simultaneously or to perform the last operation after the others,
with a different value resulting for D.
If the approach of global transformations is assumed,
since the data of the page are all available simultaneously, it
is convenient to execute these data transformations by
means of special operational structures which include the
data page, that is, a special-purpose processor for the specific
data transformation. Because there are several variables in
the page, this processor will be composed of a network of
smaller processors. Because the data transformation to be
performed changes continuously, this network needs to be
programmable. For this reason, the station where the data
transformations are performed is indicated in Fig.@ 1 as a
programmable network, or PN; actual implementations of
PN's are discussed in Section III. We indicate the description of such a data transformation implemented by such a
programmable network globally with the symbol F.
Before each data transformation F, we may want to add
new data to the page. A station preceding the programmable
network, indicated in Fig. 1 as the assembler, is in the best
position to do so. The sources of input data are therefore
connected to the assembler. We indicate globally with the
symbol I the prescription of new input data to be added to
the page in the assembler at the present time.
SCHAFFNER: DATA AND PROGRAM BLOCKS
1017
program typically is static; outside of self-developing programs, a program module does not change shape, identity,
or place. It therefore appears simpler to make reference to
program modules rather than to data. In the case of program
sharing, this approach does not require additional control.
For all these reasons, approach 3) is taken here. Accordingly, each page is provided with a key (to be more fully
described later) which refers to the DS that the page should
be paired with.
Now we can describe in more detail the activity within the
frame of Fig. 1. When a page arrives at the assembler, it calls
for a DS, that is, a quadruplet, in response to its key. Then,
the first component of the acquired quadruplet, the input
prescription I, calls for new input data to be added to the
page. Subsequently, the page is transferred into the programmable network, carrying components F, T, and R of
the quadruplet. In the programmable network PN, the data
transformation F is implemented, and then the transition
function T is executed. The outcome of the transition
function is precisely to determine the value of the key in the
page, so that the page may possibly call for a different DS at
the next circulation. When the page is transferred into the
packer, the routings prescribed in R are implemented.
As will be described in more detail later, different routings
can be associated with different outcomes of the transition
function. Among the possible routings for an already existing page there is the insertion of a new key into circulation.
When such a key arrives at the assembler, it will call for a
quadruplet and thus start a new process. In this way, a page
can generate another or several other pages. One of the
possible outcomes from the transition functions is a key
equal to zero, which means that this page disappears after
the completion of the routing in the packer.
If no specific prescriptions are given, the page memory has
a FIFO discipline; that is, the assembler acquires one page
at a time from the page memory, in the same order in which
the packer fed them into the memory. But among the
possible routings by the pages there are also prescriptions
for structuring the page memory, for instance, in order to
switch the circulation to different page segments.
From the described behavior it can be noted that through
their existence and movement the pages (data) strongly
C. The Control
participate in the control of the activity of the entire system.
As was said above, for producing processing activities we Although everything is predetermined in the DS's, we can
need to associate pages with DS's. We can think of three say that the processing activity here is data-driven; more
approaches to that. 1) An additional device calls for the precisely, it is a page-driven processing.
program modules (DS's) and for the data blocks (pages), as
needed, providing for the appropriate pairings. This is a D. Structure-and-Data Machines
From the previous description of the circulation of the
completely general solution, but also the most expensive. 2)
The program modules are given references for the pages to pages, it is apparent that different processing activities can
be called for. This approach is similar to the one used in occur as a consequence of different pairings of pages and
convectional computers (data addresses). 3) The pages are DS's. These pairings in general result from the outcome of
given a reference for the program modules to be called for. the transition functions. These outcomes depend on the
Data are dynamic in that their structure, location, and results, which in turn depend also on the data that are
scope vary in time. Therefore, a reference system for data acquired; therefore, the variety of changing activities that
(addresses) inevitably grows in complexity quickly as soon can be implemented with simple program means can easily
as the activity becomes even a little articulated, a well- be realized. In order to have a criterion for distinguishing
known problem in present computers. On the other hand, a these different activities, we introduce the notion of
After each transformation F, we may wish to route some
results elsewhere. A station subsequent to the programmable network, indicated in Fig. 1 as the packer, is devoted
to this task. The output devices are therefore connected to
the packer. We indicate globally with the symbol R the
prescriptions of the routings to be executed at the present
time in the packer.
When we delineate a page transformation as a part of a
process, it is also necessary to indicate which other transformation should be next. Given the complex activity that can
be included in these Fs, a linear succession of Fs (like a
succession of instructions) is not the most frequent occurrence, at least in significant classes of problems. It is
convenient, instead, to use, after each F, a transition function
for establishing the transfer to one of several other data
transformations F, in response either to results obtained in
the page or to outside signals. Whatever the complexity of
such a transition function, it will be implemented best by
using the facility of the programmable network, which
already exists for the data transformations F; in this case,
PN is used for testing the conditions that at the present time
lead to different transitions. We indicate globally with the
symbol T such a transition function so implemented.
From the above we have arrived at the description of a
complete processing structure symbolized in the form of a
quadruplet [IFTR]. It includes input connections in the
assembler (component I), a data transformation and transition function in the programmable network (components F
and T), and output connections in the packer (component
R). We call such a quadruplet a description of structure, or
DS. The structure described by a DS is an instant structure,
and to implement a process we generally need several such
DS's. For instance, a process may be implemented by the
sequence DS1, DS2, DS2, DS3, DS2, DS2, DS4; for such a
process, the program consists of the set DS1, DS2, DS 3, DS4.
The mechanics of executing the process is to make a page (a
data working set) circulate through the assembler, PN,
packer, and memory of Fig. 1 and have a DS associated with
the page at each circulation. The selection ofthe DS for each
circulation is given by the transition functions in the DS's
themselves.
1018
IEEE TRANSACTIONS ON COMPUTERS, VOL.
structure-and-data machines or SDM's; each SDM is implemented by a page (the data) through a pattern of pairings
with DS's (the structure). Denoting specific patterns through
DS's by means of indexed brackets, we can write
symbolically
pagen
+
[DS]m
SDMk.
Different pages which follow the same pattern m of DS's
constitute different SDM's, because they use different data.
Moreover, a page which previously performed an SDM can
remain alive (rather than disappear) and enter a different
[DS]m for performing a new SDM, thus implementing a
simple parameter passing between two SDM's. The partition
into SDM's is quite arbitrary, but it is useful in that the
SDM's can be associated with meaningfully distinct
computations.
E. Facilities
The SDM's described above are transient machines which
are created for performing a specific task. To make the work
efficient, several facilities are provided in the system. The
input and output facilities assembler and packer,
respectively-have already been introduced, and contribute
to the pipeline structure. Two other essential facilities, the
auxiliary page storage and the functional, memory, are
described in the following.
Exchange of data among different pages is a frequent
need. Because each page that does an activity passes through
the programmable network PN, a simple way to implement
data exchange is to provide PN with an auxiliary page
storage QN (Fig. 1); that is a one-to-one replica of registers
QN that hold the page in PN. When a page is in PN, it can
transfer some of its data into Q'N; then some other pages can
acquire or exchange those data from Q'N. The data in QlN are
available to the data transformations F, so that common
values can be used by several pages; the data in Q'N are not
removed during the flow of pages in PN.
Also, control information can be stored in Q'N. A very
useful feature is the transmission of a key through Q' by part
of one page to specific other pages; in these pages the new
key overrides the outcome of their transition functions. In
this way, it is possible for one page to direct the transition of
other pages; in the programming language this is called a
c-27, NO.
11, NOVEMBER 1978
input; for this reason this memory is called functional. This
facility frees the SDM's from many clerical tasks.
At this point, the role of the pages can be described in
more detail. A page is created when a specific task is needed;
with circulations through the structure shown in Fig. 1, the
page keeps its working data set updated, typically changing
in size. Data can be acquired from input sources and from
the functional memory from other pages; data can be routed
to output devices and to the functional memory to other
pages. The pages provide for the computations and any
page can take control of other pages. When its task is
accomplished, a page normally disappears. All such activities are accomplished by each page in response to the DS's
that the page acquires -in its life, following the transition
functions in the DS's; therefore, no central control is needed,
regardless of the number of simultaneous SDM's. Because
the pages are created by other pages when and where (in the
sequence of pages) they are needed, no central scheduling is
necessary. Because the pages flow in a known sequence
through the pipeline structure depicted in Fig. 1, synchronization results from the modeling of the process as an
activity of concurrent SDM's.
At first, one might wonder whether this modeling in terms
of data that circulate continuously might lead to implementations with an exceedingly high rate of data transfers and
thus be inefficient. However, a more detailed analysis of the
work in the frame of Fig. 1 should clarify the question: the
pages are pure working sets; new data are introduced into
the pages precisely in the cycle in which they are needed;
results are normally routed out of the page as soon as they
are produced; within PN, all data transformations are
executed without references to memory. When data are not
suitable for a page discipline, the addressable functional
memory is available (see Section II-E). The example
described in Section IV shows a simultaneous use of data
managed in self-updating pages and in addressable form in
the functional memory.
The implementation model described is in the same
framework as Wegner's information structure models [6].
Here, the modeling of the processes is achieved by mechanizing an appropriate pairing of data structures (the pages)
with processing structures (the DS's). Data and programs
not only are separate structures, but also are treated independently and reside in different storage structures. Nonetheless, the union of program and data is very deep; DS's
and pages are indistinguishable in the PN. The gist of this
approach can be summarized as follows: the conventional
computer model is based on a processor which scans a
passive memory; in this model, a number of dynamic
memories flow through an instantly specializable processor.
driven transition.
As mentioned previously, there are cases in which data
have to be organized in an addressable form. For this
purpose, an addressable memory is provided (the functional
memory in Fig. 1), and the routings R and the input
prescriptions I provide for the communication between
pages and this memory. However, in most cases in which
data are stored outside the page, some simple operation is
III. PHYSICAL IMPLEMENTATION
actually requested, such as accumulating a sequence of data,
The implementation model described in the previous
storing the maximum (or minimum) value in a sequence of
data, or counting the occurrences of a set of values (e.g., to section developed from the need to process radar signals in
produce a distribution). The routing prescription indicates real time in ways that could easily be changed to follow
the requested function, and the memory provides for its developing research. After several special processors based
execution by means of a controllable, simple processor at its on circulating words related to independent processes at the
1019
SCHAFFNER: DATA AND PROGRAM BLOCKS
BUS
RESO URCES
STORAGES
Fig. 2. Typical connectivity configuration in
UData
Storage
Functional
u Resource
a
computer.
Controllable
+ Connection
Fig. 3. The programmable network of the CPL 1 machine.
Radio Meteor Project of the Smithsonian Astrophysical
Observatory, Cambridge, MA, and at the Weather Radar
Project of the Massachusetts Institute of Technology, the
first programmable machine of this type, called CPL 1 [7],
was constructed in about 1969 and put into operation at
the Smithsonian meteor radar station in Havana, IL, in
1970. Then the machine was brought to MIT, where it has
since been used for experiments on real-time characterizations of weather radar echoes. On the basis of these experiences, an analysis of the approach was undertaken [8], and
the construction of a more generalized machine of this type
has started. Presently the machine is at the National Center
for Atmospheric Research, Boulder, CO, for experiments in
real-time processing of radar signals.
Because the main characteristics ofthis type ofprocessing
machine are the page discipline, the automatic circulation of
the pages, and the instant structures ofthe processor, we will
call them Circulating Page and Structure (CPS) machines.
The architecture of a CPS machine directly follows that in
Fig. 1. The organization of the work is as described in
Section II. Most characteristic is the programmable
network, to which most of the following discussion will be
related. In conventional general-purpose machines, data are
moved individually between storage and resources by means
of addresses; Fig. 2 shows a typical configuration for this
traffic. In a CPS machine, there is the basic flow of data
blocks (the pages); and within the page, is the hardware
configuration to change, rather than data to move. The PN
structure of the CPL 1 machine, shown in Fig. 3, will be used
as a concrete example for the discussion.
A page is transferred in parallel from the assembler
register array Qa into the PN register array QN. (In the
assembler of the CPL 1 machine, there are three replicas of
array Ka, not indicated in Fig. 3, cascaded in an FIFO
discipline, to mitigate the effect ofthe differences in the times
for page, quadruplet, and input acquisition and for the
execution in PN.) After the operations in PN, a page is
transferred in parallel into the register array Qp of the
packer. The array Q2N comprises registers A, B, C, and D for
four variables; registers a, b, c, and d for four-input data;
register K for the key; and registers W for holding words w of
the components F and T of -the quadruplet. In the figure,
registers A', B', C', D', and K' of the auxiliary array QN are
also shown. Each register for the variables has its own ALU
(lined boxes in Fig. 3) with one ALU's input multiplexed to
several other registers. The ALU of variable A has connections with all the data in the page; the other variables have a
direct connection with one input register, one auxiliary
register, and one neighbor variable. The connections of this
PN can be considered as constituting the minimum connec--
1020
IEEE TRANSACTIONS ON COMPUTERS, VOL.
c-27,
NO.
11,
NOVEMBER
1978
(a)
(b)
n Data
fn Functional
Controllable
' Resource
Connection
Fig. 4. Types of connectivity for a programmable network.
Storage
PREFIX
5
ROOT
SPECIFIER
I I I I I
Fig. 5. A configuration
SUFFIX
I I
1
of program words in the CPL I machine.
tivity needed to make up a CPS machine. With this PN, the
data available for the work of each page are the four
variables A, B, C, and D; four new input data a, b, c, and d;
the four auxiliary variables A', B', C', and D'; variables of the
previous page in the packer; and variables of the following
page in the assembler.
The selection of the operation in the ALU's and the
activation of the connections to the ALU's (the controllable
connections are indicated by circles in Fig. 3) are determined
by outputs of a unit indicated in Fig. 3 as a logic array. This
unit (a combinational circuit or an ROM) has as many
outputs as there are controllable elements in PN and has as
many inputs as there are bits in the words w, plus some
outputs from a timer. This timer is controlled by the content
of words w and provides the clock pulses to the registers; for
certain operations, the timer also produces sequential
changes of the connections and functions in the ALU's
through the logic array. In short, each word in the registers
W provides an extended horizontal microprogramming,
and the sequence of words in W provides a high-level
vertical microprogramming of PN.
The data transformation that was indicated globally with
the symbol F in Section II here has the form of a sequence of
words in the registers W, which are presented one at a time
to the input of the logic array and timer. The transition
function that has been indicated with the symbol T are other
words in W which make use of the ALU's for testing
relations among the values in the registers, and consequently
transfer new values into register K. Register K' is used for
transmitting a driven transition to other pages.
The choice of the connectivity pattern within PN is a
major design factor of a programmable network. A crossbar
solution, as sketched in Fig. 4(a), enables all the variables to
have complete independence of operation; however, it is
highly redundant, and it is not suitable for standardization
of hardware applicable to PN's of different sizes. The
solution sketched in Fig. 4(b) has a restricted connectivity,
but it is appealing for implementing standard hardware
suitable for PN's with any number of variables. The maximum number of variables that can be stored simultaneously
in PN is another design factor of a CPS machine.
Another interesting point is the efficiency and readability
that can be obtained in the microprogram words for a
programmable network. In the CPL 1 machine, the program
words are 12 bits long, corresponding to the columns of
punch cards, and are structured in different ways, depending
SCHAFFNER: DATA AND PROGRAM BLOCKS
on the classes of operations to which they refer. As an
example, Fig. 5 shows one format for parallel operation. The
field labeled "root" specifies one of six functions; the field
labeled "specifier" indicates the argument for the prescribed
function (that is, the connections among registers to be
activated in PN); the field labeled "prefix" selects details
specific to each function (e.g., in the accumulation, whether
the result should overflow or remain at the maximum
value); and the field labeled "suffix" specifices the variables
(A, B, C, D) that perform the operation. Two unused values
of the root indicate other word structures: one for unary
operations and the other for special functions that involve
several variables. Using this technique, the majority of the
4096 bit combinations of these 12-bit words can be used for
meaningful configurations of the programmable network.
What is more important, the user does not need to look in a
dictionary of 4096 entries; instead he can compose these
words in terms of meaningful small tables that, with a little
practice, can even be memorized. The logical next step is to
have automatic generation of these words from conventional mnemonic expressions. The experience with the
CPL 1 machine suggests that microprogramming can be
made fully available to the user.
In the CPL 1 machine, the page memory has a FIFO
discipline in terms of pages. The circulation of the pages and
the allocation of their words in the registers of the page
arrays Q proceed automatically. A simple supervisor program establishes various modes for the page circulation,
such as asynchronous, synchronized with a radar, or controlled by time signals from a digital clock.
IV. THE PROGRAMMING LANGUAGE
By programming language we mean here the machine
language of the CPS machine. However, because of the
organization of this machine and its working level, this
machine language also constitutes a suitable user language
for certain classes of processes, such as those described in
Section V. There is also a more general point: this machine
language was developed for expressing varieties of recognition strategies which could be executed in real time, and it
can be developed directly by those who are conceiving the
experiments. Undoubtedly, these circumstances have
resulted in effectiveness and conciseness of expression in this
context. In this sense, the language might constitute an
interesting implementation model for features of programming languages in a broader context.
This machine language is modular in terms of DS's; every
program is a set of DS's related to each other by transition
functions or by data transfers. The external structure of a set
of DS's related by transition functions is operatively similar
to that of the finite-state machines defined in automata
theory; for this reason, we will refer to the DS's also as states
and we will view the programs as state diagrams whenever it
is useful to do so. Because of this functional similarity, some
of these external structures can have uses similar to those of
the finite-state machines, such as for recognition of patterns
1021
or memorization of past events. Abundant use of this
property has been made in the applications reported in
Section V.
In contrast to the hierarchical abstract machines defined
in automata theory, these state structures have a text within
the states-the quadruplet-which permits us to change the
characteristics of the machine at each state; for instance, at
each state, I and R select subsets of the input and output
alphabets. Moreover, a set of data (the page), possibly
renewed at each discrete time, is associated with the states;
that is, these machines have memory. The text in each state
may have a description of data transformation; thus, computation can be provided when needed. As a matter of fact,
when only straight computation is required, a program may
consist of a single state with an F made up of a long sequence
of subexpressions similar to a conventional listing. In effect,
if primarily the functional memory is used, a CPS machine
can simulate the activity of a conventional computer. On the
other hand, if the page memory is considered as a closedloop tape system, a CPS machine can be reduced to a
multitape or multicell (and consequently, a linear tape)
Turing machine, making the usual assumption that additional memory is provided whenever needed, e.g., [9].
From the above we see that in this programming language
characteristics of abstract automata and of conventional
programming languages are equally available and embedded in each other. Probably, the possibility of using these
different frames simultaneously for modeling the processes
is an important reason for the peculiar effectiveness of this
language that has been experienced in several classes of
problems.
The state structure mentioned above, the modularity
given by the quadruplets, and the several years of experiments with the CPL 1 machine have suggested the representation of these programs in a particular form of state
diagram. In the present context, the state diagrams are the
working representation for the user, and they also assume
the role of reference representation. The actual coding for
the machine is done as the last operation, and because of the
typical one-to-one correspondence between elements of the
state diagrams and machine codes, it does not constitute a
significant phase. Given this context there is much less
lexical constraint in the state diagrams than in conventional
programs; indeed, the modeling of a process should be
developed with notation-independent thinking. The chosen
graphical structure of the state diagram provides much of
the syntax.
In order to interpret the example given below, some
notations used to make up these state diagrams are
described (a full description can be found in [8]). States are
represented as encircled domains (see Fig. 8). The data
transformations F and the transition functions T are indicated inside these domains, above and below a horizontal
line, respectively. The outcomes of the transition functions
are indicated by arrows leading to the states.
For the data transformations F, self-explanatory nota-
1022
IEEE TRANSACTIONS ON COMPUTERS, VOL.
Pg (k)
generation of a new page in state k
ST k
driven transition to state k for the
following pages
ix
c-27,
NO.
11,
NOVEMBER
1978
stopover transition; the path followed
by the page is predetermined (regardless
of the transition functions in the states
visited)
the page circulates n times idle, before
entering state k
(the dot indicates the path chosen when
the test is true)
the page stays in state k for n times,
before following the transition
function of state k
~~I
the succession of states (or cycles in
the same state) occurs without the page
leaving PN
the page disappears
Fig. 6. Notations used for the state diagrams.
tions are used: capital letters A, B, C .. . indicate variables in
the page; the same letters with a prime indicate the corresponding variables in the auxiliary storage K'N; capital
letters M, N, 0 ... indicate variables in the functional
memory. Operations are executed in the sequence indicated
by the succession of the expressions. Parallel execution is
indicated with a single infix expression comprising several
variables and several arguments in corresponding order
(e.g., if indexes i and i + 1 denote the values before and after
the execution of an operation, respectively, the expression
ABC +- sAB means Ai+ I- s, BiI++- Ai, Ci+1 - Ba).
The input prescriptions I comprise numerical constants
and identifiers; the actual values of the identified quantities
are those that were available from the source at the time in
which the page was in the assembler (e.g., the present radar
sample s in the above example of parallel expression). But
there is no need to spell out input prescriptions I in the state
diagrams; the appearance of input data in the expressions of
F and T is sufficient for documenting a program.
Routings are indicated beside the encircled domains,
when they are state-dependent (i.e., the routing is executed
every time a page is in that state). Routings are indicated
beside an arrow, when they are transition-dependent (i.e.,
they are executed only if the transition function selects the
path indicated by that arrow). These output features correspond to those which in automata theory are assigned to the
Moore and Mealy machines, respectively. When a routing
has a name of a variable without further specification,
the present value of that variable is routed to an output
buffer which performs with a FIFO discipline. When routing
is to the functional memory, the name of the variable in
the page is followed by an arrow and by the name of
the variable in the functional memory, with the function
indicated in parentheses. Some specific notations and graphical means used in the example are described in Fig. 6.
azimuth
z
Fig. 7. Area selected for a real-time processing of radar signals.
The example to be described is a real-time processing of
radar signals, from the applications to be discussed in
Section V. This example is chosen because, while simple to
follow, it gives an opportunity for seeing how the efficiency
of the special-purpose processor and the generality of the
general-purpose computer are achieved simultaneously.
The environment is depicted in Fig. 7; range r and azimuth z
are considered at discrete, equidistant values, i and j,
respectively, to form a grid of i, j points. For each of these
points, an estimate of the echo intensity xi,j is computed as
the mean of consecutive digital samples s (weather echoes
are highly fluctuating) obtained from the radar output
during the antenna rotation of onej unit (say 32 consecutive
pulses per one j increment):
1
xi,j= 32
j32
t
(- 1)32
Si.
(1)
We want to characterize in a single antenna passage the
precipitation in a specific region, such as the sector a in Fig.
7, delimited by ranges R1 and R2 and azimuths Z1 and Z2The characterization considered here consists of three
1023
SCHAFFNER: DATA AND PROGRAM BLOCKS
Control Poge
r-
RI
I
Computation Pages
1
Fig. 8. State diagram of a process for real-time characterization of
weather echoes.
global parameters: the area A covered by echoes of intensity Initialization Phase
above a given value h, the mean echo intensity M in that
Before the radar beam crosses the region of interest (the
area, and the mean gradient G in the echo pattern of that lined area in Fig. 7), the operator activates a page in state 1,
area. These three parameters are approximated with the locked to range
RI; that is, the page transfers from the
following algorithms:
assembler to PN when r has the value R1, so that this page
will always correspond to range R1. In this way, the
if xi,j > h
processing machine is synchronized with the radar;
ni,j=0 Iotherwise
the pages make one circulation per radar period, with
the first page working precisely when the echo from range R I
A =E E ni,j
j i
has arrived. In state 1, this page simply acquires the present
value z ofthe radar azimuth and checks whether itis equal to
Z1. When this is the case (which means we are at point Po in
M= AE ni,jxi,j
j i
Fig. 7-the page is at azimuth Z1 and range R l the page
transfers to state 3, and routes a key for a page in state 2. This
page produces a sequence of pages in state 8, filling the range
G=
+ |xi,j-xi,j-11]
E
1|xi,j-Xi-1'j1
A
interval R1 - R2, and then disappears.
At each circulation, the consecutive pages in state 8, each
with the differences computed only if either of the two x is
at
a specific range, accumulate the samples s of the radar
> h.
in variable A. After 32 idle circulations, the control
echo
There is no preliminary storage of data in the memory of
the computer. Rather, the memory is structured in the form page is in state 3 and produces a double driven transition, so
of pages, according to the arriving data, so that the proces- that the pages in state 8 stop over to state 9 and then return
sing can be accomplished on the spot, with a minimum of to state 8. In this way, a first mean value is computed and
resources and time. In this program, two types of SDM's can stored in variable B ofthese pages, which will have the role of
be delineated, as in Fig. 8: one, comprising states 1-7, the xi,j_ in expression (2), and variable A starts a new
implemented by one page, which has the role of general accumulation of samples s. At the same time, the control
control, and the other, comprising states 8-11, implemented page starts an accumulation of samples s in variable A and
by a number of pages performing the computation at each clears variables M, N, and Q in the functional memory.
point. The functional memory is also used. The input data
used by this process are digital echo samples s, current Continuous Computation Phase
This phase lasts from azimuth Z 1 to azimuth Z2, and the
digital values of the azimuth z and range r, and constants
such as Z1, Z2, R1, R2, and h. Forthesake ofdescription, we pages continuously repeat a cycle of computation 32 circulations long. The control page stays in state 4 for 31 circuladivide the process into the three following phases.
I
it
1024
tions and in state 5 for one circulation, while the
computation pages follow a pattern through states 8, 9, 10,
and 11. The computation simultaneously comprises the
preparation of the next values of xi,j in variables A, the
holding of the most recent values xij in variables B,
the holding of the x, j- I in variables C for as long as is
necessary, and the computation of the differences among x
that appear in expression (2). The following description of
the states involved explains the mechanization adopted.
State 4: Variable A of the control page accumulates the
successive samples s from the range associated with the page.
The purpose of this computation in the control page is to
provide the xi- 1,j for the first of the following computation
pages.
State 5: The accumulation performed in state 4 is divided
by 32, thus producing a mean-echo value x. Then there is a
triple transfer of data: the computed x is transferred from A
to C' (in the auxiliary storage); A and B are given the present
sample s and the current azimuth z, respectively. In this way,
variable A can start a new accumulation, and the test on B
performed by the transition function of this state can
determine when the region of interest (the lined area in Fig.
7) has passed. Moreover, a driven transition to state 9 is
produced for the following pages, in order to synchronize
each cycle of computation for the entire page set.
State 8: When in state 8, all computation pages simply
accumulate the current echo sample s into variable A.
State 9: The content of A is divided by 32 to obtain the x
values. A triple transfer occurs: the previous x j in B, which
has now become xijF-, is transferred into C; the x just
computed in A, which now becomes xi,j, is transferred into
B; and A is initialized with the present sample, so as to be
ready for a new accumulation. At this point, the transition
function acts: if B (value xi,j) is larger than h, the page
transfers to state 11, if not, but C (value xij- 1) is larger than
h, the pages stops over in state 11 and then goes to state 10;
otherwise, the page stops over in state 8 (for timing reasons)
and goes to state 10. Only in the case of the first of the above
transitions (case of xij > h) is a routing made: the value of
B (xi,j) is accumulated into variable M of the functional
memory, and variable N of the functional memory is
incremented by one. In this way, M is accumulating the x for
the computation of M, and N is computing A.
State 10:* First, the present sample is added to variable A,
in order not to interrupt the accumulation for the new x.
Second, variable C copies the value in variable B (which is
x,,j). Third, C interchanges values with C' in the auxiliary
storage; because all pages make this interchange in the same
circulation, a shift of content among the consecutive pages
occurs in C. That is, in each page, C comes to contain xi j.
Finally, if C is > h, the page stops over in state 11 and then
goes to state 8; otherwise it goes directly to state 8.
State 11: This state provides for the computation of the
absolute value of the differences between two x; and it is
reached only when either of the two x involved is greater
than h, as can be determined from the transition functions of
states 9 and 10. First the present sample s is added to A,
again so as not to interrupt the accumulation; then the
IEEE TRANSACTIONS ON COMPUTERS, VOL.
c-27, NO. 11, NOVEMBER 1978
difference is computed; and finally, by routing, the value of
this difference is accumulated into variable Q -of the functional memory. The transition function of this state (which
acts only when the state is reached not in stopover) is an
unconditional first stopover to state 10, a second stopover to
state 11, and a final transfer to state 8. It can easily be
checked that when state 11 is reached from state 9, the
computed difference is I ij -- j I; when it is reached
from state 10, the difference is xi Xi- 1j. The simple
feature of the stopover transition makes this real-time
computation be executed fast and with little means.
Output Phase
When the current azimuth z has surpassed the value Z2,
the page control from state 5 transfers to state 6 instead of
state 4. In state 6, the variables of the page acquire the four
delimiting coordinates and route them to the output. Then
the control page remains idle for four circulations (to allow
the other pages to finish their computations) and goes to
state 7. In state 7, the present content of variables M, N, and
Q in the functional memory is acquired by variables A, B,
and C, respectively, of the page. The content of B and C is
divided by the content in A, and then routed to the output.
These routed values are the covered area A, the mean
intensity M, and the mean gradient G of expressions (2).
Finally, a record of the data routed to the output is commanded (to a tape recorder or printer), a driven transition to
state 0 (disappearing) is produced for the other pages, and
the control page itself disappears. Each record automatically contains, as heading data, the lapsed time, the date,
and program numbers. A variant of states 6 and 7 transfers
the control page again to state 1, for a preset number of radar
antenna rotations or for a given time interval, in order to
produce a longer record containing the time evolution of
parameters A, M, and G, computed in the lined area of Fig. 7,
at every antenna rotation.
This simple program also shows how real-time processing
can be set directly by the operator for a given environment.
After it has been set, the activity proceeds automatically,
depositing the results in the output device and causing the
memory structure created for the process to disappear, thus
leaving the machine clean for other processing, without the
need for garbage collection.
V. APPLICATIONS
As indicated previously, this approach developed from
the need for real-time processing of radar signals, and it is in
this context that it has been applied. In [10] there is a
description of a program for recognizing the echoes of faint
meteors (which have an intensity in the range of the noise)
and recording their rates, by echo energy and duration. In
[10] there is also a first comparison with corresponding
programs written in different languages. In [11] and [12] an
account is given about real-time characterizations made of
weather echo patterns.
When conventional general-purpose computers are used
for real-time applications of this type, programs have to be
prepared in machine language, thus losing the facilities
SCHAFFNER: DATA AND PROGRAM BLOCKS
offered by the high level languages. Another difficulty is that
standard computers cannot manage the large quantity of
data produced by the radar in real time, so that special
preprocessing units need to be added. The result is that the
generality inherent in the programming language and the
computational capability inherent in the computer cannot,
in practice, be put to full use. It is precisely in this context
that the approach described gives the greatest benefit. In
regard to the first point, the machine language of the CPS
system appears, for this type of. application, to be at a
comparable level to that of high-level languages, and it
certainly has more flexibility. In regard to the second point,
the dynamic evolution of the pages and the level of activity
performed by a page in each circulation solve, in a natural
way, the problems of high-speed data handling.
Because of these facilities, the feasibility of real-time
processing is extended. In the case of processing radar
signals, computations can be made on the entire set of raw
data, thus using the full resolution and information given by
the radar. As an example, a family of processes, one program
of which is described in Section IV, has made it possible to
ascertain the strong correlation that typically exists between
the mean gradient in a weather echo pattern and the type of
precipitation (such as snow, stratiform rain, or convective
showers) associated with the echo pattern [12]. Also, it has
been possible to experiment with a variety of algorithms for
real-time discrimination of ground echoes, based either on
the morphological characteristics of the echo pattern or on
the characteristics of the pulse-to-pulse fluctuation [11],
[12].
One of the most helpful features of this system in signal
processing experiments is the possibility of observing the
several variables during the execution of the processing
itself. The packer, as indicated in Fig. 1, has an output
toward the memory to which all the words of all pages are
transmitted sequentially; the packer also has synchronization signals suitable for selecting specific words in a specific
page or over an entire sequence of pages. Therefore, a simple
display permits one to observe not only the evolution of the
pages, but also the values (either in digital or analog form) of
any quantity involved, during the actual execution of an
experiment. Such a facility is also of great value in debugging
complex processes, a fact that in turn makes it possible to
experiment with a large variety of processing techniques,
within a given amount of effort.
Exploratory experiments have also been conducted with
various other classes of processes. The structure of a CPS
machine has similarities with that of digital differential
analyzers [13], and therefore the programming and execution are efficient in those classes of problems in which
differential analyzers have been used. Taking advantage of
the existence of several variables in the page and of the direct
communication between adjacent pages, experiments have
been performed on numerical models, by associating a page
with each grid point. The flexibility ofthe page structure has
also suggested experiments in sorting algorithms. Perhaps
the most interesting exercises have been in implementing
recursive functions taking advantage of the facts that pages
1025
can be created locally and can exchange data and that the
key gives each page an independent status.
VI. DIscuSSION
As pointed out in Section II-C, in a CPS machine,
processing is page-driven. This associates the CPS system
with data-flow computers and languages. A bibliography on
the works in this field can be found in [2]. A most active
group has been the one directed by Jack Dennis at the
Laboratory for Computer Science of MIT. The differences
and similarities between the system described in this paper
and Dennis' work can be traced from the different starting
objectives.
Dennis' group studies a general procedure for automatically achieving highly parallel execution of programs.
Machines are being designed that execute instructions at the
arrival of their operands, so that parallelism could be
exploited on a global basis, regardless of the original
structure of the programs. The needed machine representation of programs, basically a directed graph at the operand
level, would be generated by translation programs from the
application programs written in conventional textual languages [14]. In this approach, some overhead is expected, as
well as some constraints in consequence of the automatic
translation.
In the system described here, as mentioned in Section III,
the objective has been to make the machine executing the
processes in a form as close as possible to the process
representation conceived by the user, in order to facilitate
the development of new processes and the real-time interaction between user and machine. In the context in question
there is a facility for thinking in terms of paths followed by
the data of each task [8]; correspondingly, it turned out that
this machine is data-driven or, more exactly, page-driven. In
this approach, the machine needs a high degree of flexibility
(Fig. 1 shows the solution presented), and a clear visualization of the processes is required on the part of the
programmer.
In both approaches the machine language programs are a
direct encoding of programs expressed in an intermediate
language; these languages have differences. In the data-flow
machines, data are funneled into the processing units
asynchronously and relatively independently of the
processes' structure; there is no notion of sequential
control. In the CPS machine, data transformations are
executed on working data sets (the pages), in the sequence
established during the modeling of the processes. The
configuration of the network of processors (PN) and the
organization of the pages adapt themselves to the processes.
The management of the working data sets (the pages) is
automatically established when the state diagrams are
delineated. This paper does not address the subject of
generating these state diagrams from programs written in
textual languages.
In the following, some preliminary comments and elaborations are made on certain aspects of the CPS machine
which have relevance to multiprogramming and multiprocessing systems in general. The issues are only mentioned
1026
here, and a more detailed discussion is postponed to the time
when results from the new machine will become available.
The characteristic that is most readily apparent in a CPS
system is its suitability for multiprogramming. In effect, the
work of the machine itself is a continuous multiprogramming through the implementation of numbers of
SDM's. Each change of page in PN is equivalent to an
interrupt with automatic preservation of the status of the
processes involved. Because programs are composed of
self-sufficient modules (the quadruplets), the complete
status of a process is rendered by a single word the key. It
appears that a key with an extensible format would be
sufficient for carrying all auxiliary information that may be
attached to the key in a complex system. This is considerably
less than the 25-100 words used by present computers to
hold the process status [15]. The basic reason for such an
efficient representation of the process status is that the page
discipline eliminates specific memory references within the
computation described by a quadruplet. Further contribution to this efficiency is due to the pipeline structure of the
assembler and the packer, which provide independently for
input and output under information within the quadruplet.
Moreover, the key is part of the page. Therefore, increasing the number of simultaneous programs does not increase
the complexity of the machine; the number of programs is
limited only by the capacities of the page and program
storage. A natural extension of the machine described in this
paper is the partition of the page memory into several
segments which circulate at different times, under controls
routed by pages working with a supervisory role. However,
again because of the modularity of the programs, this
extension does not increase the complexity of the process
status. In summary, given the almost nonexistent overhead
of program switching, it appears that aspects of the CPS
approach are of interest for time-sharing systems.
Two forms of parallelism can be seen in the CPS system.
The first is the straight parallel execution of simultaneous
and independent operations on different variables in PN.
The number of these variables is small, and this form of
parallelism matches quite well the small parallelism that is
present in almost all computations. Of interest is the fact
that the same hardware can be used in different
configurations for specialized functions when there is no
parallelism at all in the computation. No less interesting is
the fact that no specific detection of parallelism is needed;
when a computation is being modeled in the form of an
SDM, the small parallelism ofthis kind becomes evident in a
very natural way.
The second form of parallelism is given by the multiplicity
of pages. In effect, the pages can be viewed as virtual replicas
of PN. A sequence of pages can perform the same or different
processes, depending on the keys they contain; a page
sequence can be any size within the capacity of the memory;
and the number of pages can change dynamically without
introducing overhead. From an external viewpoint, this
form of processing is equivalent either to sequential proces-
IEEE TRANSACTIONS ON COMPUTERS, VOL.
c-27, NO. 1l,
NOVEMBER
1978
sing or to parallel processing, depending on the speed of
execution in respect to the rate of data input and output
needed by the environment. Because of the fast execution
permitted by the PN and the direct acquisition ofnew inputs
and production of output at each cycle, we expect that the
processing will be equivalent to parallel processing in very
demanding situations also. For instance, the set of pages in
the right part of Fig. 8 perform their operations virtually
simultaneously in regard to the radar, the operator, and the
program.
It has been noted that special-algorithm processors can
outperform a stored program by a factor of 10-100 [15]. This
is because there are fewer memory references and fewer
separate instructions to be executed. However, although
present echnology makes this approach practical,.no general
applications have yet materialized. The programmable
network of the CPS machine is suitable for various degrees
of hardware specialization, especially with the inclusion of
look-up tables made convenient by present ROM's; and,
more important, the page discipline (making a number of
operands available simultaneously) permits an efficient
insertion of special-algorithm executions at any point in the
programs. These facts suggest the following consideration.
The time T needed for the execution of a program can be
expressed, in a first approximation, as the product of the
number N of instructions to be executed and the mean
execution time tm of these instructions:
T= N x tm.
Great attention has always been given to reducing ti. It
appears that equal attention has not been given to reducing
N. The programmable network and the page discipline of
the CPS system are effective in reducing both N and tm.
Comparing certain classes of programs run in the CPL 1
machine with equivalent programs for conventional computers, the execution time is found to be reduced by from one
to two orders of magnitude. A similar reduction is found in
the number of machine-program bytes, and this reduction is
also due to the concise modeling of a process in the SDM
form.
To systematize the various computer -architectures, a
classification in terms of instruction and data streams has
been introduced [16], so that problems such as contention of
resources, synchronization, task splitting, and scheduling
can be put in perspective [17], [3]. From this viewpoint, we
see that the CPS machine shares some characteristics of the
single-instruction-single-data (SISD) architectures and of
the multiple-instruction-multiple-data (MIMD) architectures. In the sense that there are single streams of pages from
the memory, of quadruplets from the program storage, and
of input data from the environment, the machine enjoys the
automatic synchronization, the absence of contentions, and
the simplicity of programming of the SISD machines. In the
sense that each page contains numerous data, that PN is a
number of processors which can also perform independent
parallel tasks, and that the page organization can change
1027
SCHAFFNER: DATA AND PROGRAM BLOCKS
during processing, the machine has capabilities of the
MIMD architectures. This double aspect of the CPS machine has been made possible by the page discipline and the
modularity of DS's. That is, the primitives here are no longer
the instruction and the operand, but rather the instant
processor (the DS) and a related data block (the page). A
further point of interest is that task splitting becomes a
natural feature of the program design when a process is
modeled in the form of interacting dynamically generated
SDM's.
The CPS system permits recursion locally, whenever it is
needed. Every page can generate another page (as
exemplified in state 1 of Fig. 8) and the new page can be in
the same state as the generating page; in turn, each newly
generated page can generate a further page, in response to
specific conditions. Data can be transferred directly between
adjacent pages, and through Q'N between distant pages. The
capability of generating new pages locally can be used for
several purposes; an obvious one is to hold a process
needing a specific computation or the outcome of a recognition, and to generate pages locally for the execution of those
computations or recognitions. A different application that
deserves to be explored is the generation of new pages for the
local interpretation and execution of transformations F
expressed in a higher level language.
Programs derived from conventional programming languages show a tendency to use small subsets of code and
data for significant periods of time. This fact led to the
development of paging techniques, by which it is possible to
execute, even with a small main memory, several programs
which were each written for a much larger virtual memory
[18]. Fundamental for these techniques are the models of
program behavior [19] and the algorithms for optimal
replacement of the pages in the main memory. It has been
argued [20] that locality is affected by the design and
structure of the programs. In particular, locality would be
enhanced by a modularity of the programs, especially with a
maximum functional autonomy given to the modules. In
this respect, it can be noted that the programs in the system
described in this paper are modular in terms of DS's, which
typically correspond to fairly autonomous small processes.
Moreover, the related data sets (the pages) are, so to speak,
automatically constructed and managed by the processes
themselves, as described in Section II.
Many other considerations of various natures are
triggered by characteristics of the programming language of
this machine, but they are beyond the scope of this paper.VII. SUMMARY
A signal processing system has been presented that permits the simultaneous attainment of the efficiency of specialpurpose processors and the total applicability of
general-purpose computers, characteristics normally
thought of as being mutually exclusive. Basically, the
approach consists of specializing the machine by programming the hardware structure, rather than by adding software
systems to it. To implement such an approach, data are
organized in a particular page discipline, resulting in a
multiplicity of data blocks which constitute independent
and dynamic individual memories for the processes. Computation is performed in a programmable network of
processors in terms of program blocks, each of which can be
viewed as the description of a special-purpose machine. The
control is given to data rather than to program.
Because of the flexibility of this system, the machine
programming language exhibits an interesting high-level
structure and it can be applied directly as a suitable user
language for certain classes of problems. The feasibility of a
significant correspondence between the user's modeling of a
process and the actual machine execution is shown. The
structure of the language used in this context may be of
interest as an implementation model for programming
languages of a larger scope.
The use of a programmable network of processors and of
a functional memory constitute examples of solutions of
known problems in multiprocessor systems. The programming of the processor network gives an example of extensive
horizontal and vertical microprogramming. The organization of the data and the modularity of programs used in this
system permit effective program switching, which may be
of interest for multiprogram systems. Results are shown in
real-time processing of radar signals.
ACKNOWLEDGMENT
I wish to thank Prof. J. Allen for the discussions on several
topics and the suggestions given in regard to this paper.
REFERENCES
[1] J. Allen, "Computer architecture for signal processing," Proc. IEEE,
vol. 63, pp. 624-633, Apr. 1975.
[2] "Workshop on Data Flow Computer and Program Organization,"
D. P. Misunas, Ed., Comput. Archit. News, ACM SIGARCH, vol. 6,
no. 4, Oct. 1977.
[3] "Special Issue on Operating Systems," Computer, vol. 9, no. 10, Oct.
1976.
[4] J. L. Baer, "Multiprocessing systems," IEEE Trans. Comput., vol.
C-25, pp. 1271-1277, Dec. 1976.
[5] D. H. Lawrie, T. Layman, D. Baer, and J. M. Randal, "Glypnir-A
programming language for Iliac IV," Commun. Assoc. Comput. Mach.,
vol. 18, pp. 157-164, Mar. 1975.
[6] P. Wegner, "Data structure models for programming languages," in
Proc. Symp. on Data Structures in Programming Languages, SIGPLAN Notices, vol. 6, no. 2, pp. 1-54, Feb. 1971.
[7] M. R. Schaffner, "A computer modeled after an automaton," in Computers and Automata. Brooklyn, NY: Polytechnic Press, 1971, pp.
635-650.
, "Research study of a self-organizing computer," Final Rep.,
[8]
Contr. NASW-2276 (NASA), July 1974.
[9] M. A. Arbib, Theories of Abstract Automata. Englewood Cliffs, NJ:
Prentice-Hall, 1969.
[10] M. R. Schaffner, "Computers formed by the problems rather than
problems deformed by the computers," COMCON Dig., pp. 259-264,
1972.
[11]
"Comments on 'Applications of radar to meteorological operations and research'," Proc. IEEE, vol. 63, pp. 731-733, Apr. 1975.
, "On the characterization of weather radar echoes," in Prepr.
[12]
17th Radar Meteorology Conf., Amer. Meteorol. Soc., Boston, MA,
1976, pp. 474-485.
[13] T. R. H. Sizer, The Digital Differential Analyzer. London: Chapman
and Hall, 1968.
[14] J. B. Dennis, D. P. Misunas, and C. K. Keung, "A highly parallel
1028
[15]
[16]
[17]
[18]
[19]
[201
IEEE TRANSACTIONS ON COMPUTERS, VOL.
processor using a data flow machine language," Computation Structures Group Memo 134, Laboratory for Computer Science, M.I.T.,
Cambridge, MA, Jan. 1977.
C. G. Bell and A. Newell, Computer Structures: Readings and
Examples. New York: McGraw-Hill, 1971.
M. J. Flynn, "Very high-speed computing systems," Proc. IEEE, voL
54, pp. 1901-1909, Dec. 1966.
-, "Some computer organizations and their effectiveness," IEEE
Trans. Comput., vol. C-21, pp. 948-960, Sept. 1972.
C. A. R. Hoare and R. M. Mckeag, "A survey of store management
techniques," in Operating System Techniques, C. A. R. Hoare and R.
H. Perrot, Eds. London: Academic, 1972, pp. 117-151.
P. J. Denning, "The working set model for program behavior,"
Commun. Assoc. Comp. Mach., vol. 11, pp. 323-333, May 1968.
P. J. Courtois and H. Vantiborgh, "A decomposable model of program paging behaviour," Acta Informatica, vol. 6, pp. 251-275, 1976.
r
c-27,
NO.
11,
NOVEMBER
1978
~~~~~Mario R. Schaffner (A'69-M'72) received the Dr.
degree in electrical engineering from the University of Pisa, Pisa, Italy, in 1948.
From 1948 to 1961, he has worked at the
Microwave Center of the ItaLian National Research CounciL at the Magneti Marelli Company,
_and FACE
Standard Company. From 1957 to
1960, he has also taught radar engineering at the
Italian Air Force Academy. In 1961, he came to
the United States with a.NATO fellowship, and
has worked at the Massachusetts Institute of
Technology, Harvard College Observatory, and Smithsonian Astrophysical Observatory. He also served as a consultant for Raytheon and
IBM. His major activity has been in the development of processing
systems for research projects. He is presently at the Advanced Study
Program of the National Center for Atmospheric Research, Boulder, CO.
A Method to Simplify a Boolean Function
into a Near Minimal Sum-of-Products for
Programmable Logic Arrays
ZOSIMO AREVALO,
MEMBER, IEEE, AND
JON G. BREDESON,
SENIOR MEMBER, IEEE
Abstract-This paper describes an algorithm for minimizing an mination of all the prime implicants (PI's) and 2) the
arbitrary Boolean function. The approach differs from most previous selection of those PI's which minimally cover the given
procedures in which first all prime implicants are found and then a
minimal set is then determined. This procedure imposes a set of function [1], [2]. Several papers have treated the problem as
conditions on the selection of the next prime implicant in order to a whole [3]-[5] and this paper also handles the problem in a
obtain a near minimal sum-of-products realization. Extension to the similar way. Nevertheless, it differs from previous work in
multiple output and incompletely specified function cases is given. An the selection processes for sequentially finding the PI's until
important characteristic of the proposed procedure is the relatively a covering is obtained. In the present paper both the decimal
small amount of computer time spent to solve a problem, as
compared to other procedures. The MINI algorithm may give better and the logic representation of the minterms are used for
results for a large number of inputs and outputs if relatively few convenience and it should be easy to see which is being used.
A high level of interest in finding near minimal two-level
product terms are needed. This procedure is also well suited to find a
solution for programmable logic arrays (PLA's) which internally networks for large functions has occurred since the introimplement large Boolean functions as a sum-of-products.
duction of programmable logic arrays (PLA's). PLA's are
Index Terms-Large-scale functions, multiple output combina- LSI circuits that are mask programmable (some field protional circuits, near minimal sum-of-products, programmable logic grammable also exist) circuits from the PLA supplier. PLA's
array (PLA's).
implement multiple output combinational circuits in a
sum-of-products form. National Semiconductors' DM 7575
INTRODUCTION
IT IS A well known fact that the problem of minimization (also DM 8575, DM 7576 and DM 8576) PLA can realize
of Boolean switching functions using a two-level AND/OR, any Boolean function with up to 14 input variables and 8
network, has been divided into two parts: 1) the deter- output functions that require no more than 96 product
terms. The actual sum-of-products form which determines
custom mask must be specified by the customer. Most
the
Manuscript received August 19, 1976; revised October 31, 1977 and
PLA's can also complement any or all outputs which
March 30, 1978.
Z. Arevalo is with the Department of Electronic Engineering, Universi- effectively means the product of sums form can be impledad Distrital, Bogota, Columbia, S.A.
J. G. Bredeson is with the Department of Electrical Engineering, The mented. Therefore, the cost for a PLA will not be a function
Pennsylvania State University, University Park, PA 16802.
of gate inputs, but a fixed cost if no more than 96 product
0018-9340/78/1100-1028$00.75 C) 1978 IEEE