Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1015 IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 11, NOVEMBER 1978 Processing by Data and Program Blocks MARIO R. SCHAFFNER, Abstract-A processing system is presented that implements simultaneously the efficiency of the special-purpose processor and the total applicability of the general-purpose computer-characteristics commonly thought of as being mutually exclusive. This is achieved through specializing the machine by programming the hardware structure, rather than by adding software systems to it. Data are organized in circulating pages which form a multiplicity of local dynamic memories for each process. Programs are made up of modules, each describing a transient special-purpose machine. A characteristic of this approach is that the processes are data-driven, rather than program-driven. The programming language presents significant flexibility and efficiency in modeling certain classes of problems, and it may be of interest as an implementation model in a broader context. Applications to real-time processing of radar signals are reported. The relevance ofcharacteristics ofthis system to problems in multiprogramming and multiprocessing systems is discussed. Index Terms-Computer architecture, data-driven processing, implementation models, microprogramming, multiprocessors, multiprogramming, paging systems, radar signal processing, realtime signal processing, structural programming language. I. INTRODUCTION T HE CONTEXT of this paper is real-time digital processing of large quantities of data, for both research and operational applications. In this context, typical approach is to use special-purpose digital processors that are designed to optimize the needed performance in the given environment. The drawback of this approach, however, is that new equipment has to be designed and procured every time that there is a change in processing. Today computer availability indicates that a more appropriate solution is to use general-purpose computers, with specialization obtained through software. However, many interesting processes cannot be achieved in real time by means of affordable general-purpose computers, a fact that makes the expectation of generality deceptive. Moreover, the development of software systems sometimes constitutes a large task in itself. Also, processing is frequently in the class of pattern recognition, which typically requires a large degree of parallelism and which often necessitates a variety of strategies that are difficult to formulate in a single programming language. Often, coding in machine language becomes necessary, with consequent loss of the benefits of the high-level user languages. One direction of investigation Manuscript received February 20, 1977; revised March 27, 1978. This work was supported by the National Aeronautics and Space Administration under Contracts NASr-158, NSR-09-015-033, and NASW-2276. M. R. Schaffner was with the Department of Meteorology, Massachusetts Institute of Technology, Cambridge, MA. He is now with the National Center for Atmospheric Research, Boulder, CO 80307. MEMBER, IEEE is to find computer architectures for general-purpose machines by which the tasks of digital signal processing can be handled efficiently [1]. We have investigated the question of whether the efficiency of the special-purpose processor and the generality of the general-purpose computer are necessarily mutually exclusive. In the context specified above, we present a solution to this problem of conflicting goals: specializing the machine by programming the hardware structure, rather than by adding software systems to it. In -this solution, it turns out that in the context of the classes of problems referred to in this paper, the effort of programming becomes significantly simplified. The key to implementing such an approach is to define a language that permits both 1) an effective and efficient representation of the processes in terms of computational structures and data structures, and 2) a direct implementation of these structures by means of suitable hardware. Section II describes the frame common to the language and to the hardware. Section III gives more details on the hardware, and Section IV gives more details on the language, with an example of actual programming. Section V reports about applications and about experiments which have been conducted. This processing solution belongs to the class of datadriven machines [2]; a brief discussion on the subject is in Section VI. Some aspects of the solution described in Section II are also relevant to the problems of multiprogramming and multiprocessing systems. Most modern computers operate with some degree of multiprogramming, ranging from simultaneously handling different parts of a large program, to simultaneously serving numerous users (time sharing). A design criterion in multiprogramming is the balance between the advantages of running other programs when a program must wait for data or resources and the overhead introduced by switching these programs. An overview of these topics can be found in [3]. Varying degrees of multiprocessing are also present in most modern computers, ranging from the simple simultaneous execution of the different activities of a process (e.g., computation, I/O, data handling) to the actual simultaneous computations performed by the large parallel or array computers. A recent brief survey of multiprocessing systems can be found in [4]. The main problem in multiprocessing is programming for an efficient use of the resources available. The scheduling of the different processors adds complexity to the operating system and the application software, and even necessitates special features in the programming languages, e.g., [5]. Aspects of the solution presented that are relevant to these topics are also discussed in Section VI. 0018-9340/78/1100-1015$00.75 (© 1978 IEEE 1016 IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 11, NOVEMBER 1978 Fig. 1. Frame for the modeling of processes. II. THE IMPLEMENTATION MODEL A. The Page Discipline A particular discipline, here referred to as page discipline, is assumed for the data. Basically, this discipline consists of keeping the working data sets, i.e., the data presently in use by each process or portion of a large process, grouped together in (small) pages which move as units in the system. Thus, the pages constitute a number of local, dynamic memories, one for each process or portion of a process. No formal definition of these pages is given, but a constructive procedure is described in the following. To help visualize these pages, the reader can refer to the frame in Fig. 1, in which pages move as data blocks from one register array to another. In one station, the assembler, the pages can acquire new data; in the programmable network, the pages transform their data; in another station, the packer, they can route data; in the memory, the pages rest. A random-access discipline, which obviously is needed in certain cases, is also employed in the system, as described in Section II-E. But it has, so to speak, an auxiliary role; it is used only when random access is an inherent characteristic of the precess itself or when it constitutes the simplest way to handle particular data. B. Description of Structure If the page discipline is used, it is no longer necessary to decompose a process into sequences of instructions (instruction = opcode + addresses). It is more convenient to view a process as a (minimal) set of global transformations for the page; global transformation here means what can be done at the present time with the data presently in the page. For instance, if a page contains variables A, B, and C, a global transformation at a certain point of the process may consist of the following operations: A A +C 2 2' CCc+ 1, B+C B-2 2 D 2A+B In this data transformation we also see an example of creating data variables in the page. Variable D did not exist previously in the page; it has been introduced by the global transformation. In this particular case, one of two choices may be specified: either to execute all operations simultaneously or to perform the last operation after the others, with a different value resulting for D. If the approach of global transformations is assumed, since the data of the page are all available simultaneously, it is convenient to execute these data transformations by means of special operational structures which include the data page, that is, a special-purpose processor for the specific data transformation. Because there are several variables in the page, this processor will be composed of a network of smaller processors. Because the data transformation to be performed changes continuously, this network needs to be programmable. For this reason, the station where the data transformations are performed is indicated in Fig.@ 1 as a programmable network, or PN; actual implementations of PN's are discussed in Section III. We indicate the description of such a data transformation implemented by such a programmable network globally with the symbol F. Before each data transformation F, we may want to add new data to the page. A station preceding the programmable network, indicated in Fig. 1 as the assembler, is in the best position to do so. The sources of input data are therefore connected to the assembler. We indicate globally with the symbol I the prescription of new input data to be added to the page in the assembler at the present time. SCHAFFNER: DATA AND PROGRAM BLOCKS 1017 program typically is static; outside of self-developing programs, a program module does not change shape, identity, or place. It therefore appears simpler to make reference to program modules rather than to data. In the case of program sharing, this approach does not require additional control. For all these reasons, approach 3) is taken here. Accordingly, each page is provided with a key (to be more fully described later) which refers to the DS that the page should be paired with. Now we can describe in more detail the activity within the frame of Fig. 1. When a page arrives at the assembler, it calls for a DS, that is, a quadruplet, in response to its key. Then, the first component of the acquired quadruplet, the input prescription I, calls for new input data to be added to the page. Subsequently, the page is transferred into the programmable network, carrying components F, T, and R of the quadruplet. In the programmable network PN, the data transformation F is implemented, and then the transition function T is executed. The outcome of the transition function is precisely to determine the value of the key in the page, so that the page may possibly call for a different DS at the next circulation. When the page is transferred into the packer, the routings prescribed in R are implemented. As will be described in more detail later, different routings can be associated with different outcomes of the transition function. Among the possible routings for an already existing page there is the insertion of a new key into circulation. When such a key arrives at the assembler, it will call for a quadruplet and thus start a new process. In this way, a page can generate another or several other pages. One of the possible outcomes from the transition functions is a key equal to zero, which means that this page disappears after the completion of the routing in the packer. If no specific prescriptions are given, the page memory has a FIFO discipline; that is, the assembler acquires one page at a time from the page memory, in the same order in which the packer fed them into the memory. But among the possible routings by the pages there are also prescriptions for structuring the page memory, for instance, in order to switch the circulation to different page segments. From the described behavior it can be noted that through their existence and movement the pages (data) strongly C. The Control participate in the control of the activity of the entire system. As was said above, for producing processing activities we Although everything is predetermined in the DS's, we can need to associate pages with DS's. We can think of three say that the processing activity here is data-driven; more approaches to that. 1) An additional device calls for the precisely, it is a page-driven processing. program modules (DS's) and for the data blocks (pages), as needed, providing for the appropriate pairings. This is a D. Structure-and-Data Machines From the previous description of the circulation of the completely general solution, but also the most expensive. 2) The program modules are given references for the pages to pages, it is apparent that different processing activities can be called for. This approach is similar to the one used in occur as a consequence of different pairings of pages and convectional computers (data addresses). 3) The pages are DS's. These pairings in general result from the outcome of given a reference for the program modules to be called for. the transition functions. These outcomes depend on the Data are dynamic in that their structure, location, and results, which in turn depend also on the data that are scope vary in time. Therefore, a reference system for data acquired; therefore, the variety of changing activities that (addresses) inevitably grows in complexity quickly as soon can be implemented with simple program means can easily as the activity becomes even a little articulated, a well- be realized. In order to have a criterion for distinguishing known problem in present computers. On the other hand, a these different activities, we introduce the notion of After each transformation F, we may wish to route some results elsewhere. A station subsequent to the programmable network, indicated in Fig. 1 as the packer, is devoted to this task. The output devices are therefore connected to the packer. We indicate globally with the symbol R the prescriptions of the routings to be executed at the present time in the packer. When we delineate a page transformation as a part of a process, it is also necessary to indicate which other transformation should be next. Given the complex activity that can be included in these Fs, a linear succession of Fs (like a succession of instructions) is not the most frequent occurrence, at least in significant classes of problems. It is convenient, instead, to use, after each F, a transition function for establishing the transfer to one of several other data transformations F, in response either to results obtained in the page or to outside signals. Whatever the complexity of such a transition function, it will be implemented best by using the facility of the programmable network, which already exists for the data transformations F; in this case, PN is used for testing the conditions that at the present time lead to different transitions. We indicate globally with the symbol T such a transition function so implemented. From the above we have arrived at the description of a complete processing structure symbolized in the form of a quadruplet [IFTR]. It includes input connections in the assembler (component I), a data transformation and transition function in the programmable network (components F and T), and output connections in the packer (component R). We call such a quadruplet a description of structure, or DS. The structure described by a DS is an instant structure, and to implement a process we generally need several such DS's. For instance, a process may be implemented by the sequence DS1, DS2, DS2, DS3, DS2, DS2, DS4; for such a process, the program consists of the set DS1, DS2, DS 3, DS4. The mechanics of executing the process is to make a page (a data working set) circulate through the assembler, PN, packer, and memory of Fig. 1 and have a DS associated with the page at each circulation. The selection ofthe DS for each circulation is given by the transition functions in the DS's themselves. 1018 IEEE TRANSACTIONS ON COMPUTERS, VOL. structure-and-data machines or SDM's; each SDM is implemented by a page (the data) through a pattern of pairings with DS's (the structure). Denoting specific patterns through DS's by means of indexed brackets, we can write symbolically pagen + [DS]m SDMk. Different pages which follow the same pattern m of DS's constitute different SDM's, because they use different data. Moreover, a page which previously performed an SDM can remain alive (rather than disappear) and enter a different [DS]m for performing a new SDM, thus implementing a simple parameter passing between two SDM's. The partition into SDM's is quite arbitrary, but it is useful in that the SDM's can be associated with meaningfully distinct computations. E. Facilities The SDM's described above are transient machines which are created for performing a specific task. To make the work efficient, several facilities are provided in the system. The input and output facilities assembler and packer, respectively-have already been introduced, and contribute to the pipeline structure. Two other essential facilities, the auxiliary page storage and the functional, memory, are described in the following. Exchange of data among different pages is a frequent need. Because each page that does an activity passes through the programmable network PN, a simple way to implement data exchange is to provide PN with an auxiliary page storage QN (Fig. 1); that is a one-to-one replica of registers QN that hold the page in PN. When a page is in PN, it can transfer some of its data into Q'N; then some other pages can acquire or exchange those data from Q'N. The data in QlN are available to the data transformations F, so that common values can be used by several pages; the data in Q'N are not removed during the flow of pages in PN. Also, control information can be stored in Q'N. A very useful feature is the transmission of a key through Q' by part of one page to specific other pages; in these pages the new key overrides the outcome of their transition functions. In this way, it is possible for one page to direct the transition of other pages; in the programming language this is called a c-27, NO. 11, NOVEMBER 1978 input; for this reason this memory is called functional. This facility frees the SDM's from many clerical tasks. At this point, the role of the pages can be described in more detail. A page is created when a specific task is needed; with circulations through the structure shown in Fig. 1, the page keeps its working data set updated, typically changing in size. Data can be acquired from input sources and from the functional memory from other pages; data can be routed to output devices and to the functional memory to other pages. The pages provide for the computations and any page can take control of other pages. When its task is accomplished, a page normally disappears. All such activities are accomplished by each page in response to the DS's that the page acquires -in its life, following the transition functions in the DS's; therefore, no central control is needed, regardless of the number of simultaneous SDM's. Because the pages are created by other pages when and where (in the sequence of pages) they are needed, no central scheduling is necessary. Because the pages flow in a known sequence through the pipeline structure depicted in Fig. 1, synchronization results from the modeling of the process as an activity of concurrent SDM's. At first, one might wonder whether this modeling in terms of data that circulate continuously might lead to implementations with an exceedingly high rate of data transfers and thus be inefficient. However, a more detailed analysis of the work in the frame of Fig. 1 should clarify the question: the pages are pure working sets; new data are introduced into the pages precisely in the cycle in which they are needed; results are normally routed out of the page as soon as they are produced; within PN, all data transformations are executed without references to memory. When data are not suitable for a page discipline, the addressable functional memory is available (see Section II-E). The example described in Section IV shows a simultaneous use of data managed in self-updating pages and in addressable form in the functional memory. The implementation model described is in the same framework as Wegner's information structure models [6]. Here, the modeling of the processes is achieved by mechanizing an appropriate pairing of data structures (the pages) with processing structures (the DS's). Data and programs not only are separate structures, but also are treated independently and reside in different storage structures. Nonetheless, the union of program and data is very deep; DS's and pages are indistinguishable in the PN. The gist of this approach can be summarized as follows: the conventional computer model is based on a processor which scans a passive memory; in this model, a number of dynamic memories flow through an instantly specializable processor. driven transition. As mentioned previously, there are cases in which data have to be organized in an addressable form. For this purpose, an addressable memory is provided (the functional memory in Fig. 1), and the routings R and the input prescriptions I provide for the communication between pages and this memory. However, in most cases in which data are stored outside the page, some simple operation is III. PHYSICAL IMPLEMENTATION actually requested, such as accumulating a sequence of data, The implementation model described in the previous storing the maximum (or minimum) value in a sequence of data, or counting the occurrences of a set of values (e.g., to section developed from the need to process radar signals in produce a distribution). The routing prescription indicates real time in ways that could easily be changed to follow the requested function, and the memory provides for its developing research. After several special processors based execution by means of a controllable, simple processor at its on circulating words related to independent processes at the 1019 SCHAFFNER: DATA AND PROGRAM BLOCKS BUS RESO URCES STORAGES Fig. 2. Typical connectivity configuration in UData Storage Functional u Resource a computer. Controllable + Connection Fig. 3. The programmable network of the CPL 1 machine. Radio Meteor Project of the Smithsonian Astrophysical Observatory, Cambridge, MA, and at the Weather Radar Project of the Massachusetts Institute of Technology, the first programmable machine of this type, called CPL 1 [7], was constructed in about 1969 and put into operation at the Smithsonian meteor radar station in Havana, IL, in 1970. Then the machine was brought to MIT, where it has since been used for experiments on real-time characterizations of weather radar echoes. On the basis of these experiences, an analysis of the approach was undertaken [8], and the construction of a more generalized machine of this type has started. Presently the machine is at the National Center for Atmospheric Research, Boulder, CO, for experiments in real-time processing of radar signals. Because the main characteristics ofthis type ofprocessing machine are the page discipline, the automatic circulation of the pages, and the instant structures ofthe processor, we will call them Circulating Page and Structure (CPS) machines. The architecture of a CPS machine directly follows that in Fig. 1. The organization of the work is as described in Section II. Most characteristic is the programmable network, to which most of the following discussion will be related. In conventional general-purpose machines, data are moved individually between storage and resources by means of addresses; Fig. 2 shows a typical configuration for this traffic. In a CPS machine, there is the basic flow of data blocks (the pages); and within the page, is the hardware configuration to change, rather than data to move. The PN structure of the CPL 1 machine, shown in Fig. 3, will be used as a concrete example for the discussion. A page is transferred in parallel from the assembler register array Qa into the PN register array QN. (In the assembler of the CPL 1 machine, there are three replicas of array Ka, not indicated in Fig. 3, cascaded in an FIFO discipline, to mitigate the effect ofthe differences in the times for page, quadruplet, and input acquisition and for the execution in PN.) After the operations in PN, a page is transferred in parallel into the register array Qp of the packer. The array Q2N comprises registers A, B, C, and D for four variables; registers a, b, c, and d for four-input data; register K for the key; and registers W for holding words w of the components F and T of -the quadruplet. In the figure, registers A', B', C', D', and K' of the auxiliary array QN are also shown. Each register for the variables has its own ALU (lined boxes in Fig. 3) with one ALU's input multiplexed to several other registers. The ALU of variable A has connections with all the data in the page; the other variables have a direct connection with one input register, one auxiliary register, and one neighbor variable. The connections of this PN can be considered as constituting the minimum connec-- 1020 IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 11, NOVEMBER 1978 (a) (b) n Data fn Functional Controllable ' Resource Connection Fig. 4. Types of connectivity for a programmable network. Storage PREFIX 5 ROOT SPECIFIER I I I I I Fig. 5. A configuration SUFFIX I I 1 of program words in the CPL I machine. tivity needed to make up a CPS machine. With this PN, the data available for the work of each page are the four variables A, B, C, and D; four new input data a, b, c, and d; the four auxiliary variables A', B', C', and D'; variables of the previous page in the packer; and variables of the following page in the assembler. The selection of the operation in the ALU's and the activation of the connections to the ALU's (the controllable connections are indicated by circles in Fig. 3) are determined by outputs of a unit indicated in Fig. 3 as a logic array. This unit (a combinational circuit or an ROM) has as many outputs as there are controllable elements in PN and has as many inputs as there are bits in the words w, plus some outputs from a timer. This timer is controlled by the content of words w and provides the clock pulses to the registers; for certain operations, the timer also produces sequential changes of the connections and functions in the ALU's through the logic array. In short, each word in the registers W provides an extended horizontal microprogramming, and the sequence of words in W provides a high-level vertical microprogramming of PN. The data transformation that was indicated globally with the symbol F in Section II here has the form of a sequence of words in the registers W, which are presented one at a time to the input of the logic array and timer. The transition function that has been indicated with the symbol T are other words in W which make use of the ALU's for testing relations among the values in the registers, and consequently transfer new values into register K. Register K' is used for transmitting a driven transition to other pages. The choice of the connectivity pattern within PN is a major design factor of a programmable network. A crossbar solution, as sketched in Fig. 4(a), enables all the variables to have complete independence of operation; however, it is highly redundant, and it is not suitable for standardization of hardware applicable to PN's of different sizes. The solution sketched in Fig. 4(b) has a restricted connectivity, but it is appealing for implementing standard hardware suitable for PN's with any number of variables. The maximum number of variables that can be stored simultaneously in PN is another design factor of a CPS machine. Another interesting point is the efficiency and readability that can be obtained in the microprogram words for a programmable network. In the CPL 1 machine, the program words are 12 bits long, corresponding to the columns of punch cards, and are structured in different ways, depending SCHAFFNER: DATA AND PROGRAM BLOCKS on the classes of operations to which they refer. As an example, Fig. 5 shows one format for parallel operation. The field labeled "root" specifies one of six functions; the field labeled "specifier" indicates the argument for the prescribed function (that is, the connections among registers to be activated in PN); the field labeled "prefix" selects details specific to each function (e.g., in the accumulation, whether the result should overflow or remain at the maximum value); and the field labeled "suffix" specifices the variables (A, B, C, D) that perform the operation. Two unused values of the root indicate other word structures: one for unary operations and the other for special functions that involve several variables. Using this technique, the majority of the 4096 bit combinations of these 12-bit words can be used for meaningful configurations of the programmable network. What is more important, the user does not need to look in a dictionary of 4096 entries; instead he can compose these words in terms of meaningful small tables that, with a little practice, can even be memorized. The logical next step is to have automatic generation of these words from conventional mnemonic expressions. The experience with the CPL 1 machine suggests that microprogramming can be made fully available to the user. In the CPL 1 machine, the page memory has a FIFO discipline in terms of pages. The circulation of the pages and the allocation of their words in the registers of the page arrays Q proceed automatically. A simple supervisor program establishes various modes for the page circulation, such as asynchronous, synchronized with a radar, or controlled by time signals from a digital clock. IV. THE PROGRAMMING LANGUAGE By programming language we mean here the machine language of the CPS machine. However, because of the organization of this machine and its working level, this machine language also constitutes a suitable user language for certain classes of processes, such as those described in Section V. There is also a more general point: this machine language was developed for expressing varieties of recognition strategies which could be executed in real time, and it can be developed directly by those who are conceiving the experiments. Undoubtedly, these circumstances have resulted in effectiveness and conciseness of expression in this context. In this sense, the language might constitute an interesting implementation model for features of programming languages in a broader context. This machine language is modular in terms of DS's; every program is a set of DS's related to each other by transition functions or by data transfers. The external structure of a set of DS's related by transition functions is operatively similar to that of the finite-state machines defined in automata theory; for this reason, we will refer to the DS's also as states and we will view the programs as state diagrams whenever it is useful to do so. Because of this functional similarity, some of these external structures can have uses similar to those of the finite-state machines, such as for recognition of patterns 1021 or memorization of past events. Abundant use of this property has been made in the applications reported in Section V. In contrast to the hierarchical abstract machines defined in automata theory, these state structures have a text within the states-the quadruplet-which permits us to change the characteristics of the machine at each state; for instance, at each state, I and R select subsets of the input and output alphabets. Moreover, a set of data (the page), possibly renewed at each discrete time, is associated with the states; that is, these machines have memory. The text in each state may have a description of data transformation; thus, computation can be provided when needed. As a matter of fact, when only straight computation is required, a program may consist of a single state with an F made up of a long sequence of subexpressions similar to a conventional listing. In effect, if primarily the functional memory is used, a CPS machine can simulate the activity of a conventional computer. On the other hand, if the page memory is considered as a closedloop tape system, a CPS machine can be reduced to a multitape or multicell (and consequently, a linear tape) Turing machine, making the usual assumption that additional memory is provided whenever needed, e.g., [9]. From the above we see that in this programming language characteristics of abstract automata and of conventional programming languages are equally available and embedded in each other. Probably, the possibility of using these different frames simultaneously for modeling the processes is an important reason for the peculiar effectiveness of this language that has been experienced in several classes of problems. The state structure mentioned above, the modularity given by the quadruplets, and the several years of experiments with the CPL 1 machine have suggested the representation of these programs in a particular form of state diagram. In the present context, the state diagrams are the working representation for the user, and they also assume the role of reference representation. The actual coding for the machine is done as the last operation, and because of the typical one-to-one correspondence between elements of the state diagrams and machine codes, it does not constitute a significant phase. Given this context there is much less lexical constraint in the state diagrams than in conventional programs; indeed, the modeling of a process should be developed with notation-independent thinking. The chosen graphical structure of the state diagram provides much of the syntax. In order to interpret the example given below, some notations used to make up these state diagrams are described (a full description can be found in [8]). States are represented as encircled domains (see Fig. 8). The data transformations F and the transition functions T are indicated inside these domains, above and below a horizontal line, respectively. The outcomes of the transition functions are indicated by arrows leading to the states. For the data transformations F, self-explanatory nota- 1022 IEEE TRANSACTIONS ON COMPUTERS, VOL. Pg (k) generation of a new page in state k ST k driven transition to state k for the following pages ix c-27, NO. 11, NOVEMBER 1978 stopover transition; the path followed by the page is predetermined (regardless of the transition functions in the states visited) the page circulates n times idle, before entering state k (the dot indicates the path chosen when the test is true) the page stays in state k for n times, before following the transition function of state k ~~I the succession of states (or cycles in the same state) occurs without the page leaving PN the page disappears Fig. 6. Notations used for the state diagrams. tions are used: capital letters A, B, C .. . indicate variables in the page; the same letters with a prime indicate the corresponding variables in the auxiliary storage K'N; capital letters M, N, 0 ... indicate variables in the functional memory. Operations are executed in the sequence indicated by the succession of the expressions. Parallel execution is indicated with a single infix expression comprising several variables and several arguments in corresponding order (e.g., if indexes i and i + 1 denote the values before and after the execution of an operation, respectively, the expression ABC +- sAB means Ai+ I- s, BiI++- Ai, Ci+1 - Ba). The input prescriptions I comprise numerical constants and identifiers; the actual values of the identified quantities are those that were available from the source at the time in which the page was in the assembler (e.g., the present radar sample s in the above example of parallel expression). But there is no need to spell out input prescriptions I in the state diagrams; the appearance of input data in the expressions of F and T is sufficient for documenting a program. Routings are indicated beside the encircled domains, when they are state-dependent (i.e., the routing is executed every time a page is in that state). Routings are indicated beside an arrow, when they are transition-dependent (i.e., they are executed only if the transition function selects the path indicated by that arrow). These output features correspond to those which in automata theory are assigned to the Moore and Mealy machines, respectively. When a routing has a name of a variable without further specification, the present value of that variable is routed to an output buffer which performs with a FIFO discipline. When routing is to the functional memory, the name of the variable in the page is followed by an arrow and by the name of the variable in the functional memory, with the function indicated in parentheses. Some specific notations and graphical means used in the example are described in Fig. 6. azimuth z Fig. 7. Area selected for a real-time processing of radar signals. The example to be described is a real-time processing of radar signals, from the applications to be discussed in Section V. This example is chosen because, while simple to follow, it gives an opportunity for seeing how the efficiency of the special-purpose processor and the generality of the general-purpose computer are achieved simultaneously. The environment is depicted in Fig. 7; range r and azimuth z are considered at discrete, equidistant values, i and j, respectively, to form a grid of i, j points. For each of these points, an estimate of the echo intensity xi,j is computed as the mean of consecutive digital samples s (weather echoes are highly fluctuating) obtained from the radar output during the antenna rotation of onej unit (say 32 consecutive pulses per one j increment): 1 xi,j= 32 j32 t (- 1)32 Si. (1) We want to characterize in a single antenna passage the precipitation in a specific region, such as the sector a in Fig. 7, delimited by ranges R1 and R2 and azimuths Z1 and Z2The characterization considered here consists of three 1023 SCHAFFNER: DATA AND PROGRAM BLOCKS Control Poge r- RI I Computation Pages 1 Fig. 8. State diagram of a process for real-time characterization of weather echoes. global parameters: the area A covered by echoes of intensity Initialization Phase above a given value h, the mean echo intensity M in that Before the radar beam crosses the region of interest (the area, and the mean gradient G in the echo pattern of that lined area in Fig. 7), the operator activates a page in state 1, area. These three parameters are approximated with the locked to range RI; that is, the page transfers from the following algorithms: assembler to PN when r has the value R1, so that this page will always correspond to range R1. In this way, the if xi,j > h processing machine is synchronized with the radar; ni,j=0 Iotherwise the pages make one circulation per radar period, with the first page working precisely when the echo from range R I A =E E ni,j j i has arrived. In state 1, this page simply acquires the present value z ofthe radar azimuth and checks whether itis equal to Z1. When this is the case (which means we are at point Po in M= AE ni,jxi,j j i Fig. 7-the page is at azimuth Z1 and range R l the page transfers to state 3, and routes a key for a page in state 2. This page produces a sequence of pages in state 8, filling the range G= + |xi,j-xi,j-11] E 1|xi,j-Xi-1'j1 A interval R1 - R2, and then disappears. At each circulation, the consecutive pages in state 8, each with the differences computed only if either of the two x is at a specific range, accumulate the samples s of the radar > h. in variable A. After 32 idle circulations, the control echo There is no preliminary storage of data in the memory of the computer. Rather, the memory is structured in the form page is in state 3 and produces a double driven transition, so of pages, according to the arriving data, so that the proces- that the pages in state 8 stop over to state 9 and then return sing can be accomplished on the spot, with a minimum of to state 8. In this way, a first mean value is computed and resources and time. In this program, two types of SDM's can stored in variable B ofthese pages, which will have the role of be delineated, as in Fig. 8: one, comprising states 1-7, the xi,j_ in expression (2), and variable A starts a new implemented by one page, which has the role of general accumulation of samples s. At the same time, the control control, and the other, comprising states 8-11, implemented page starts an accumulation of samples s in variable A and by a number of pages performing the computation at each clears variables M, N, and Q in the functional memory. point. The functional memory is also used. The input data used by this process are digital echo samples s, current Continuous Computation Phase This phase lasts from azimuth Z 1 to azimuth Z2, and the digital values of the azimuth z and range r, and constants such as Z1, Z2, R1, R2, and h. Forthesake ofdescription, we pages continuously repeat a cycle of computation 32 circulations long. The control page stays in state 4 for 31 circuladivide the process into the three following phases. I it 1024 tions and in state 5 for one circulation, while the computation pages follow a pattern through states 8, 9, 10, and 11. The computation simultaneously comprises the preparation of the next values of xi,j in variables A, the holding of the most recent values xij in variables B, the holding of the x, j- I in variables C for as long as is necessary, and the computation of the differences among x that appear in expression (2). The following description of the states involved explains the mechanization adopted. State 4: Variable A of the control page accumulates the successive samples s from the range associated with the page. The purpose of this computation in the control page is to provide the xi- 1,j for the first of the following computation pages. State 5: The accumulation performed in state 4 is divided by 32, thus producing a mean-echo value x. Then there is a triple transfer of data: the computed x is transferred from A to C' (in the auxiliary storage); A and B are given the present sample s and the current azimuth z, respectively. In this way, variable A can start a new accumulation, and the test on B performed by the transition function of this state can determine when the region of interest (the lined area in Fig. 7) has passed. Moreover, a driven transition to state 9 is produced for the following pages, in order to synchronize each cycle of computation for the entire page set. State 8: When in state 8, all computation pages simply accumulate the current echo sample s into variable A. State 9: The content of A is divided by 32 to obtain the x values. A triple transfer occurs: the previous x j in B, which has now become xijF-, is transferred into C; the x just computed in A, which now becomes xi,j, is transferred into B; and A is initialized with the present sample, so as to be ready for a new accumulation. At this point, the transition function acts: if B (value xi,j) is larger than h, the page transfers to state 11, if not, but C (value xij- 1) is larger than h, the pages stops over in state 11 and then goes to state 10; otherwise, the page stops over in state 8 (for timing reasons) and goes to state 10. Only in the case of the first of the above transitions (case of xij > h) is a routing made: the value of B (xi,j) is accumulated into variable M of the functional memory, and variable N of the functional memory is incremented by one. In this way, M is accumulating the x for the computation of M, and N is computing A. State 10:* First, the present sample is added to variable A, in order not to interrupt the accumulation for the new x. Second, variable C copies the value in variable B (which is x,,j). Third, C interchanges values with C' in the auxiliary storage; because all pages make this interchange in the same circulation, a shift of content among the consecutive pages occurs in C. That is, in each page, C comes to contain xi j. Finally, if C is > h, the page stops over in state 11 and then goes to state 8; otherwise it goes directly to state 8. State 11: This state provides for the computation of the absolute value of the differences between two x; and it is reached only when either of the two x involved is greater than h, as can be determined from the transition functions of states 9 and 10. First the present sample s is added to A, again so as not to interrupt the accumulation; then the IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 11, NOVEMBER 1978 difference is computed; and finally, by routing, the value of this difference is accumulated into variable Q -of the functional memory. The transition function of this state (which acts only when the state is reached not in stopover) is an unconditional first stopover to state 10, a second stopover to state 11, and a final transfer to state 8. It can easily be checked that when state 11 is reached from state 9, the computed difference is I ij -- j I; when it is reached from state 10, the difference is xi Xi- 1j. The simple feature of the stopover transition makes this real-time computation be executed fast and with little means. Output Phase When the current azimuth z has surpassed the value Z2, the page control from state 5 transfers to state 6 instead of state 4. In state 6, the variables of the page acquire the four delimiting coordinates and route them to the output. Then the control page remains idle for four circulations (to allow the other pages to finish their computations) and goes to state 7. In state 7, the present content of variables M, N, and Q in the functional memory is acquired by variables A, B, and C, respectively, of the page. The content of B and C is divided by the content in A, and then routed to the output. These routed values are the covered area A, the mean intensity M, and the mean gradient G of expressions (2). Finally, a record of the data routed to the output is commanded (to a tape recorder or printer), a driven transition to state 0 (disappearing) is produced for the other pages, and the control page itself disappears. Each record automatically contains, as heading data, the lapsed time, the date, and program numbers. A variant of states 6 and 7 transfers the control page again to state 1, for a preset number of radar antenna rotations or for a given time interval, in order to produce a longer record containing the time evolution of parameters A, M, and G, computed in the lined area of Fig. 7, at every antenna rotation. This simple program also shows how real-time processing can be set directly by the operator for a given environment. After it has been set, the activity proceeds automatically, depositing the results in the output device and causing the memory structure created for the process to disappear, thus leaving the machine clean for other processing, without the need for garbage collection. V. APPLICATIONS As indicated previously, this approach developed from the need for real-time processing of radar signals, and it is in this context that it has been applied. In [10] there is a description of a program for recognizing the echoes of faint meteors (which have an intensity in the range of the noise) and recording their rates, by echo energy and duration. In [10] there is also a first comparison with corresponding programs written in different languages. In [11] and [12] an account is given about real-time characterizations made of weather echo patterns. When conventional general-purpose computers are used for real-time applications of this type, programs have to be prepared in machine language, thus losing the facilities SCHAFFNER: DATA AND PROGRAM BLOCKS offered by the high level languages. Another difficulty is that standard computers cannot manage the large quantity of data produced by the radar in real time, so that special preprocessing units need to be added. The result is that the generality inherent in the programming language and the computational capability inherent in the computer cannot, in practice, be put to full use. It is precisely in this context that the approach described gives the greatest benefit. In regard to the first point, the machine language of the CPS system appears, for this type of. application, to be at a comparable level to that of high-level languages, and it certainly has more flexibility. In regard to the second point, the dynamic evolution of the pages and the level of activity performed by a page in each circulation solve, in a natural way, the problems of high-speed data handling. Because of these facilities, the feasibility of real-time processing is extended. In the case of processing radar signals, computations can be made on the entire set of raw data, thus using the full resolution and information given by the radar. As an example, a family of processes, one program of which is described in Section IV, has made it possible to ascertain the strong correlation that typically exists between the mean gradient in a weather echo pattern and the type of precipitation (such as snow, stratiform rain, or convective showers) associated with the echo pattern [12]. Also, it has been possible to experiment with a variety of algorithms for real-time discrimination of ground echoes, based either on the morphological characteristics of the echo pattern or on the characteristics of the pulse-to-pulse fluctuation [11], [12]. One of the most helpful features of this system in signal processing experiments is the possibility of observing the several variables during the execution of the processing itself. The packer, as indicated in Fig. 1, has an output toward the memory to which all the words of all pages are transmitted sequentially; the packer also has synchronization signals suitable for selecting specific words in a specific page or over an entire sequence of pages. Therefore, a simple display permits one to observe not only the evolution of the pages, but also the values (either in digital or analog form) of any quantity involved, during the actual execution of an experiment. Such a facility is also of great value in debugging complex processes, a fact that in turn makes it possible to experiment with a large variety of processing techniques, within a given amount of effort. Exploratory experiments have also been conducted with various other classes of processes. The structure of a CPS machine has similarities with that of digital differential analyzers [13], and therefore the programming and execution are efficient in those classes of problems in which differential analyzers have been used. Taking advantage of the existence of several variables in the page and of the direct communication between adjacent pages, experiments have been performed on numerical models, by associating a page with each grid point. The flexibility ofthe page structure has also suggested experiments in sorting algorithms. Perhaps the most interesting exercises have been in implementing recursive functions taking advantage of the facts that pages 1025 can be created locally and can exchange data and that the key gives each page an independent status. VI. DIscuSSION As pointed out in Section II-C, in a CPS machine, processing is page-driven. This associates the CPS system with data-flow computers and languages. A bibliography on the works in this field can be found in [2]. A most active group has been the one directed by Jack Dennis at the Laboratory for Computer Science of MIT. The differences and similarities between the system described in this paper and Dennis' work can be traced from the different starting objectives. Dennis' group studies a general procedure for automatically achieving highly parallel execution of programs. Machines are being designed that execute instructions at the arrival of their operands, so that parallelism could be exploited on a global basis, regardless of the original structure of the programs. The needed machine representation of programs, basically a directed graph at the operand level, would be generated by translation programs from the application programs written in conventional textual languages [14]. In this approach, some overhead is expected, as well as some constraints in consequence of the automatic translation. In the system described here, as mentioned in Section III, the objective has been to make the machine executing the processes in a form as close as possible to the process representation conceived by the user, in order to facilitate the development of new processes and the real-time interaction between user and machine. In the context in question there is a facility for thinking in terms of paths followed by the data of each task [8]; correspondingly, it turned out that this machine is data-driven or, more exactly, page-driven. In this approach, the machine needs a high degree of flexibility (Fig. 1 shows the solution presented), and a clear visualization of the processes is required on the part of the programmer. In both approaches the machine language programs are a direct encoding of programs expressed in an intermediate language; these languages have differences. In the data-flow machines, data are funneled into the processing units asynchronously and relatively independently of the processes' structure; there is no notion of sequential control. In the CPS machine, data transformations are executed on working data sets (the pages), in the sequence established during the modeling of the processes. The configuration of the network of processors (PN) and the organization of the pages adapt themselves to the processes. The management of the working data sets (the pages) is automatically established when the state diagrams are delineated. This paper does not address the subject of generating these state diagrams from programs written in textual languages. In the following, some preliminary comments and elaborations are made on certain aspects of the CPS machine which have relevance to multiprogramming and multiprocessing systems in general. The issues are only mentioned 1026 here, and a more detailed discussion is postponed to the time when results from the new machine will become available. The characteristic that is most readily apparent in a CPS system is its suitability for multiprogramming. In effect, the work of the machine itself is a continuous multiprogramming through the implementation of numbers of SDM's. Each change of page in PN is equivalent to an interrupt with automatic preservation of the status of the processes involved. Because programs are composed of self-sufficient modules (the quadruplets), the complete status of a process is rendered by a single word the key. It appears that a key with an extensible format would be sufficient for carrying all auxiliary information that may be attached to the key in a complex system. This is considerably less than the 25-100 words used by present computers to hold the process status [15]. The basic reason for such an efficient representation of the process status is that the page discipline eliminates specific memory references within the computation described by a quadruplet. Further contribution to this efficiency is due to the pipeline structure of the assembler and the packer, which provide independently for input and output under information within the quadruplet. Moreover, the key is part of the page. Therefore, increasing the number of simultaneous programs does not increase the complexity of the machine; the number of programs is limited only by the capacities of the page and program storage. A natural extension of the machine described in this paper is the partition of the page memory into several segments which circulate at different times, under controls routed by pages working with a supervisory role. However, again because of the modularity of the programs, this extension does not increase the complexity of the process status. In summary, given the almost nonexistent overhead of program switching, it appears that aspects of the CPS approach are of interest for time-sharing systems. Two forms of parallelism can be seen in the CPS system. The first is the straight parallel execution of simultaneous and independent operations on different variables in PN. The number of these variables is small, and this form of parallelism matches quite well the small parallelism that is present in almost all computations. Of interest is the fact that the same hardware can be used in different configurations for specialized functions when there is no parallelism at all in the computation. No less interesting is the fact that no specific detection of parallelism is needed; when a computation is being modeled in the form of an SDM, the small parallelism ofthis kind becomes evident in a very natural way. The second form of parallelism is given by the multiplicity of pages. In effect, the pages can be viewed as virtual replicas of PN. A sequence of pages can perform the same or different processes, depending on the keys they contain; a page sequence can be any size within the capacity of the memory; and the number of pages can change dynamically without introducing overhead. From an external viewpoint, this form of processing is equivalent either to sequential proces- IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 1l, NOVEMBER 1978 sing or to parallel processing, depending on the speed of execution in respect to the rate of data input and output needed by the environment. Because of the fast execution permitted by the PN and the direct acquisition ofnew inputs and production of output at each cycle, we expect that the processing will be equivalent to parallel processing in very demanding situations also. For instance, the set of pages in the right part of Fig. 8 perform their operations virtually simultaneously in regard to the radar, the operator, and the program. It has been noted that special-algorithm processors can outperform a stored program by a factor of 10-100 [15]. This is because there are fewer memory references and fewer separate instructions to be executed. However, although present echnology makes this approach practical,.no general applications have yet materialized. The programmable network of the CPS machine is suitable for various degrees of hardware specialization, especially with the inclusion of look-up tables made convenient by present ROM's; and, more important, the page discipline (making a number of operands available simultaneously) permits an efficient insertion of special-algorithm executions at any point in the programs. These facts suggest the following consideration. The time T needed for the execution of a program can be expressed, in a first approximation, as the product of the number N of instructions to be executed and the mean execution time tm of these instructions: T= N x tm. Great attention has always been given to reducing ti. It appears that equal attention has not been given to reducing N. The programmable network and the page discipline of the CPS system are effective in reducing both N and tm. Comparing certain classes of programs run in the CPL 1 machine with equivalent programs for conventional computers, the execution time is found to be reduced by from one to two orders of magnitude. A similar reduction is found in the number of machine-program bytes, and this reduction is also due to the concise modeling of a process in the SDM form. To systematize the various computer -architectures, a classification in terms of instruction and data streams has been introduced [16], so that problems such as contention of resources, synchronization, task splitting, and scheduling can be put in perspective [17], [3]. From this viewpoint, we see that the CPS machine shares some characteristics of the single-instruction-single-data (SISD) architectures and of the multiple-instruction-multiple-data (MIMD) architectures. In the sense that there are single streams of pages from the memory, of quadruplets from the program storage, and of input data from the environment, the machine enjoys the automatic synchronization, the absence of contentions, and the simplicity of programming of the SISD machines. In the sense that each page contains numerous data, that PN is a number of processors which can also perform independent parallel tasks, and that the page organization can change 1027 SCHAFFNER: DATA AND PROGRAM BLOCKS during processing, the machine has capabilities of the MIMD architectures. This double aspect of the CPS machine has been made possible by the page discipline and the modularity of DS's. That is, the primitives here are no longer the instruction and the operand, but rather the instant processor (the DS) and a related data block (the page). A further point of interest is that task splitting becomes a natural feature of the program design when a process is modeled in the form of interacting dynamically generated SDM's. The CPS system permits recursion locally, whenever it is needed. Every page can generate another page (as exemplified in state 1 of Fig. 8) and the new page can be in the same state as the generating page; in turn, each newly generated page can generate a further page, in response to specific conditions. Data can be transferred directly between adjacent pages, and through Q'N between distant pages. The capability of generating new pages locally can be used for several purposes; an obvious one is to hold a process needing a specific computation or the outcome of a recognition, and to generate pages locally for the execution of those computations or recognitions. A different application that deserves to be explored is the generation of new pages for the local interpretation and execution of transformations F expressed in a higher level language. Programs derived from conventional programming languages show a tendency to use small subsets of code and data for significant periods of time. This fact led to the development of paging techniques, by which it is possible to execute, even with a small main memory, several programs which were each written for a much larger virtual memory [18]. Fundamental for these techniques are the models of program behavior [19] and the algorithms for optimal replacement of the pages in the main memory. It has been argued [20] that locality is affected by the design and structure of the programs. In particular, locality would be enhanced by a modularity of the programs, especially with a maximum functional autonomy given to the modules. In this respect, it can be noted that the programs in the system described in this paper are modular in terms of DS's, which typically correspond to fairly autonomous small processes. Moreover, the related data sets (the pages) are, so to speak, automatically constructed and managed by the processes themselves, as described in Section II. Many other considerations of various natures are triggered by characteristics of the programming language of this machine, but they are beyond the scope of this paper.VII. SUMMARY A signal processing system has been presented that permits the simultaneous attainment of the efficiency of specialpurpose processors and the total applicability of general-purpose computers, characteristics normally thought of as being mutually exclusive. Basically, the approach consists of specializing the machine by programming the hardware structure, rather than by adding software systems to it. To implement such an approach, data are organized in a particular page discipline, resulting in a multiplicity of data blocks which constitute independent and dynamic individual memories for the processes. Computation is performed in a programmable network of processors in terms of program blocks, each of which can be viewed as the description of a special-purpose machine. The control is given to data rather than to program. Because of the flexibility of this system, the machine programming language exhibits an interesting high-level structure and it can be applied directly as a suitable user language for certain classes of problems. The feasibility of a significant correspondence between the user's modeling of a process and the actual machine execution is shown. The structure of the language used in this context may be of interest as an implementation model for programming languages of a larger scope. The use of a programmable network of processors and of a functional memory constitute examples of solutions of known problems in multiprocessor systems. The programming of the processor network gives an example of extensive horizontal and vertical microprogramming. The organization of the data and the modularity of programs used in this system permit effective program switching, which may be of interest for multiprogram systems. Results are shown in real-time processing of radar signals. ACKNOWLEDGMENT I wish to thank Prof. J. Allen for the discussions on several topics and the suggestions given in regard to this paper. REFERENCES [1] J. Allen, "Computer architecture for signal processing," Proc. IEEE, vol. 63, pp. 624-633, Apr. 1975. [2] "Workshop on Data Flow Computer and Program Organization," D. P. Misunas, Ed., Comput. Archit. News, ACM SIGARCH, vol. 6, no. 4, Oct. 1977. [3] "Special Issue on Operating Systems," Computer, vol. 9, no. 10, Oct. 1976. [4] J. L. Baer, "Multiprocessing systems," IEEE Trans. Comput., vol. C-25, pp. 1271-1277, Dec. 1976. [5] D. H. Lawrie, T. Layman, D. Baer, and J. M. Randal, "Glypnir-A programming language for Iliac IV," Commun. Assoc. Comput. Mach., vol. 18, pp. 157-164, Mar. 1975. [6] P. Wegner, "Data structure models for programming languages," in Proc. Symp. on Data Structures in Programming Languages, SIGPLAN Notices, vol. 6, no. 2, pp. 1-54, Feb. 1971. [7] M. R. Schaffner, "A computer modeled after an automaton," in Computers and Automata. Brooklyn, NY: Polytechnic Press, 1971, pp. 635-650. , "Research study of a self-organizing computer," Final Rep., [8] Contr. NASW-2276 (NASA), July 1974. [9] M. A. Arbib, Theories of Abstract Automata. Englewood Cliffs, NJ: Prentice-Hall, 1969. [10] M. R. Schaffner, "Computers formed by the problems rather than problems deformed by the computers," COMCON Dig., pp. 259-264, 1972. [11] "Comments on 'Applications of radar to meteorological operations and research'," Proc. IEEE, vol. 63, pp. 731-733, Apr. 1975. , "On the characterization of weather radar echoes," in Prepr. [12] 17th Radar Meteorology Conf., Amer. Meteorol. Soc., Boston, MA, 1976, pp. 474-485. [13] T. R. H. Sizer, The Digital Differential Analyzer. London: Chapman and Hall, 1968. [14] J. B. Dennis, D. P. Misunas, and C. K. Keung, "A highly parallel 1028 [15] [16] [17] [18] [19] [201 IEEE TRANSACTIONS ON COMPUTERS, VOL. processor using a data flow machine language," Computation Structures Group Memo 134, Laboratory for Computer Science, M.I.T., Cambridge, MA, Jan. 1977. C. G. Bell and A. Newell, Computer Structures: Readings and Examples. New York: McGraw-Hill, 1971. M. J. Flynn, "Very high-speed computing systems," Proc. IEEE, voL 54, pp. 1901-1909, Dec. 1966. -, "Some computer organizations and their effectiveness," IEEE Trans. Comput., vol. C-21, pp. 948-960, Sept. 1972. C. A. R. Hoare and R. M. Mckeag, "A survey of store management techniques," in Operating System Techniques, C. A. R. Hoare and R. H. Perrot, Eds. London: Academic, 1972, pp. 117-151. P. J. Denning, "The working set model for program behavior," Commun. Assoc. Comp. Mach., vol. 11, pp. 323-333, May 1968. P. J. Courtois and H. Vantiborgh, "A decomposable model of program paging behaviour," Acta Informatica, vol. 6, pp. 251-275, 1976. r c-27, NO. 11, NOVEMBER 1978 ~~~~~Mario R. Schaffner (A'69-M'72) received the Dr. degree in electrical engineering from the University of Pisa, Pisa, Italy, in 1948. From 1948 to 1961, he has worked at the Microwave Center of the ItaLian National Research CounciL at the Magneti Marelli Company, _and FACE Standard Company. From 1957 to 1960, he has also taught radar engineering at the Italian Air Force Academy. In 1961, he came to the United States with a.NATO fellowship, and has worked at the Massachusetts Institute of Technology, Harvard College Observatory, and Smithsonian Astrophysical Observatory. He also served as a consultant for Raytheon and IBM. His major activity has been in the development of processing systems for research projects. He is presently at the Advanced Study Program of the National Center for Atmospheric Research, Boulder, CO. A Method to Simplify a Boolean Function into a Near Minimal Sum-of-Products for Programmable Logic Arrays ZOSIMO AREVALO, MEMBER, IEEE, AND JON G. BREDESON, SENIOR MEMBER, IEEE Abstract-This paper describes an algorithm for minimizing an mination of all the prime implicants (PI's) and 2) the arbitrary Boolean function. The approach differs from most previous selection of those PI's which minimally cover the given procedures in which first all prime implicants are found and then a minimal set is then determined. This procedure imposes a set of function [1], [2]. Several papers have treated the problem as conditions on the selection of the next prime implicant in order to a whole [3]-[5] and this paper also handles the problem in a obtain a near minimal sum-of-products realization. Extension to the similar way. Nevertheless, it differs from previous work in multiple output and incompletely specified function cases is given. An the selection processes for sequentially finding the PI's until important characteristic of the proposed procedure is the relatively a covering is obtained. In the present paper both the decimal small amount of computer time spent to solve a problem, as compared to other procedures. The MINI algorithm may give better and the logic representation of the minterms are used for results for a large number of inputs and outputs if relatively few convenience and it should be easy to see which is being used. A high level of interest in finding near minimal two-level product terms are needed. This procedure is also well suited to find a solution for programmable logic arrays (PLA's) which internally networks for large functions has occurred since the introimplement large Boolean functions as a sum-of-products. duction of programmable logic arrays (PLA's). PLA's are Index Terms-Large-scale functions, multiple output combina- LSI circuits that are mask programmable (some field protional circuits, near minimal sum-of-products, programmable logic grammable also exist) circuits from the PLA supplier. PLA's array (PLA's). implement multiple output combinational circuits in a sum-of-products form. National Semiconductors' DM 7575 INTRODUCTION IT IS A well known fact that the problem of minimization (also DM 8575, DM 7576 and DM 8576) PLA can realize of Boolean switching functions using a two-level AND/OR, any Boolean function with up to 14 input variables and 8 network, has been divided into two parts: 1) the deter- output functions that require no more than 96 product terms. The actual sum-of-products form which determines custom mask must be specified by the customer. Most the Manuscript received August 19, 1976; revised October 31, 1977 and PLA's can also complement any or all outputs which March 30, 1978. Z. Arevalo is with the Department of Electronic Engineering, Universi- effectively means the product of sums form can be impledad Distrital, Bogota, Columbia, S.A. J. G. Bredeson is with the Department of Electrical Engineering, The mented. Therefore, the cost for a PLA will not be a function Pennsylvania State University, University Park, PA 16802. of gate inputs, but a fixed cost if no more than 96 product 0018-9340/78/1100-1028$00.75 C) 1978 IEEE