Download fbi.h-da.de

Parsing in functional unification grammar MARTIN K A Y Language is a system for encoding and transmitting ideas. A theory that seeks to explain linguistic phenomena in terms of this fact is a functional theory. One that does not misses the point. In particular, a theory that shows how the sentences of a language are all generable by rules of a particular formal system, however restricted that system may be, does not explain anything. It may be suggestive, to be sure, because it may point to the existence of an encoding device whose structure that formal system reflects. But, if it points to no such device, it simply constitutes a gratuitous and wholly unwelcome addition to the set of phenomena to be explained. A formal system that is decorated with informal footnotes and amendments explains even less. If I ask why some phenomenon, say relativization from within the subject of a sentence, does not take place in English and am told that it is because it does not take place in any language, 1 go away justifiably more perplexed than 1 came. The theory that attempts to explain things in this way is not functional. It tells me only that the source of my perplexity is more widespread than 1 had thought. The putative explanation makes no reference to the only assertion that is sufficiently self-evident to provide a basis for linguistic theory, namely that language is a system for encoding and transmitting ideas. But, surely there is more to functionalism than this. To fill their role as systems for encoding and transmitting ideas, languages must first be learnable. Learnability is a functional property and language learning needs to be explained. But what is involved here is a derivative notion of function. A satisfactory linguistic theory will at least make it plausible that children could learn to use language as a system for encoding and transmitting ideas. It will not show how a child might learn to distinguish sentences from nonsentences. a skill with little survival value and one for which evolution probably furnished no special equipment. It follows that any reasonable linguistic theory will be functional. To use the word to characterize one particular theory, as 1 shall shortly do, must therefore be accounted pretentious. However, while it i s true that the label has been used before, it is manifestly untrue that linguistic theories have typically been functional. Just recently, there has been a partial and grudging retreat from the view that to formalize is to explain. This has not been because of any widespread realization of the essential vacuity of purely formal explanations. but for two other reasons. The first is the failure of the formalists to produce workable criteria on which to distinguish competing theories, and the second is the apparent impossibility of constructing theories whose most cogent remarks are made within the formalism rather than in footnotes and amendments. The search for sources of constraint to impose on formal grammar has led to an uneasy alliance with the psychologists and a belated rekindling of interest in parsing and other performance issues (for some discussion, see Gazdar, 1982; Kaplan, 1972. 1973, 1978; Kay, 1973. 1977, 1979). My aim in this polemic is not to belittle the value of formalisms. Without them linguistics,like most other scientific enterprises. would be impotent. It is only to discredit them as an ultimate basis for the explanation of the contingent matters that are the stuff of science. What I shall describe under the banner of functional unification grammar is indeed a formalism, but one that has been designed to accommodate functionally revealing, and therefore explanatorily satisfying, grammars. 7. I. Functional unification grammar A hnctionally adequate grammar must either be a particular transducer, or some kind of data structure that more general kinds of transducer generators andpqrsers - can interpret. It must show not only what strings can be associated with what meanings, but how a given meaning c a n h expressed and a given utterance understood. Furthermore, it cannot take logic as the measure of meaning, abjuring any responsibility for distinctions that are not readily reflected in the predicate calculus. The semantic side of the transducer must traffic in quantifiers and connectives, to be sure, but also in topic and emphasis and Biven and new. Functional grammar will, therefore, at the very least, adopt some form offunctional sentence perspective. In practice, I take it that the factors that govern the production of a sentence typically come from a great variety of different sources, logical, textual. interpersonal, and so forth. In general, each of these, taken by itself. underdetermines what comes out. When they jointly overdetermine it, there must be priorities enabling a choice to be made among the demands of the different sources. When they jointly underdetermine the outcome. the theory must provide defaults and unmarked cases. The point is that we must be prepared to take seriously the claim that language in f f5’ 09 I” ? e 2. E 5I Ie general, and individual utterances in particular, fill many different functions and that these all affect the theory. even at the syntactic level. 1 have outlined a broad program for linguistic theory, and the possibility o f carrying it through rests on our being able to design clean, simple formalisms. This applies to the formal descriptions by which words, phrases, sentences, grammars, and languages are known and also to the operations that the theory allows to be performed on these descriptions. Much of the character of the theory I shall outline comes from the set.heoretic properties that are imputed to descriptions of all kinds in everyday life. These properties do not, in fact, carry over to the descriptions provided for in most linguistic formalisms. In this theory, there is no limit on how detailed a description can be and no requirement that everything in it should serve some grammatical end. Generally speaking, to add more detail to a description is to narrow the class of objects described, and to remove material from a description is lo widen its coverage. I n fact, descriptions and the sets of things they refer to are mathematical duals of one another with respect to the operations of set theory. In other words, the intersection of a pair of descriptions describes the union of the sets of objects that they describe separately and the union of a pair of descriptions describes the intersection of the corresponding pairs of sets. These are properties we are entitled to look for in anything to which the term "description" is seriously applied. Descriptions are sets of descriptors and prominent among the kinds of object that go to make up the sets are pairs consisting of an attribute, like number. with an associated value, like singular. An important subclass of attributes consists of grammaticalfunctions like Subject, Modifier, and Connective. The claim that this theory makes on the word "functional" in its title is therefore supported in three ways. First, it gives primary status to those aspects of language that have often been called functional; logical aspects are not privileged. Second, it describes linguistic structures in terms of the function that a pari fills in a whole, rather than in terms of parts of speech and ordering relations. Third, and most important for this paper, it requires its grammars tojirncrion; that is, they must support the practical enterprises of language generation and analysis. 7.1.1. Compilation The view that a grammar is a data structure that can be interpreted by one or more transducers has some attractions over the one that would have it actually be a transducer. On the one hand, while the grammar itself remains the proper repository of linguistic knowledge, and the formalism, an encapsulation of formal linguistic universals, the language- independent transducer becomes the embodiment o f a theory of performance. But there is no reason to suppose that the same transducer should be reversible, taking responsibility for both generation and parsing. While the strongest hypothesis would indeed provide only one device, a more conservative position would separate these functions. This paper takes the conservative position. Given that generation and parsing are done by different transducers, a strong hypothesis would have both transducers interpret the same grammar; a more conservative position would associate a different formalism with each transducer and provide some mechanism for ensuring that the same facts were represented in each. This paper takes the conservative position. Specifically. the position is that the generator operates directly on the canonical form of the grammar - the competence grammar - and that the parser operates on a translation of this grammar into a different formalism. This paper will concentrate on how this translation is actually camed out; it will, in short, be about machine translation between grammatical formalisms. The kind of translation to be explored here is known in computer science as compilation, and the computer program that does it is called a compiler. Typically, compilers are used to translate programs from socalled high-lewl programming'like FORTRAN or ALGOL, into the lowlevel languages that directly control computers. But, compilers have also proved very useful in other situations. Whatever the specific application, the term "compilation" almost always refers,to a process that translates a text produced by a human into a text that IS functionally equivalent, but not intended for human consumption. There 'are at least two reasons for doing this. One is that those properties of a language that make for perspicuity and ease of expression are very different from those that make for simple, cheap, reliable computing. Put simply, computers are not easy to talk to and it is sometimes easier to use an interpreter. It is economical and efficieot for man and machine to talk different languages and to intemct through an intermediary. The second reason has to do with flexibility and is therefore also economic. By compilation, the programming enterprise is kept. in large measure, independent of particular machines. Through different compil. ers, programs can be made to run on a variety o f computers. existing or yet to be invented. All in all, compilers facilitate the business of writing programs for computers to execute. The kind of compiler lo be described here confers practical benefits on the linguisl by facilitating the business of obtaining parsing grammars. It does this by making the enterprise essentially indistinguishable from writing competence grammars. It is possible to say things in the native language of a computer that never should be said, because they are either redundant or meaningless. Adding zero to a humber, or calculating a value that is never used, is I redundant and 11 erefore wasteful. Attempting to divide a number by zero makes no sense ind an endless cycle of operations prevents a result ever being reached. 5 hese things can almost always be assumed to be errors on the part of tCe programmer. Some of these errors are dificult or impossible to detect simply by inspecting the program and emerge only in the course of thr actual computation. But some errors can be dealt with by simply not providing any way of committing them in the higher-level language. Thus, an expression n10. meaning the result of dividing n by 0. would be deemed outside the language, and the compiler would not be able to translate a program that contained it. The general point is this: compiling can confer additional benefits when it is used to translate one language into a subset of another. In the case of programming languages, the idea is to exclude from the subset useless constructions. This benefit is easy to appreciate in the linguistic case. As 1 have said, competence grammars are written in a formal language designed to enshrine restrictions that have been found characteristic of the human linguistic faculty, in short, linguistic universals. The guarantee that performance grammars reflect these same universals can be provided by embedding them in the language of those grammars. But it can also be provided by deriving the performance grammars from the competence grammars by means of a process that is guaranteed to preserve all important properties. The language of the performance grammar is relatively unconstrained and would allow the expression of theoretically unmotivated things, but the translation procedure ensures that only a properly motivated subset is used. This strategy clearly makes for a stronger theory by positing a close connection between competence and performance. It has the added advantage of freeing the designer of the performance formalism of any concern for matters already accounted for in the competence system. An attribute is a symbol, that is, a string of letters. A value is either a symbol or another FD. The equal sign, " = is used to separate an attribute from its value so that, in a = @. a is the attribute and p the value. Thus, for example. ( I ) might be an FD. albeit a simple one, of the sentence He sun! her. ". CAT CAT = PRON GENDER = MASC PERSON CAT = PRON PERSON VERB = SEE TENSE = PAST b O l C E = ACTIVE CAT I . = S ["*' z.. GENDER = MASC NUMBER = SING PERSON = 3 GOAL = [=DER SING NUMBER PERSON = 3 VERB = SEE 7.1.2. Attributes and values As 1 have already pointed out. functional unification grammar knows things by their Junctional desrriptions. (FDs). A simple FD is a set of descriptors and a descriptor is a constituent set, a pattern. or an attribute with an associated value. 1 shall come to the form and function of constituent sets and patterns shortly. For the moment, we consider only attribute-value pairs. The list of descriptors that make up an F D is written in square brackets, no significance attaching to the order. The attributes in an FD must be distinct from one another so that if an FD F contains the attribute u , it is always possible to use the phrase "the u of F" to refer unambiguously to a value. = S f5 09 5 TENSE = PAST I f the values of SUBJ and DOBJ are reversed in (I). and the value of VOICE changed to PASSIVE, it becomes an FD for the sentence She was seen by him. However, in both this and the original sentence, he is the protagonist (PROT), or logical subject. and she the goal (GOAL) of the action, or logical direct object. In other words. both sentences are equally well described by (2). In the sense of transformational grammar ( 2 ) shows a deeper struc- P G U G OD (3) - CAT =s CAT = PRON GENDER = MASC (4) - = PRON GENDER = FEM NUMBER = SING PERSON = 3 I = SEE TENSE LVOICE = PAST = ACTIVE I CAT = S SUBJ = he DOBJ = CAT = NP HEAD = books CAT = PRESP = [LEX = WRITE VERB = LIKE TENSE = PRES VOICE = ACTIVE 11 - J ture than (1). However, in functional unification grammar, if a given linguistic entity has two different FDs, a single F D containing the information in both can be constructed by the process of unification, which we shall examine in detail shortly. The FD (3) results from unifying ( I ) and (2). A pair of FDs is said to be incompatible if they have a common attribute with different symbols, or incompatible FDs, as values. Grammatically ambiguous sentences have two or more incompatible FDs. Thus, for example, the sentence i f e likes wifing books might be described by (4) or (5). Incompatible simple FDs F I ... F k can be combined into a single complex FD { F l ... F A )which describes the union of the sets of objects that its components describe. The notation allows common parts of components to be factored in the obvious way, so that (6) describes all those objects that are described by eirher (4) or ( 5 ) . The use of braces to indicate alternation between incompatible FDs or sub-FDs provides a compact way of describing large classes of disparate objects. In fact. as we shall see, given a few extra conventions, it makes it possible to claim that the grammar of a language is nothing more than a single complex FD. 7 .I .3. Unification A string of atoms enclosed in angle brackets constitutes a pufh and there is at least one that identifies every value in an FD. The path ( a , u 2 ... u r c ) identifies the value of the attribute uk in the FD that is the value of (u , u 2 -. 1 SUBJ = he rCAT = NP r DOBJ = HEAD = L VERB 11 J =’ LIKE- TENSE = PRES VOICE = ACTIVE 1 CAT CAT = S SUBJ = he = s GENDER SUBJ = PRO1 'CAT = NP rHEAD = books = NUMBER PERSON PRON FEM ACC = SING = 3 = = = GENDER DOBJ = r i DOBJ = GOAL r CAT = s = NUMBER PERSON 1 VERB = = SEE HEAD = VERB = LIKE TENSE = PAST VOICE = ACTIVE ASPECT PERFECT PROGRESSIVE = = 1 1 PRON MASC NOM SING = 3 = = = = I' - TENSE = PRES VOICE = ACTIVE aI . Paths are always interpreted as beginning in the largest F D that encloses them. Attributes are otherwise taken as belonging to the smallest enclosing FD. Accordingly, A pair consisting of a path in an FD and the value that the path leads to is afcafure of the object described. If the value is a symbol, the pair is a basic feature of the FD. Any FD can be represented as a list of basic features. For example, ( 8 ) can be represented by the list (9). I t is in the nature of FDs that they blur the usual distinction between features and structures. Example (8) shows FDs embedded in other FDs, thus stressing their structural properties. Rewriting (8) as (9) stresses the componential nature of FDs. It is the possibility of viewing FDs as unstructured sets of features that makes them subject to the standard operations of set theory. However, it is also a crucial property of FDs that they are not closcd under set- theoretic operations. Specifically, the union of a pair o f FDs is not, in general, a well-formed FD. The reason is this: The requirement that a given attribute appear only once in an F D implies a similar constraint on the set of features corresponding to an FD. A path must uniquely identify. a value. But, if the F D F I has the basic feature (a) = x and the F D F2 has the basic feature (a) = y, then either x = y or F I and F 2 are incompatible and their union is not a well-formed FD. So, for example, if fa describes a sentence with a singular subject and F zdescribes a sentence with a plural subject. then SI U S z , where SI and SZare the corresponding sets of basic features. is not well formed because it would contain (SUSJ NUMBER) = SINGULAR and (SUBJ NUMBER) = PLURAL. When two or more simple FDs are compatible, they can be combined into one simple F D describing those things that they both describe, by the process of unification. Unification is the same as set union except that it yields the null set when applied to incompatible arguments. The - sign is used for unification, so that a = p denotes the result of unifying a and p. Examples (10)-(12) show the results of unification in some simple cases. LL 3, f5' z. .n E gE 5 R i;' 1 0 a B (CAT) = S The result of unifying-a pair of complex FDs is, in general, a complex F D with one term for each compatible pair of terms in the original FDs. Thus ( a , ... a,) = ( b l ... h,) becomes an F D of the form { c , ... ck) in which each ch ( I c: h 5 k ) is the result of unifying a compatible pair a i = b, (I i 5 rn, 1 I: j 5 n). This is exemplified in (13). (SUBJCAT) = PRON (SUBJGENDER) = MASC (SUBJCASE) = NOM (sum NUMBER) = SING ( S U N PERSON) = 3 (PROTCAT) = PRON (PROT GENDER) = MASC (PROTCASE) = NOM (PROT NUMBER) = SING (PROT PERSON) = 3 (OBJCAT) = PRON (om GENDER) = FEM (OWCASE) = ACC (OBJ NUMBER) = SING (OBJ PERSON) = 3 CAT I*'( [CASE PREP = DAT] MIT = PP = [HEAD = [ C A T = NP CASE = (CASE) (GOALGENDER) = FEM (GOALNUMBER) = SING (GOALPERSON) = 3 (VERB CAT) = VERB (VERB WORD) = SEE (TENSE) = PAST (VOICE) = ACTIVE (ASPECTPERFECT) = (ASPECT PROGRESSIVE) = + - .. . ) . . 1 F = PP HEAD = (GOALCAT) = PRON (GOALCASE) = FEM ]] 3EL: [ CAT CAT = NP CASE = (CASE)] [TENSE :RES] = FORM = IS = [CAT = VERB] TENSE = PAST [ C-set CAT = VERB j TENSE = PAST FORM was ~ Unification is the fundamental operation underlying the analysis and synthesis of sentences using functional unification grammar. It is important to understand the difference between saying that a pair of FDs are equal and saying that they are unified. Clearly, when a pair of FDs has been unified, they are equal; the inverse is not generally true. To say that a pair of FDs A and B are unified is to say that A and R are two namesfor one and the same description. Consequently, if A is unified with C. the effect is to unify A , B . and C. On the other hand. if A and R. though possibly equal. are not unified. then unifying A and C does not aNect B and. indeed, if A and C are not equal, a result of the unification will be to make A and B different. A crucial consequence of this is that the result of unifying various pair of FDs is independent of the order in which the operations are carried out. This, in its turn, makes for a loose coupling between the grammatical formalism as a whole and the algorithms that use it. 7.1.4. Patterns and constituent sets We come now to the question of recursion in the grammar and how constituency is represented. I have already remarked that functional unification grammar deliberately blurs the distinction between structures and sets of features. It is clear from the examples we have considered so far that some parts of the F D of a phrase typically belong to the phrase as a whole, whereas others belong to its constituents. For example, in (8). the value of SUBJ is the F D of a constituent of the sentence. whereas the value of ASPECT is not. The purpose of constituent sets and patterns is to identify constituents and to state constraints on the order of their occurrence. Example (14) is a version of ( 8 ) that specifies the order. (SURJ VERR DOBJ) is a pattern stating that the values of the attributes S U B J , V E R B , and DOBJ are FDs of constituents and that they occur in that order. A s the example illustrates, patterns appear in FDs as the values of attributes. In general, the value is a list of patterns, and not just one as in (14). The attributepattern is distinguished, its value being the one used to determine the order of the immediate constituents of the item being described. The attribute C-set is also distinguished, its value being a single list of paths identifying the immediate constituents of the current item, but imposing = (SUBJ VERB OBJ) Pattern = (SUBJ VERB OBJ) CAT = s SUBJ = PRO1 NUMBER = SING PERSON = 3 CAT = PROF GENDER = FEM DOBJ = GOAL NUMBER = SING PERSON = 3 VERB = [ERD TENSE = PAST VOICE = ACTIVE I !:EB] PERFECT ASPECT = + no order on them. Here, as elsewhere in the formalism, one-step paths canbe, and usually are, written without enclosing angle-brackets. The patterns are templates that the string of immediate constituents must match. Each pattern is a list whose members can be 1 I. A parh. The path may have as its value a. An FD. As in the case of the constituent set. the FD describes a constituent. 4. b. A pattern. The pattern is inserted into the current one at this point. A srring ofdots. This matches any number of constituents. The symbol #. This matches any one constituent. An FD. This will match any constituent whose description is unifiable with it. The unification is made with a ropy of the FD in the pattern, rather than with the FD itself, because the intention is to impute its properties to the constituent, but not to unify all the constituents that 5. An expression offheform (* f d ) . where f d is on FD. This matches zero 2. 3. match this part of the pattern. or more constituents. provided they can all be unified with a copy of f d. The ordering constraints in (14) could have been represented by many other sets of patterns. for example, those in (15). t: N (IS) Example (19) shows a simple grammar, corresponding to a context-free grammar containing the single-rule (20). ... (SUBJ V E R B ) (-. VERB DOBJ) (SUB) W B J ) (-. V F R B ...) (... SUBJ DOBJ) (X V E R B ...) (... SUBJ ' ' ' V E R B ' " W B J ) (... SUBJ VERB ) (.. DOBJ) ... ... (SUBJ V E R B . . .) ... CAT = S iIr The pattern (16) requires exactly~one constituent to have the properly [TRACE = NP]; all others must have the property [TRACE = NONF]. (16) ((* [TRACE = NONE]) [TRACE = NPI (* (TRACT = SUBJ VERB (... V E R B The last of these could be a compatible pair just in case VERB and OORJ were unified. but the names suggest that the first would have the feature [CAT = VERB] and the other [CAT = NP], ruling out this possibility. The value of the C-set attribute covers all constituents. If two FDs are unified. and both have C-set attributes, the C-set attribute of the result is the intersection of these. If a member of fhe pair has no such attribute, the value is taken from the other member. In other words, the universal set of constituents can be written by simply omitting the attribute altogether. If there are constituents. but no patterns, then there are no constraints on the order in which the constituents can occur. I f there are patterns, then each of the constituents must be assimilated to all of them. 7.1.5. Grammar A functional unification grammar is a single FD. A sentence is well formed if a constituent-structure tree can be assigned to it each of whose nodes is labeled with an FD that is compatible with the grammar. The immediate descendents of each node must be properly specified by constituent-sets and patterns. Generally speaking, grammars will take the form of alternations. each clause of which describes a major category; that is, they will have the form exhibited in (18). . 1. . . SCOMP) SCOMP = [CAT = S) [CAT = NP] [CAT = VERB] ..' SllBJ ...) (# SUB3 (SUBJ "') (." SUBJ VERB "-) (.-.SUB) W B J "') [CAT = NP] [SCOMP = NONE] NONT])) Clearly, patterns, like attribute-value pairs, can be incompatible thuc preventing the unification of FDs. This is the case in examples i n (17). (17) (... SUBJ = (20)S -+ NP VERB (S) FD (19) describes either sentences or verbs or noun phrases. Nothing is said about the constituency of the verbs or noun phrases described - they are treated as terminal constituents. The sentences have either two or three constituents depending on the choice made in the embedded alternation. All constituents must match the FD(19). Since the first constituent has the feature [CAT = NP]. it can only match the second term in the main allernation. Likewise, the second constituent can only match the third term. If there is a third constituent, it must match the first term in the alternatiori. because it has the feature [CAT = sJ. It must therefore also have two or fhree constituents also described by (19). If (19) consisted only of the first term in the outer alternation. i t would have a null extension because the first term, for example, would be required to have the incompatible features [CAT = NP] and [CAT = s]. On the other hand, if the inner alternation were replaced by its second term, so that [scow = NONE] were no longer an option. then the FD would correspond to the rule (21), whose derivations do not terminate. (21) S -+ NP VERB S 7.2. The parser 7.2.I. The General Syntactic Processor As I said at the outset, the transducer that is used for generating sentences operates on grammars in essentially the form in which they have been given here. The process is quite straightforward. The input is an FD that constitutes the specification of a sentence to be uttered. The more detail it contains. the yore closely it constrains the sentence that will be produced. The transducer attempts to unify this FD with the grammar. If .c" Y this cannot be uone - that is. if it produces a null result because of incompatibilities bztween the two descriptions - there is no sentence in the language that mc ets the specification. If the unification is successful, the result will, in gewral, be to add detail to the FD originally provided. If the FD that resiilts from this step has a constituenf set, the process is repeated, unifying each constituent in turn with the grammar. A constiluent that has no constituents of its own is a terminal that must match some entry in the lexicon. Parsing is by no means so straightforward. Grammars, as I have characterized them, do not. for example, enable one to discern, in any immediate way, what properties the first or last word of a sentence might have. It is mainly for this reason that a compiler is required to convert the grammar into a new form. Compilation will result in a set of procedures, or miniature programs, designed to be embedded in a general parsing program that will call upon it at the appropriate time. The general program, a version of the General Syntactic Processor, has been described in several places. The basic idea on which it is based is this. There are two principal data structures, the chart. and the agenda. The chart is a directed graph each of whose edges maps onto a substring of the sentence being analyzed. The chart therefore contains a vertex for each of the possible end points of such substrings, k + 1 vertices for a sentence of k words. Vertices are directed from left to right. The only loops that can occur consist of single edges incident from and to the same vertex. Each word in the sentence to be parsed is represented by an edge labeled with an FD obtained by looking that word up in the lexicon. If the word is ambiguous, that is, if it has more than one FD. it is represented by more than one edge. All the edges for the ith word clearly go from vertex i - I to vertex i . As the parser identifies higher order constituents, edges with the appropriate FDs are added to the chart. A particular analysis is complete when an edge is added from the first to the last vertex and labeled with a suitable FD.say one with the feature [CAT = SI. The edges just alluded to are all inactive; they represent words and phrases. Active edges each represent a step in the recognition process. Suppose a phrase with c constituents is recognized. Since the only knowledge the parser has of words and phrases is recorded in the chart, it must be the case that the c6nstituent edges are entered in the chart before the process of recognizing the phrase is complete, though the recognition process can clearly get underway before they are all there. If the phrase begins at vertex ul, the chart will contain c vertices (possibly among others) beginning at u1. each ending at a vertex where one of the constituents ends. The one that ends where the final constituent ends is the one that represents the phrase itself. That one is inactive. The remainder are ocrive -' -- edges and each records what is known about a phrase when only the first of so many of its constituents have been seen, and also what action must be taken to incorporate the next constituent to the right. The label on an active edge therefore has two parts. an FD describing what is known about the putative phrase, and a procedure that will carry the recognition of the phrase one step further forward. It is these procedures that constitute the parsing grammar and that the compiler is responsible for constructing. Parsing proceeds in a series of steps in each of which the procedure on an active edge is applied to a pair of FDs, one coming from that same active edge, and the other from an inactive edge that leaves the vertex where the active edge ends. In other words, if o and i are an active and an inactive edge respectively, o being incident to the vertex that i is incident from, the step consists in evaluating Po(fo,ff),where f, and f , are the FDs on a and i , and Pa is the procedure. The step is successful if f f meets the the requirements that Pa makes of it. one of which is to unify it with the value of some path in a copy of fa. This copy then becomes part of the label on a new edge beginning where fa begins and ending where fiends. The requirements imposed on f i , the path in the copy o f f , with which it is unified, and the procedure to be incorporated in the label of the new edge, are all built in to P o . The last step completes the recognition of a phrase, and the new edge that is produced is inactive and therefore has no procedure in its label. This process is camed out for every pair consisting of an active followed tiy an inactive edge that comes to be part of the chart. Each successful step lads to the introduction of one new edge, but this edge may result in several new pairs. Each new pair produced therefore becomes a new item on the agknda which serves as a queue of pairs waiting to be processed. The initial step in the recognition of a phrase is one in which the active member of the pair is incident from and to the same vertex - the same vertex from which the inactive edge is also incident. This is reasonable because active edges are like a snapshot of the process of recognizing a phrase when it is still incomplete. The initial active edge is therefore a snapshot of the process before any work has been done. These edges constitute the only cycles in the chart. Their labels clearly contain a process, but no FD. The question that remains to be answered is how these initial active edges find their way into the chart. Suppose the first step in the recognition of all phrases, of whatever type. is carried out by a single procedure. I leave open, for the moment, the question of whether this is true of the output of the functional grammar compiler. This process will be the only one that is ever found in the label o f initial, looping, active edges. If a looping edge is introduced at every vertex in the chart labeled with this process, then the strategy I have I 1 outlined will clearly c iuse all substrings of the string that are phrases to be analyzed as such, md all the analyses of the string as a whole will be among them. This technique corresponds to what is sometimes called undirected. bottom-up parsing.' If there is a number of different processes that can begin the recognition of a phrase, then a loop must be introduced for each of them at each vertex. Uqdirected top-down parsing is modeled by a strategy in which the only loops introduced initially are at the first vertex and the procedures in their labels are those that can initiate the search for a sentence. Others are introduced as follows: when the FD that a given procedure is looking for on an inactive edge corresponds to a phrase rather than a lexical item, and there is no loop at the relevant vertex that would initiate the search for a phrase of the required type. one is created. 7.2.2. The parsing grammar The parsing grammar, as we have seen, takes the form of a set o f procedures, each of which operates on a pair of FDs. One of these FDs, the matrix FD. is a partial description of a phrase, and the other, the consfiruenfFD. is as complete a description as the parser will ever have of a candidate for inclusion as constituent of that phrase. There may be various ways in which the constituent FD can be incorporated. Suppose, for example, that the matrix is a partial FD of a noun phrase and that it already contains the description of a determiner. Suppose, further, that the edge representing the determiner ends where the current active edge ends. The current procedure will therefore be a specialist in noun-phrase constituents that follow initial determiners. Offered a constituent FD that describes an adjective, it will incorporate it into the matrix FD as a modifier and create a new active edge whose label may contain itself - on the theory that what can follow a determiner can also follow a determiner and an adjective. But, if the constituent FD describes a noun, that must be incorporated into the matrix as its head. At least two new edges will be produced in this case, an inactive edge representing a complete noun phrase and an active edge labeled with a procedure that specializes in incorporating prepositional phrases and relative clauses. We have seen that the process of recognizing a phrase is initiated by an active edge that loops back to its starting vertex, and which has a null FD in its label. For the case of functional grammar, all such loops will be labeled with the same procedure, one capable of recognizing initial members of all constituents and building a matrix FD that incorporates them appropriately. Thus, for example, if this procedure is given a constituent FD that is a determiner description. it will put a new active edge in the chart whose FD is a partial NP description incorporating that determiner FD. The picture that emerges, then, is of a network of procedures, akin in many ways to an augmented transition network (ATN) (Wood, 1969). The initial procedure corresponds to the initial state and, in general, there is an arc connecting a procedure to the procedures that it can use to label the new active edges it introduces. The arc that i s followed in a particular case depends on the properties that the procedure finds in the constituent FD it is applied to. Consider the situation that obtains in a simple English sentence in the active voice after a noun phrase, a verb, and a second noun phrase have been seen. Three different grammar procedures are involved in getting this far. Let us now consider what would face a fourth procedure. If the verb is of the appropriate kind, the question of whether the second noun phrase is the direct or the indirect object is still open. If the current constituent FD is a noun phrase, then one possible way to continue the analysis will accept that noun phrase as the direct object and the preceding one as the indirect object. But, regardless of what the constituent FD is, another possibility that must be explored is that the preceding noun phrase was in fact the direct object. Let us assume that what can follow is the same in both cases; in ATN terms, a transition is made to the same state. A grammar procedure that does this is given below in (22). (22) - [I]121 131 (PROO ((OLDMATRIX MATRIX) '% . 141 SI) ' -. (SELECTQ 7 151 161 171 LI (ASSIGN 4 DOEJ) (TO S~DOSJ) (SETQ MATRIX OLDMATRIX) (ASSIGN 3 DOE,) (TO S~DOBJ) 1141 [Is] (NP) (NIL (SETQ SI T)) (GO LI)) (AND SI (NEWPATH 4 CAT) (QUOTE NP))) (ASSION 3 IOEJ) 181 191 1101 1111 1121 1131 (PATH 4 CAT) OUT) The procedure is written in Interlisp, a general programming language, but only a very small subset of the language is used, and the salient points about the example will be readily understood with the help of the following remarks. The major idiosyncracy of the language for present expository purposes is that the name of a function or procedure, instead of being written before a parenthesized list of arguments, is written as the first item within the parentheses. So what would normally be written as F(x) is written (F x). 35: o E rn The heart of the procedure is in lines 3 through 6, which constitute a call on the function sELECTQ. SELECTQ examines the value of the CAT attribute of the constituent FD, as specified by the first argument (PATH 4 CAT). This returns the value of the (CAT) path of the fourth, that is, the current constituent.2 The SELECTQ function can have any number of arguments, and the remaining ones give the action to be taken for various values of the first one. All but the last give the action for specific values, or sets of values, and the last says what is to be done in all other cases. In (22). the specific values are NP (line 4), and NIL (line 5). The nonspecific case is given in line 6: (GO LI). Now let us examine these cases more closely. The last case is the most straightforward. If the value of the constituent FD has some value other than NP, or N I L for its CAT attribute. then it must describe something other than a noun phrase. NP is the value it would have if it did describe a noun phrase, and NIL is a purely conventional value that the procedure finds when the attribute is absent altogether. The only other possibility is that the attribute has some other substantive value, like VP. If this happens, the instruction (GO L I ) is executed, causing processing to resume at line 12. The first thing that is done here, (SETQ MATRIX OLDMATRIX), is of minor interest, and w e will return to it shortly. After this. only two instructions remain, on lines 13 and 14. The effect of the first of these, (ASSIGN 3 ooei), is to unify the description ofthe third constituent of the phrase the noun phrase immediately preceding the phrase that this procedure examines - with the value of the path DOBJ. In short, it implements the hypothesis that that constituent was the direct object. The instruction (mSKWBJ), intentionally reminiscent of the language of A T N s , causes a new active edge to be created, labeled with the procedure SIDOBI. In the remaining cases, the constituent either has, or could be given, the feature [CAT = NP]. These are the cases in which the current constituent could be the direct, and the preceding constituent the indirect, object. However, if the value of the CAT attribute of the current constituent is NIL, it must be assigned the value NP before the hypothesis is acted upon and the procedure sets the local variable SI to T (Interlisp's own name for "true") in line 5 to remind it to d o this. It is actually done in by the instructions on lines 7 and 8; if si has a nonnull value, the instruction (NEWPATH ICAT) (QUOTE NP)) is camed out, causing NP to become the new value of the CAT attribute of the constituent FD. Setting and testing the local variable SI is an eficiency measure, enabling the cost of the NEWPATH operation to be avoided when the required value is already in place. The instructions on lines 9 and IO are similar to the one on line 13. They cause the third and fourth constituents to be assigned the functions IOBI and DOBJ. respectively. (TO S ~ B I then ) causes a new active edge to be created as before, The procedure then goes on to do what it would - have done in the default case described above. But it must first restore the FDs to the condition they were in when the procedure was entered, and this is the purpose o f the instruction (sETQ MATRIX OLDMATRIX) on line 12.' The original matrix FD was preserved as the value of the temporary variable OLDMATRIX in line 1 and the constituent FD is embedded in the same data structure by a subterfuge that is best left unexamined. If, in the case w e have just examined, the grammar made no provision for constituents following the direct object, the instruction (TO WBJ) appearing on lines 11 and 14 would have been replaced by (DONE). This has the effect of creating an inactive edge labeled with the current matrix FD. More realistically. the procedure should probably have both instructions in both places, on the theory that material can follow the direct object, but it is optional. Before going on to discuss how procedures of this kind can b e compiled from functional grammars, a few words are in order on why procedures like (22) have the particular structure they have. Why, in particular, is the local variable S I used in the way it is rather than simply embedding the (NEWPATH 4 CAT) (QUOTE NP)) in the (SELSCTQ) instruction? The example at (23) will help make this clear. The procedure succeeds, and a transition to the state NEXT is made, just in case the current constituent has the features [CAT = NP] and [ANIMATE = NONE]. If attribute CAT in the constituent F D has the value NIL, then as we have seen, the function NEWPATH must be called to assign the value NP to this attribute. However, there is nopoint in doing this if the other requirements of the constituent are not also met; Accordingly, in this case, the procedure verifies the feature [ANIMATE = NQNEI before making any reassignments of values. 1 (23) (PROO ( (OLDMATRIX MATRIX) s2 SI) (SELECTQ (PATH I CAT) (NP) (NIL (SETQ SI T)) (00 OUT)) (SELEIXQ (PATH I ANIMATE) (NONE) (NIL (SETQ sz T)) (WJ OUT)) (AND sz (NEWPATH I ANIMATE) (QUOTE NONE))) (AND SI(NEWPATH I CAT) (QUOTE NP))) (TONEXT) OUT) A second point that may require clarification involves instructions like (ASSIGN 3 DOBJ), wpich can cause the assignment of constituents other than the current one to grammatical functions - attributes - in the matrix FD. In principle, all such assignments could be made when the constituent in question is current, in which case no special instruction would be required and it would not be necessary to use numbers to identify particular constituents. These devices are the cost of some considerable gains in efficiency. A situation that illustrates this is now classic in discussions of ATN grammars. Suppose that the first noun phrase in a sentence is to be assigned to the SUBJ attribute if the sentence is active, and to the DOBJ attribute if it is passive. The voice is not known at the time the noun phrase is encountered what the voice of the sentence is. A perfectly viable solution to the problem this raises is to make both assignments, in different active edges, and to unify the corresponding value with that of the VOICE attribute. When the voice is eventually determined. only the active edge with the proper assignment for the first noun phrase will be able to continue. But. in the meantime. parallel computations will have been pursued redundantly. The numbers that appear in the ASSIGN instructions serve as a temporary label and enable us to defer the decision as to the proper role for a constituent. 7.3. The compiler The compiler has two major sections. The first part is a straightforward application of the generation program to put the grammar, effectively. into disjunctive normal form. The second is concerned with actually building the procedures. A grammar is a complex FD, typically involving a great many alternations. As with any algebraic formula involving and and or, it can be restructured so as to bring all the alternations to the top level. In other words, if F is a grammar, or indeed any complex FD,it is always possible to recast it in the form F I V F2 ... F , , where the Fi ( I 5 i 5 n ) each contain no alternations. This remains true even when account is taken of the alternations implicit in patterns, for if the patterns in an FD do not impose a unique ordering on the constituents, then each allowable order is a diNerent alternative. Now, the process of generation from a particular FD, effectively selects those members of F , F,, that can be unified with f. and then repeats this procedure recursively for each constituent. But F is, in general, a conjunct containing some atomic terms and some alternations. ignoring patterns for the moment, the procedure is as follows: .f9 I. Unify the atomic terms of F w i t h f . If this fails. the procedure as a whole fails. Some number of alternations now remain to be considered. In other words, that part of F that remains to be unified with f is an expression F' Of the f o m (01.1 V ~ 1 . z... 0 1 . t ~A) ( 0 2 . 1 V 0 2 . 1 ... 0 1 . k , ) ... ( a V , tV a n , : "' 0 m . k . ) . 2. 3. Rewrite as an alternation by muhiplying our the terms of an arbitrary alternation in F'. say the first one. This gives an expression F" of the form ( O I J A (o?.I V 0 2 . 2 ... ~ 2 . d A (am.t V an,2 ... o ~ , L )V ) ((11.2 A (a2.i V 0 2 . 2 ... ~ z . . t ) ) A ( 0 m . 1 V 0 " . 2 ... ~ , , , h ~ ) )... ( o I . ~A , (02.1 V 02.2 ... a 2 . d A (CLI V (1.1 ..-a , . d ) . Apply the whole procedure (steps 1-3) separately to each conjunct in F . I f f is null, then there are no constraints on what is generated and the effect is to enumerate all the sentences in the language, a process that presumably only terminates in trivial cases. Unconstrained generation does terminate, however, if it is applied only to a single level in the constituency hierarchy and, indeed, the effect is to generate precisely the disjunctive normal form of the grammar required by the compiler. It remains to spell out the alternatives that are implicit in the patterns. This is quite straightforward and can be done in a separate part of the procedure applied to each of the FDs that the foregoing procedure delivers. The basic idea is to generate all permutations of the constituent set of the FD and to eliminate those that do not match all the patterns. This process, like the one just described, can be considerably streamlined. Permutations are generated in an analog of alphabetical order, so that all those that share a given prefix are produced together. But, if any such prefix itself violates the requirements of the patterns, the process is truncated and the corresponding suffixes are not generated for them. The result of this phase of the compilation is a list of simple FDs, containing no alternations. and having either no pattern, or a single pattern that specifies the.yrder of constituents uniquely. Those that have no pattern become lexical entries and they are of no further interest to the compiler. It seems likely that realistic grammars will give rise to liscs which, though quite long, are entirely manageable. If a little care is exercised and list-processing techniques are used in a thoroughgoing rnanner, a great deal of structure turns out to be shared among members of the list. When FDs are simplified in this way, an analogy between them and phrase structure rules begins to emerge. In fact, a simple F D F of this kind could readily be recast in the form F -+ F , Fr where F , F& are the subdescriptions identified by the terms in the pattern taken in the order given there. The analogy is not complete because the items that make up the right-hand side of such a rule cannot be matched directly against the left-hand sides of other rules. A rule of this kind can be used to expand a given item, not i f its left-hand side is that item. but if it can be unified with it. The second phase of the compiler centers around a procedure which, given a list of simple FDs, and an integer n, attempts to find an attribute, h: 3. n K a 0 2 a I or path, on the Ilasis of which the nth constituent of those FDs can be distinguished. In the general case, the result of this process is (I) a path A , (2) a set of values for A , each associated with the subset of the list of FDs whose nth constituent has that value of A . and (3) a residual subset of the list consisting of FDs whose nth constituent has no value for the attribute A . The procedure attempts to find a path that minimizes the size of the largest of these sets. The residual set cannot be distinguished from the other sets on the basis of the chosen path; in other words. for each member R of the residual set, a new member can be added to each of the other sets by adding a path with the appropriate value to a copy of R. The same procedure is applied to each of the resulting sets. with the same constituent number, until the resulting sets cannot be discriminated further. The overall result of this is to construct a discrimination tree that can be applied to an FD to determine in which of the members of the list it could be incorporated as the nth constituent. The discrimination procedure also has the responsibility of detecting features that are shared by all the FDs in the list provided to it. For this purpose, it examines the whole FD and not just the nth constituent. Clearly, if the list is the result of a previous discrimination step. there will always be at least one such feature, namely the path and value on the basis of which that previous discrimination was made. The discrimination procedure thus collects almost all the information required to construct a grammar procedure. All that is lacking is information about the other procedures that a given one must nominate to label active chart edges that it produces. This in its turn is bound up with the problem reducing equivalent pairs of procedures to one. Before going on to these problems, we shall do well to examine one furthqr example of a grammar procedure to see how the operation of the discrimination process is reflected in its structure. The procedure below, (24), shows the reflexes o f all the important parts of the procedure. It is apparent that two sets of FDs have been discriminated on the basis that the discriminating path was CAT. If the value of the path is VERB, a further discrimination is made on the basis of the TRANSITIVE attribute o f the third constituent. If the constituent is a transitive verb, the next active arc will be labeled with the procedure V-TRANS, and if intransitive with V-INTRANS. If the category is A U X , the next procedure will be AUX-STATE.-ThiSdiscrimination is reflected in lines 5-23. Lines 3 and 4 reflect the fact that all the FDs involved have the subject in first position, and that subject is singular. The instruction (OR (UNIFY (SUBJ)(I))(GOOUT)), for example, say that either the value of the subject attribute can be unified with the first constituent or the procedure terminates without further ado. (PROG ( (OLDFEATURES FEATURES) SI s2) (OR (UNIFY (SUB)) (I)) (GO OUT)) (OR (UNIFY SING (SUEJNUM)) (GO OUT)) (SELECTQ(PATH (3 CAT)) (VERB (GO ~ 2 ) ) (AUX (GO LI)) (NIL (SETQ SI T)) (GO OUT)) (NEWPATH 3 CAT (QUOTE VERB)) LZ (SELECTQ (PATH (3 TRANSITIVE)) (PLUS (GO U)) (MINUS (GO LJ)) (NIL (SETQ sz T)) (GO OUT)) (NEWPATH 3 TRANSITIVE (QUOTE PLUS)) (TO V-TRANS) u (SETQ FEATURES OLDFEATURES) (AND sz (NEWPATH 3 TRANSITIVE (QUOTE MINUS)) (TO V-INTRANS) ~3 (OR SI (GO OUT)) i22j [231 (NEWPATH 3 CAT (QUOTE AUX)) (TO AUX-STATE) LI OUT) It remains to discuss how a procedure is put in touch with its successor procedures. I n the examples I have shown, instructions like (TO V-INTRANS), provide more or less mnemonic names for the procedures that will be used to label new active edges, but, in doing so; I was exercising expository license. When the time comes to insert a name like this into the procedure+.,the discrimination process is invoked recursively to compile the procedu’re in question. Having done this, it compares the result with a list of previously compiled procedures. If, as frequently happens, it finds that the same procedure has been compiled before, it uses the old name and throws away the result of the compilation just completed; 0therwise it assigns a new name. The effect of this is to substantially reduce the total number of procedures required. In ATN terms, the effect is to connate similar tails of the transition networks. If this technique were not used, the network would in fact have the form of a tree. 5E 09 $ L 7.4. Conclusion The compilation scheme just outlined is based on the view, expressed at the outset, that the competence grammar is written in a formalism with theoretical status. This is the seat o f linguistic universals. Since the parsing grammar is tightly coupled to this by the compiler. the language that it is written in is of only minor theoretical interest. The obvious choice I 3 is therefore a standard programming language. It is not only relatively straightforward to compile the grammar into such a language, as I hope to have shown, but the result can then be further compiled into the language of a particular computer. The result is therefore an exceedingly efficient parser. But there is a concomitant inefficiency in the research cycle involved in developing the grammar. This comes from the fact that the compilation itself is a long and very time-consuming enterprise. Two things can be said to mitigate this to some extent. First, the parsing and generation grammars do indeed describe exactly the same languages, so that much of the work involved in testing prototype grammars can be done with a generator that works directly and efficiently off the competence grammar. The second point is this: it is not in fact necessary to compile the whole grammar in order to obtain an entirely satisfactory parser. Suppose, for example, that every constituent is known to have a value for the attribute CAT. Some such assumption is almost always in order. Suppose, further, that a parsing grammar is compiled ignoring completely all other attributes; in other words, the compiler behaves as though any attribute-value pair in the grammar that did not mention CAT was not there at all. The resulting set of parsing procedures clearly recognizes at least all the sentences of the language intended, though possibly others in addition. However, the results of the parsing process can be unified with the original grammar to eliminate false analyses. Notes 1. See Kay (1980) for a fuller discussion of these terms. 2. The functions PATH, NEWPATH, and ASSIGN all take an indefinite number or arguments which, since the functions are all FEXPRS, are unevaluated. The first argument is a constituent number and the remainder constilute a path. 3. Argumentative LISP programmers may find this construction unconvincing on the grounds that OLDMATRIX in fact preserves only a pointer to the data structure and not the data structure itself. They should subside, however, on hearing that the data structure is an association list and the only changes made t o it involve cowing new material on the beginning. thus masking old versions. Reference Giizdar. Gerald. 1982. Phrase structure grammar, In: Pauline Jacobson and Geoniey K. Pullum (eds.). Thr nature afsyntartic prorrssing. Dordrechl. Holland: D. Reidel. pp. 131-86. Kaplan, Ronnld M. 1972. Augmented transition networks as psychologicalmodels of sentence comprehension. Artificial Inrelligrncr 3:77- 100. 1973. A General Syntactic Processor. In: Rustin (cd.). Narural languagr procrssing. Englewood Cliffs. N.J.: Rentice-Hall. 1978. Computational resources and linguistic theory. 'fhrorrrical Issurs in Narural Languagr Procrssing 2. Kay. Martin. 1973. The mind system. In: Rustin (ed.).Natural languagr procrssing. Englewood Cliffs. N.J.: Rentice-Hall. 1977. Morphological and syntactic analysis. In: A. Zampolli (ed.). Syntactic structurrs procrming. Amsterdam: North Holland. 1979. Functional grammar. Proceedings of the Fifth Annual Meefing a/ thr Brrkelry Linguisrir Socirty . 1980. Algorirhm schemara and data structures in syntactic processing. CSL-ilO.12 Xerox Palo Alto Research Center. Woods, William A. 1%9. Transition network grammars for nalural language analysis. Communications of the Associarion for Compuring Machinery 13:591-602. 3h: 3. n 0 a E v)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download fbi.h-da.de