Download Sentence Disambiguation by a Shift

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Stemming wikipedia , lookup

Embodied language processing wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Transcript
Sentence Disambiguation
by a Shift-Reduce Parsing Technique*
Stuart M. Shieber
Artificial Intelligence Center
SRI International
333 Ravenswood Avenue
Menlo Park, CA 94025
Abstract
constraints, e.g., the difl]culty of parsing garden-path sentences,
could be modeled. His claim about deterministic parsing was
quite strong. Not only was the behavior of the parser required
to be deterministic, but, as Marcus claimed,
Native speakers of English show definite and consistent
preferences for certain readings of syntactically ambiguous sentences. A user of a natural-language-processing system would
naturally expect it to reflect the same preferences. Thus, such
systems must model in some way the linguistic performance as
well as the linguistic competence of the native speaker. We
have developed a parsing algorithm--a variant of the LALR(I}
shift.-reduce algorithm--that models the preference behavior of
native speakers for a range of syntactic preference phenomena
reported in the psycholinguistic literature, including the recent
data on lexical preferences. The algorithm yields the preferred
parse deterministically, without building multiple parse trees
and choosing among them. As a side effect, it displays appropriate behavior in processing the much discussed garden-path
sentences. The parsing algorithm has been implemented and has
confirmed the feasibility of our approach to the modeling of these
phenomena.
The interpreter cannot use some general rule to take
a nondeterministic grammar specification and impose arbitrary constraints to convert it to a deterministic specification {unless, of course, there is a
general rule which will always lead to the correct
decision in such a case). [Marcus, 1980, p.14]
We have developed and implemented a parsing system
that. given a nondeterministic grammar, forces disambiguation
in just the manner Marcus rejected (i.e. t .hrough general rules};
it thereby exhibits the same preference behavior that psycbolinguists have attributed to native speakers of English for a certain range of ambiguities. These include structural ambiguities
[Frazier and Fodor, 1978, Frazier and Fodor, 1980, Wanner, 1980l
and lexical preferences [Ford et aL, 1982l, as well as the gardenpath sentences as a side effect. The parsing system is based on
the shih.-reduee scheduling technique of Pereira [forthcoming].
1. I n t r o d u c t i o n
Our parsing algorithm is a slight variant of LALR{ 1) parsing, and, as such, exhibits the three conditions postulated by
Marcus for a deterministic mechanism: it is data-driven, reflects
expectations, and has look-ahead. Like Marcus's parser, our
parsing system is deterministic. Unlike Marcus's parser, the
grammars used by our parser can be ambiguous.
For natural language processing systems to be useful, they
must assign the same interpretation to a given sentence that a
native speaker would, since t h a t is precisely the behavior users
will expect.. Consider, for example, the case of ambiguous sentences. Native speakers of English show definite and consistent
preferences for certain readings of syntactically ambiguous sentences [Kimball, 1973, Frazier and Fodor, 1978, Ford et aL, 1982].
A user of a natural-language-processing system would naturally
expect, it to reflect the same preferences. Thus, such systems
must model in some way the lineuistie performance as well as
the linguistic competence of the native speaker.
2. T h e P h e n o m e n a
t o be M o d e l e d
The parsing system was designed to manifest preferences
among ,~tructurally distinct parses of ambiguous sentences. It,
does this by building just one parse t r e e - - r a t h e r than building multiple parse trees and choosing among them. Like the
Marcus parsing system, ours does not do disambiguation requiring "extensive semantic processing," hut, in contrast to Marcus,
it does handle such phenomena as PP-attachment insofar as
there exist a priori preferences for one attachment over another.
By a priori we mean preferences that are exhibited in contexts
where pragmatic or plausibility considerations do not tend to
favor one reading over the other. Rather than make such value
judgments ourselves, we defer to the psycholinguistic literature
{specifically [Frazier and Fodor, 1978], [Frazier and Fodor, 1980]
and [Ford et al., 1982]) for our examples.
This idea is certainly not new in the artificial-intelligence
literature. The pioneering work of Marcus [Marcus, 1980] is perhaps the best. known example of linguistic-performance modeling
in AI. Starting from the hypothesis that ~deterministic" parsing
of English is possible, he demonstrated that certain performance
"This research was supported by the Defense Advanced Research Proiects
Agency under Contract NOOO39-80-C-0575 with the Naval Electronic
Systems Command. The views and conclusions contained in this document
are those of the author and should not be interpreted a.s representative of
the oh~cial policies, either expressed or implied, of the Defense Advanced
Research Projects Agency or the United States government.
113
The parsing system models the following phenomena:
the parse and a shift-reduce table for guiding the parse, At each
step in the parse, the table is used for deciding between two basic
types of operations: the shift operation, which adds the next
word in the sentence (with its pretcrminal category) to the top
of the stack, and the reduce operation, which removes several
elements from the top of the stack and replaces them with a
new element--for instance, removing an N P and a V P from the
top of the stack and replacing them with an S. The state of the
parser is also updated in accordance with the shift-reduce table
at each stage. The combination of the stack, input, and state of
the parser will be called a configuration and will be notated as,
for example,
Right Association
Native speakers of English tend to prefer readings in which
constituents are "attached low." For instance, in the sentence
Joe bought the book that I hod been trving to obtain for
~usan.
the preferred reaL~lng is one in w~lch the
prepositional
phrase "for Susan~ is associated with %o obtain ~ rather
than %ought. ~
Minlmal Attachment
On the other hand, higher attachment in preferred in eerrain cases such as
1
Joe bought the book [or Suean.
NPv IIMar,
110 1
where the stack contains the nonterminals N P and V, the input
contains the lexical item Mary and the parser is in state 10.
in which "for Susan* modifies %he book" rather than
"bought." Frazier and Fodor [1978] note that these are
canes in which the higher attachment includes fewer nodes
in the parse tree. Ore" analysis is somewhat different.
By way of example,
parser (using the grammar
tence "John loves Mary. ~
input has been consumed.
Lexical Preference
Ford et al. [10821 present evidence that attachment
preferences depend on lexical choice. Thus, the preferred
reading for
we demonstrate the operation of the
of Appendix I) on the oft-cited senInitially the stack is empty and no
The parser begins in state 0.
IIahn 10..Mar,
i0 i
As elements are shifted to the stack, they axe replaced by their
preterminal category." T.he shiR-reduce table for the grammar
of Appendix I states that in state 0, with a proper noun as the
next word in the input, the appropriate action is a shift. The
new configuration, therefore, is
The woman wanted the dresm on that rock.
has low attachment of the PP, whereas
The tnoman positioned the dreu on that rack.
has high attachment.
i
Garden-Path S e n t e n c e s
Grammatical sentences such as
PNOUN
lo~e8 Mar~l
i 4
!
The next operation specified is a reduction of the proper noun
to a noun phrase yielding
The horse raced pamt the barn fell.
seem actually to receive no parse by the native speaker
until some sort of "conscioun parsing" is done. Following
Marcus [Marcus, 1980], we take this to be a hard failure
of the human sentence-processing mechanism.
,
NP iI loves Mary
[2
i
The verb and second proper noun axe now shifted, in accordance
with the shift-reduce table, exhausting the input, and the proper
noun is then reduced to an NP.
It will be seen that all these phenomena axe handled in oux
parser by the same general rules. The simple context-free grammar used t (see Appendix I) allows both parses of the ambiguous
sentences as well as one for the garden-path sentences. The parser disambiguates the grammar and yields only the preferred
structure. The actual output of the parsing system can be found
in Appendix II.
NPv !l Ma,,
v P.ouN il
NP V NP i]
!1o
!,
:14
Finally, the verb and noun phrase on the top of the stack are
reduced to a V P
3. T h e P a r s i n g S y s t e m
i
The parsing system we use is a shift-reduce purser. Shiftreduce parsers [Aho and Johnson, 19741 axe a very general class
of bottom-up parsers characterized by the following architecture.
They incorporate a stock for holding constituents built up during
NP V P II
!I
!
~6
Il
which is in turn reduced, together with the subject NP, to an S.
i
sJl
,'I )
This final configuration is an accepting configuration, since all
I W e make no claims a4 to the accuracy of the sample grammar. It is
obviously a gross simplific~t.ion of English syntax. Ins role is merely to
show that the parsing system is sble to dis,~mbiguate the sentences under
consideration correctly.
2But see Section 3.'2.for an exception.
114
the input has been consumed and an S derived. Thus the sentence is grammatical ia the grammar of Appendix I, as expected.
3.1 D i f f e r e n c e s f r o m t h e S t a n d a r d
equivocal. Although the implementation acts as a simulator
for a nondeterministic machine, the nondeterminism is a priori
bounded, given a particular grammar and lexicon. 3 Thus. the
nondeterminism could be traded in for a larger, albeit still finite,
set of states, unlike the nondeterminism found in other parsing algorithms. Another way of looking at the situation is to
note that there is no observable property of the algorithm that
would distinguish the operation of the parser from a deterministic one. In some sense, there is no interesting difference between
the limited nondeterminism of this parser, and Marcus's notion
of strict determinism. In fact, the implementation of Marcus's
parser also embodies a bounded nondeterminism in much the
same way this parser does.
is
LR Techniques
The shift-reduce table mentioned above is generated
automatically from a context-free grammar by the standard algorithm [Aho and Johnson, 1974]. The parsing alogrithm differs,
however, from the standard LALR(1) parsing algorithm in two
ways. First, instead of assigning preterminal symbols to words
as they are shifted, the algorithm allows the assignment to be
delayed if the word is ambiguous among preterminals. When
the word is used in a reduction, the appropriate preterminal is
assigned.
The differentiating property between this parser and that
of Marcus is a slightly different one, namely, the property of
qaaM-real-time operation. 4 By quasi-real-time operation, Marcus
means that there exists a maximum interval of parser operation
for which no output can be generated. If the parser operates for
longer than this, it must generate some output. For instance,
the parser might be guaranteed to produce output (i.e., structure) at least every three words. However, because preterminal
assignment can be delayed indefinitely in pathological grammars,
there may exist sentences in such grammars for which arbitrary
numbers of words need to be read before output can be produced.
It is not clear whether this is a real disadvantage or not, and,
if so, whether there are simple adjustments to the algorithm
that would result in quasi-real-time behavior. In fact, it is a
property of bottom-up parsing in general that quasi-real-time
behavior is not guaranteed. Our parser has a less restrictive but
similar property, fairneaH, that is, our parser generates output
linear in the input, though there is no constant over which output is guaranteed. For a fuller discussion of these properties, see
Pereira and S h i e b e r [forthcoming].
Second, and most importantly, since true LR parsers exist
only for unambiguous grammars, the normal algorithm for deriving LALR(1) shift-reduce tables yields a table t h a t may specify
conflicting actions under certain configurations. It is through the
choice made from the options in a conflict that the preference
behavior we desire is engendered.
3.2 P r e t e r m i n a l
Delaying
One key advantage of shift-reduce parsing that is critical
in our system is the fact that decisions about the structure to
be assigned to a phrase are postponed as long as possible. In
keeping with this general principle, we extend the algorithm
to allow the ~ssignment of a preterminal category to a lexical
item to be deferred until a decision is forced upon it, so to
speak, by aa encompassing reduction. For instance, we would not
want to decide on the preterminal category of the word "that,"
which can serve as either a determiner (DET) or complementizer
(THAT), until some further information is available. Consider
the sentences
To summarize, preterminal delaying, as an intrinsic part
of the algorithm, does not actually change the basic properties
of the algorithm in any observable way. Note, however, that
preterminal assignments, like reductions, are irrevocable once
they are made {as a byproduct of the determinism of the algorithm}. Such decisions can therefore lead to garden paths, as
they do for the sentences presented in Section 3.6.
That problem i* important.
That problema are difficult to naive ia important.
Instead of a.~signiag a preterminal to ~that," we leave open the
possibility of assigning either DET or THAT until the first reduction that involves the word. In the first case, this reduction
will be by the rule NP ~ D E T NOM, thus forcing, once and for
all, the assignment of DET as preterminal. In the second ease,
the DET NOM analysis is disallowed oa the basis of number
agreement, so that the first applicable reduction is the C O M P S
reduction to S, forcing the assignment o f THAT as preterminal.
We now discuss the central feature of the algorithm.
namely, the resolution of shift-reduce conflicts.
3.3 T h e D i s a m b i g u a t i o n
Of course, the question arises as to what state the parser goes into after shitting the lexical item ~that." The answer
is quite straightforward, though its interpretation t,i~ d t,,a the
determinism hypothesis is subtle. The simple answer is that
the parser enters into a state corresponding to the union of the
states entered upon shifting a DET and upon shifting a THAT
respectively, in much the same way as the deterministic simulation of a nondeterministic finite automaton enters a ~uniou"
state when faced with a nondeterministic choice. Are we then
merely simulating a aoadeterministic machine here.~ The anss~er
Rules
Conflicts arise in two ways: aM/t-reduce conflicts, in which
the parser has the option of either shifting a word onto the stack
or reducing a set of elements on the stack to a new element;
reduce-reduce conflicts, in which reductions by several grammar
3The boundedness comes about because only a finite amount or informatie,n is kept per state (an integer) and the nondeterrninlsm stops at the
prcterminat level, so that, the splitting of states does not. propogate,
41 am indebted to Mitch Marcus for this .bservation and the previous
comparison with his parser.
i15
rules are possible. The parser uses two rules to resolve these
conflicts:5
Joe bou¢ht the book for Su,an.
(I)
Resolve shift-reduce conflicts by shifting.
demonstrates resolution of a reduce-reduce conflict. At some
point in the parse, the parser is in the following configuration:
(2)
Resolve reduce-reduce conflicts by performing
the longer reduction.
[
These two rules suffice to engender the appropriate behavior in the parser for cases of right association and minimal
attachment. Though we demonstrate our system primarily with
PP-attachment examples, we claim that the rules are generally
valid for the phenomena being modeled [Pereira and Shieber,
forthcoming].
NP V NP PP ii
120
I
with a reduce-reduce conflict. Either a more complex NP or a
VP can be built. The conflict is resolved in favor of the longer
reduction, i.e., the VP reduction. The derivation continues:
I
NP VP [I
I
I8
sll
!
1! I
ending in an accepting state with the following generated structure:
3.4 S o m e E x a m p l e s
Some examples demonstrate these principles. Consider the
sentence
[sdoe{v~,bought[Npthe bookl[Ppfor Susan]I]
Joe took the book that I bought for Sum,re.
After a certain amount of parsing has beta completed deterministically, the parser will be in the following coniigttration:
I
NP v
that
V Ill°r S,...
I
3.5 Lexical P r e f e r e n c e
with a shift-reduce confict, since the V can be reduced to a
VP/NP ° or the P can be shifted. The principle* presented would
solve the conflict in favor of the shift, thereby leading to the
following derivation:
NP V NP that NP V P l] Su,an
" N P V NP that N P V P NP II
NP v NP that NP V PP !l
NPVNPthatNPVP/NP
II
NP V NP that S/NP
NP v NP II
,,2
Iq'P V NP, 11.
.,
NP VP t1
....
sll
112
119
To handle the lexical-preferenee examples, we extend the
second rule slightly. Preterminal-word pairs can be stipulated as
either weak or strong. The second rule becomes
(2} Resolve reduce-reduce conflicts by performing
the longest reduction with the stroncest &ftmost
stack element. 7
)
I
124
I
i 22
I
Therefore, if it is assumed that the lexicon encodes the
information that the triadic form of ~ a n t " iV2 in the sample
grammar) and the dyadic form of ~position" (V1) are both weak,
we can see the operation of the shift-reduce parser on the ~dress
on that rack" sentences of Section 2. Both sentences are similar
in form and will thus have a similar configuration when the
reduce-reduce conflict arises. For example, the first sentence will
be in the following configuration:
.1O I
I7 I
}14
I
I8
I
I'
I
t
NPwanted NPPP i[
120 i
In this case, the longer reduction would require assignment of the
preterminat category V2 to ~ a n t , " which is the weak form: thus,
the shorter reduction will be preferred, leading to the derivation:
which yields the structure:
[sdoe{vptook{Nl,{xethe book][gthat I bought for Susanl]]]
The sentence
I
NP wanted NP ]1
]
NP VP II
I
5The original notion of using a shift-reduce parser and general scheduling
principles to handle right association and minlmal attachment, together
with the followingtwo rules, are due to Fernando Pereira [Pereira, 1982[.
The formalizationof preterminal delayingand the extensionsto the Ionictlpreference cases and garden-path behavior are due to the author.
8The "slash-category" analysis of long-distance dependencies used here is
loosely based on the work of Gaadar [lggl]. The Appendix 1 grammar
does not incorporate the full range of slashed rules, however, but merely a
representative selection for illustrative purposes.
11,1
:,':
sli
i 6
il
and the underlying structure:
[sthe woman[vpwaated[Np{Npthe dress][ppoa that
7Note that, strength takes precedence over length.
116
r~klll]
In the ca~e in which the verb is "positioned," however, the longer
reduction does not yield the weak form of the verb; it will therefore be invoked, reslting in the structure:
[sthe woman [vP positioned [Npthe dress][ppon that
3.6 G a r d e n - P a t h
Ford, M., J. Bresnan, and R. Kaplan, 1982: "A CompetenceBased Theory of Syntactic Closure," in The Mental
Representation o/Grammatical Relations, J. Bresnan, ed.
(Cambridge, Massachusetts: MIT Press).
Frazier, L., and J.D. radar, 1978: ~I'he Sausage Machine: A
New Two-Stage Parsing Model," Cognition, Volume 6, pp.
291-325.
rackl]]
Frazier, L., and J.D. Fodor, 1980: "Is the Human Sentence
Parsing Mechanism aa ATN?" Cognition, Volume 8, pp.
411-459.
Sentences
As a side effect of these conflict resolution rules, certain
sentences in the language of the grammar will receive no parse
by the parsing system just discussed. These sentences are apparently the ones classified as "garden-path" sentences, a class
that humans also have great difficulty parsing. Marcus's conjecture that such difficulty stems from a hard failure of the normal
sentence-processing mechanism is directly modeled by the parsing system presented here.
Gazdar, G., 1981: "Unbounded dependencies and coordinate
structure," Linquistic Inquiry, Volume 12, pp. 105-179.
Kimball, d., 1973: "Seven Principles of Surface Structure Parsing
in Natural Language," Cognition, Volume 2, Number 1,
pp. 15-47.
Marcus, M., 1980: A Theory of Syntactic Recognition/or Natural
Lanquagc, (Cambridge, Massachusetts: MIT Press).
For instance, the sentence
The horse raced past the barn fell
Pereira, F.C.N., forthcoming: "A New Characterization of
Attachment Preferences," to appear in D. Dowry,
L. Karttunen,
and A. gwicky (eds.)
Natural
exhibits a reduce-reduce conflict before the last word. If the
participial form of "raced" is weak, the finite verb form will be
chosen; consequently, "raced pant the barn" will be reduced to a
VP rather than a participial phrase. The parser will fail shortly,
since the correct choice of reduction was not made.
Language Prate,int. Psyeholingui, tic, Computational,
and Theoretical Perspective~, Cambridge, England:
Cambridge University Press.
Pereira, F.C.N., and S.M. Shieber, forthcoming: "ShiR-Reduce
Scheduling and Syntactic Closure/ to appear.
Similarly, the sentence
That scaly, deep-sea fish ,hould be underwater i~ important.
Wanner, E., 1980: "The ATN and the Sausage Machine: Which
One is Baloney?" Caanition, Volume 8, pp. '209-225.
will fail. though grammatical. Before the word %hould" is
shifted, a reduce-reduce conflict arises in forming an NP from
either "That scaly, deep-sea l~h" or "scaly, deep-sea fish." The
longer (incorrect} reduction will be performed and the parser will
fail.
A p p e n d i x I. T h e T e s t G r a m m a r
The following is the grammar used to test the parting
~ystem descibed in the paper. Not a robust grammar of English
by any means, it is presented only for the purpose of establishing
that the preference rules yield the correct, results.
S - - NP VP
VP - - V3 INF
S--gVP
V P - - V 4 ADJ
NP - - DET NOM
VP - - V5 PP
NP - - NOM
5-- that S
NP - - PNOUN
INF - - to VP
NP -- NP S/NP
PP - - P NP
NP - - NP PARTP
PARTP - - VPART PP
NP -- NP PP
S / N P - - that S / N P
DET - - N P ' s
S/NP - - VP
NOM - - N
S / N P - - NP V P / N P
NOM -- ADJ NOM
VP/NP - - Vl
VP - - AUX VP
VP/NP - - V2 PP
Other examples, e.g., "the boy got fat melted," or "the
prime number few" would be handled similarly by the parser,
though the sample grammar of Appendix I does not parse them
[Pcreira and Shieber, forthcoming].
4. C o n c l u s i o n
To be useful, aatttral-language systems must model the
behavior, if not the method, of the native speaker. We have
demonstrated that a parser using simple general rules for disambiguating sentences can yield appropriate behavior for a large
class of performance phenomena--right a-~soeiation, minimal attachment, lexical preference, and garden-path sentences--and
that, morever, it can do so deterministically wit,hour generating
all the parses and choosing among them. The parsing system
has been implemented and has confirmed the feasibility of ottr
approach to the modeling of these phenomena.
VP -- V0
VP - - Vl NP
VP - - V2 NP PP
VP/NP - - V3 INF/NP
VP/NP - . AUX VP/NP
INF/NP --* to VP/NP
References
A p p e n d i x II. S a m p l e R u n s
Aho, A.V.. and S.C. Johnson, 1974: "LR Parsing," Computi,, 9
Sur,,eys. Volume 6, Number 2, pp. 99-i24 ISpring).
>>
117
bought the hook t h a t
f o r Susan
do*
I
had b e l n t r y i n E t o o b t . i n
Accepted:
Is
Cup Cpnonn Joe))
(vp
Cvl bought)
Cap
(up (dec the)
(uoa (n book)))
(sbar/np
(that that)
Cs/np
Cup ( p n o u I))
Cvp/up
(uuz bud)
(vp/np
(auz been)
(vp/np Cv3 t r y i n l )
(t-~/np
(~plup
(v2 obtain)
(pp (p for}
(up (pnoun Saul]
>> Joe bought the book for S u u u
Accepted:
[8 (up (puoun Joe))
(vp (v2 boucht)
Cup Cdet the)
Chum Cn book)))
(pp (p for)
Cup (puoun Sueua]
>> The vomam vatted the dreou on thnt r ~ h
Accepted:
Is Cup Cdut The)
Cue= (u vomu)))
(Tp (vt v~ted)
Cap (up (den the)
(no= (n d r u u ) ) )
(pp (p on)
(rip (det that)
Curt (u rack]
>> The youth poeitioued the dreue on that rack
Accepted: Is (up (den The)
(noa (n vol,~)))
(vp (~2 poaitioued)
(up (den the)
(nee (~ dreJl)))
(pp Cp on)
(up (den that}
Cuom (. rack]
>> The horse raced p u t
Parse failed.
8tare:
(l)
the barn fell
Currant confiEurltlon:
Is Cap (4*t me)
(not (u horse)))
(vp (v6 rncea)
(pp (p p u t )
(up (4et the)
(aou (u b~rn]
stack:
<(0)>
input:
(tO f e l l )
Cend)
)) That ecal! ~eep-let fish should be undes=l~tur i8 importer
Parse failed.
Current cou~ilOlrttiou:
118
sta~e:
(1)
stack:
<(0)>
input:
(v4 is)
(|dj itportut)
(end)
[e [up (den Thlt)
(non (IdJ scaly)
Chum (~tJ 4eup-ssl)
(mum (u fish]
C,p Can should)
(vp (v4 be)
(adj u a d u ~ t e r ]