Download Some Predictions of Optimality Theory on Sentence Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Kannada grammar wikipedia , lookup

Equative wikipedia , lookup

Chinese grammar wikipedia , lookup

Sloppy identity wikipedia , lookup

Inflection wikipedia , lookup

Sanskrit grammar wikipedia , lookup

Context-free grammar wikipedia , lookup

Latin syntax wikipedia , lookup

English clause syntax wikipedia , lookup

Pleonasm wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Arabic grammar wikipedia , lookup

Relative clause wikipedia , lookup

Esperanto grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Construction grammar wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Antisymmetry wikipedia , lookup

Musical syntax wikipedia , lookup

Junction Grammar wikipedia , lookup

Transformational grammar wikipedia , lookup

Parsing wikipedia , lookup

Transcript
Some Predictions of Optimality Theory on Sentence Processing, Grammaticality
Perception, and Corpus Frequency*
Gisbert Fanselow
University of Potsdam
0. Introduction
The distinction between ‘grammatical’ and ‘processing’ factors determining our linguistic
behaviour, which is one of the guiding assumptions in generative grammar since Chomsky
(1957, 1965), is a very natural one. There is ample evidence that aspects of human cognition
such as working memory constraints influence language processing even though they are not
language-particular. On the other hand, it cannot be denied that many aspect of grammar are
not grounded in general cognition, since they are idiosyncratic and language-particular. That
certain prepositions govern genetive case, while others combine with the dative, surely does
not follow from general laws of cognition.
For a long time, the view had been popular that performance data can tell us little about the
structure of the grammatical knowledge/ grammatical competence. This view seemed to stem
from the perception that the “Derivational Theory of Complexity”1 was a complete failure
(Fodor, Bever & Garrett 1974, Pritchett & Whitman 1995, Phillips 1996). However, this
perception was based on few empirical insights only (that were often not interpreted in an
optimal way), and it arose relative to a syntacic theory (the Aspects-model, Chomsky 1965)
that has now been abandonded for more than 30 years because of its grammatical
inadequacies. More recent grammatical models are much more compatible with models of
cognition. E.g., Pritchett (1992) constitutes an attempt to derive the psycholinguistic
phenomenon of preferred readings in sentence processing from key aspects of Government &
Binding (GB-) Theory (Chomsky 1981).
MacWhinney, Bates, & Kliegl (1978) had shown that preferences of the human parser in the
grammatical analysis of sentences can be modeled in a very effective way by assuming
conflicting (language-independent) cues for the assignment of structure, and languagedependent strategies for resolving conflicts among these cues. The grammatical model of
*
The contents of this paper have grown out of two research projects funded by the Deutsche Forschungsgemeinschaft DFG: the “Innovationskolleg Formale Modelle Kognitiver Komplexität”, the “Forschergruppe
Konfligierende Regeln”. For fruitful discussions, I would like to thank my collaborators in these groups: Damir
Cavar, Caroline Féry, Stefan Frisch, Reinhold Kliegl, Matthias Schlesewsky, and Ralf Vogel.
1
According to the Derivation Theory of Complexity, the processing complexity of a sentence is a function of the
number of derivational steps needed to generate the sentence in terms of the algorithm characterizing its
grammaticality.
Optimality Theory (OT) (Prince & Smolensky 1993, Grimshaw 1997) has an architecture
reminiscent of such processing models. According to OT, Universal Grammar is made up of a
set of universal constraints or principles. In contrast to what was assumed in GB-theory, the
UG constraints may imply incompatible requirements for single sentences. Such conflicts
among universal principles are resolved in a language-particular way, by giving the principles
different ranks in different languages.
The structural similarity between OT and successful processing models suggests the idea that
the grammatical principles of an OT grammar may be of explanatory value in the psycholinguistic domain. Indeed, this has been argued for quite frequently (Stevenson & Smolensky
(1997), Fanselow, Schlesewsky, Cavar & Kliegl (1999). Artstein (2000), Smolensky &
Stevenson (in prep.), Kuhn 2000). In the spirit of such models, the present paper discusses
three aspects of the grammar-processing interface that support a grammatical model with
conflicting principles of the kind constituted by OT. These aspects are: reading time
preferences (sect. 1) and graded grammaticality phenomona (sect. 2) resulting from principles
with a very low rank, and the role grammatical competition plays in determining degrees of
grammaticality (sect. 3).
1. Optimal Parsing
Our initial observations are borrowed from Fanselow et al. (1999), a paper that tries to
interpret certain similarities between explanations in sentence processing models and
Optimality Theory. Much insight has been gained into the functioning of the human sentence
processor by investigating how it copes with local ambiguities in the assignment of syntactic
structure and grammatical functions. E.g., constituent questions require the preposing of the
wh-phrase in languages such as English and German. In sentence (1), the grammatical
function (subject vs. object) of the preposed wh-phrase welche Frau is not unambiguously
determined until the determiner der or den of the second noun phrase is processed, since
welche Frau is morphologically ambiguous between a nominative and an accusative
interpretation.
(1)
a. welche Frau
which
hat
gestern
der
woman has yesterday the-nom
Mann angerufen
man
called
“which woman did the man call yesterday?”
b. welche Frau
which
hat
gestern
den
woman has yesterday the-acc
“which woman man called the man yesterday?”
Mann angerufen
man
called
When the human parser encounters such a local ambiguity, it will prefer a subject analysis of
the preposed wh-phrase. This has been established first by Frazier & Flores d’Àrcais (1989)
for Dutch, and it was later shown to be correct for English, German, Italian, and many other
languages.
Because of the limited scope of the present paper, we will refrain from entering a detailed
discussion of factors modulating the subject preference (see, e.g., Kaan 1997). Rather, let us
focus on what is responsible for the subject preference in (1). At first glance, it may seem that
grammar (understood as the rule system underlying linguistic competence and behavior) does
not distinguish between the two sentences in (1). After all, both obey the grammatical rule
that the wh-phrase must be fronted in German (and English) constituent questions irrespective
of its grammatical function. It is not surprising, then, that extragrammatical factors such as
overall clause type frequency (see Cuetos & Mitchell 1988) or specific parsing heuristics of
the language processing system (Frazier 1988, Frazier & Clifton 1996) (which may itself
reduce to more profound principles of cognition) have been made responsible for parsing
preferences.
Such accounts are, however, not entirely satisfactory, because they leave the relation between
laws of grammar and laws of processing quite open. Processing (and learnability) demands
certainly have a shaping influence on grammar (see Fanselow 2002, and below), and they
establish a relation between processing constraints and laws of grammar reflecting them, but
parallels between grammar rules and processing facts go beyond the domain where one can
easily argue for the primacy of the processing system. Consequently, attempts have been
made to derive processing facts from a direct and transparent application of principles of
grammar in the course of online processing. Pritchett (1992) is an excellent example of such
models.
Pritchett’s leading idea is that many observations concerning processing derive from the socalled Theta Criterion (2)2 of Chomsky (1981). When it processes a local ambiguity, the
human parser is claimed to minimize the number of local/temporary violations of (2) (and
other principles) in the partial structure it hypothesizes. Thus, when two grammatical analyses
are possible for a locally ambiguous item, the parser will chosse the one which involves less
violations of (2).
(2)
Each argument expression (NPs, PPs) must be linked to exactly one argument slot
(theta-role) of a predicate.
2
The theta-criterion excludes sentences such as *John saw her a cat and *Bill donated
Each theta role of a predicate must be linked to exactly one argument expression.
In the case of (1), the clause initial wh-phrase welche Frau could be analyzed as a subject or
an object before the determiner of the second NP is perceived. The human parser goes for the
subject interpretation, and it can be argued that it does so because the subject interpretation
incurs less temporary violations of (2) that the object-initial alternative. In case the wh-phrase
is taken to be a subject, the partial analysis (3) can be built up by the parser after analyzing the
first part of the input string. In this structure, V can be assumed be intransitive. (3) violates the
theta-criterion (2) once, because welche Frau is not yet linked to an argument slot of a
concrete verb.
(3)
[CP welche Frau … [IP t [VP V]]
Compare this to the object analysis of clause-initial welche Frau. If there is an accusative
object, there must be a subject, so the partial structure that needs to be postulated is (4). Here,
we are confronted with (at least) two violations of (2): welche Frau is not linked to a thetarole of a verb (as in (3)), but the verb must be transitive (a subject is necessary when there is a
direct object), so that there are two thematic roles, one of which cannot be linked to an NP.
(4)
[CP welche Frau … [IP NP* [VP t V]]
If the parser operates such that it minimizes the number of violations of thematic roles, it will
go for (3) rather than (4), so that the preference seems to be accounted for.
The Pritchett model has, at least, three shortcomings, however. First, the grammatical
principles cannot always be interpreted in the way they are in grammar when they figure in
the parsing process. The theta-criterion of Chomsky 1981 says little, if anything, about the
behaviour of structural slots for verbs and NPs that have been predicted in the parsing
process. Second, there are more grammatical principles than just (2), and one needs a choice
mechanism for situations in which principle X is less often violated in the analysis A, while
principle Y is violated less often in the alternative B. Finally, Chomsky (1981) and similar
grammatical theories do not allow that their constraints may be violated. The ‘real’ prediction
these models make when applied directly in the parsing process is the following: whenever a
structural hypothesis violates some principle, the parsing hypothesis should no longer be
pursued. In effect, this excludes an application of grammatical principles in the parsing
process.
All these problems are circumvented in Optimality Theory Syntax (Grimshaw 1997), one of
the two more recent elaborations of the Chomsky (1981) model. In the first place, OT
constraints are all assumed to be violable. That they will not be fullfilled in a partial syntactic
representation because the structure is not yet completed thus constitutes no problem: OT
principles can also be violated in complete grammatical representations. Furthermore, their
violability implies that they can be formulated in a very simple and general way. E.g., one
could assume that surface order is determined by a constraint such as “Subjects precede
objects” for German or English. Since it is violable, the constraint can postulated in spite of
the fact that these languages have object.initial sentences. One only needs to assume that there
are reasons that warrant the violation of the subject-first serialization principle3. Such a
surface structure constraint would, however, correctly describe the “normal word order
patterns” of German (if applied as a grammar principle), that we find in the absence of factors
licensing its violation. Furthermore, the surface constraint will imply the parsing preference in
(1) (if applied in sentence processing): only if welche Frau is analyzed as a subject, the
surface serialization principle will be respected. Given the violable nature of the constraint, its
application to complete structures does not differ from its application to structures that are
only partially specified (as in syntax parsing). When later information forces the parser to
abandon the subject hypothesis, no principled problem arises.
The simple and general principles of OT often impose conflicting requirements on syntactic
structure, so OT needs (and has) a conflict resolution component. According to OT, the
principles are ordered in a language particular hierarchy, and in case of a conflict between two
principles, the one with the higher rank in the hierarchy wins. The conflict resolution
component needed for sentence parsing models thus comes for free in OT-syntax. Not only
the syntactic principles, even the overall organization of grammar can be taken over from OT
syntax to the OT parsing model.
Ideally, an OT parsing model takes a very simple form. Suppose the parser has already
computed the structure S for the partial syntactic string S, and suppose that W is the next
word to be integrated. The generative procedure of the grammar then computes the set of
grammatical possibilities S* by which W can be combined with S4. By applying the hierarchy
of grammatical constraints on S*, the “best” extension E of S by W can be identified: when
compared to its structural alternatives E’ in S*, E always violates the highest constraint on
which they differ less often than E’. Several models for explaining parsing facts in terms of a
3
E.g., both languages require that constituent questions begin with a wh-phrase, a constraint that is more
important than the subject > object principle.
4
The generative GEN component of OT grammars is unconstrained, so it will offer an infinite number of
possible ways of linking W to S. There is, e.g., no restriction on the number if phonetically empty elements one
might need to link W and S. In order to make OT parsing psychologically plausible, one thus needs restrictions
in the structure building component. See Fanselow et al. (1999) for a proposal.
direct application of OT syntax principles in processing have been proposed. (Stevenson &
Smolensky (1997), Fanselow, Schlesewsky, Cavar & Kliegl (1999). Artstein (2000),
Smolensky & Stevenson (i.V.)). The identification of the principles that determine how the
class of candidate representations S* is restricted constitutes a particular challenge for OT
parsing models, and so does the characterization of the principles that trigger backtracking
and reanalysis.
One aspect that deserves particular mentioning here is a prediction that seems to be particular
for OT parsing models: it is to be expected that the syntax processing of a language L is also
influenced by syntactic constraints visible in others language L’ even if they are of little or
zero overt relevance in language L. It is easy to see why this prediction is made. OT assumes
that all constraints are universal. Grammars do not differ by the constraints that they apply,
rather, only the hierarchies among the principles are language particular. Consequently,
conflicts between universal constraints will be resolved in different ways in different
languages, yielding different grammatical outputs. Typically, the principles with a low rank
will have little or no impact at all.
Consider, e.g., a grammatical principle that we might call AgreeRel that requires that the
head noun of a relative clause and the relative pronoun agree in Case independent of the Case
imposed by the heads governing the two elements5. As Harbert (1983) shows, a principle
implying this is necessary in an account of relative clauses in Ancient Greek and some other
languages, in which the relative pronoun and the head noun indeed can agree even if that
implies that one of them violates the Case requirements of the governing head, as is true for
the genetive relative pronoun in (5), which fails to realize the accusative assigned by
“possess”.
(5)
áksioi tes
eleutherías
hes
worthy the
freedom-gen which-gen
kéktesthe
you-possess
Within OT, AgreeRel must be a universal principle. In German syntax, however, it has
(nearly) no observable effect: the relative pronoun must always bear the case governed by the
relative clause verb, as (6) illustrates: dessen cannot take over genetive from the head of the
relative clause. An OT syntax of German has no problems in capturing this: AgreeRel only
needs to be assigned a rank much lower than that of the constraint GovCase that requires that
Cases governed by verbs be visibly realized.
5
For a theory of Case matching in (free) relative clauses, the reader is referred to Vogel (2001).
(6)
*wegen
des Mannes
dessen du siehst
because
the-gen man-gen
who-gen you see
When we consider the well-formedness of German relative clauses, AgreeRel has no chance
to show any effect. Given its low rank, its requirements will always be ignored in the interest
of respecting higher principles that require the realization of governed cases.
The situation is quite different in the online-processing of German relative clauses. Recall that
German is a verb-final language, which implies that the verb that factually governs the Case
of the relative pronoun is the last word encountered in the processing of a relative clause. In
spite of its high rank, GovCase therefore has no chance of exerting any effect (forcing a
particular case) throughout the processing of the relative clause until the very last word is
perceived.
Consequently, AgreeRel is expected to influence the processing of the relative clause for
quite some time, but when the verb is encountered, it will have to give way to GovCase.
Consider (7) in this respect, an abstract representation of a relative clause embedded in, e.g.,
an object NP. When the online processing of the construction has reached point a, the Case of
the head noun N of the relative clause will have been identified in most situations. Suppose, N
bears accusative Case, since it is a standard object.
(7)
X
[NP [ .. N
[CP
a
Rel ….. t …. V]
b
Y
c
The relative pronoun encountered in the next step of the parsing project may have a
morphologically marked case. Under this circumstance, AgreeRel will not have any effect on
Rel, since the parser must not ignore explicit morphological information. But suppose that Rel
is a Case-ambiguous relative pronoun (such as German die “who, fem.”, which could either
be nominative or accusative). The morphology does not resolve the Case ambiguity, but
AgreeRel will favor the accusative option, since it would be violated by a nominative relative
pronoun. All other things being equal, AgreeRel will thus imply that Rel = accusative at stage
b in processing the construction.
Given that the relative clause verb has not been encountered at point b, GovCase cannot
overwrite this effect of AgreeRel in the initial segment of the relative clause. Only when
point c is reached and the verb is encountered, the choice suggested by AgreeRel will have to
be given up when the nature of the verb or its morphology excludes the possibility that Rel is
an accusative. In that case, one should see a reanalysis effect (e.g., higher reading times) on
the verb, or, more generally, at any point in the relative clause when the predictions of the low
ranked principle AgreeRel have to be abandoned because constraints with a higher rank
demand their right. And in fact, such reanalysis effects can be found in experimental results
concerning the processing of German relative clauses, as Fanselow, Schlesewsky, Cavar &
Kliegl (1999) show.
In a similar vein, Frazier (2000) discusses the following parsing preference in English and
French. For a constellation in which a main clause is preceded by an initial subordinate
clause, she observes a strong tendency to take the subjects of the two clauses to be
coreferential when one of the subjects is pronominal. Frazier proposes that this is due to a
violable grammatical principle that has a very low rank in English and French, and can exert
its influence on parsing in the way described above. Frazier points out that in switch reference
languages such as Mojave or Diyari, the pertinent principle seems to have a much higher rank,
since there, one needs to obligatorily mark sentence for a switch in the reference of the two
subjects in exactly the same constellation (initial adjunct clause followed by main clause) in
which the reading preference can be observed in English. Her data constitue a further instance
of the idea that principles with a low rank can have considerable parsing effects.
One of the outstanding features of OT approaches to language processing thus lies in the
prediction that grammatical laws G of some language L can be visible in the processing of
some other language L’ even if G has no detectable effect on grammaticality in L’. This
property makes OT parsing models extremely interesting, but only very few of the predictions
have so far been tested empirically.
2. Gradient Acceptability
In this section, we will see that and why graded acceptability facts are also in line with the key
property of Optimality Theory just discussed: even if its low rank makes it invisible in terms
of pure categorical grammaticality, a syntactic principle may have a clear effect on the degree
of acceptability of a sentence.
Standard syntactic models presuppose that grammaticality is a categorical notion: a sentence
either is grammatical (i.e., compatible with the structural requirements of the language), or it
is not. In contrast to this, acceptability is not categorical, but gradient. When confronted with
a particular construction, we are often not certain whether it is well-formed or not, we judge
constructions as marginally acceptable only, and we can (and do) rank sentence types
according to their acceptability.
Just as for parsing, the question arises what role grammar and grammar-external factors play
in yielding graded acceptability. Ideally, we can uphold the assumption that grammaticality is
a categorical concept, so that the graded nature of acceptability must be due to the other
components6 that interact with grammar in determining acceptability. E.g., a sentence can be
hard to process at different degrees, so any effect processing difficulty has on acceptability
should be gradient. Chomsky & Miller (1963) convincingly argue that the low acceptability of
center embedded structures is not a consequence of low grammaticality – rather, it is caused
by the enormous processing difficulty center embedded structures such as (8) come with.
(8)
the man who the woman who the mosquito bit loves kicked the horse.
Garden path sentence such as (9) constitue a further straightforward case of a processing
influence on acceptability: the difficulty of identifying the correct structural analysis of the
clauses in (naïve) online parsing implies low or zero acceptability, in spite of the fact that (9)
is fully well-formed, according to the laws of English syntax.
(9)
the horse raced past the barn fell
Examples such as (8) and (9) are more or less fully unacceptable because of the processing
problem they come with. However, there are numerous examples of a less drastic reduction of
acceptability due to processing problems. For example, German sentences that begin with an
object are consistently rated as less acceptable than sentences that begin with a subject. This
effect does not completely disappear when the sentence to be judged is placed into a favorable
context, as Keller (2000) has shown. Corresponding processing difficulties of object-initial
structures have been amply documented since Krems (1984), see also Hemforth (1993),
Schlesewsky, Fanselow, Kliegl & Krems (2000), among others. The reduced acceptability of
such sentences can be explained in terms of their processing problems if the latter do not
merely reflect the grammatical difficulty. Indeed, the grammar-independent nature of the
additional processing load of object initial sentences can be argued for easily. Based on a selfpaced reading study, Fanselow, Kliegl & Schlesewsky (1999) argue that the processing
difficulty of object initial structures results from memory load: the preposed object is
“reconstructed” by the human parser to its canonical position following the subject. Up to the
point at which the subject is encountered in the parsing of a sentence, this reconstruction is
impossible, so that the preposed phrase must be kept in memory. Recent ERP studies such as
Felser, Clahsen & Münte (2003), Fiebach, Schlesewsky & Friederici (2002), and Matzke,
Mai, Nager, Rüsseler & Münte (2002) support this interpretation. See Fanselow & Frisch
(2005) for more data supporting a processing influence on graded acceptability.
6
Other factors influencing acceptability are the difficulty in constructing a context in which the utterance would
be acceptable, the stylistic homogeneity of the utterance, its content, etc.
Furthermore, Fanselow & Frisch (2005) have elaborated on an important aspect of the
processing influence on acceptability. They discuss various structural constellations in which
the intermediate analysis of a certain part of some construction is not in line with the final
analysis. In spite of the fact that the intermediate analysis is finally abandoned, Fanselow and
Frisch show that this temporary analysis has an influence on the overall acceptability of the
sentence.
On the one hand, local ambiguities can have a mitigating effect on (un-) acceptability7. When
a structure begins with a locally ambiguous part, and one (but not both) of the readings is
compatible with grammatical requirements (and if it is the locally preferred option), then the
availability of this locally grammatical reading will increase the global acceptability of the
sentence even if the structure is later disambiguated towards the other, ungrammatical interpretation. Such locally ambiguous constructions are more acceptable than structures that do
not involve such an initial local ambiguity, while they are structurally identical otherwise. The
local appearance of grammaticality thus can increases the overall acceptability of a structure
independent of its global status.
Similarly, a fully grammatical structure may also appear less acceptable. This can be
illustrated by so-called “case-clash” phenomena first discussed (for German) by Kvam (1983)
and experimentally investigated by Fanselow & Frisch (2005). They consider structures in
which a wh-phrase or a relative pronoun has been moved out of a complement clause into the
matrix clause. The grammatical properties of this preposed phrase must be in line with the
selectional requirements of V2, and, in addition, they can but need not be compatible with the
grammatical requirements that the matrix verb V1 would impose on its arguments.
(10)
… WH-PHRASE … V1
[CP … t
V2 ]
When the WH-Phrase in (10) could also be an object of matrix V1 (e.g, if the Case potentials
of V1 and V2 are identical), the structure is rated as more acceptable than sentences in which
the wh-phrase could not belong to the matrix clause (it has a Case incompatible with V1),
even though the wh-phrase fits the requirements of the embedded verb V2 in both cases, and
must be interpreted as a preposed complement clause object because of the selectional
requirements in the complement clause.
Fanselow & Frisch explain this result in the following way: in the initial phase of the
processing of (10), the WH-phrase is checked for its availability as a matrix argument, and a
7
This happens, e.g., with noun phrases that are temporarily ambiguous for number, and fpr which later material
forces a singular interpretation that is incompatible with other grammatical requirements.
Case clash between it and the requirements of V1 leads to a perception of ungrammaticality.
This temporary perception of ungrammaticality reduces the overall acceptability of the clause,
even when the initial matrix clause attachment of the wh-phrase turns out to be completely
irrelevant for the grammatical parse of the tree because the wh-phrase turns out to have been
moved out of the complement clause.
If the processing difficulty of intermediate structural analyses has an impact on the overall
acceptability of a sentence, we expect there to be a second type of an influence by grammatical constraints on acceptability, which is indirect and gradient. Each grammatical
principle comes with a certain processing cost, and this processing cost contributes to
acceptability. The gradience of this indirect effect does not reflect a property of grammatical
principles as such, but stems from the fact that this additional effect of grammar on
acceptability is mediated by different degrees of processing difficulty.
Furthermore, in an OT model one again expects that this mediated processing effect of
grammatical principles on gradient acceptability can be triggered by constraints that are not
effective in the grammar of the language at all, in the sense that they never (or rarely) yield
true ungrammaticality because of their very low rank. Again, there is evidence for such an
impact of low-ranked principles on acceptability. E.g., German does not respect the thattrace-filter that renders (11) ungrammatical. Subjects can be extracted from clauses
introduced by a complementizer in German, in contrast to what holds in English.
(11)
*who do you think that _ loves Mary
(12)
wer
denkst du
dass
die
Wahl
gewinnen
wird
who.nom
think you
that
the
election
win
will
“who do you think will win the election”
In spite of the fact that (12) is by no means ungrammatical in German, experimnental studies
have shown that native speakers consider subject extraction from a complement clause less
acceptable than corresponding object extractions (Featherston 2005a). Likewise, the
superiority condition blocks object fronting in English multiple questions in the presence of a
wh-subject (13). German does not rule such structures out (14), but experimental studies have
shown that native speakers consider (14) much less acceptable than its subject-initial
counterpart (Featherston 2005a, Fanselow & Frisch 2005).
(13)
*who did who see?
(14)
wen
hat
wer
eingeladen
who.acc
has
who.nom
invited.
Linguistic constraints thus can have effects on reading preferences and on graded
acceptability even if they are irrelevant for grammaticality in a certain language. This insight
constitutes an interesting constraint linguistic models must meet when they try to explain
natural language syntax. The design of Optimality Theory predicts the existence of such
phenomena.
There is more but indirect evidence for acceptability being reduced due to an effect of
grammatical constraints with a low rank. Bresnan and her colleagues (cf., e.g., Bresnan &
Nikitina 2003, Bresnan, Dingare & Manning 2001) have shown that in a variety of domains
(among them passives, the dative alternation in English, subject-verb agreement) “soft
constraints” of a language like English are mirrored by “hard constraints” in other languages.
A hard constraint is a rule that is strongly respected in the language in question (certain
person-number combinations for subjects and objects are ungrammatical), while “soft
constraints” are identified by statistical data in Bresnan’s work: certain grammatical
constellations are not ungrammatical in a certain language, but only (very) infrequent. As
Bresnan points out, the correlation between soft and hard constraints is easily explained in
Optimality Theory, and thus constitutes another type of processing evidence for that model.
How does corpus frequency relate to graded acceptability? How are soft constraints captured
in OT according to Bresnan and her colleagues?
Kempen & Harbusch (2004, 2005), Featherston (2005b), and many others have shown that
corpus frequency and acceptability are correlated, but the relation is not a linear one. Only
when the acceptability of a structure is fairly high, it will have a chance to occur in corpora
with a non-negligable frequency. What is the reason for this correlation?
Up to a certain extent, one can assume that production and perception difficulty are influenced
by comparable constraints. This assumption is often made in theories that relate corpus
frequency to processing difficulty (see, e.g,, Hawkins 1994). Under the OT perspective
pursued here, one expects that the grammatical principles figure both in perception and
production, while the two processing system might (but need not) differ with respect to the
non-grammatical factors that determine perception/production difficulty.
The more difficult to produce a structure is, the more time consuming its production will be,
so that easier and faster competitors for formulating a certain thought are more likely to be
articulated in most cases. By contributing to production difficulty, principles of grammar will
influence relative corpus frequencies of contributing structures, even when they have a low
rank (and would thus count as “soft” constraints).
Bresnan and her colleagues derive that relation between hard and soft constraint in a different
way. Suppose that the production procedure for sentences indeed makes use of the hierarchy
of grammatical principles in a language in a fairly direct way. Given a strict ranking of
grammatical principles, a certain type of thought (a certain “input”, in OT terms) should thus
always be formulated in exactly the same way8. Psychological models (such as the
“Elimination by Aspects” (EBA) model) that deal with decision taking on the basis of ranked
principles (that are, thus, isomorphic to OT) assume that the hierarchy is not always perfectly
worked through in actual decision taking (see Jungermann, Pfister & Fischer 1998). Rather,
there is a certain likelihood (inversely correlated with the rank within the hierarchy) that a
specific constraint is simply skipped in processing. Following Boersma’s (1998) proposal of
Stochastitic OT (SOT), Bresnan and her colleagues make a comparable assumption on
production: when the production system operates on a certain hierarchy, there is always a
certain likelihood that a constraint C is not evaluated with the rank it actually should have, but
at a different relativeposition in the hierarchy.
SOT and EBA make comparable, but of course not identical predictions concerning syntax
production. Both imply that sentences are not always produced on the basis of the “real”
hierarchy of the language: there is a certain likelihood that production will skip or invert the
rank of principles. This should yield an output different from the one generated by the “real”
hierarchy. The frequencies of the different structural alternative in a corpus should then be a
function of the likelihood that the principle crucial for the choice of a particular structure is
skipped or reduced in rank. Corpus frequencies are thus expected to mirror the constraint
hierarchy.
Given the stochastic nature of human cognition, SOT and/or EBA are plausible descriptions
of what happens syntactically in the production process. Nothing of what we have said so far
is incompatible with the possibility that the stochastic nature of processing also contributes to
corpus frequencies, in addition to grammar based production difficulty, and further factors. In
that sense, there is no conflict between SOT and our assumptions.
Note, however, that although SOT and EBA-models may be good at modelling corpus
frequencies, they have nothing (direct) to say about gradient acceptability. Corpus frequencies can
8
This would not necessarily hold in case the grammar contains principles with an identical rank (“tied”
principles), a possibility we can ignore here.
be predicted in these models because these result from the repeated execution of the same
computational task, the outcome of which will be influenced by the stochastic properties of the
processing system. If applied to the task of formulating an acceptability judgement, SOT and EBA
would lead us to the expectation that we assess the acceptability of a structure differently on different
occasions. There is no direct way, however, of explaining why a sentence may have a gradient
acceptability value at a single evaluation. Stochastic models cannot explain this without further
assumptions9.
3. The relative nature of grammatical judgments
The topic of the preceding two sections has been a particular implication of Optimality
Theory for language processing: principles with a very low rank can influence the parsing and
the production process directly, and gradient acceptability, indirectly. There seems to be no
other current grammatical model that makes these predictions, and to the extent that the
available evidence is representative, it seems that these predictions are borne out.
Let us conclude by turning to a further consequence Optimality Theory has for language
processing and acceptability judgements: sharp and categorical ungrammaticality should not
be visible in that part of language that is characterized by what Pesetsky (1997, 1998) called
“ineffability”.
As we have mentioned above, parsing models using Optimality Theory need to answer a very
profound question: under what conditions will the parser give up a temporary analysis? In
standard parsing approaches, this question has a simple answer. Suppose the parser has
postulated the analysis S for the string it has analyzed so far, and suppose that the next word
to be parsed is W. If there is no way of combining S with W without violating a grammatical
principle, S will have to be abandoned, and a new analysis S’ must be found for the input
string parsed so far.
In OT parsing, the violation of a principle of grammar never can be a sufficient reason for
abandoning a structural hypothesis. After all, according to OT, there is no grammatical output
in any language of the world that would not violate one or the other grammatical principle10.
Furthermore, given the construction of the generative component GEN of OT, it is hardly
conceivable that a given structure S and a word W could not be combined structurally in some
(albeit extremely complex) way. The procedure of OT parsing will only select the best of the
various complex ways of linking S and W, and there will necessarily be one “best” candidate.
9
One might entertain the hypothesis that the perception of different degrees of frequency influences gradient
acceptability, perhaps, because they have an impact on the degree of confidence in the acceptability judgements.
10
E.g., the two contraints that require a head appear at the left or the right periphery of a phrase, respectively,
cannot be fulfilled simultaneously.
A simple OT parser cannot decide that none of the postulated structures is “good enough” for
being maintained. Syntactic hypothesis are just evaluated relative to other possibilities, there
is, on principled grounds, never an evaluation relative to some absolute standard.
This difficulty for incremental OT parsing is a serious one, but can be resolved (see Kuhn
2000) if one adopts the idea that there is bidirectional optimization, and that bidirectional
optimization is what happens in the parsing process. In standard OT, the grammatical
competition tries to identify the best form for a given input/meaning. In simple OT parsing,
one tries to identify the best interpretation (structurally, and, finally, semantically) for a given
form. The “direction” of optimization is thus different in the two domains. If syntactic parsing
is indeed bidirectional (as Kuhn 2000 suggests), each parsing step will first identify what is
the best structure S and meaning M for the segment I of the input string that has been
analysed so far. But then, in a second step, the parser computes what would be the best
(partial) expression I’ for the meaning M it has just hypothesized. If I’ coincides with the
input string I that has been parsed so far, parsing proceeds to the next word, but if the input
string I is not the best way of expressing the meaning M the parser has hypothesized for it, the
structural hypothesis for I must be given up. Crucially, this means that a structural hypothesis
is abandoned whenever there is a better way than the input string itself of expressing the
meaning hypothesized for the input string. The (temporary) perception of ungrammaticality is
not based only on the parsing of an input string, but rather on a production oriented
optimization for the form. A structure will be perceived as (temporarily) ungrammatical only
if it is kicked out of the competition by the identification of a better way of formulating its
meaning.
Within the reach of standard Optimality Theory, this reflection is of little importance.
Standard OT assumes that the application of the grammatical hierarchy is always successful,
in the sense that the optimal form for realizing an underlying representation in phonology, a
morphological constellation, or some structured meaning, can always be found. For logical
reasons, there must be (at least) one structure that is not worse than its competitors. It is
identified as optimal and grammatical, and the ungrammaticality of the others is caused by the
optimality of the winning competitor.
However, as pointed out by Pesetsky (1997, 1998) for syntax, it is simply not true that every
concept or every thought can be formulated, there is “ineffability”. See Fanselow & Féry
(2001) for a discussion of various approaches to ineffability that have been proposed in OT.
Fanselow & Féry (2002) also try to characterize the domains of language in which ineffability
occurs. One remarkable aspect that they did not comment on is that ineffable domains rarely
come with “absolute” ungrammaticality. Although there is no really good expression for the
input in case of ineffability, at least one of the candidate representations seems marginally
possible.
In German syntax, ineffability arises in a number of domains: there is no clear-cut resolution
for the conflict between the general requirement that verbs must appear in second position,
and some principles that rule out the movement of particular verbs11, or block verb movement
when it affects semantic scope. When a wh-phrase is moved out of a syntactic island, there is
no better way of expressing the pertinent wh-question, while extraction from the island is
impossible as well. In a free relative clause, the relative pronoun must meet the case
requirements of the matrix and the complement verb. If these do not coincide, ineffability
may arise, depending on the position of the free relative clause, and on dialect.
Remarkably, absolute ungrammaticality seems rare in such domains. See Vogel (2005) for the
gradient nature of case conflict violations in German free relative clauses. That island
violations do not (always) yield strong ungrammaticality is a notorious fact. Even for the verb
movement problem, it seems that one can always identify one structural candidate which is
not perfect, but marginally possible. Exactly the same seems to be true in the domain of
morphology. E.g., the addition of the diminuitive marker -chen triggers Umlaut in German,
but if the umlauted vowel is in an unstressed position, a phonological condition is violated
(Féry 1994). In this constellation, diminuitive formation is blocked, but some of the
conceivable realization of the diminuitive form (e.g., if umlauf is not applied) do not sound
extremely bad (see Fanselow & Féry 2001). This seems to be true of many other
morphological gaps, too: the ‘impossible’ form is not completely ungrammatical.
If such examples are representative, we can conclude that the third prediction of the OT
parsing model is fulfilled as well. A (partial) structure is perceived as ungrammatical only
when a better competitor has been identified. A situation in which all candidate
representations are absolutely ungrammatical cannot arise – but the acceptability of the
winner may be negatively affected by further, extragrammatical conditions. “Ineffability”
arises, whenever these extragrammatical conditions are strong enough to sharpy reduce
acceptability. For the syntax, sentence processing difficulty may lead to ineffability, for the
morphology, a lexical control component (see Orgun & Sprouse 1999, and Fanselow & Féry
2001 for a discussion) comes into play.
11
For example. verbs with two prefixes cannot be moved to the second position in matrix clauses. They are fully
grammatical in final position only. See Haider (1997).
Sometimes ineffability seems to go with complete unacceptability, however. In syntax,
extractions out of adjunct clauses may be a case in point (*howi did you praise him although
he behaves ti). We can try to derive the high degree of unacceptability from the assumption
that questioning part of an although-clause is conceptually ill-formed (the proposition
expressed by of the although-clause cannot be directly asserted or questioned). Many
languages impose strong and (apparently) absolute restrictions (relative to person and
animacy features) against certain subject-object-combinations. Here, the lexicon might help to
explain the high degree of inacceptability: it lacks an entry for the pertinent inflectional form
of the verb.
The gradient nature of ineffability constitues indirect evidence for the claim that
ungrammaticality arises only when a better competitor has been identified. Runner, Sussman
& Tanenhaus (2004) provide further evidence coming from an eye-movement study
concerned with the processing of reflexive and personal pronouns in syntactic constellations
crucial for binding theory. They show that experimental subjects consider referents
incompatible with the requirements of the binding theory early on in processing for both
reflexives and for pronouns. In the case of personal pronouns, the crucial fact is that referents
appropriate for reflexives are also looked at. Apparently, all (relevant) options are checked,
and grammatical principles (probably, the availability of a reflexive expression) then
eliminate incorrect choices.
4. Conclusions
Linguistic theorizing is confronted with the problem that the facts of language underdetermine
the choice of the correct grammar. It would thus be welcome if more data –corpus frequency,
reading preferences, degrees of ungrammaticality- could bear on this issue. Due to its
architectural properties, OT can be applied in a very direct way in processing theories. It
makes very specific predictions concerning the processing role played by princples with a low
rank that seem to be borne out.
References
Artstein, R. 2000. Case constraints and empty categories in Optimality Theory parsing.
Unpublished manuscript.
Boersma, P. 1998. Functional Phonology: Formalizing the Interactions between Articulatory
and Perceptual Drives. The Hague: Holland Academic Graphics.
Bresnan, J., Dingare, S., and Manning, C. 2001. Soft constraints mirror hard constraints. In
Proceedings of the LFG01 Conference. Stanford, CSLI Publications, 13-32.
Bresnan, Joan and Tatiana Nikitina. 2003. On the gradience of the dative alternation.
Unpublished manuscript. http://www-lfg.stanford.edu/bresnan/new-dative.pdf.
Chomsky, Noam. 1957, Syntactic Structures. The Hague: Mouton.
Chomsky, N. 1965. Aspect of the Theory of Syntax. Cambridge, Mass.: MIT-Press.
Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht: Foris.
Chomsky, N., & G. Miller. 1963. Finitary Models of Language Users. In: R.D. Luce, R. Bush.
& E. Galanter (eds). Handbook of Mathematical Psychology Vol 11, New York: John Wiley.
Cuetos, F. & D.C. Mitchell, 1988. Cross-linguistic differences in parsing: Restrictions on the
use of the Late Closure strategy in Spanish. Cognition, 30, 73-105.
Fanselow, G. 2003. Wie ihr Gebrauch die Sprache prägt. In: S. Krämer& E. König (eds.):
Gibt es eine Sprache hinter dem Sprechen? Frankfurt: Suhrkamp, 2003: 229 - 261.
Fanselow, G. & C. Féry. 2002. Ineffability. In: G. Fanselow & C. Féry (eds.). Resolving
Conflicts in Grammar. Special Issue of Linguistische Berichte 265-307.
Fanselow, G. & S. Frisch. 2005. Effects of processing difficulty on judgments of
acceptability. In: G. Fanselow, C. Féry, M. Schlesewsky, & R. Vogel (eds.). Gradience in
Grammar. Oxford: OUP. To appear.
Fanselow, Gisbert, Reinhold Kliegl & Matthias Schlesewksy. 1999. Processing Difficulty and
Principles of Grammar. In: S. Kemper & R. Kliegl (eds). Constraints on Language. Aging,
Grammar, and Memory. Boston: Kluwer. 171-201.
Fanselow, G., M. Schlesewsky, D. Ćavar, & Reinhold Kliegl. 1999. Optimal Parsing,
Unpublished ms., and Rutgers Optimaly Archive ROA 382, 1999.
Featherston S. 2005a. Universals and grammaticality: Wh-constraints in German and English.
Linguistics 43 (4).
Featherson, S. 2005b. The Decathlon Model of Empirical Syntax. Ms.
Felser, C., H. Clahsen, & T. Münthe. 2003. Storage and Integration in the Processing of
Filler-Gap Dependencies: An ERP Study of Topicalization and Wh-Movement in German
Brain and Language 87 (2003), 345-354.
Féry, Caroline (1994): Umlaut and inflection in German. Ms, University of Tübingen. [also
at: Rutgers Optimality Archive ROA-33-; http://roa.rutgers.edu].
Fiebach, C., M. Schlesewsky, & A. Friederici. 2002. Separating syntactic memory costs and
syntactic integration costs during parsing: The processing of German WHquestions. Journal
of Memory and Language.47: 250-272.
Fodor, J., T. Bever, & M. Garrett. 1974. The psychology of language. New York: McGraw
Hill.
Frazier, L. 1978. On comprehending sentences: Syntactic parsing strategies. Doctoral
dissertation, University of Connecticut.
Frazier, L. 2002. The pronoun bias effect strikes again: a reply to Pynte and Colonna. Journal
of Psycholinguistic Research. 30: 601-604.
Frazier, L. & C. Clifton. 1996. Construal, Cambridge, Mass.: MIT-Press.
Frazier, L., & Flores d' Arcais, G., 1989. Filler-driven parsing: A study of gap filling in
Dutch. Journal of Memory and Language 28: 331-344.
Grimshaw, Jane. 1997. Projection, heads, and optimality. Linguistic Inquiry 28: 373-422
Haider, Hubert. 1997. Typological Implications of a directionality constraint on projections.
In A. Alexiadou & T. Hall (eds). Studies on universal grammar and typological variation, ed.,
Amsterdam: Benjamins. 17-33.
Harbert, W. 1983, On the Nature of the Matching Parameter. The Linguistic Review 2: 237284.
Hawkins, J. 1994. A performance theory of word order and constituency. Cambridge: CUP.
Hemforth, B., 1993. Kognitives Parsing: Repräsentation und Verarbeitung sprachlichen
Wissens. Sankt Augustin: Infix.
Jungermann, H., H.-R. Pfister & K. Fischer 1998. Die Psychologie der Entscheidung.
Heidelberg: Spektrum.
Kaan, E., 1997. Processing subject-object ambiguities in Dutch. Doctoral dissertation,
Rijksuniversiteit Groningen.
Keller, F. 2000. Evaluating Competition-based Models of Word Order In: L R. Gleitman & A.
K. Joshi (eds.) Proceedings of the 22nd Annual Conference of the Cognitive Science Society.
Mahawah, NJ: Lawrence Erlbaum. 747–752..
Kempen, G. & K. Harbusch. 2004. A corpus study into word order variation in German
subordinate clauses: Animacy affects linearization independently of grammatical function
assignment. In T. Pechmann and C. Habel, eds., Multidisciplinary approaches to language
production. Mouton De Gruyter, Berlin.
Kempen, G. & K. Harbusch. 2004. How flexible is constituent order in the midfield of
German subordinate clauses? A corpus study revealing unexpected rigidity. Ms.
Krems, J. 1984. Erwartungsgeleitete Sprachverarbeitung. Computersimulierungen von
Verstehensprozessen. Frankfurt/Main: Lang Verlag.
Kuhn, J. 2000. Generation and parsing in Optimality Theoretic syntax. Issues in the
formalization of OT-LFG. In: P. Sells 2000. Formal and empirical issues in Optimality
Theoretic Syntax. Stanford: CSLI Publications.
Kvam, Sigmund. 1983. Linksverschachtelung im Deutschen und Norwegischen. Tübingen: Niemeyer.
MacWhinney, B., Bates, E., & Kliegl, R. (1984). Cue validity and sentence interpretation in
English, Italian and German. Journal of Verbal Learning and Verbal Behavior, 23, 127-150.
Matzke, M., Mai, H., Nager, W., Rüsseler, J., & Münte, T. F. 2002. The cost of freedom: An
ERP-study of non-canonical sentences. Clinical Neurophysiology. 113: 844-852
Orgun, C.& R. Sprouse. 1999. From MPARSE to CONTROL: deriving ungrammaticality.
Phonology, 16, 191-224.
Pesetsky, D. 1997. Optimality theory and syntax: movement and pronunctiation. In: D.
Archangeli & D.T. Langendoen (eds). Optimality Theory: An Overview. Blackwell: Oxford.
Pesetsky, David (1998): Some Optimality Principles of Sentence Pronunciation. In: Barbosa,
P. et al. (eds.) Is the Best Good Enough? Cambridge, MA: MIT-Press. 337–383.
Phillips, Colin. 1996. Order and Structure. Doctoral dissertation. MIT
Prince, A. & P. Smolensky. 1993. Optimality Theory. Constraint Interaction in
Generative Grammar. Ms.
Pritchett, Bradley. 1992. Grammatical Competence and Parsing Performance. Chicago UP..
Pritchett, Bradley & John Whitman. 1995. Syntactic Representation and Interpretive
Preference. In: R. Mazuka & N. Nasai (eds.) Japanese Sentence Processing. Hillsdale, NJ:
LEA: 65-76.
Runner, J., R. Sussman, & M. K. Tanenhaus. 2004. Assigning reference to reflexives and
pronouns in picture noun phrases: Experimental tests of Binding Theory (submitted).
Schlesewsky, M., Fanselow, G., Kliegl, R., & Krems, J. 2000. The subject preference in the
processing of locally ambiguous WH-questions in german.. In B. Hemforth & L. Konieczny
(Eds.), German sentence processing (pp. 65-93). Dordrecht: Kluwer.
Stevenson, S., & P. Smolensky (1997). Optimal Sentence Processing. Conference on
Computational Psycholinguistics (CPL '97).
Smolensky, P & Stevenson, S. 2005. Optimality in sentence processing. Ms.
Vogel, R. 2001. Case Conflict in German Free Relative Constructions. An Optimality
Theoretic Treatment. In: G. Müller & W. Sternefeld (eds.), Competition in Syntax. 341–375.
Berlin: Mouton de Gruyter. 341–375.
Vogel, R. 2005. Degraded Acceptability and Markedness in Syntax, and the Stochastic
Interpretation of Optimality Theory. To appear in: G. Fanselow, C. Féry, M. Schlesewsky, &
R. Vogel (eds). Gradience in Grammar. Oxford, Oxford UP.