Download Some Predictions of Optimality Theory on Sentence Processing

Some Predictions of Optimality Theory on Sentence Processing, Grammaticality Perception, and Corpus Frequency* Gisbert Fanselow University of Potsdam 0. Introduction The distinction between ‘grammatical’ and ‘processing’ factors determining our linguistic behaviour, which is one of the guiding assumptions in generative grammar since Chomsky (1957, 1965), is a very natural one. There is ample evidence that aspects of human cognition such as working memory constraints influence language processing even though they are not language-particular. On the other hand, it cannot be denied that many aspect of grammar are not grounded in general cognition, since they are idiosyncratic and language-particular. That certain prepositions govern genetive case, while others combine with the dative, surely does not follow from general laws of cognition. For a long time, the view had been popular that performance data can tell us little about the structure of the grammatical knowledge/ grammatical competence. This view seemed to stem from the perception that the “Derivational Theory of Complexity”1 was a complete failure (Fodor, Bever & Garrett 1974, Pritchett & Whitman 1995, Phillips 1996). However, this perception was based on few empirical insights only (that were often not interpreted in an optimal way), and it arose relative to a syntacic theory (the Aspects-model, Chomsky 1965) that has now been abandonded for more than 30 years because of its grammatical inadequacies. More recent grammatical models are much more compatible with models of cognition. E.g., Pritchett (1992) constitutes an attempt to derive the psycholinguistic phenomenon of preferred readings in sentence processing from key aspects of Government & Binding (GB-) Theory (Chomsky 1981). MacWhinney, Bates, & Kliegl (1978) had shown that preferences of the human parser in the grammatical analysis of sentences can be modeled in a very effective way by assuming conflicting (language-independent) cues for the assignment of structure, and languagedependent strategies for resolving conflicts among these cues. The grammatical model of * The contents of this paper have grown out of two research projects funded by the Deutsche Forschungsgemeinschaft DFG: the “Innovationskolleg Formale Modelle Kognitiver Komplexität”, the “Forschergruppe Konfligierende Regeln”. For fruitful discussions, I would like to thank my collaborators in these groups: Damir Cavar, Caroline Féry, Stefan Frisch, Reinhold Kliegl, Matthias Schlesewsky, and Ralf Vogel. 1 According to the Derivation Theory of Complexity, the processing complexity of a sentence is a function of the number of derivational steps needed to generate the sentence in terms of the algorithm characterizing its grammaticality. Optimality Theory (OT) (Prince & Smolensky 1993, Grimshaw 1997) has an architecture reminiscent of such processing models. According to OT, Universal Grammar is made up of a set of universal constraints or principles. In contrast to what was assumed in GB-theory, the UG constraints may imply incompatible requirements for single sentences. Such conflicts among universal principles are resolved in a language-particular way, by giving the principles different ranks in different languages. The structural similarity between OT and successful processing models suggests the idea that the grammatical principles of an OT grammar may be of explanatory value in the psycholinguistic domain. Indeed, this has been argued for quite frequently (Stevenson & Smolensky (1997), Fanselow, Schlesewsky, Cavar & Kliegl (1999). Artstein (2000), Smolensky & Stevenson (in prep.), Kuhn 2000). In the spirit of such models, the present paper discusses three aspects of the grammar-processing interface that support a grammatical model with conflicting principles of the kind constituted by OT. These aspects are: reading time preferences (sect. 1) and graded grammaticality phenomona (sect. 2) resulting from principles with a very low rank, and the role grammatical competition plays in determining degrees of grammaticality (sect. 3). 1. Optimal Parsing Our initial observations are borrowed from Fanselow et al. (1999), a paper that tries to interpret certain similarities between explanations in sentence processing models and Optimality Theory. Much insight has been gained into the functioning of the human sentence processor by investigating how it copes with local ambiguities in the assignment of syntactic structure and grammatical functions. E.g., constituent questions require the preposing of the wh-phrase in languages such as English and German. In sentence (1), the grammatical function (subject vs. object) of the preposed wh-phrase welche Frau is not unambiguously determined until the determiner der or den of the second noun phrase is processed, since welche Frau is morphologically ambiguous between a nominative and an accusative interpretation. (1) a. welche Frau which hat gestern der woman has yesterday the-nom Mann angerufen man called “which woman did the man call yesterday?” b. welche Frau which hat gestern den woman has yesterday the-acc “which woman man called the man yesterday?” Mann angerufen man called When the human parser encounters such a local ambiguity, it will prefer a subject analysis of the preposed wh-phrase. This has been established first by Frazier & Flores d’Àrcais (1989) for Dutch, and it was later shown to be correct for English, German, Italian, and many other languages. Because of the limited scope of the present paper, we will refrain from entering a detailed discussion of factors modulating the subject preference (see, e.g., Kaan 1997). Rather, let us focus on what is responsible for the subject preference in (1). At first glance, it may seem that grammar (understood as the rule system underlying linguistic competence and behavior) does not distinguish between the two sentences in (1). After all, both obey the grammatical rule that the wh-phrase must be fronted in German (and English) constituent questions irrespective of its grammatical function. It is not surprising, then, that extragrammatical factors such as overall clause type frequency (see Cuetos & Mitchell 1988) or specific parsing heuristics of the language processing system (Frazier 1988, Frazier & Clifton 1996) (which may itself reduce to more profound principles of cognition) have been made responsible for parsing preferences. Such accounts are, however, not entirely satisfactory, because they leave the relation between laws of grammar and laws of processing quite open. Processing (and learnability) demands certainly have a shaping influence on grammar (see Fanselow 2002, and below), and they establish a relation between processing constraints and laws of grammar reflecting them, but parallels between grammar rules and processing facts go beyond the domain where one can easily argue for the primacy of the processing system. Consequently, attempts have been made to derive processing facts from a direct and transparent application of principles of grammar in the course of online processing. Pritchett (1992) is an excellent example of such models. Pritchett’s leading idea is that many observations concerning processing derive from the socalled Theta Criterion (2)2 of Chomsky (1981). When it processes a local ambiguity, the human parser is claimed to minimize the number of local/temporary violations of (2) (and other principles) in the partial structure it hypothesizes. Thus, when two grammatical analyses are possible for a locally ambiguous item, the parser will chosse the one which involves less violations of (2). (2) Each argument expression (NPs, PPs) must be linked to exactly one argument slot (theta-role) of a predicate. 2 The theta-criterion excludes sentences such as *John saw her a cat and *Bill donated Each theta role of a predicate must be linked to exactly one argument expression. In the case of (1), the clause initial wh-phrase welche Frau could be analyzed as a subject or an object before the determiner of the second NP is perceived. The human parser goes for the subject interpretation, and it can be argued that it does so because the subject interpretation incurs less temporary violations of (2) that the object-initial alternative. In case the wh-phrase is taken to be a subject, the partial analysis (3) can be built up by the parser after analyzing the first part of the input string. In this structure, V can be assumed be intransitive. (3) violates the theta-criterion (2) once, because welche Frau is not yet linked to an argument slot of a concrete verb. (3) [CP welche Frau … [IP t [VP V]] Compare this to the object analysis of clause-initial welche Frau. If there is an accusative object, there must be a subject, so the partial structure that needs to be postulated is (4). Here, we are confronted with (at least) two violations of (2): welche Frau is not linked to a thetarole of a verb (as in (3)), but the verb must be transitive (a subject is necessary when there is a direct object), so that there are two thematic roles, one of which cannot be linked to an NP. (4) [CP welche Frau … [IP NP* [VP t V]] If the parser operates such that it minimizes the number of violations of thematic roles, it will go for (3) rather than (4), so that the preference seems to be accounted for. The Pritchett model has, at least, three shortcomings, however. First, the grammatical principles cannot always be interpreted in the way they are in grammar when they figure in the parsing process. The theta-criterion of Chomsky 1981 says little, if anything, about the behaviour of structural slots for verbs and NPs that have been predicted in the parsing process. Second, there are more grammatical principles than just (2), and one needs a choice mechanism for situations in which principle X is less often violated in the analysis A, while principle Y is violated less often in the alternative B. Finally, Chomsky (1981) and similar grammatical theories do not allow that their constraints may be violated. The ‘real’ prediction these models make when applied directly in the parsing process is the following: whenever a structural hypothesis violates some principle, the parsing hypothesis should no longer be pursued. In effect, this excludes an application of grammatical principles in the parsing process. All these problems are circumvented in Optimality Theory Syntax (Grimshaw 1997), one of the two more recent elaborations of the Chomsky (1981) model. In the first place, OT constraints are all assumed to be violable. That they will not be fullfilled in a partial syntactic representation because the structure is not yet completed thus constitutes no problem: OT principles can also be violated in complete grammatical representations. Furthermore, their violability implies that they can be formulated in a very simple and general way. E.g., one could assume that surface order is determined by a constraint such as “Subjects precede objects” for German or English. Since it is violable, the constraint can postulated in spite of the fact that these languages have object.initial sentences. One only needs to assume that there are reasons that warrant the violation of the subject-first serialization principle3. Such a surface structure constraint would, however, correctly describe the “normal word order patterns” of German (if applied as a grammar principle), that we find in the absence of factors licensing its violation. Furthermore, the surface constraint will imply the parsing preference in (1) (if applied in sentence processing): only if welche Frau is analyzed as a subject, the surface serialization principle will be respected. Given the violable nature of the constraint, its application to complete structures does not differ from its application to structures that are only partially specified (as in syntax parsing). When later information forces the parser to abandon the subject hypothesis, no principled problem arises. The simple and general principles of OT often impose conflicting requirements on syntactic structure, so OT needs (and has) a conflict resolution component. According to OT, the principles are ordered in a language particular hierarchy, and in case of a conflict between two principles, the one with the higher rank in the hierarchy wins. The conflict resolution component needed for sentence parsing models thus comes for free in OT-syntax. Not only the syntactic principles, even the overall organization of grammar can be taken over from OT syntax to the OT parsing model. Ideally, an OT parsing model takes a very simple form. Suppose the parser has already computed the structure S for the partial syntactic string S, and suppose that W is the next word to be integrated. The generative procedure of the grammar then computes the set of grammatical possibilities S* by which W can be combined with S4. By applying the hierarchy of grammatical constraints on S*, the “best” extension E of S by W can be identified: when compared to its structural alternatives E’ in S*, E always violates the highest constraint on which they differ less often than E’. Several models for explaining parsing facts in terms of a 3 E.g., both languages require that constituent questions begin with a wh-phrase, a constraint that is more important than the subject > object principle. 4 The generative GEN component of OT grammars is unconstrained, so it will offer an infinite number of possible ways of linking W to S. There is, e.g., no restriction on the number if phonetically empty elements one might need to link W and S. In order to make OT parsing psychologically plausible, one thus needs restrictions in the structure building component. See Fanselow et al. (1999) for a proposal. direct application of OT syntax principles in processing have been proposed. (Stevenson & Smolensky (1997), Fanselow, Schlesewsky, Cavar & Kliegl (1999). Artstein (2000), Smolensky & Stevenson (i.V.)). The identification of the principles that determine how the class of candidate representations S* is restricted constitutes a particular challenge for OT parsing models, and so does the characterization of the principles that trigger backtracking and reanalysis. One aspect that deserves particular mentioning here is a prediction that seems to be particular for OT parsing models: it is to be expected that the syntax processing of a language L is also influenced by syntactic constraints visible in others language L’ even if they are of little or zero overt relevance in language L. It is easy to see why this prediction is made. OT assumes that all constraints are universal. Grammars do not differ by the constraints that they apply, rather, only the hierarchies among the principles are language particular. Consequently, conflicts between universal constraints will be resolved in different ways in different languages, yielding different grammatical outputs. Typically, the principles with a low rank will have little or no impact at all. Consider, e.g., a grammatical principle that we might call AgreeRel that requires that the head noun of a relative clause and the relative pronoun agree in Case independent of the Case imposed by the heads governing the two elements5. As Harbert (1983) shows, a principle implying this is necessary in an account of relative clauses in Ancient Greek and some other languages, in which the relative pronoun and the head noun indeed can agree even if that implies that one of them violates the Case requirements of the governing head, as is true for the genetive relative pronoun in (5), which fails to realize the accusative assigned by “possess”. (5) áksioi tes eleutherías hes worthy the freedom-gen which-gen kéktesthe you-possess Within OT, AgreeRel must be a universal principle. In German syntax, however, it has (nearly) no observable effect: the relative pronoun must always bear the case governed by the relative clause verb, as (6) illustrates: dessen cannot take over genetive from the head of the relative clause. An OT syntax of German has no problems in capturing this: AgreeRel only needs to be assigned a rank much lower than that of the constraint GovCase that requires that Cases governed by verbs be visibly realized. 5 For a theory of Case matching in (free) relative clauses, the reader is referred to Vogel (2001). (6) *wegen des Mannes dessen du siehst because the-gen man-gen who-gen you see When we consider the well-formedness of German relative clauses, AgreeRel has no chance to show any effect. Given its low rank, its requirements will always be ignored in the interest of respecting higher principles that require the realization of governed cases. The situation is quite different in the online-processing of German relative clauses. Recall that German is a verb-final language, which implies that the verb that factually governs the Case of the relative pronoun is the last word encountered in the processing of a relative clause. In spite of its high rank, GovCase therefore has no chance of exerting any effect (forcing a particular case) throughout the processing of the relative clause until the very last word is perceived. Consequently, AgreeRel is expected to influence the processing of the relative clause for quite some time, but when the verb is encountered, it will have to give way to GovCase. Consider (7) in this respect, an abstract representation of a relative clause embedded in, e.g., an object NP. When the online processing of the construction has reached point a, the Case of the head noun N of the relative clause will have been identified in most situations. Suppose, N bears accusative Case, since it is a standard object. (7) X [NP [ .. N [CP a Rel ….. t …. V] b Y c The relative pronoun encountered in the next step of the parsing project may have a morphologically marked case. Under this circumstance, AgreeRel will not have any effect on Rel, since the parser must not ignore explicit morphological information. But suppose that Rel is a Case-ambiguous relative pronoun (such as German die “who, fem.”, which could either be nominative or accusative). The morphology does not resolve the Case ambiguity, but AgreeRel will favor the accusative option, since it would be violated by a nominative relative pronoun. All other things being equal, AgreeRel will thus imply that Rel = accusative at stage b in processing the construction. Given that the relative clause verb has not been encountered at point b, GovCase cannot overwrite this effect of AgreeRel in the initial segment of the relative clause. Only when point c is reached and the verb is encountered, the choice suggested by AgreeRel will have to be given up when the nature of the verb or its morphology excludes the possibility that Rel is an accusative. In that case, one should see a reanalysis effect (e.g., higher reading times) on the verb, or, more generally, at any point in the relative clause when the predictions of the low ranked principle AgreeRel have to be abandoned because constraints with a higher rank demand their right. And in fact, such reanalysis effects can be found in experimental results concerning the processing of German relative clauses, as Fanselow, Schlesewsky, Cavar & Kliegl (1999) show. In a similar vein, Frazier (2000) discusses the following parsing preference in English and French. For a constellation in which a main clause is preceded by an initial subordinate clause, she observes a strong tendency to take the subjects of the two clauses to be coreferential when one of the subjects is pronominal. Frazier proposes that this is due to a violable grammatical principle that has a very low rank in English and French, and can exert its influence on parsing in the way described above. Frazier points out that in switch reference languages such as Mojave or Diyari, the pertinent principle seems to have a much higher rank, since there, one needs to obligatorily mark sentence for a switch in the reference of the two subjects in exactly the same constellation (initial adjunct clause followed by main clause) in which the reading preference can be observed in English. Her data constitue a further instance of the idea that principles with a low rank can have considerable parsing effects. One of the outstanding features of OT approaches to language processing thus lies in the prediction that grammatical laws G of some language L can be visible in the processing of some other language L’ even if G has no detectable effect on grammaticality in L’. This property makes OT parsing models extremely interesting, but only very few of the predictions have so far been tested empirically. 2. Gradient Acceptability In this section, we will see that and why graded acceptability facts are also in line with the key property of Optimality Theory just discussed: even if its low rank makes it invisible in terms of pure categorical grammaticality, a syntactic principle may have a clear effect on the degree of acceptability of a sentence. Standard syntactic models presuppose that grammaticality is a categorical notion: a sentence either is grammatical (i.e., compatible with the structural requirements of the language), or it is not. In contrast to this, acceptability is not categorical, but gradient. When confronted with a particular construction, we are often not certain whether it is well-formed or not, we judge constructions as marginally acceptable only, and we can (and do) rank sentence types according to their acceptability. Just as for parsing, the question arises what role grammar and grammar-external factors play in yielding graded acceptability. Ideally, we can uphold the assumption that grammaticality is a categorical concept, so that the graded nature of acceptability must be due to the other components6 that interact with grammar in determining acceptability. E.g., a sentence can be hard to process at different degrees, so any effect processing difficulty has on acceptability should be gradient. Chomsky & Miller (1963) convincingly argue that the low acceptability of center embedded structures is not a consequence of low grammaticality – rather, it is caused by the enormous processing difficulty center embedded structures such as (8) come with. (8) the man who the woman who the mosquito bit loves kicked the horse. Garden path sentence such as (9) constitue a further straightforward case of a processing influence on acceptability: the difficulty of identifying the correct structural analysis of the clauses in (naïve) online parsing implies low or zero acceptability, in spite of the fact that (9) is fully well-formed, according to the laws of English syntax. (9) the horse raced past the barn fell Examples such as (8) and (9) are more or less fully unacceptable because of the processing problem they come with. However, there are numerous examples of a less drastic reduction of acceptability due to processing problems. For example, German sentences that begin with an object are consistently rated as less acceptable than sentences that begin with a subject. This effect does not completely disappear when the sentence to be judged is placed into a favorable context, as Keller (2000) has shown. Corresponding processing difficulties of object-initial structures have been amply documented since Krems (1984), see also Hemforth (1993), Schlesewsky, Fanselow, Kliegl & Krems (2000), among others. The reduced acceptability of such sentences can be explained in terms of their processing problems if the latter do not merely reflect the grammatical difficulty. Indeed, the grammar-independent nature of the additional processing load of object initial sentences can be argued for easily. Based on a selfpaced reading study, Fanselow, Kliegl & Schlesewsky (1999) argue that the processing difficulty of object initial structures results from memory load: the preposed object is “reconstructed” by the human parser to its canonical position following the subject. Up to the point at which the subject is encountered in the parsing of a sentence, this reconstruction is impossible, so that the preposed phrase must be kept in memory. Recent ERP studies such as Felser, Clahsen & Münte (2003), Fiebach, Schlesewsky & Friederici (2002), and Matzke, Mai, Nager, Rüsseler & Münte (2002) support this interpretation. See Fanselow & Frisch (2005) for more data supporting a processing influence on graded acceptability. 6 Other factors influencing acceptability are the difficulty in constructing a context in which the utterance would be acceptable, the stylistic homogeneity of the utterance, its content, etc. Furthermore, Fanselow & Frisch (2005) have elaborated on an important aspect of the processing influence on acceptability. They discuss various structural constellations in which the intermediate analysis of a certain part of some construction is not in line with the final analysis. In spite of the fact that the intermediate analysis is finally abandoned, Fanselow and Frisch show that this temporary analysis has an influence on the overall acceptability of the sentence. On the one hand, local ambiguities can have a mitigating effect on (un-) acceptability7. When a structure begins with a locally ambiguous part, and one (but not both) of the readings is compatible with grammatical requirements (and if it is the locally preferred option), then the availability of this locally grammatical reading will increase the global acceptability of the sentence even if the structure is later disambiguated towards the other, ungrammatical interpretation. Such locally ambiguous constructions are more acceptable than structures that do not involve such an initial local ambiguity, while they are structurally identical otherwise. The local appearance of grammaticality thus can increases the overall acceptability of a structure independent of its global status. Similarly, a fully grammatical structure may also appear less acceptable. This can be illustrated by so-called “case-clash” phenomena first discussed (for German) by Kvam (1983) and experimentally investigated by Fanselow & Frisch (2005). They consider structures in which a wh-phrase or a relative pronoun has been moved out of a complement clause into the matrix clause. The grammatical properties of this preposed phrase must be in line with the selectional requirements of V2, and, in addition, they can but need not be compatible with the grammatical requirements that the matrix verb V1 would impose on its arguments. (10) … WH-PHRASE … V1 [CP … t V2 ] When the WH-Phrase in (10) could also be an object of matrix V1 (e.g, if the Case potentials of V1 and V2 are identical), the structure is rated as more acceptable than sentences in which the wh-phrase could not belong to the matrix clause (it has a Case incompatible with V1), even though the wh-phrase fits the requirements of the embedded verb V2 in both cases, and must be interpreted as a preposed complement clause object because of the selectional requirements in the complement clause. Fanselow & Frisch explain this result in the following way: in the initial phase of the processing of (10), the WH-phrase is checked for its availability as a matrix argument, and a 7 This happens, e.g., with noun phrases that are temporarily ambiguous for number, and fpr which later material forces a singular interpretation that is incompatible with other grammatical requirements. Case clash between it and the requirements of V1 leads to a perception of ungrammaticality. This temporary perception of ungrammaticality reduces the overall acceptability of the clause, even when the initial matrix clause attachment of the wh-phrase turns out to be completely irrelevant for the grammatical parse of the tree because the wh-phrase turns out to have been moved out of the complement clause. If the processing difficulty of intermediate structural analyses has an impact on the overall acceptability of a sentence, we expect there to be a second type of an influence by grammatical constraints on acceptability, which is indirect and gradient. Each grammatical principle comes with a certain processing cost, and this processing cost contributes to acceptability. The gradience of this indirect effect does not reflect a property of grammatical principles as such, but stems from the fact that this additional effect of grammar on acceptability is mediated by different degrees of processing difficulty. Furthermore, in an OT model one again expects that this mediated processing effect of grammatical principles on gradient acceptability can be triggered by constraints that are not effective in the grammar of the language at all, in the sense that they never (or rarely) yield true ungrammaticality because of their very low rank. Again, there is evidence for such an impact of low-ranked principles on acceptability. E.g., German does not respect the thattrace-filter that renders (11) ungrammatical. Subjects can be extracted from clauses introduced by a complementizer in German, in contrast to what holds in English. (11) *who do you think that _ loves Mary (12) wer denkst du dass die Wahl gewinnen wird who.nom think you that the election win will “who do you think will win the election” In spite of the fact that (12) is by no means ungrammatical in German, experimnental studies have shown that native speakers consider subject extraction from a complement clause less acceptable than corresponding object extractions (Featherston 2005a). Likewise, the superiority condition blocks object fronting in English multiple questions in the presence of a wh-subject (13). German does not rule such structures out (14), but experimental studies have shown that native speakers consider (14) much less acceptable than its subject-initial counterpart (Featherston 2005a, Fanselow & Frisch 2005). (13) *who did who see? (14) wen hat wer eingeladen who.acc has who.nom invited. Linguistic constraints thus can have effects on reading preferences and on graded acceptability even if they are irrelevant for grammaticality in a certain language. This insight constitutes an interesting constraint linguistic models must meet when they try to explain natural language syntax. The design of Optimality Theory predicts the existence of such phenomena. There is more but indirect evidence for acceptability being reduced due to an effect of grammatical constraints with a low rank. Bresnan and her colleagues (cf., e.g., Bresnan & Nikitina 2003, Bresnan, Dingare & Manning 2001) have shown that in a variety of domains (among them passives, the dative alternation in English, subject-verb agreement) “soft constraints” of a language like English are mirrored by “hard constraints” in other languages. A hard constraint is a rule that is strongly respected in the language in question (certain person-number combinations for subjects and objects are ungrammatical), while “soft constraints” are identified by statistical data in Bresnan’s work: certain grammatical constellations are not ungrammatical in a certain language, but only (very) infrequent. As Bresnan points out, the correlation between soft and hard constraints is easily explained in Optimality Theory, and thus constitutes another type of processing evidence for that model. How does corpus frequency relate to graded acceptability? How are soft constraints captured in OT according to Bresnan and her colleagues? Kempen & Harbusch (2004, 2005), Featherston (2005b), and many others have shown that corpus frequency and acceptability are correlated, but the relation is not a linear one. Only when the acceptability of a structure is fairly high, it will have a chance to occur in corpora with a non-negligable frequency. What is the reason for this correlation? Up to a certain extent, one can assume that production and perception difficulty are influenced by comparable constraints. This assumption is often made in theories that relate corpus frequency to processing difficulty (see, e.g,, Hawkins 1994). Under the OT perspective pursued here, one expects that the grammatical principles figure both in perception and production, while the two processing system might (but need not) differ with respect to the non-grammatical factors that determine perception/production difficulty. The more difficult to produce a structure is, the more time consuming its production will be, so that easier and faster competitors for formulating a certain thought are more likely to be articulated in most cases. By contributing to production difficulty, principles of grammar will influence relative corpus frequencies of contributing structures, even when they have a low rank (and would thus count as “soft” constraints). Bresnan and her colleagues derive that relation between hard and soft constraint in a different way. Suppose that the production procedure for sentences indeed makes use of the hierarchy of grammatical principles in a language in a fairly direct way. Given a strict ranking of grammatical principles, a certain type of thought (a certain “input”, in OT terms) should thus always be formulated in exactly the same way8. Psychological models (such as the “Elimination by Aspects” (EBA) model) that deal with decision taking on the basis of ranked principles (that are, thus, isomorphic to OT) assume that the hierarchy is not always perfectly worked through in actual decision taking (see Jungermann, Pfister & Fischer 1998). Rather, there is a certain likelihood (inversely correlated with the rank within the hierarchy) that a specific constraint is simply skipped in processing. Following Boersma’s (1998) proposal of Stochastitic OT (SOT), Bresnan and her colleagues make a comparable assumption on production: when the production system operates on a certain hierarchy, there is always a certain likelihood that a constraint C is not evaluated with the rank it actually should have, but at a different relativeposition in the hierarchy. SOT and EBA make comparable, but of course not identical predictions concerning syntax production. Both imply that sentences are not always produced on the basis of the “real” hierarchy of the language: there is a certain likelihood that production will skip or invert the rank of principles. This should yield an output different from the one generated by the “real” hierarchy. The frequencies of the different structural alternative in a corpus should then be a function of the likelihood that the principle crucial for the choice of a particular structure is skipped or reduced in rank. Corpus frequencies are thus expected to mirror the constraint hierarchy. Given the stochastic nature of human cognition, SOT and/or EBA are plausible descriptions of what happens syntactically in the production process. Nothing of what we have said so far is incompatible with the possibility that the stochastic nature of processing also contributes to corpus frequencies, in addition to grammar based production difficulty, and further factors. In that sense, there is no conflict between SOT and our assumptions. Note, however, that although SOT and EBA-models may be good at modelling corpus frequencies, they have nothing (direct) to say about gradient acceptability. Corpus frequencies can 8 This would not necessarily hold in case the grammar contains principles with an identical rank (“tied” principles), a possibility we can ignore here. be predicted in these models because these result from the repeated execution of the same computational task, the outcome of which will be influenced by the stochastic properties of the processing system. If applied to the task of formulating an acceptability judgement, SOT and EBA would lead us to the expectation that we assess the acceptability of a structure differently on different occasions. There is no direct way, however, of explaining why a sentence may have a gradient acceptability value at a single evaluation. Stochastic models cannot explain this without further assumptions9. 3. The relative nature of grammatical judgments The topic of the preceding two sections has been a particular implication of Optimality Theory for language processing: principles with a very low rank can influence the parsing and the production process directly, and gradient acceptability, indirectly. There seems to be no other current grammatical model that makes these predictions, and to the extent that the available evidence is representative, it seems that these predictions are borne out. Let us conclude by turning to a further consequence Optimality Theory has for language processing and acceptability judgements: sharp and categorical ungrammaticality should not be visible in that part of language that is characterized by what Pesetsky (1997, 1998) called “ineffability”. As we have mentioned above, parsing models using Optimality Theory need to answer a very profound question: under what conditions will the parser give up a temporary analysis? In standard parsing approaches, this question has a simple answer. Suppose the parser has postulated the analysis S for the string it has analyzed so far, and suppose that the next word to be parsed is W. If there is no way of combining S with W without violating a grammatical principle, S will have to be abandoned, and a new analysis S’ must be found for the input string parsed so far. In OT parsing, the violation of a principle of grammar never can be a sufficient reason for abandoning a structural hypothesis. After all, according to OT, there is no grammatical output in any language of the world that would not violate one or the other grammatical principle10. Furthermore, given the construction of the generative component GEN of OT, it is hardly conceivable that a given structure S and a word W could not be combined structurally in some (albeit extremely complex) way. The procedure of OT parsing will only select the best of the various complex ways of linking S and W, and there will necessarily be one “best” candidate. 9 One might entertain the hypothesis that the perception of different degrees of frequency influences gradient acceptability, perhaps, because they have an impact on the degree of confidence in the acceptability judgements. 10 E.g., the two contraints that require a head appear at the left or the right periphery of a phrase, respectively, cannot be fulfilled simultaneously. A simple OT parser cannot decide that none of the postulated structures is “good enough” for being maintained. Syntactic hypothesis are just evaluated relative to other possibilities, there is, on principled grounds, never an evaluation relative to some absolute standard. This difficulty for incremental OT parsing is a serious one, but can be resolved (see Kuhn 2000) if one adopts the idea that there is bidirectional optimization, and that bidirectional optimization is what happens in the parsing process. In standard OT, the grammatical competition tries to identify the best form for a given input/meaning. In simple OT parsing, one tries to identify the best interpretation (structurally, and, finally, semantically) for a given form. The “direction” of optimization is thus different in the two domains. If syntactic parsing is indeed bidirectional (as Kuhn 2000 suggests), each parsing step will first identify what is the best structure S and meaning M for the segment I of the input string that has been analysed so far. But then, in a second step, the parser computes what would be the best (partial) expression I’ for the meaning M it has just hypothesized. If I’ coincides with the input string I that has been parsed so far, parsing proceeds to the next word, but if the input string I is not the best way of expressing the meaning M the parser has hypothesized for it, the structural hypothesis for I must be given up. Crucially, this means that a structural hypothesis is abandoned whenever there is a better way than the input string itself of expressing the meaning hypothesized for the input string. The (temporary) perception of ungrammaticality is not based only on the parsing of an input string, but rather on a production oriented optimization for the form. A structure will be perceived as (temporarily) ungrammatical only if it is kicked out of the competition by the identification of a better way of formulating its meaning. Within the reach of standard Optimality Theory, this reflection is of little importance. Standard OT assumes that the application of the grammatical hierarchy is always successful, in the sense that the optimal form for realizing an underlying representation in phonology, a morphological constellation, or some structured meaning, can always be found. For logical reasons, there must be (at least) one structure that is not worse than its competitors. It is identified as optimal and grammatical, and the ungrammaticality of the others is caused by the optimality of the winning competitor. However, as pointed out by Pesetsky (1997, 1998) for syntax, it is simply not true that every concept or every thought can be formulated, there is “ineffability”. See Fanselow & Féry (2001) for a discussion of various approaches to ineffability that have been proposed in OT. Fanselow & Féry (2002) also try to characterize the domains of language in which ineffability occurs. One remarkable aspect that they did not comment on is that ineffable domains rarely come with “absolute” ungrammaticality. Although there is no really good expression for the input in case of ineffability, at least one of the candidate representations seems marginally possible. In German syntax, ineffability arises in a number of domains: there is no clear-cut resolution for the conflict between the general requirement that verbs must appear in second position, and some principles that rule out the movement of particular verbs11, or block verb movement when it affects semantic scope. When a wh-phrase is moved out of a syntactic island, there is no better way of expressing the pertinent wh-question, while extraction from the island is impossible as well. In a free relative clause, the relative pronoun must meet the case requirements of the matrix and the complement verb. If these do not coincide, ineffability may arise, depending on the position of the free relative clause, and on dialect. Remarkably, absolute ungrammaticality seems rare in such domains. See Vogel (2005) for the gradient nature of case conflict violations in German free relative clauses. That island violations do not (always) yield strong ungrammaticality is a notorious fact. Even for the verb movement problem, it seems that one can always identify one structural candidate which is not perfect, but marginally possible. Exactly the same seems to be true in the domain of morphology. E.g., the addition of the diminuitive marker -chen triggers Umlaut in German, but if the umlauted vowel is in an unstressed position, a phonological condition is violated (Féry 1994). In this constellation, diminuitive formation is blocked, but some of the conceivable realization of the diminuitive form (e.g., if umlauf is not applied) do not sound extremely bad (see Fanselow & Féry 2001). This seems to be true of many other morphological gaps, too: the ‘impossible’ form is not completely ungrammatical. If such examples are representative, we can conclude that the third prediction of the OT parsing model is fulfilled as well. A (partial) structure is perceived as ungrammatical only when a better competitor has been identified. A situation in which all candidate representations are absolutely ungrammatical cannot arise – but the acceptability of the winner may be negatively affected by further, extragrammatical conditions. “Ineffability” arises, whenever these extragrammatical conditions are strong enough to sharpy reduce acceptability. For the syntax, sentence processing difficulty may lead to ineffability, for the morphology, a lexical control component (see Orgun & Sprouse 1999, and Fanselow & Féry 2001 for a discussion) comes into play. 11 For example. verbs with two prefixes cannot be moved to the second position in matrix clauses. They are fully grammatical in final position only. See Haider (1997). Sometimes ineffability seems to go with complete unacceptability, however. In syntax, extractions out of adjunct clauses may be a case in point (*howi did you praise him although he behaves ti). We can try to derive the high degree of unacceptability from the assumption that questioning part of an although-clause is conceptually ill-formed (the proposition expressed by of the although-clause cannot be directly asserted or questioned). Many languages impose strong and (apparently) absolute restrictions (relative to person and animacy features) against certain subject-object-combinations. Here, the lexicon might help to explain the high degree of inacceptability: it lacks an entry for the pertinent inflectional form of the verb. The gradient nature of ineffability constitues indirect evidence for the claim that ungrammaticality arises only when a better competitor has been identified. Runner, Sussman & Tanenhaus (2004) provide further evidence coming from an eye-movement study concerned with the processing of reflexive and personal pronouns in syntactic constellations crucial for binding theory. They show that experimental subjects consider referents incompatible with the requirements of the binding theory early on in processing for both reflexives and for pronouns. In the case of personal pronouns, the crucial fact is that referents appropriate for reflexives are also looked at. Apparently, all (relevant) options are checked, and grammatical principles (probably, the availability of a reflexive expression) then eliminate incorrect choices. 4. Conclusions Linguistic theorizing is confronted with the problem that the facts of language underdetermine the choice of the correct grammar. It would thus be welcome if more data –corpus frequency, reading preferences, degrees of ungrammaticality- could bear on this issue. Due to its architectural properties, OT can be applied in a very direct way in processing theories. It makes very specific predictions concerning the processing role played by princples with a low rank that seem to be borne out. References Artstein, R. 2000. Case constraints and empty categories in Optimality Theory parsing. Unpublished manuscript. Boersma, P. 1998. Functional Phonology: Formalizing the Interactions between Articulatory and Perceptual Drives. The Hague: Holland Academic Graphics. Bresnan, J., Dingare, S., and Manning, C. 2001. Soft constraints mirror hard constraints. In Proceedings of the LFG01 Conference. Stanford, CSLI Publications, 13-32. Bresnan, Joan and Tatiana Nikitina. 2003. On the gradience of the dative alternation. Unpublished manuscript. http://www-lfg.stanford.edu/bresnan/new-dative.pdf. Chomsky, Noam. 1957, Syntactic Structures. The Hague: Mouton. Chomsky, N. 1965. Aspect of the Theory of Syntax. Cambridge, Mass.: MIT-Press. Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N., & G. Miller. 1963. Finitary Models of Language Users. In: R.D. Luce, R. Bush. & E. Galanter (eds). Handbook of Mathematical Psychology Vol 11, New York: John Wiley. Cuetos, F. & D.C. Mitchell, 1988. Cross-linguistic differences in parsing: Restrictions on the use of the Late Closure strategy in Spanish. Cognition, 30, 73-105. Fanselow, G. 2003. Wie ihr Gebrauch die Sprache prägt. In: S. Krämer& E. König (eds.): Gibt es eine Sprache hinter dem Sprechen? Frankfurt: Suhrkamp, 2003: 229 - 261. Fanselow, G. & C. Féry. 2002. Ineffability. In: G. Fanselow & C. Féry (eds.). Resolving Conflicts in Grammar. Special Issue of Linguistische Berichte 265-307. Fanselow, G. & S. Frisch. 2005. Effects of processing difficulty on judgments of acceptability. In: G. Fanselow, C. Féry, M. Schlesewsky, & R. Vogel (eds.). Gradience in Grammar. Oxford: OUP. To appear. Fanselow, Gisbert, Reinhold Kliegl & Matthias Schlesewksy. 1999. Processing Difficulty and Principles of Grammar. In: S. Kemper & R. Kliegl (eds). Constraints on Language. Aging, Grammar, and Memory. Boston: Kluwer. 171-201. Fanselow, G., M. Schlesewsky, D. Ćavar, & Reinhold Kliegl. 1999. Optimal Parsing, Unpublished ms., and Rutgers Optimaly Archive ROA 382, 1999. Featherston S. 2005a. Universals and grammaticality: Wh-constraints in German and English. Linguistics 43 (4). Featherson, S. 2005b. The Decathlon Model of Empirical Syntax. Ms. Felser, C., H. Clahsen, & T. Münthe. 2003. Storage and Integration in the Processing of Filler-Gap Dependencies: An ERP Study of Topicalization and Wh-Movement in German Brain and Language 87 (2003), 345-354. Féry, Caroline (1994): Umlaut and inflection in German. Ms, University of Tübingen. [also at: Rutgers Optimality Archive ROA-33-; http://roa.rutgers.edu]. Fiebach, C., M. Schlesewsky, & A. Friederici. 2002. Separating syntactic memory costs and syntactic integration costs during parsing: The processing of German WHquestions. Journal of Memory and Language.47: 250-272. Fodor, J., T. Bever, & M. Garrett. 1974. The psychology of language. New York: McGraw Hill. Frazier, L. 1978. On comprehending sentences: Syntactic parsing strategies. Doctoral dissertation, University of Connecticut. Frazier, L. 2002. The pronoun bias effect strikes again: a reply to Pynte and Colonna. Journal of Psycholinguistic Research. 30: 601-604. Frazier, L. & C. Clifton. 1996. Construal, Cambridge, Mass.: MIT-Press. Frazier, L., & Flores d' Arcais, G., 1989. Filler-driven parsing: A study of gap filling in Dutch. Journal of Memory and Language 28: 331-344. Grimshaw, Jane. 1997. Projection, heads, and optimality. Linguistic Inquiry 28: 373-422 Haider, Hubert. 1997. Typological Implications of a directionality constraint on projections. In A. Alexiadou & T. Hall (eds). Studies on universal grammar and typological variation, ed., Amsterdam: Benjamins. 17-33. Harbert, W. 1983, On the Nature of the Matching Parameter. The Linguistic Review 2: 237284. Hawkins, J. 1994. A performance theory of word order and constituency. Cambridge: CUP. Hemforth, B., 1993. Kognitives Parsing: Repräsentation und Verarbeitung sprachlichen Wissens. Sankt Augustin: Infix. Jungermann, H., H.-R. Pfister & K. Fischer 1998. Die Psychologie der Entscheidung. Heidelberg: Spektrum. Kaan, E., 1997. Processing subject-object ambiguities in Dutch. Doctoral dissertation, Rijksuniversiteit Groningen. Keller, F. 2000. Evaluating Competition-based Models of Word Order In: L R. Gleitman & A. K. Joshi (eds.) Proceedings of the 22nd Annual Conference of the Cognitive Science Society. Mahawah, NJ: Lawrence Erlbaum. 747–752.. Kempen, G. & K. Harbusch. 2004. A corpus study into word order variation in German subordinate clauses: Animacy affects linearization independently of grammatical function assignment. In T. Pechmann and C. Habel, eds., Multidisciplinary approaches to language production. Mouton De Gruyter, Berlin. Kempen, G. & K. Harbusch. 2004. How flexible is constituent order in the midfield of German subordinate clauses? A corpus study revealing unexpected rigidity. Ms. Krems, J. 1984. Erwartungsgeleitete Sprachverarbeitung. Computersimulierungen von Verstehensprozessen. Frankfurt/Main: Lang Verlag. Kuhn, J. 2000. Generation and parsing in Optimality Theoretic syntax. Issues in the formalization of OT-LFG. In: P. Sells 2000. Formal and empirical issues in Optimality Theoretic Syntax. Stanford: CSLI Publications. Kvam, Sigmund. 1983. Linksverschachtelung im Deutschen und Norwegischen. Tübingen: Niemeyer. MacWhinney, B., Bates, E., & Kliegl, R. (1984). Cue validity and sentence interpretation in English, Italian and German. Journal of Verbal Learning and Verbal Behavior, 23, 127-150. Matzke, M., Mai, H., Nager, W., Rüsseler, J., & Münte, T. F. 2002. The cost of freedom: An ERP-study of non-canonical sentences. Clinical Neurophysiology. 113: 844-852 Orgun, C.& R. Sprouse. 1999. From MPARSE to CONTROL: deriving ungrammaticality. Phonology, 16, 191-224. Pesetsky, D. 1997. Optimality theory and syntax: movement and pronunctiation. In: D. Archangeli & D.T. Langendoen (eds). Optimality Theory: An Overview. Blackwell: Oxford. Pesetsky, David (1998): Some Optimality Principles of Sentence Pronunciation. In: Barbosa, P. et al. (eds.) Is the Best Good Enough? Cambridge, MA: MIT-Press. 337–383. Phillips, Colin. 1996. Order and Structure. Doctoral dissertation. MIT Prince, A. & P. Smolensky. 1993. Optimality Theory. Constraint Interaction in Generative Grammar. Ms. Pritchett, Bradley. 1992. Grammatical Competence and Parsing Performance. Chicago UP.. Pritchett, Bradley & John Whitman. 1995. Syntactic Representation and Interpretive Preference. In: R. Mazuka & N. Nasai (eds.) Japanese Sentence Processing. Hillsdale, NJ: LEA: 65-76. Runner, J., R. Sussman, & M. K. Tanenhaus. 2004. Assigning reference to reflexives and pronouns in picture noun phrases: Experimental tests of Binding Theory (submitted). Schlesewsky, M., Fanselow, G., Kliegl, R., & Krems, J. 2000. The subject preference in the processing of locally ambiguous WH-questions in german.. In B. Hemforth & L. Konieczny (Eds.), German sentence processing (pp. 65-93). Dordrecht: Kluwer. Stevenson, S., & P. Smolensky (1997). Optimal Sentence Processing. Conference on Computational Psycholinguistics (CPL '97). Smolensky, P & Stevenson, S. 2005. Optimality in sentence processing. Ms. Vogel, R. 2001. Case Conflict in German Free Relative Constructions. An Optimality Theoretic Treatment. In: G. Müller & W. Sternefeld (eds.), Competition in Syntax. 341–375. Berlin: Mouton de Gruyter. 341–375. Vogel, R. 2005. Degraded Acceptability and Markedness in Syntax, and the Stochastic Interpretation of Optimality Theory. To appear in: G. Fanselow, C. Féry, M. Schlesewsky, & R. Vogel (eds). Gradience in Grammar. Oxford, Oxford UP.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Some Predictions of Optimality Theory on Sentence Processing