Download Lecture 5 - ELTE / SEAS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Kannada grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Preposition and postposition wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Latin syntax wikipedia , lookup

English clause syntax wikipedia , lookup

Junction Grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Antisymmetry wikipedia , lookup

Pipil grammar wikipedia , lookup

Transformational grammar wikipedia , lookup

Transcript
BMN ANGD A2 Linguistic Theory
Lecture 5: Derivation Verses Representation
1
Standard and Extended Standard Theory
The beginning of the 1970s marked a shift in linguistic theorising from a number of
perspectives, one of which we discussed last week arising from the introduction of
constraints. Alongside this, the X-bar theory of phrase structure was introduced (Chomsky
1970), which at the time was taken to be a set of constraints on possible phrase structure rules
in a similar way to how Island constraints constrained possible movements. Thus, we went
from a theory which looked as in (1), in the 1960s to one which looked as in (2), in the
1970s:
(1)
Phrase Structure Rules
Lexicon
Deep Structures
Transformations
Surface Structures
(2)
Phrase Structure Rules
Lexicon
X-bar Theory
D-Structures
Transformations
Constraints on Transformations
S-Structures
The change of terms ‘Deep Structure’ to ‘D-structure’ and ‘Surface Structure’ to ‘S-structure’
was merely a way to avoid the word deep and surface that Chomsky felt had unfortunate
connotations. The important differences are in the additions to the theory. The original theory
became known as Standard Theory and that in (2) as Extended Standard Theory.
2
Further Developments
Before we continue to the main theme of the present lecture it is necessary to mention a few
developments that also took place in the 1970s which while having no direct bearing on our
topic, will make our discussion more straightforward if they are introduced at this point.
Mark Newson
The first concerns the complementiser and associated elements. During the 60s, if the
complementiser was included at all in discussion, it was simply assumed to be part of the S
node, along with the subject and VP:
(3)
S
that
NP
VP
the man ran away
Indeed, it was only in the last part of the 1960s that the term complementiser started to be
used to refer to this element (Rosenbaum 1967). In 1970, Bresnan proposed that the
complementiser stood outside the clause, forming another constituent with it which she
labelled S. This analysis was rapidly accepted along with the assumption that the position
occupied by the complementiser also hosted the fronted wh-element in interrogatives and
relative clauses. The standard analyses of such clauses was as below:
(4)
S
S
COMP
S
COMP
that
the man ran away
S
why COMP the man ran away
As we can see the wh-element was not assumed to occupy the same position as the
complementiser, but be adjoined to it. We need not delay over the justifications of these
assumption, but they are necessary to know about for the following discussion.
The second important development which has its roots in several works of the early 1970s
but is usually attributed to Fiengo (1974), is the assumption that when an element undergoes
a movement, it leaves behind an abstract phonologically empty element of the same category
as itself, known as a trace. Traces are usually indicated by the letter ‘t’ and this is indexed
with the moved elements to show which part of the structure it is the trace of. Again, we do
not need to go too far into the justification of this assumption at the moment, but it is easy to
see that it limits the power of transformations in terms of what they are capable of doing:
(5)
S
NP Aux
was
S
VP
V

NP
NP Aux VP
John was
seen John
(6)
seen
S
NP Aux
was
V
S
VP
V
NP

NP Aux
John1 was
seen John
2
VP
V
NP
seen
t1
Derivation Verses Representation
As we can see from (5), if the trace convention is not used, the transformation rearranges the
structure in some quite drastic ways as whole parts of the tree are made to disappear.
Moreover the lexical items involved also undergo changes: the verb which is in a transitive
environment at D-structure is in an intransitive environment at S-structure, which is
tantamount to saying that a transitive verb is made intransitive by the transformation. In (6),
where a trace is assumed, the only thing the transformation does is to move the object into
subject position and insert the trace itself. Structurally and lexically nothing much changes.
Without traces then transformations have to be assumed to be able to make more radical
changes and hence are more powerful mechanisms.
Having introduced these notions, we can now proceed to the main topic of the lecture.
3
Filters
Besides using constraints on transformations to counteract over-generation, Ross also used
another device which has become known as a Filter. Filters do the same job as a constraint,
deeming ungrammatical some structure which would be predicted to be grammatical by the
free application of a transformation. However, instead of imposing a restriction on the
transformation itself, Filters work by imposing restrictions on the structures that are formed
by transformations, i.e. S-structures. A simple example can be taken from a paper by Ross
written in 1972.
We have previously mentioned the idea of early transformational grammar that gerunds
might be derived from underlying sentences by a transformation. Thus, there is an obvious
relationship between the following examples which is exactly the sort of thing that
transformations were formulated to account for:
(7)
a
b
John wrote a letter
John’s writing a letter (pleased his mother)
However it is that we are to deal with this relationship, note that it is the first verbal element
which has the gerund –ing form, whether this be a main verb or an auxiliary:
(8)
a
b
c
John’s writing a letter
John’s having written a letter
the letter’s being written (by John)
(John wrote a letter)
(John had written a letter)
(the letter was written (by John))
However, there is an unexpected gap in this paradigm, we cannot form a gerund when the
progressive auxiliary is the first element:
(9)
a
b
John’s having been writing a letter
* John’s being writing a letter
(John had been writing a letter)
(John was writing a letter)
Note that there is nothing wrong with the underlying sentence associated with the gerund in
(9b), so the problem seems to be due only to whatever is involved in forming the gerund
itself. Ross’s suggestion was that what is wrong with (9b) is nothing to do with the process
by which it is formed, but simply that we are not allowed to have two verbal elements in the
‘-ing’ form next to each other. That this is so can be seen by the following data which are
produced using different mechanisms to those in (9):
3
Mark Newson
(10)
a
b
it began to rain
it is beginning to rain
it began raining
* it is beginning raining
In this case we have a verb in its continuous form followed by a gerund complement rather
than an auxiliary in its gerund form followed by a continuous verb. Hence the two structures
are different, yet the result is the same: two adjacent verbs in their –ing form is
ungrammatical.
We might propose therefore a condition which simply states that however the structure is
formed, if the result has two adjacent –ing verbs, it will be ungrammatical:
(11)
The Double-ing Filter
* … V-ing V-ing …
The first thing to note it that this does not work in the same way that a constraint does: a
constraint directly restricts the operation of a transformation and so limits what S-structures
can be associated with a given D-structure. A filter, on the other hand, simply says that
certain S-structures, no matter how they are formed, are ungrammatical and so completely
ignores the issue of the relationship between D- and S-structure.
We can extend the model of the grammar given in (2) in the following way to include filters:
(12)
Phrase Structure Rules
Lexicon
X-bar Theory
D-Structures
Constraints on Transformations
Transformations
S-Structures
Filters
A number of points are raised by these suggestions. Why, for example, do we need two
different mechanisms to do the same job? Both constraints and filters check the overgeneration of transformations, so why do we need both?
I suspect that the answer to this question at the time that the early filters were proposed was
simply because they were the most obvious way to state the restriction. To formulate a
constraint that prevented the generation of double –ing structures, although perhaps not
impossible, would result in something rather complex and inelegant. The Double-ing Filter
at least has simplicity on its side. Moreover, there is an underlying assumption here that
might not be totally accurate: that every phenomenon that might be ruled out by a filter is the
result of some transformation. This is not guaranteed. For example, the offending
phenomenon may be the result of the phrase structure rules which survives into S-structure
because transformations do not affect it. In this case it would be difficult to get a constraint
on transformations to rule out the problematic S-structure.
4
Derivation Verses Representation
This said, however, there is a serious theoretical point here concerning the nature of human
grammatical systems. Given that constraints and filters both perform the same function, it
would obviously be a simpler system that made use of just one of these mechanisms that one
that makes use of both. Only if there is empirical evidence that both are needed would we be
wise to continue with the more complex assumption. As it turns out, although it may not be
possible to replicate the effect of a particular filter by a particular constraint, or vice versa, it
is always possible to replicate an effect of any system which makes use of only filters with
one which makes use of only constraints, and vice versa. For example we can arrange the
grammar so that a particular S-structure configuration is not generated without there being a
specific constraint which accounts for its absence. Thus the question is still valid as to which
of the two mechanisms for countering over-generation should be used. This turns out to be an
extremely important issue which ultimately relates to the question of whether we need
transformations or not. We will return to this issue in a later section.
A second point that one might raise concerning the use of filters is that as they are associated
with surface phenomena, isn’t it the case that they are rather superficial themselves, merely
translating observations into grammatical mechanisms meant to account for the observations?
In other words, is it not the case that filters are not just hopelessly descriptive in nature and
cannot hope to achieve any level of explanation? The issue revolves on just how surfacy the
filters have to be. For example, the double-ing filter seems to be very surface based given that
it directly addresses a particular structure of English. However, this does not necessarily
mean that it cannot be restated in a less surface based way. There is evidence that the doubleing effect is not just a one off observation, but is part of a larger set of observations which
indicate something more general is going on. For example, Chomsky and Lasnik (1977)
proposed a filter to account for the following observation:
(13)
a
b
c
d
e
solitude is what she longs for
for him to leave is what she longs for
she longs for solitude
* she longs for for him to leave
she longs for him to leave
The verb long takes a PP complement headed by for. It can also be associated with an
infinitival clause introduced by the complementiser for. However, it cannot have the
preposition and the complementiser together, though as shown by (13b), there is nothing to
stop these appearing in the same construction, as long as they are separated. Thus we might
suppose a filter which rules out the adjacency of the preposition and complementiser for:
(14)
The For-For Filter
* … for [S for …
Obviously this filter is just as surface based as the double-ing filter, but they also share
something in common: both involve the adjacency of two elements with identical forms
expressing different grammatical entities; the gerund and progressive ing and the preposition
and complementiser for. Is this just coincidence, or is there something deeper going on? We
can note that similar observations can be made concerning other phenomena in other
languages. Take NPs in Hungarian for example:
5
Mark Newson
(15)
a
b
c
c
az ő kalapja
the he hat
“his hat”
az ember kalapja
the person hat
“the person’s hat”
* az az ember kalapja
az embernek a kalapja
In (15a) we see that the Hungarian determiner a(z) precedes the possessor. In this case, this
determiner must be associated with the possessed noun, not the possessor, as Hungarian
pronouns cannot co-occur with determiners:
(16)
* az ő olvasott egy könyvet
the he read a book
The possessor can also be an NP with its own determiner, as shown in (15b). However, we
cannot have both a determiner for the possessed noun and the possessor together in such a
way that the two determiners are adjacent (15c), though they are fine when separated (15d).
Again, we seem to have a ban on two identical forms with different functions following each
other adjacently.
Strangely enough, similar observations can be made in phonology too. Leben (1973)
proposed a kind of phonological filter which ruled out two similar tones being adjacent. So
while we find HL and LH tone patterns in various languages we do not find HH and LL
patterns. This restriction was termed the Obligatory Contour Principle. It may be that doubleing, double for and double az phenomena are the syntactic versions of violations of the
Obligatory Contour Principle, in which case it seems that there is something very deep rooted
and not at all surface based that is being filtered out in the linguistic system. What this deep
filter is and how to formulate it are other matters which take us beyond the boundaries of this
lecture, however.
On the other hand, it has to be admitted that many of the filters that were proposed in the
1970s were not particularly deep and had the feel of a descriptive device waiting for
something more explanatory to come along and replace it. For example, it is a straightforward
observation that complementisers such as that only introduce subordinate clauses and not
main clauses:
(17)
a
b
I think [that he left]
* that he left
From a traditional point of view, this is not remarkable, as complementisers were classified as
subordinators, i.e. things which introduce subordinate clauses. Yet from the generative
position, there was something to be explained. To start with, given that main clauses can take
a fronted wh-element and assuming these to move to the COMP position, this means that
main clauses have COMP positions, even if they cannot be filled by a complementiser.
Moreover, not all subordinate clauses are introduced by a visible complementiser. The idea
emerged then that all clauses have a COMP position, but that this maybe left unfilled, or
perhaps it was filled by some phonologically empty complementiser:
6
Derivation Verses Representation
(18)
S
COMP
S
e
he left
The question that arises is why the COMP position must be empty in main clauses, given that
it may be filled in subordinate clauses. Chomsky and Lasnik (1977) proposed that this was
due to a filter that deemed root clauses to be ungrammatical if beginning with a
complementiser:
(19)
The Root Clause Filter
* [S COMP [ …
if COMP is filled by an overt complementiser and S is root
Apparently not all languages behave like this and so this has to be seen as a language specific
filter, which perhaps makes it even more unappealing as it simply describes the surface
distribution of a category in certain languages. It might be better to deal with this in terms of
a lexical property ascribed to complementisers, more or less admitting that they have a
subordinating function.
Another very well known, but ultimately descriptive filter from the same paper as the Root
Clause Filter accounts for the following observations:
(20)
a
b
c
a man [who1 COMP [I met t1]]
a man [who1 (that) [I met t1]]
* a man [who1 that [I met t1]]
A restrictive relative clause in English may be introduced by a wh-element, in which case the
COMP position is left empty. On the other hand, the wh-element may undergo a deletion
process, in which case the COMP position may or may not be left empty. What is not allowed
is for the wh-element to be undeleted and the COMP position to be filled. Thus Chomsky and
Lasnik proposed the following:
(21)
The Doubly Filled COMP Filter
* [S WH + COMP [ …
if WH is not deleted or COMP is not empty
While this has the feel of a descriptive device, it is difficult to come up with some more
deeper account for it. Certainly it would be hard to claim that it was the result of some lexical
property.
One of Chomsky and Lasnik’s descriptive filters turned out to be the basis of a more deeper
principle of the 1980s, which we will talk about next week. The observation was a rather
puzzling one, which had been known about since Perlmutter (1971). It seems that while the
presence of a complementiser does not affect the grammaticality of a clause with an extracted
wh-element from object position, it does when the wh-element is extracted from subject
position:
(22)
a
b
who1 did you think [that Mary called t1]
who1 did you think [Mary called t1]
7
Mark Newson
(23)
a
b
* who1 did you think [that t1 called Mary]
who1 did you think [t1 called Mary]
Given that what seems to be the problem here is the combination of a movement from the
subject position and an overt complementiser, Chomsky and Lasnik proposed the following
filter:
(24)
The that-trace Filter
* [that [ t …
In other words, when the complementiser is immediately followed by a trace, the result is
ungrammatical. While this is clearly descriptive, as mentioned above, it did serve as the
starting point for a more explanatory account in the 1980s. We will review this development
in a later lecture.
The final filter we will discuss here is also one introduced by Chomsky and Lasnik. The
starting point for this is the following observations:
(25)
a
b
c
d
they want [him to leave]
they want [__ to leave]
they want very much [for him to leave]
* they want very much [for __ to leave]
The example in (25) demonstrate the distribution pattern of subjects in infinitive clauses.
Essentially this subject can be overt when it is immediately preceded by certain verbs of by a
for complementiser. However, the subject can only be covert in the absence of the
complementiser. While this might appear to be related to that-trace effects, in that an overt
complementiser is followed by a covert element in subject position, it turns out that the two
phenomena are unrelated. To start, the covert subject in (25) is not a trace as none of these
examples involve movement: the subject of want is semantically related to this verb and
hence has not moved from another position. Chomsky and Lasnik propose that the covert
position is taken by a phonologically null pronoun, which they term PRO1. Another reason to
suspect that this observation is independent of the that-trace filter is the fact that there are
dialects of English in which (25d) is grammatical. But even in these dialects that-trace
violations are not allowed, demonstrating that the two have different sources.
The filter proposed by Chomsky and Lasnik is again rather descriptive in nature:
(26)
The for-to filter
* [for [__ to …
In other words this filter filters out structures in which to complementiser for is immediately
followed by the infinitival to in the surface string (i.e. there is no phonological material
between them). Despite this, it was the foundations of one of the most successful application
of filters. Shortly before the publication of their paper, Chomsky and Lasnik received a letter
from Jean-Roger Vergnaud commenting on a draft of the paper they had sent him. In this
letter Vergnaud proposed a different account of the data in (25) based on the idea that NPs
1
This analysis has its roots in another transformational process called Equi-NP deletion, which we will
introduce in a future lecture. Chomsky and Lasnik’s analysis was subsequently taken up and as such played a
role in the elimination of deletion type transformations as part of the programme of restricting their power.
8
Derivation Verses Representation
occupy positions in which they are assigned Case, even if this is not realised morphologically
in a language. The Case positions Vergnaud suggested are subject of a finite clause (which
Vergnaud called Subject Case, but is traditionally called Nominative) and positions following
verbs and prepositions (which Vergnaud calls Governed Case, but is traditionally
Accusative). To account for the for-to filter data, Vergnaud suggests that Case is relevant for
all NP, except the phonologically null one which appears in the subject position of infinitives.
The idea is that the complementiser preceding the infinitive forces the subject position to be
in the Governed Case and this is incompatible with a null subject.
Chomsky (1981) reworked Vergnaud’s informal account into a more formal theory which
made use of the following filter:
(27)
The Case Filter
* NP, where NP is overt and does not sit in a Case position
The more formal aspects of this theory involved the definition of Case positions, which we do
not need to go into here. Descriptively speaking they are the same positions as Vergnaud
proposed. This theory then accounts for the surface distribution of NPs in general including
observations about why some NPs undergo movements and others do not:
(28)
a
b
c
it seems [he is rich]
* it seems [him to be rich]
he1 seems [ t1 to be rich]
Note that when the complement clause is finite, its subject is not forced to move (28a), but
when it is non-finite it is (28b and c). As the subject position of a finite clause is a Case
position, the Case filter is satisfied by (28a), but the subject of a non-finite clause is only a
Case position if it is preceded by certain verbs (such as believe) or by the complementiser for.
Hence in (28b) the Case filter is violated and the sentence is ungrammatical. This is
obviously a far more general account which subsumes the effects of the for-to filter.
To conclude this section, I think it is worthwhile pointing out the importance of Chomsky and
Lasnik’s paper. Even though the specific filters they propose in this paper were highly
descriptive in nature, many of them laid the foundations of much of the work done in the
1980s which reached a level of explanation not until then achieved. Much of this later work
was also based on filters, though filters of a far more general kind than those originally
proposed.
4
Is Derivation Necessary?
Finally in this lecture we turn to an issue that the notion of filters gives rise to and which has
been the source of much disagreement since their proposals in the 1970s. As we have already
mentioned, although filters and constraints work in a different way and so it is not always
possible to replicate the effects of a particular constraint by a particular filter, and vice versa,
it is always possible to replicate the effects of a constraint by a grammar that does not make
use of constraints at all, but uses only filters. For example, consider the Complex NP
constraint, which claimed that no transformation could move any element out of a clause
contained in an NP. Such a constraint, being a constraint on transformations, is in effect a
constraint on which D- and S-structures can be paired. It follows then that no filter can do the
same job as filters only examine the S-structure and not the D-structure. However, with the
9
Mark Newson
development of the trace convention, certain aspects of D-structure come to be encoded in Sstructure, as the trace marks the original position of the moved element. Hence an S-structure
annotated with traces is like a D- and an S-structure collapsed into one. Because of this, it is
possible to come up with a filter which does the same job as the Complex NP constraint:
(29)
The Complex NP filter
* … XP1 … [NP … [S … t1 …
I present this only as a demonstration of what traces allow us to do rather than as a serious
proposal. The point is that traces by making D-structures visible at S-structure in many ways
negate the need for D-structures altogether.
One might have thought that if we do not need transformations and therefore derivations, it
would be simpler to do away with them. However, there are two main arguments that are
used by transformationalists to support the derivational approach. The first is the obvious
surface/descriptive nature of filters, which often merely translate observations into
grammatical mechanisms. As we have seen, this is not a necessary property of filters and
hence it remains to be shown whether or not filters can handle all grammatical phenomena in
a more explanatory way. The second argument is that in order to achieve full descriptive
power, non-derivational theories always have to adopt mechanisms that do the same sort of
things that transformations do. For example, suppose we generate structures with traces in
place rather than being inserted by a movement rule. There will have to be a set of filters
which ensure that any ‘moved’ element is always associated with a trace in the relevant
position. Transformationalists would claim then that the mechanisms that allow traces to be
generated, plus those which ensure their valid inclusion in a structure add up to an equivalent
mechanisms to a transformation and hence no real simplicity is gained.
Nonetheless, non-transformational theories have been proposed, starting from the end of the
1970s. An early example is Lexical Functional Grammar, which assumes two levels of
representing syntactic expressions: one, an f-structure, which is not a constituent structure,
but which represents semantic relationships between the elements of the expression, such as
what is the subject and the object of a given verb, and the other, a c-structure, which is a
simplified constituent structure which does not indicate movement at all. Another example is
Generalised Phrase Structure Grammar, which had various mechanisms for dealing with
apparently displaced elements, including slash categories, which we have mentioned a few
weeks ago.
More modern theories seem also to be divided along these lines. The Minimalist Programme,
which started in the early 1990s is a derivational theory, which assumes just one syntactic
level of representation, with transformations of a general kind being involved in the step by
strep building of that structure. It makes use of one very general filter: Full Interpretation,
which is satisfied only if every element in the structure is able to receive a valid interpretation
with respect to the position it is placed in. In Optimality Theory, which also started in the
early 1990s, grammaticality is determined solely by filters, only instead of defining
grammaticality by absolute conformity to the filters, the filters are used to determine which of
a number of possible structures is best and hence grammaticality is a relative rather than
absolute thing. In this way the filters are kept as general as possible as they do not necessarily
describe exact surface phenomena.
10
Derivation Verses Representation
The fact that both derivational and representational theories still survive is an indication that
the debate between the two has been inconclusive. In fact it is very difficult to argue in
theoretical terms between them and virtually impossible to distinguish between then on
empirical grounds. In this situation we are left with Chomsky’s original suggestion made in
1957, the theories have to be continually developed as explicitly as possible to see if anything
materialises which might argue for or against any one. We appear to be a long way off from
this position.
References
Bresnan, Joan W. 1970, On Complementisers: toward a syntactic theory of complement
types, Foundations of Language 6, 297-321.
Chomsky, Noam 1981 Lectures on Government and Binding, Foris, Dordrecht, Holland.
Chomsky, Noam and Howard Lasnik 1977 ‘Filters and Control’ Linguistic Inquiry 8, 425504.
Fiengo, Robert 1974, Semantic Conditions on Surface Structure, unpublished Doctoral
dissertation, MIT, Cambridge, Mass.
Leben, William 1973, Suprasegmental phonology, unpublished Doctoral dissertation, MIT,
Cambridge, Mass.
Perlmutter, D. 1971. Deep and surface constraints in syntax. Holt, Rinehart, and Winston,
New York.
Rosenbaum, Peter S. 1967, The grammar of English predicate complement constructions.
Cambridge, MA: MIT Press.
Ross, John R. 1972, Doubl-ing, in J. Kimball (Ed.), Syntax and semantics 1, 157-186, New
York: Seminar Press.
Vergnaud, Jean-Roger 1977 Letter to Noam Chomsky and Howard Lasnik,
http://norvin.dlp.mit.edu/~norvin/24.902/Vergnaud.pdf
11