Download Corpus Linguistics and Grammar Teaching

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Chinese grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Germanic strong verb wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Germanic weak verb wikipedia , lookup

Macedonian grammar wikipedia , lookup

Arabic grammar wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Old Norse morphology wikipedia , lookup

Old Irish grammar wikipedia , lookup

Ukrainian grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Inflection wikipedia , lookup

Spanish grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

French grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Hungarian verbs wikipedia , lookup

Icelandic grammar wikipedia , lookup

Japanese grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Turkish grammar wikipedia , lookup

Kagoshima verb conjugations wikipedia , lookup

Malay grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Russian grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
Corpus Linguistics and
Grammar Teaching
Douglas Biber & Susan Conrad
Susan Conrad is Professor of Applied Linguistics
at Portland State University. In addition to
doing teacher training, she has taught ESL/EFL
in South Korea, southern Africa, and numerous
places in the U.S.
Douglas Biber is Regents’ Professor of English
(Applied Linguistics) at Northern Arizona
University. His research efforts have focused
on corpus linguistics, English grammar,
and register variation (in English and crosslinguistic; synchronic and diachronic).
Dr. Biber and Dr. Conrad have written several books, including the Longman Grammar
of Spoken and Written English and Real Grammar.
1. W
hy do grammar teachers need
corpus-based studies?
Similarly, the author must decide how much space to
devote to each feature. At a more detailed level, the author
must decide how to illustrate the targeted grammatical
features in example sentences — e.g. what contexts to use
and what specific words to use in the grammatical structure
(e.g. what verbs to use to illustrate past tense).
Speaking with English teachers and students throughout
the world, we have discovered that most of them believe
that the authors of their textbooks had some special source
of information to help them write the book. This information
— they believe — made clear to the textbook author what
content to cover and how to cover it. With this source of
information, the coverage in the book must be correct,
useful, and realistic, right?
All of these decisions have important implications:
teachers and students assume that the grammatical features
at the beginning of the book are easier, more basic, or
more important, while omitted features are not important.
Features that get many pages of coverage are assumed to be
important and difficult to grasp. Perhaps most importantly,
the contexts and example language are taken to be typical
of natural discourse.
Unfortunately, no special source of information for
textbook writers exists. Authors’ intuition, anecdotal
evidence, and traditions about what should be in a grammar
book play major roles in determining the content of
textbooks. This usually works just fine for basic descriptions
of grammatical structure. The intuition of a native speaker
or a couple examples is sufficient evidence for how to form
accurate grammatical structures in English. For example,
intuition or anecdotal evidence work well to tell which
verbs are irregular, to describe the way to form perfect
aspect verb phrases, or to list the relative pronouns that
are possible in English.
Unfortunately, decisions about the sequencing of
material, typical contexts, and natural discourse are not
served as well by intuition and anecdotal evidence as
judgments of accuracy are. First, the intuition of even
experienced teachers is not consistent. For example, when
they rely purely on their intuition, teachers usually disagree
over whether simple present tense or present progressive
is more common in typical conversations and therefore
deserving of more practice.
However, as any experienced English teacher can attest,
accurately describing how to form grammatical structures
is only a small part of grammar teaching. And like teachers,
textbook writers must make myriad decisions. An author
must decide how to sequence the grammatical information:
Which features should be presented in the first chapters
versus features to discuss in later chapters versus features
to exclude because there is no room to discuss them at all?
In addition, it turns out that intuitions about typical
language choices are often wrong. In many cases, we simply
don’t notice the most typical grammatical features because
they are so common. For example, most native speakers of
English cannot identify the most common lexical verb used
in conversation; we use this verb so frequently that we don’t
even notice it. (We describe the use of this verb in Section 2
below; in the meantime, you can try to identify it on your own.)
1
c. 20 million words of text overall, with c. 4-5 million
words from each of these four registers. All frequency counts
reported below have been normalized to a common basis
(a count per 1 million words of text), so that they are
directly comparable across registers.
Finally, to make matters worse, specialist books on
teaching methods and materials development provide little
guidance about making specific decisions for sequencing
or for gathering information about language choices in
natural discourse. As Byrd (1995) wrote, “often design
decisions are based on traditions about grammar materials
and their organization rather than on careful rethinking of
either the content or its organization” (p. 46).
In the LGSWE, corpus-based investigations of
grammatical features were carried out using a variety of
computational, interactive, and detailed ‘manual’ textual
analyses. In all cases, the overarching concern was to
achieve an accurate description of the distributional
patterns of the target grammatical feature. Computational
techniques made it feasible to analyze the patterns of
use in a 20-million-word corpus. However, whenever
automatic techniques produced skewed or inaccurate
results, we shifted to interactive analyses or even manual
analyses carried out on random sub-samples of texts from
each register. The guiding principles were to achieve an
accurate grammatical description efficiently, using whatever
techniques were required for that purpose, based on the
most representative sample of corpus texts that could
reasonably be analyzed using those techniques.
Studies from the field of second language acquisition
can provide some guidance for textbook writers, for
example about what grammar structures are acquired
first by English language learners, and about kinds of
activities that are likely to help learners begin the process
of acquisition. But what about the content for grammar
teaching — the questions about what forms are more
common, what examples will best exemplify naturally
occurring language, and what words are most frequent
with grammatical structures? Answers to these kinds of
questions have, in recent years, been coming from research
that uses the tools and techniques of corpus linguistics to
describe English grammar.
Corpus linguistics uses computer-assisted analyses
along with human interpretations of language functions
to study the language patterns in a large “corpus” of texts
(see Biber, Conrad, and Reppen, 1998, and McEnergy,
Tono, Xiao, 2006 for useful introductions). The texts can
be written or transcribed spoken texts. In most cases, a
corpus is designed to represent particular “registers” (such
as conversation, newspaper writing, or academic prose).
As a result, grammatical studies based on corpora can
describe differences across registers. Because the corpora
are large, covering many different speakers and writers, it is
possible to see what is typical for a large group of language
users in various contexts. Thus, corpus-based research
provides textbook writers and teachers with a new source
of information — a data-based source, rather than intuition
— to consider when making decisions.
2. Frequency Information
The simplest kind of information available from corpus
analysis is frequency: identifying the grammatical features
that are especially common or rare. However, as the
following two case studies show, even this kind of analysis
often reveals surprising patterns of use that have important
implications for grammar teaching.
Progressive vs. Simple Verb Tenses
One strongly held intuition about language use among
many English-teaching professionals is that progressive
aspect (the ‘present continuous’) is the most common
choice in conversation. This belief is reflected in grammar
textbooks, which typically cover the progressive in one of
the very first chapters of lower-level books.
Corpus-based studies of English grammar have proven
to be especially useful for descriptions of language use.
That is, they help us understand what speakers and
writers actually do with the linguistic resources available
in English. Three types of results are most important for
grammar teaching:
Corpus-based research has found that progressive aspect
is in fact more common in conversation than in most
written registers (Figure 1). The contrast with academic
prose is especially noteworthy: progressive aspect is extremely
rare in academic prose but more common in conversation.
Progressive verb phrases are nearly as common in fiction
as in conversation, and they are relatively common in news
as well.
• frequency information
• register comparisons
•associations between grammatical structures and
words (lexico-grammar)
FIG.1
We illustrate each of these areas with case studies taken
from research conducted for the Longman Grammar of
Spoken and Written English (LGSWE; Biber et al., 1999),
discussing how such information is useful to textbook
writers and teachers. The case studies are based on analysis
of corpora from four registers: conversation, fiction,
newspaper language, and academic prose. Although these
are general registers, they differ in important ways from
one another (e.g., with respect to mode, interactiveness,
production circumstances, purpose, and target audience).
The analyses were carried out on the Longman Spoken
and Written English (LSWE) Corpus, which contains
2
However, when progressive aspect is compared to simple
aspect in conversation, the corpus research shows that the
intuition about progressives being typical is dramatically
wrong. Simple aspect verb phrases are more than 20 times
as common as progressives in conversation (Figure 2).
This case study illustrates how textbook authors and
teachers can be misled by intuitions, in this case believing
that progressive verbs are much more prevalent than they
in fact are. Unfortunately, misperceptions like this are
passed along to our students through textbooks and other
teaching materials that overemphasize the progressive, and
promote over-use by students. Corpus-based findings are a
strong antidote to misperceptions of this type.
FIG.2
The most common lexical verbs in conversation
Frequency information from corpus studies can also
help materials writers decide what words to use as they
give examples and writer exercises for grammatical
structures. Even when they are focused on common, easy
vocabulary, for example, materials writers have to choose
from literally dozens of common lexical verbs in English.
For example, nearly 400 different verb forms occur over
20 times per million words in the LSWE Corpus (see Biber
et al 1999.370-1). These include many everyday verbs such
as pull, throw, choose, fall, etc.
Given this large inventory of relatively common verbs,
it might be easy to assume that that no individual verbs
stand out as being particularly frequent. However, this is
not at all the case: there are only 63 lexical verbs that occur
more than 500 times per million words in a register, and
only 12 verbs occur more than 1,000 times per million
words in the LSWE Corpus (Biber et al 1999.367-378).
These 12 most common verbs are: say, get, go, know, think,
see, make, come, take, want, give, mean.
The following excerpt illustrates the normal reliance on simple
aspect in natural conversation:
[conversation among friends at a party, talking about movies]
Michelle:Do you guys want more sour cream chips or
do you want Doritos?
John:
I want pretzels. [pause] Thanks.
Sheila:This is really good. [pause] What the hell
was that movie?
John:
You can’t remember the name of it?
Michelle: Was it an older movie or a new one?
Sheila:No, not very recent, within the last three
years I’d say. [pause] This guy is an American.
He goes — his dad had gone to the United
States — he actually gets back into Germany
and he works on a train during World War
Two. He gets embroiled in this family’s mess.
[pause] Oh, it was really good — I wish I
could remember — I’ll think of it.
John:
You don’t remember who was in it?
Sheila:
No, nobody recognizable.
To give an indication of the importance of these 12 verbs,
Figure 3 plots their combined frequency compared to the
overall frequency of all other verbs. Taken as a group, these
12 verbs are especially important in conversation, where
they account for almost 45% of the occurrences of all
lexical verbs!
FIG.3
In contrast, progressive aspect is much less frequent and
used for special effects, usually focusing on the fact that an
event is in progress or about to take place, for example:
What’s she doing?
But she’s coming back tomorrow.
There are also some large differences among the
frequencies of these 12 verbs. Figure 4 plots the frequencies
for these verbs in the LSWE conversation corpus. Some of
these very common verbs are not surprising. For instance,
most people are not surprised that say is very frequent
in conversation, given how often the speech of others
is reported. The verb say is actually common in both
conversations and in written texts such as newspapers,
for example:
Another special effect of progressives occurs with
non-dynamic verbs, when the progressive can refer to a
temporary state that exists over a period of time, as in:
I was looking at that one just now.
You should be wondering why.
We were waiting for the train.
3
that they can express. Thus, students are likely to need
extensive exposure to those verbs. In the past, authors
have been forced to rely on intuitions regarding the typical
patterns of language use, resulting in a skewed coverage of
language patterns. However, with the availability of the
frequency information that can be obtained from corpus
analyses, it is now possible to ensure coverage of the most
common words and grammatical patterns, as well as
coverage of vocabulary breadth.
You said you didn’t have it. (Conversation)
He said this campaign raised ‘doubts about the
authenticity of free choice’. (Newspaper)
FIG.4
3. A
ssociations between Grammar
and Words
The usefulness of frequency information for materials
writers extends beyond simple counts of frequencies of
grammatical structures or vocabulary, as illustrated in
the last section. Corpus-based research has consistently
found that there are actually strong associations between
grammatical structures and the words used with them. In
other words, not every word is equally likely to occur in
a given grammatical structure. It makes sense, then, for
materials writers to give explanations and examples that
reflect the typical associations and give students practice
with them.
But the extremely high frequency of the verb get in
conversation is more surprising for most people. This verb
goes largely unnoticed, yet in conversation it is by far the
single most common lexical verb. The main reason that
get is so common is that it is extremely versatile, being
used with a wide range of meanings.
For example, the last section discussed the rarity of
progressive aspect in conversation. However, it is the case
that there are a few verbs usually are in the progressive
aspect when they are used. These verbs include bleeding,
chasing, shopping, starving, joking, kidding, and moaning.
Explanations, examples, and activities with progressives
thus need to include these verbs. As noted in the last
section, this is not to say that expanding students’
vocabularies is not useful; rather, it doesn’t make sense
neglect the most common words used in a grammatical
structure even when also expanding vocabulary.
These include:
Obtaining something: See if they can get some of
that beer.
Possession: They’ve got a big house.
Moving to or away from something: Get in the car.
Causing something to move or happen: It gets people
talking again, right?
Understanding something: Do you get it?
Changing to a new state: So I’m getting that way now.
Another illustration of the usefulness of corpus studies’
lexico-grammatical findings can be seen with verb + gerund
and verb + infinitive constructions. Teachers and students
have long been plagued by long lists of verbs that take
gerunds and other lists of verbs that take infinitives. The
lists are, in fact, so long that — while they are useful for
reference — they can be overwhelming for students and
teachers. You probably will not be surprised at this point
to learn, however, that not all the possible combinations
are frequent. For example, in the case of verb + to infinitive,
only four verbs are especially common in both speech
and writing (occurring more than 200 times per million
words): want, try, seem, and like. In addition, begin to is
very frequent in fiction writing, while tend to and appear
to are common in academic writing.
Several other verbs are also extremely common in
conversation: go, know, and to a lesser extent, think, see,
come, want, and mean. News, on the other hand, shows
a quite different pattern, with only the verb say being
extremely frequent. However, it should be noted that all
12 of these verbs are notably common in both conversation
and written newspaper language when compared to most
verbs in English.
An important function of grammar textbooks is the
introduction of new vocabulary, and frequency has an
important guiding role. Frequent words will be more useful
to students receptively and in production, in a wider range
of circumstances, whereas relatively rare words will prove
less useful.
Other relatively common verbs fit the same kinds of
meaning, so that the most common verb + infinitive pairs
can be grouped into general meaning categories:
The point here is not to argue against inclusion of a
wide range of vocabulary. Relatively rare words can be very
useful for learners to know. However, there is no reason
why relatively rare words should be illustrated to the
exclusion of common words. High-frequency verbs are
difficult to learn because of the wide range of meanings
want/need verbs: hope, like, need, want, want NP, wish
effort verbs: attempt, fail, manage, try
begin/continue verbs: begin, continue, start
“seem” verbs: appear, seem, tend
4
A useful way to make the verb + infinitive materials
more manageable and the practice more focused is to
concentrate on these four categories of meaning, beginning
with the seven most common verbs and then expanding to
these other relatively common verbs with related meanings.
FIG.5
4. Register Comparisons
Another of the most important general findings in
corpus-based studies of grammar is the importance of
register variation. It turns out that most descriptions
of common grammatical features and their use are not
valid for English overall. Rather, strong patterns of use in
one register often represent only weak patterns in other
registers. And even when there are similarities, there are
still often differences across registers. For example, in
fiction writing and newspaper writing, the verb say is much
more frequent than any other lexical verb; in conversation,
the verbs go and know are as frequent as say, and get is
more frequent than any of those three verbs; while in
academic writing, the only especially frequent verb is BE.
However, a very different pattern of use is found in
the written registers. The pattern in newspaper writing is
especially noteworthy: Nouns as premodifiers are extremely
frequent and nearly as common as adjectives, whereas
participial forms are surprisingly rare.
It might be argued that the grammar of nouns as
premodifiers is somehow simpler than that of adjectives
or participial forms, and therefore they require little overt
attention. However, in actual fact, premodifying nouns can
express a bewildering array of meanings, with no surfacelevel clues to guide the reader. For example, consider the
relationships between the modifying noun (N1) and the
head noun (N2) in the following pairs:
This example illustrates the most important kind
of information for teachers and materials writers: the
grammatical distinctions between conversation and
informational writing. The typical grammar of conversation
is radically different from the typical grammar of
informational writing. The two registers differ dramatically
in the grammatical features they most commonly use. But
even when the two registers use the same grammatical
constructions, they often do so in very different ways and
associated with different sets of words. The following two
case studies illustrate these patterns: The case of noun
premodifiers provides an illustrative example of the
way conversation and writing use different grammatical
structures. The case of the definite article illustrates how
they use the same structure in different ways.
glass windows, metal seat, tomato sauce (N2 is made
from N1)
pencil case, brandy bottle, patrol car (N2 is used for the
purpose of N1)
sex magazine, sports diary (N2 is about N1)
farmyard manure, computer printout (N2 comes from N1)
Noun premodifiers in conversation and writing
In most textbooks, adjectives are typically characterized
as words that describe something, and books usually
include extensive coverage of attributive adjectives as the
major grammatical device used for noun modification (e.g.,
the big house). Most textbooks also describe the adjectival
role of -ing and -ed participles (e.g., an exciting game, an
interested couple). In contrast, the adjectival role of nouns
(e.g., a grammar lesson) is less commonly acknowledged
in textbooks. This difference seems to reflect a widely
held belief that adjectives and participial adjectives are
the primary devices used for noun modification, whereas
nouns are considered to be much less important as
premodifiers of another noun. Here again, corpus-based
analysis provides a very different picture.
summer rains, Paris conference (N1 gives the time or
location of N2)
These are only a few of the many different meaning
relations found with nouns as premodifiers (see Biber
et al., 1999, pp. 589–591). These forms are potentially
difficult to understand in addition to being extremely
frequent in written registers. Students at intermediate and
advanced levels are likely to need greater exposure to these
commonly encountered forms than comparatively rare
forms like participial adjectives, and especially at advanced
levels, they are likely to need practice producing them
appropriately for the concise, condensed writing expected
in the informational writing of many disciplines.
Figure 5 presents the frequencies of adjectives, participial
adjectives, and nouns as nominal premodifiers, comparing
their use in conversation versus newspaper writing. In
conversation, adjectives are the primary device used for
noun modification (although most noun phrases in
conversation do not include modifiers). If conversational
English is the primary target for a textbook, an exclusive
focus on adjectives seems justified.
The Definite Article
The definite article (the) occurs fairly commonly in both
speaking and writing, but it is used for different functions
in the two registers. In general, definite articles are required
in English for one of four reasons:
5
The noun was introduced previously in the text:
The differences between registers can be crucially
important for both comprehension (listening versus
academic reading) and production (conversational skills
versus academic writing). In this case, corpus research
provides the essential information required to support
ESP/EAP approaches to teaching. However, even in more
specialized approaches, it is rare that students need to
be taught just one register; rather, they need to be made
aware of register variation. Even within EAP programs,
for example, students must learn to comprehend and use
grammar appropriately in conversation to interact with
English-speaking friends and participate in group work,
while the grammar of academic writing is required for
reading course texts and writing academic papers.
A 12-year-old boy got mad at his parents Friday
night <…> and drove off in one of his parent’s four cars
<…> The boy was found unharmed <…>
Shared situational context specifies the noun:
Bob, put the dog out, would you please?
Modifiers of the noun specify the noun:
The introduction [of technology into teaching] should
include support and training.
The specific noun can be inferred from earlier discourse:
5. P
utting it all together: Integrating
frequency, lexico-grammar, and
register information
…an old pale blue Ford rattled into view. The driver
swung wide around my car.
By using information from corpus research — frequencies,
word associations, and analysis of register differences —
materials developers and teachers can increase the meaningful
input that is provided to learners. This is not to say that corpus
research provides all the answers. Other factors are equally
important — for example, some grammatical topics are
required as building blocks for later topics; some grammatical
topics are more difficult and therefore require more practice
than others. In many cases, though, pedagogical decisions are
made based on little empirical evidence. Teachers and authors
all share the same goals of presenting the typical and most
important patterns first, moving on to more specialized topics
in later chapters and more advanced books. However, lacking
empirical studies, authors have been forced to rely on their
intuitions for these judgments, and widely accepted traditions
have arisen to support those intuitions.
Definite articles are used for other reasons, including
idioms and generic reference, but these four functions are
the most common.
Corpus-based research has found that the single most
common reason for use of the definite article differs
between informational writing and conversation (Table 1).
In conversation, definite articles are usually used when
there is a shared situational context accounts (55% of all
occurrences), while modifiers of the noun account for
only about 5%. In informational writing modifiers of the
noun are far more likely to be the reason for the use of the
definite article; they account for 30-40% of all occurrences.
Reason for Definite
Conversation Informational
Article Use
Writing
introduced previous in text
25%
25-30%
shared situational context
55%
10%
modifiers of the noun
5%
30-40%
inference
5%
15%
10%
10%
other
With the rise of corpus-based analyses, we are beginning
to see empirical descriptions of language use, identifying the
patterns that are actually frequent (or not) and documenting
the differential reliance on specific forms and words in
different registers. In some cases, our intuitions as authors
have turned out to be correct; in many other cases, we have
been wrong. For those latter cases, revising pedagogy to reflect
actual use, as shown by frequency studies, can result in radical
changes that facilitate the learning process for students.
Table 1. Most Common Reasons for Use of the Definite Article
In both registers, about 25% of all definite articles
occur when a noun is mentioned previously in the text.
Many learners of English know this rule. Similarly, many
learners know the rule that the definite article is used when
speakers are both familiar with the specific noun being
referred to. What is important from the corpus-based
study is that the presence of a noun modifier is such an
important reason for definite article use in informational
writing. The modifiers can be before the noun (such as in
the most affordable homes) but they more often come
after the noun, in prepositional phrases or relative clauses.
These modifiers rarely contain superlatives that make
it easy to identify a referent as unique, so the choice of
definite article is difficult for many learners to understand.
Few textbooks, however, call attention to these structures
for students or provide practice with them.
© Copyright by Pearson Education
REFERENCES
Biber, D., S. Conrad, and R. Reppen. (1998). Corpus linguistics:
Investigating language structure and use. Cambridge: Cambridge
University Press.
Biber, D., S. Johansson, G. Leech, S. Conrad, & E. Finegan. (1999).
Longman grammar of spoken and written english. London: Longman.
Biber, D. and R. Reppen. 2002. What does frequency have to do with
grammar teaching? Studies in Second Language Acquisition 24.199-208.
Byrd, P. (1995). Issues in the writing of grammar textbooks. In P. Byrd
(Ed.), Materials writer’s guide (pp. 45–63). Boston: Heinle & Heinle.
McEnery, T., R. Xiao, and Y. Tono. (2006) Corpus-based language
studies: An advanced resource book. London: Routledge.
6
0-13-210552-7 978-0-13-210552-1