Download Corpus Linguistics and Grammar Teaching

Corpus Linguistics and Grammar Teaching Douglas Biber & Susan Conrad Susan Conrad is Professor of Applied Linguistics at Portland State University. In addition to doing teacher training, she has taught ESL/EFL in South Korea, southern Africa, and numerous places in the U.S. Douglas Biber is Regents’ Professor of English (Applied Linguistics) at Northern Arizona University. His research efforts have focused on corpus linguistics, English grammar, and register variation (in English and crosslinguistic; synchronic and diachronic). Dr. Biber and Dr. Conrad have written several books, including the Longman Grammar of Spoken and Written English and Real Grammar. 1. W hy do grammar teachers need corpus-based studies? Similarly, the author must decide how much space to devote to each feature. At a more detailed level, the author must decide how to illustrate the targeted grammatical features in example sentences — e.g. what contexts to use and what specific words to use in the grammatical structure (e.g. what verbs to use to illustrate past tense). Speaking with English teachers and students throughout the world, we have discovered that most of them believe that the authors of their textbooks had some special source of information to help them write the book. This information — they believe — made clear to the textbook author what content to cover and how to cover it. With this source of information, the coverage in the book must be correct, useful, and realistic, right? All of these decisions have important implications: teachers and students assume that the grammatical features at the beginning of the book are easier, more basic, or more important, while omitted features are not important. Features that get many pages of coverage are assumed to be important and difficult to grasp. Perhaps most importantly, the contexts and example language are taken to be typical of natural discourse. Unfortunately, no special source of information for textbook writers exists. Authors’ intuition, anecdotal evidence, and traditions about what should be in a grammar book play major roles in determining the content of textbooks. This usually works just fine for basic descriptions of grammatical structure. The intuition of a native speaker or a couple examples is sufficient evidence for how to form accurate grammatical structures in English. For example, intuition or anecdotal evidence work well to tell which verbs are irregular, to describe the way to form perfect aspect verb phrases, or to list the relative pronouns that are possible in English. Unfortunately, decisions about the sequencing of material, typical contexts, and natural discourse are not served as well by intuition and anecdotal evidence as judgments of accuracy are. First, the intuition of even experienced teachers is not consistent. For example, when they rely purely on their intuition, teachers usually disagree over whether simple present tense or present progressive is more common in typical conversations and therefore deserving of more practice. However, as any experienced English teacher can attest, accurately describing how to form grammatical structures is only a small part of grammar teaching. And like teachers, textbook writers must make myriad decisions. An author must decide how to sequence the grammatical information: Which features should be presented in the first chapters versus features to discuss in later chapters versus features to exclude because there is no room to discuss them at all? In addition, it turns out that intuitions about typical language choices are often wrong. In many cases, we simply don’t notice the most typical grammatical features because they are so common. For example, most native speakers of English cannot identify the most common lexical verb used in conversation; we use this verb so frequently that we don’t even notice it. (We describe the use of this verb in Section 2 below; in the meantime, you can try to identify it on your own.) 1 c. 20 million words of text overall, with c. 4-5 million words from each of these four registers. All frequency counts reported below have been normalized to a common basis (a count per 1 million words of text), so that they are directly comparable across registers. Finally, to make matters worse, specialist books on teaching methods and materials development provide little guidance about making specific decisions for sequencing or for gathering information about language choices in natural discourse. As Byrd (1995) wrote, “often design decisions are based on traditions about grammar materials and their organization rather than on careful rethinking of either the content or its organization” (p. 46). In the LGSWE, corpus-based investigations of grammatical features were carried out using a variety of computational, interactive, and detailed ‘manual’ textual analyses. In all cases, the overarching concern was to achieve an accurate description of the distributional patterns of the target grammatical feature. Computational techniques made it feasible to analyze the patterns of use in a 20-million-word corpus. However, whenever automatic techniques produced skewed or inaccurate results, we shifted to interactive analyses or even manual analyses carried out on random sub-samples of texts from each register. The guiding principles were to achieve an accurate grammatical description efficiently, using whatever techniques were required for that purpose, based on the most representative sample of corpus texts that could reasonably be analyzed using those techniques. Studies from the field of second language acquisition can provide some guidance for textbook writers, for example about what grammar structures are acquired first by English language learners, and about kinds of activities that are likely to help learners begin the process of acquisition. But what about the content for grammar teaching — the questions about what forms are more common, what examples will best exemplify naturally occurring language, and what words are most frequent with grammatical structures? Answers to these kinds of questions have, in recent years, been coming from research that uses the tools and techniques of corpus linguistics to describe English grammar. Corpus linguistics uses computer-assisted analyses along with human interpretations of language functions to study the language patterns in a large “corpus” of texts (see Biber, Conrad, and Reppen, 1998, and McEnergy, Tono, Xiao, 2006 for useful introductions). The texts can be written or transcribed spoken texts. In most cases, a corpus is designed to represent particular “registers” (such as conversation, newspaper writing, or academic prose). As a result, grammatical studies based on corpora can describe differences across registers. Because the corpora are large, covering many different speakers and writers, it is possible to see what is typical for a large group of language users in various contexts. Thus, corpus-based research provides textbook writers and teachers with a new source of information — a data-based source, rather than intuition — to consider when making decisions. 2. Frequency Information The simplest kind of information available from corpus analysis is frequency: identifying the grammatical features that are especially common or rare. However, as the following two case studies show, even this kind of analysis often reveals surprising patterns of use that have important implications for grammar teaching. Progressive vs. Simple Verb Tenses One strongly held intuition about language use among many English-teaching professionals is that progressive aspect (the ‘present continuous’) is the most common choice in conversation. This belief is reflected in grammar textbooks, which typically cover the progressive in one of the very first chapters of lower-level books. Corpus-based studies of English grammar have proven to be especially useful for descriptions of language use. That is, they help us understand what speakers and writers actually do with the linguistic resources available in English. Three types of results are most important for grammar teaching: Corpus-based research has found that progressive aspect is in fact more common in conversation than in most written registers (Figure 1). The contrast with academic prose is especially noteworthy: progressive aspect is extremely rare in academic prose but more common in conversation. Progressive verb phrases are nearly as common in fiction as in conversation, and they are relatively common in news as well. • frequency information • register comparisons •associations between grammatical structures and words (lexico-grammar) FIG.1 We illustrate each of these areas with case studies taken from research conducted for the Longman Grammar of Spoken and Written English (LGSWE; Biber et al., 1999), discussing how such information is useful to textbook writers and teachers. The case studies are based on analysis of corpora from four registers: conversation, fiction, newspaper language, and academic prose. Although these are general registers, they differ in important ways from one another (e.g., with respect to mode, interactiveness, production circumstances, purpose, and target audience). The analyses were carried out on the Longman Spoken and Written English (LSWE) Corpus, which contains 2 However, when progressive aspect is compared to simple aspect in conversation, the corpus research shows that the intuition about progressives being typical is dramatically wrong. Simple aspect verb phrases are more than 20 times as common as progressives in conversation (Figure 2). This case study illustrates how textbook authors and teachers can be misled by intuitions, in this case believing that progressive verbs are much more prevalent than they in fact are. Unfortunately, misperceptions like this are passed along to our students through textbooks and other teaching materials that overemphasize the progressive, and promote over-use by students. Corpus-based findings are a strong antidote to misperceptions of this type. FIG.2 The most common lexical verbs in conversation Frequency information from corpus studies can also help materials writers decide what words to use as they give examples and writer exercises for grammatical structures. Even when they are focused on common, easy vocabulary, for example, materials writers have to choose from literally dozens of common lexical verbs in English. For example, nearly 400 different verb forms occur over 20 times per million words in the LSWE Corpus (see Biber et al 1999.370-1). These include many everyday verbs such as pull, throw, choose, fall, etc. Given this large inventory of relatively common verbs, it might be easy to assume that that no individual verbs stand out as being particularly frequent. However, this is not at all the case: there are only 63 lexical verbs that occur more than 500 times per million words in a register, and only 12 verbs occur more than 1,000 times per million words in the LSWE Corpus (Biber et al 1999.367-378). These 12 most common verbs are: say, get, go, know, think, see, make, come, take, want, give, mean. The following excerpt illustrates the normal reliance on simple aspect in natural conversation: [conversation among friends at a party, talking about movies] Michelle:Do you guys want more sour cream chips or do you want Doritos? John: I want pretzels. [pause] Thanks. Sheila:This is really good. [pause] What the hell was that movie? John: You can’t remember the name of it? Michelle: Was it an older movie or a new one? Sheila:No, not very recent, within the last three years I’d say. [pause] This guy is an American. He goes — his dad had gone to the United States — he actually gets back into Germany and he works on a train during World War Two. He gets embroiled in this family’s mess. [pause] Oh, it was really good — I wish I could remember — I’ll think of it. John: You don’t remember who was in it? Sheila: No, nobody recognizable. To give an indication of the importance of these 12 verbs, Figure 3 plots their combined frequency compared to the overall frequency of all other verbs. Taken as a group, these 12 verbs are especially important in conversation, where they account for almost 45% of the occurrences of all lexical verbs! FIG.3 In contrast, progressive aspect is much less frequent and used for special effects, usually focusing on the fact that an event is in progress or about to take place, for example: What’s she doing? But she’s coming back tomorrow. There are also some large differences among the frequencies of these 12 verbs. Figure 4 plots the frequencies for these verbs in the LSWE conversation corpus. Some of these very common verbs are not surprising. For instance, most people are not surprised that say is very frequent in conversation, given how often the speech of others is reported. The verb say is actually common in both conversations and in written texts such as newspapers, for example: Another special effect of progressives occurs with non-dynamic verbs, when the progressive can refer to a temporary state that exists over a period of time, as in: I was looking at that one just now. You should be wondering why. We were waiting for the train. 3 that they can express. Thus, students are likely to need extensive exposure to those verbs. In the past, authors have been forced to rely on intuitions regarding the typical patterns of language use, resulting in a skewed coverage of language patterns. However, with the availability of the frequency information that can be obtained from corpus analyses, it is now possible to ensure coverage of the most common words and grammatical patterns, as well as coverage of vocabulary breadth. You said you didn’t have it. (Conversation) He said this campaign raised ‘doubts about the authenticity of free choice’. (Newspaper) FIG.4 3. A ssociations between Grammar and Words The usefulness of frequency information for materials writers extends beyond simple counts of frequencies of grammatical structures or vocabulary, as illustrated in the last section. Corpus-based research has consistently found that there are actually strong associations between grammatical structures and the words used with them. In other words, not every word is equally likely to occur in a given grammatical structure. It makes sense, then, for materials writers to give explanations and examples that reflect the typical associations and give students practice with them. But the extremely high frequency of the verb get in conversation is more surprising for most people. This verb goes largely unnoticed, yet in conversation it is by far the single most common lexical verb. The main reason that get is so common is that it is extremely versatile, being used with a wide range of meanings. For example, the last section discussed the rarity of progressive aspect in conversation. However, it is the case that there are a few verbs usually are in the progressive aspect when they are used. These verbs include bleeding, chasing, shopping, starving, joking, kidding, and moaning. Explanations, examples, and activities with progressives thus need to include these verbs. As noted in the last section, this is not to say that expanding students’ vocabularies is not useful; rather, it doesn’t make sense neglect the most common words used in a grammatical structure even when also expanding vocabulary. These include: Obtaining something: See if they can get some of that beer. Possession: They’ve got a big house. Moving to or away from something: Get in the car. Causing something to move or happen: It gets people talking again, right? Understanding something: Do you get it? Changing to a new state: So I’m getting that way now. Another illustration of the usefulness of corpus studies’ lexico-grammatical findings can be seen with verb + gerund and verb + infinitive constructions. Teachers and students have long been plagued by long lists of verbs that take gerunds and other lists of verbs that take infinitives. The lists are, in fact, so long that — while they are useful for reference — they can be overwhelming for students and teachers. You probably will not be surprised at this point to learn, however, that not all the possible combinations are frequent. For example, in the case of verb + to infinitive, only four verbs are especially common in both speech and writing (occurring more than 200 times per million words): want, try, seem, and like. In addition, begin to is very frequent in fiction writing, while tend to and appear to are common in academic writing. Several other verbs are also extremely common in conversation: go, know, and to a lesser extent, think, see, come, want, and mean. News, on the other hand, shows a quite different pattern, with only the verb say being extremely frequent. However, it should be noted that all 12 of these verbs are notably common in both conversation and written newspaper language when compared to most verbs in English. An important function of grammar textbooks is the introduction of new vocabulary, and frequency has an important guiding role. Frequent words will be more useful to students receptively and in production, in a wider range of circumstances, whereas relatively rare words will prove less useful. Other relatively common verbs fit the same kinds of meaning, so that the most common verb + infinitive pairs can be grouped into general meaning categories: The point here is not to argue against inclusion of a wide range of vocabulary. Relatively rare words can be very useful for learners to know. However, there is no reason why relatively rare words should be illustrated to the exclusion of common words. High-frequency verbs are difficult to learn because of the wide range of meanings want/need verbs: hope, like, need, want, want NP, wish effort verbs: attempt, fail, manage, try begin/continue verbs: begin, continue, start “seem” verbs: appear, seem, tend 4 A useful way to make the verb + infinitive materials more manageable and the practice more focused is to concentrate on these four categories of meaning, beginning with the seven most common verbs and then expanding to these other relatively common verbs with related meanings. FIG.5 4. Register Comparisons Another of the most important general findings in corpus-based studies of grammar is the importance of register variation. It turns out that most descriptions of common grammatical features and their use are not valid for English overall. Rather, strong patterns of use in one register often represent only weak patterns in other registers. And even when there are similarities, there are still often differences across registers. For example, in fiction writing and newspaper writing, the verb say is much more frequent than any other lexical verb; in conversation, the verbs go and know are as frequent as say, and get is more frequent than any of those three verbs; while in academic writing, the only especially frequent verb is BE. However, a very different pattern of use is found in the written registers. The pattern in newspaper writing is especially noteworthy: Nouns as premodifiers are extremely frequent and nearly as common as adjectives, whereas participial forms are surprisingly rare. It might be argued that the grammar of nouns as premodifiers is somehow simpler than that of adjectives or participial forms, and therefore they require little overt attention. However, in actual fact, premodifying nouns can express a bewildering array of meanings, with no surfacelevel clues to guide the reader. For example, consider the relationships between the modifying noun (N1) and the head noun (N2) in the following pairs: This example illustrates the most important kind of information for teachers and materials writers: the grammatical distinctions between conversation and informational writing. The typical grammar of conversation is radically different from the typical grammar of informational writing. The two registers differ dramatically in the grammatical features they most commonly use. But even when the two registers use the same grammatical constructions, they often do so in very different ways and associated with different sets of words. The following two case studies illustrate these patterns: The case of noun premodifiers provides an illustrative example of the way conversation and writing use different grammatical structures. The case of the definite article illustrates how they use the same structure in different ways. glass windows, metal seat, tomato sauce (N2 is made from N1) pencil case, brandy bottle, patrol car (N2 is used for the purpose of N1) sex magazine, sports diary (N2 is about N1) farmyard manure, computer printout (N2 comes from N1) Noun premodifiers in conversation and writing In most textbooks, adjectives are typically characterized as words that describe something, and books usually include extensive coverage of attributive adjectives as the major grammatical device used for noun modification (e.g., the big house). Most textbooks also describe the adjectival role of -ing and -ed participles (e.g., an exciting game, an interested couple). In contrast, the adjectival role of nouns (e.g., a grammar lesson) is less commonly acknowledged in textbooks. This difference seems to reflect a widely held belief that adjectives and participial adjectives are the primary devices used for noun modification, whereas nouns are considered to be much less important as premodifiers of another noun. Here again, corpus-based analysis provides a very different picture. summer rains, Paris conference (N1 gives the time or location of N2) These are only a few of the many different meaning relations found with nouns as premodifiers (see Biber et al., 1999, pp. 589–591). These forms are potentially difficult to understand in addition to being extremely frequent in written registers. Students at intermediate and advanced levels are likely to need greater exposure to these commonly encountered forms than comparatively rare forms like participial adjectives, and especially at advanced levels, they are likely to need practice producing them appropriately for the concise, condensed writing expected in the informational writing of many disciplines. Figure 5 presents the frequencies of adjectives, participial adjectives, and nouns as nominal premodifiers, comparing their use in conversation versus newspaper writing. In conversation, adjectives are the primary device used for noun modification (although most noun phrases in conversation do not include modifiers). If conversational English is the primary target for a textbook, an exclusive focus on adjectives seems justified. The Definite Article The definite article (the) occurs fairly commonly in both speaking and writing, but it is used for different functions in the two registers. In general, definite articles are required in English for one of four reasons: 5 The noun was introduced previously in the text: The differences between registers can be crucially important for both comprehension (listening versus academic reading) and production (conversational skills versus academic writing). In this case, corpus research provides the essential information required to support ESP/EAP approaches to teaching. However, even in more specialized approaches, it is rare that students need to be taught just one register; rather, they need to be made aware of register variation. Even within EAP programs, for example, students must learn to comprehend and use grammar appropriately in conversation to interact with English-speaking friends and participate in group work, while the grammar of academic writing is required for reading course texts and writing academic papers. A 12-year-old boy got mad at his parents Friday night <…> and drove off in one of his parent’s four cars <…> The boy was found unharmed <…> Shared situational context specifies the noun: Bob, put the dog out, would you please? Modifiers of the noun specify the noun: The introduction [of technology into teaching] should include support and training. The specific noun can be inferred from earlier discourse: 5. P utting it all together: Integrating frequency, lexico-grammar, and register information …an old pale blue Ford rattled into view. The driver swung wide around my car. By using information from corpus research — frequencies, word associations, and analysis of register differences — materials developers and teachers can increase the meaningful input that is provided to learners. This is not to say that corpus research provides all the answers. Other factors are equally important — for example, some grammatical topics are required as building blocks for later topics; some grammatical topics are more difficult and therefore require more practice than others. In many cases, though, pedagogical decisions are made based on little empirical evidence. Teachers and authors all share the same goals of presenting the typical and most important patterns first, moving on to more specialized topics in later chapters and more advanced books. However, lacking empirical studies, authors have been forced to rely on their intuitions for these judgments, and widely accepted traditions have arisen to support those intuitions. Definite articles are used for other reasons, including idioms and generic reference, but these four functions are the most common. Corpus-based research has found that the single most common reason for use of the definite article differs between informational writing and conversation (Table 1). In conversation, definite articles are usually used when there is a shared situational context accounts (55% of all occurrences), while modifiers of the noun account for only about 5%. In informational writing modifiers of the noun are far more likely to be the reason for the use of the definite article; they account for 30-40% of all occurrences. Reason for Definite Conversation Informational Article Use Writing introduced previous in text 25% 25-30% shared situational context 55% 10% modifiers of the noun 5% 30-40% inference 5% 15% 10% 10% other With the rise of corpus-based analyses, we are beginning to see empirical descriptions of language use, identifying the patterns that are actually frequent (or not) and documenting the differential reliance on specific forms and words in different registers. In some cases, our intuitions as authors have turned out to be correct; in many other cases, we have been wrong. For those latter cases, revising pedagogy to reflect actual use, as shown by frequency studies, can result in radical changes that facilitate the learning process for students. Table 1. Most Common Reasons for Use of the Definite Article In both registers, about 25% of all definite articles occur when a noun is mentioned previously in the text. Many learners of English know this rule. Similarly, many learners know the rule that the definite article is used when speakers are both familiar with the specific noun being referred to. What is important from the corpus-based study is that the presence of a noun modifier is such an important reason for definite article use in informational writing. The modifiers can be before the noun (such as in the most affordable homes) but they more often come after the noun, in prepositional phrases or relative clauses. These modifiers rarely contain superlatives that make it easy to identify a referent as unique, so the choice of definite article is difficult for many learners to understand. Few textbooks, however, call attention to these structures for students or provide practice with them. © Copyright by Pearson Education REFERENCES Biber, D., S. Conrad, and R. Reppen. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press. Biber, D., S. Johansson, G. Leech, S. Conrad, & E. Finegan. (1999). Longman grammar of spoken and written english. London: Longman. Biber, D. and R. Reppen. 2002. What does frequency have to do with grammar teaching? Studies in Second Language Acquisition 24.199-208. Byrd, P. (1995). Issues in the writing of grammar textbooks. In P. Byrd (Ed.), Materials writer’s guide (pp. 45–63). Boston: Heinle & Heinle. McEnery, T., R. Xiao, and Y. Tono. (2006) Corpus-based language studies: An advanced resource book. London: Routledge. 6 0-13-210552-7 978-0-13-210552-1

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Corpus Linguistics and Grammar Teaching