* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download doc - Montclair State University
Old Irish grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Chinese grammar wikipedia , lookup
Japanese grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Swedish grammar wikipedia , lookup
Comparison (grammar) wikipedia , lookup
Portuguese grammar wikipedia , lookup
Junction Grammar wikipedia , lookup
French grammar wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Macedonian grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Agglutination wikipedia , lookup
Honorific speech in Japanese wikipedia , lookup
Spanish grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
Lithuanian grammar wikipedia , lookup
Russian grammar wikipedia , lookup
Word-sense disambiguation wikipedia , lookup
Latin syntax wikipedia , lookup
Untranslatability wikipedia , lookup
Polish grammar wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Contraction (grammar) wikipedia , lookup
Pipil grammar wikipedia , lookup
Introduction to Corpus Linguistics for Advanced Structure of American English Introduction ..................................................................................................................................... 1 Shifts in Word Meaning .................................................................................................................. 1 Collocations .................................................................................................................................... 2 Powerful tea and cooked coffee? ................................................................................................ 2 Stand on line or in line? .............................................................................................................. 2 Part of Speech Identification ........................................................................................................... 2 Testing textbook claims about POS ............................................................................................ 3 Exemplifying Standard and non-standard forms ............................................................................ 3 Adverbs or Adjectives in intransitive sentences? ....................................................................... 3 Syntactic Constructions .................................................................................................................. 4 The passive with by .................................................................................................................... 4 Subject and object selection ........................................................................................................ 4 Verb complementation ................................................................................................................ 5 Introduction Textbooks on English grammar attempt to describe the language as a system of rules that might explain how the child comes to know the language so quickly. This knowledge of language is referred to as grammatical competence. However, the use of language in everyday situations, known as grammatical performance, often affects competence since it provides the data that the child hears. Corpus linguistics aims to look at the actual use of language, written and spoken. The tasks you will do below are designed to make you familiar with this approach and to appreciate some of its possibilities for your own research and teaching. Shifts in Word Meaning Most aspects of language remain fairly constant over time, but words can shift meaning, or develop new meanings, rather rapidly. The usage of a word is also restricted by the domain in which it occurs. The word hot is a good example of this. Using the Virtual Language Centre (VLC) Web Concordancer, compare the usage of the word hot in the Brown Corpus (data from a wide variety of English texts) with its usage in computer ASAE. Corpus Tutorial 1 texts from the 1990’s. To do so, click on "Simple search" under "English", In the VLC Web Concordancer, (English), type the word "hot" in the second box after "Search string:"; then go to "Select corpus:" and, on the pull-down menu, select "Brown Corpus". What word does "hot" most frequently modify among these 136 entries? What is the most common meaning of "hot" in the Brown Corpus? There are only 7 entries for "hot" in "Business and Economy, but what is the most common meaning here? Collocations Powerful tea and cooked coffee? Certain words commonly occur together: coffee is brewed, not cooked, lights are turned off, not closed. These collocational patterns are language-specific and, as such, are often mind-boggling to the language learner. They are learned by constant exposure. A corpus can provide this exposure quickly. The following sentences are ESL student productions in which the underlined word is not a standard collocate of the following word(s). Using the VLC concordancer, find a better word for each of the underlined words below. A powerful dollar overseas hurts European markets. (Search Business and Economy and Sort left.) This was his single chance for success. (Search Brown and Sort left.) I like being with people, knowing new people, etc. (Search Times for 'new people', and Sort left.) Stand on line or in line? Should ESL students be taught to 'stand on line' or 'stand in line', or does it matter? You can check the use of the prepositions at the Collins CoBuild website. In the "Type in your query" box, type stand +1line. The +1 will ignore a single word between stand and line. How many times does stand on line occur? _________ stand in line? __________ Do Americans use on or in more often? _________ Part of Speech Identification Many grammar texts give form criteria by which to judge the part of speech of a word. For example, a word is classified as a noun if it can occur with a plural or possessive ending, or if it has a noun-making morpheme like –ness or –tion, while it is classified as a verb if it can occur with the tense and participial endings, or if it has a verb-making morpheme like –ize, or -ate. However, many words in English have the same form as nouns and verbs, e.g., house, button, garden, progress, permit, record. How is part of speech determined in such cases? To answer this question, look at the usage of the word permit as it occurs in the Brown corpus. (Sort left.) The word permit can be a noun or a verb in English and, in its base form, there is no formal, morphological way of telling whether it is a noun or a verb. ASAE. Corpus Tutorial 2 Find the first three occurrences of permit as a verb. How do you know permit is being used as a verb here? If you're not sure about assigning parts of speech, you might want to check out the online Part of Speech Tagger at the University of Colorado. Testing textbook claims about POS Klammer, Schulz and Della Volpe's Analyzing English Grammar says that completely, absolutely, totally, extremely, and excessively are adverbs even though they fit the qualifier frame The handsome man seems ______ handsome. These words do fit the adverb form test (they end in -ly) but they fail all the function tests (they can't modify verbs and they can't move within the sentence).. What’s going on? Looking at the usage of one of the -ly degree words in MICASE, the Michigan Corpus of Academic Spoken English, will clarify what's going on with these words. If we classify adverbs as words that (1) modify verbs and (2) can be moved within a sentence, find an example in the data above of totally used as an adverb. If we classify qualifiers as words that (1) modify adjectives or adverbs and (2) can fit the slot in the frame sentence The handsome man is ________________ handsome, find an example in the data above of totally used as a qualifier. This exercise shows that the part of speech class of a word depends in part on a. the morphological form of the word b. the context in which the word occurs c. the grammatical function of the word Exemplifying Standard and non-standard forms Adverbs or Adjectives in intransitive sentences? Many grammar texts claim that there is a usage issue relating to the use of adverbs vs. adjectives following intransitive verbs, as in doing well vs. doing good, with the latter considered informal. Find all instances in MICASE of doing good and doing well. Which form is more common? MICASE classifies its data by speech event. (The speech events are listed to the left of the data.) Is there any correlation between the informal nature of the speech event and ASAE. Corpus Tutorial 3 doing good, as opposed to the formal nature of the speech event and doing well? Speech Event: ADV – advising session; COL – colloquia; DIS – discussion section; LAB – lab sections LEL – large lecture; LES – small lecture; MTG – meetings; OFC – office hours; SEM – seminars; SGR – study groups; TUT – tutorials; How about the age and/or status of the speakers? (Speaker characteristics are listed to the right of the data.) Syntactic Constructions The passive with by The often politically motivated claim "Mistakes were made" has recently been dubbed the 'past exonerative'. How often does the passive occur without by? To see how often mistakes occurs with made, with and without the by, in Collins' 56 million word database, type the following in the Collins Cobuild site's query box -mistakes+2made -- and click "Show Concs" The +2 allows for a maximum of two words to intervene between mistakes and made. (If you want results for mistake and mistakes, type mistake*.) Searching for passives is tricky because the past participle used in the passive voice often has the same form as the past tense. For example: I made a mistake (past tense); A mistake was made (past participle). The Collins Cobuild site distinguishes past tense forms, which it labels VBD, from past participle forms, which it labels VBN and verbs can be searched for as, for example, made/VBN. (Be aware that the VBN tag will give you all past participles, i.e., both has/have made and am,is,are,was,were,be made.) So you can try the search again as follows -mistake*+2made/VBN Of the 40 samples that you see, how many have a by phrase? _________ You can see how often the passive form of a specific verb occurs with a by phrase in the entire 56 million word database by asking for the T-score under "Collocation Sampler". What is the joint frequency of the following items? mistake*+2made and by? __________ (Type the first term; the by will appear in the table.) mistake*/NOUN and made? __________ What percentage of the time does by appear with mistakes were made? _____________ Subject and object selection According to the Collins Cobuild data, what subjects and objects can the verb prove take? Animiate, inanimate, abstract, concrete, mass, count? ASAE. Corpus Tutorial 4 Verb complementation Can you pretend something, i.e. a noun or a noun phrase? Type pretend/VERB+NOUN into the Collins Cobuild box to find out. (Note the occasional errors in the POS tagging of pretend.) What somethings can one pretend? _________________________________________ Can pretend be followed by "-ing" forms, infinitives or anything else? Type pretend* in the box to find out. ______________________________________________________________________________ ______________________________________________________________________________ Know Your Tools A concordance, in its simplest form, is an alphabetical listing of the words in a text, given together with the contexts in which they appear. The most common form of concordance today is the Keyword-in-Context (KWIC) index, in which each word is centered in a fixed-length field (e.g., 80 characters). The example given below was produced by Conc 1.70 (Macintosh), from a plain ASCII text version of the first book of Dickens' A Tale of Two Cities. Note that the line numbers are as calculated by Conc. Figure 3.1.1: Concordance of poor in Tale of Two Cities, Book 1 1320 948 778 1870 947 1884 1615 1577 1001 1036 taste it is that such of sparing the small property of my desolate, while your Miss, if the the love of my stockings, and all his faded away into a on your way to the detachment from the poor poor poor poor poor poor poor poor poor poor cattle always have in their mouths child the inheritance of any part of father, whom I never saw--so long heart pined away, weep for it lady had suffered so intensely mother hid his torture from me tatters of clothes, had, in a long weak stain. So sunken and wronged gentleman, and, with a young lady, by laying a brawny hand A concordancer is a software tool that produces such a list. A collocate list is a list of words that occur in the neighborhood of the keyword. For example, a search for the keyword so in the Hong Kong Web Concordancer with a request for words that occur at a distance of two words from the so, returns the following words as the top collocates of so that occur to its right: Right collocates for 'so' The 132 that 127 as 110 ASAE. Corpus Tutorial 5 to 77 in 57 a 49 and 47 it 46 He 40 of 35 A part-of-speech tagger automatically tags each word in a text with its part of speech. Current taggers are about 97% accurate (as are human experts). The Collins CoBuild Concordancer allows you to search for part of speech strings rather than strings of words. Searching, in the context of corpus work, means looking in the online text for a specific keyword, phrase, part-of-speech tag, etc. Browsing means reading through the documents in the corpus. This is a useful activity only if the documents have been classified. For example, the MICASE corpus is categorized by speech event (lab, lecture, office hour, etc.) and by speaker (professor, undergraduate, grad student, native speaker, non-native speaker, male, female, etc.). This classification allows you to get a sense of the differences between one speech event or speaker type and another. Sorting means listing words in alphabetical order. The Hong Kong Web Concordancer allows you to sort the collocates immediately to the right or to the left of the keyword. Websites to get you started. The Internet Grammar of English is an online course in English grammar written primarily for university undergraduates. IGE does not assume any prior knowledge of grammar. It includes interactive exercises. English Grammar on the Web is a resource designed to support ESL/EFL teachers, but it has valuable lists of links to other web resources on English grammar. Particularly helpful is its Lists of Grammar Lists. The Hong Kong Web Concordancer is a concordance program that allows you to search several million words of English sampled from many sources. Alternate URL for The Hong Kong Web Concordancer The Collins CoBuild Concordancer allows you to search 56 million words of contemporary written and spoken documents. It also allows you to tag these documents for part of speech. A Ten-step Introduction to Concordancing through the Collins Cobuild Corpus Concordance Sampler. Instructions for using Collins Cobuild effectively. MICASE, the Michigan Corpus of Academic Spoken English allows you to search 1,848,364 words of English transcribed from lectures, conversations, service encounters, etc. recorded at the University of Michigan. POS Tagger, A statistically-based Part-of-Speech Tagger from the University of Colorado; it returns your sentence with Penn Treebank part-of-speech tags assigned. Bookmarks for Corpus Linguists is a list of web resources on using corpora and links to software and corpus collections. The Web Concordances and Workbooks from the University of Dundee English Department. This site is devoted to the study of literature using literary computer concordancing, a form of analysing text. This document will attempt to help students understand what is meant by literary concordancing. ASAE. Corpus Tutorial 6 WordNet a lexical database for the English language. Concordances and Corpora. A tutorial by Catherine Ball on the design of corpora, the use of concordances, and available concordancing software. The Montclair Electronic Language Database (MELD). A collection of ESL student essays and background information on L1, native country, age, gender, and other languages spoken. ASAE. Corpus Tutorial 7