Download Corpus 06

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Corpus 05
Grammar
• Unlike lexicography, grammar does not
have a long tradition of empirical study.
• Prescriptive vs descriptive: traditionally,
grammatical studies had a goal of providing
a relatively complete category of forms in a
language and a description of rules for
combining forms.
• Traditional approaches failed to analyze the
patterned use of grammatical features, nor focused
on variation in language use, or pay attention to
functional reasons for choosing between the
alternatives.
• The neglected areas turn out to be the
strength of corpus studies: frequency of
distribution of various constructions,
association patterns between grammatical
structures and other linguistic and
nonlinguistic factors, factors that
affect the choices between structural
variants.
• Question 1: How can the use and function of
morphological characteristics be better understood
by analyzing their distribution across registers?
• Question 2: How can the use and function of
grammatical classes be better understood by
analyzing their distribution across registers?
• Question 3: How can the function of syntactic
constructions be better understood by analyzing
their distribution across registers?
• Question 4: What linguistic and nonlinguistic
features are associated with the choice between
seemingly synonymous structural variants?
Morphological study
• To learn the frequency and distribution of
characteristic and the differing function of
particular variants.
• Rather straight forward, using search
function in an untagged concordance corpus.
Nominalization
• Nouns that are related to verbs or adjectives
morphologically.
• -tion, -sion, -ness, -ment, -ity
• Note for the words that are not nominalizations:
cushion, dandelion, mansion
• Case study: frequency of nominalization
• Frequency distribution of nominalization across 3
registers
• Per million words
Academic prose: 44,000
Fiction:
11,200
Speech:
11,300
Findings in nominalization
Academic prose uses nominalizations to treat actions and
processes as abstract objects separated from human
participants.
Nominalizations in academic prose discuss the
generalized action of moving, rather than a particular
person moving.
Fiction and spoken discourse are more concerned with
people and use verbs and adjectives to describe how
they are behaving.
Academic prose more often refers to a process with a
stative nominalization, where fiction and spoken
corpus describe a specific person's action with a verb or
adjective.
Nominalization endings
• Proportion of nominalization
-tion
acad
68%
fic
51%
speech
56%
-ment
15%
21%
24%
-ness
2%
13%
5%
-ity
15%
15%
15%
• 1. Though -tion as the majority in all three
reregisters, it is highest in academic prose.
• 2. -ment suffix account for a greater
percentage of the nominalizations in fictions
and spoken corpus
• 3. -ness ending is more important in fiction
than the other two registers.
• -tion ending is to convert an action expressed by a
verb into a noun, usually referring to a generalized
process or state.
• -ment: process making or doing something.
occurring in three registers.
• Many -ment are noncount nouns describing mental
states. Rare in academic prose and spooken corpus,
relatively common in fiction for the decription of
mental states of characters.
• -ness accounts high in fiction. The -ness ending
converts adjectifes into nouns that often describe
personal qualities.
Counting grammatical categories
• Nouns as adjectives: depends on the goal of
counting
• If the goal is to count the extent to which
nominal verses verbal references are used, it
is appropriate to include nouns used to
modify other nouns.
Counting grammatical categories
• Pronouns: similar to nouns in that they refer
to a nominal entity, different in that they do
not refer to anything when used in isolation.
However, if we want a count of words that
directly refer to things, then it seems most
appropriate to omit pronouns.
• Verbs: auxiliary
• Should not be included in the overall verb
count, as they mark aspectual meanings or
negation.
Noun-to-verb ratios in three registers
Academic
prose
Fiction Speech
A. All nouns and
verbs
2.2:1
1.2:1
1.2:1
B. All nouns and
verbs excluding
auxiliaries
C. Nouns excluding
premodifiers of other
nouns and verb
excluding auxiliaries
2.9:1
1.5:1
1.6:1
2.5:1
1.3:1
1.3:1
• Fiction and speech have similar ratios,
while academic prose is close to double that.
• The emphasis in academic prose on objects,
states, and process rather than human agents
and their actions.
• In fiction and speech, pronouns
take the place of many nouns, and
this reduces the noun-to-verb ratio.
Comparison of noun-verb ratios
• Academic prose : objects,states, and processes,
all referred to with nouns
• Fiction and speech: human agents and their
actions, described with verbs.
Excerpt fro an academic prose
• In planning a livestock building or
conversion, the psychological and health
requirements of the livestock should
undoubtedly be given absolute priority
together with the basic needs of the
stockman.
• (9 nouns and 2 verb)
Excerpt from fiction
• He merged and locked the door. He
unsnapped the protective strap on his
holster and scanned the parking lot. He
walked quickly to the glass door of the
bank.
• (7 nouns and 5 verbs)
Excerpt from a conversation
• A: Oh yeah, it’s called washing your hair.
Don’t you know how to wash your hair?
• B: Might be.
• A: I know. I know how to have a bath.
• B: Go away, I’m cooking…. Excuse me
please, I’m trying to cook. I haven’t got
enough potatoes.
• (4 nouns and 14 verbs.)
Syntactic construction: that and
to complements
• How to search them
• Findings
Searching
• Both that and to have multiple meanings
• That: complement clause, determiner, demonstrative
pronoun, relative pronoun, complex clause connector.
• To: complement, adverbial clause, relative clause,
prepositional phrase
• That can also be omitted.
• Use a computer program to automatically identify
constructions that are likely to be that-clauses or toclauses. Then an interactive checking program is
used to edit the codes. Finally, another program is
used to calculate frequency counts.
Distribution of that-clause & to-clause
Conversation
Academic prose
that-clause **************
****
to-clause
*********
********
Each * represents 5000 occurrences per million words
That-clauses are very common in conversation but not so
common in academic prose. To clauses are moderately
common in both.
Distribution in terms of lexicogrammatical association
• Most verbs control only one or the other
type of complement clause.
• That-clause: imagine, mention, suggest,
conclude, guess, argue
• To-clause: begin,start, like,love, try, and
want.
Extraposed
• With verb predicates: I want to sleep here.
• With adjective predicates: It’s possible to
adjust the limit upwards.
Extraposed that-clauses
Conversation Academic prose
**
******
Extraposed to-clauses
**
***************
Fig. 3.2 Use of that-clauses and to-clauses in extraposed constructions
(each * represents 100 occurrences per million words)
Conversation Academic prose
*
*
Extraposed to-clauses
with verb predicates
Extraposed to-clauses
*
with adjective predicates
**************
Fig. 3.3 Use of to-clauses in extraposed constructions controlled by verbs
versus adjectives
(each * represents 100 occurrences per million words)
Explanation for preference
• Extraposed adjective predicates frame a
proposition in terms of a static condition
rather than a dynamic action or process. The
typical grammatical associations of toclauses fit well with the typical
communicative priorities of academic prose,
resulting in a greater reliance on to-clauses
in that register.