Download NOUN

Document related concepts

Old Irish grammar wikipedia , lookup

Antisymmetry wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Ukrainian grammar wikipedia , lookup

Preposition and postposition wikipedia , lookup

Comparison (grammar) wikipedia , lookup

Kannada grammar wikipedia , lookup

Distributed morphology wikipedia , lookup

Morphology (linguistics) wikipedia , lookup

Compound (linguistics) wikipedia , lookup

Dependency grammar wikipedia , lookup

Zulu grammar wikipedia , lookup

Arabic grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Japanese grammar wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Old Norse morphology wikipedia , lookup

Portuguese grammar wikipedia , lookup

Macedonian grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Inflection wikipedia , lookup

Romanian nouns wikipedia , lookup

Latin syntax wikipedia , lookup

Icelandic grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Romanian grammar wikipedia , lookup

Russian grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Determiner phrase wikipedia , lookup

Malay grammar wikipedia , lookup

Vietnamese grammar wikipedia , lookup

Turkish grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Italian grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Yiddish grammar wikipedia , lookup

French grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

English grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Transcript
NLP 1주차 강의
Linguistic Essentials
(Ch 3)
통계적 언어처리
• 통계?
• 전혀 모르는 언어가 쓰인 부호를 본다고 하자.
– 예측?
– 압축?
– 통계?
• 그러면 통계적으로 언어 문제를 해결할 수 있는
가?
– 구글은 현재 통계적 방법으로 큰 성과를 거둠
– 중국어-영어 기계번역: 순수 통계에서 약간 벗어남
– 언어 현상을 반영한 통계적 접근
• Long distance dependency
• Context-free???
통계적 ? 규칙
• 한국어 맞춤법 검사기를 이용한 비교
– 규칙에 의한 방법의 한계
– 통계적 방법의 한계
– 어느 것을 중심으로 하여 개발하였는가?
Competence and Performance
• Innate  Learning, Categorical 
Statistical
– CFG (Context free grammar)
• Performance
The Description of Language
•
•
•
Grammar
• set of rules which describe what is allowable in a language
Classic Grammars (Quirk et al.)
• meant for humans who know the language
• definitions and rules are mainly supported by examples
• no (or almost no) formal description tools; cannot be
programmed
Explicit Grammar (CFG, LFG, GPSG, HPSG, Dependency Grammars,
Link Grammars,...)
• formal description
• can be programmed & tested on data (texts)
5
Levels of (Formal) Description
•
6 basic levels (more or less explicitly present in most theories):
– and beyond (pragmatics/logic/...)
– meaning (semantics)
– (surface) syntax
– morphology
– Phonology(음운론)
– Phonetics(음성학, 발음학)/orthography(정서법, 맞춤법)
•
특성
– 6 ach level has an input and output representation
– output from one level is the input to the next (upper) level
– sometimes levels might be skipped (merged) or split
6
Phonetics/Orthography
• Input:
– acoustic signal (phonetics) / text (orthography)
• Output:
– phonetic alphabet (phonetics) / text (orthography)
• Deals with:
– Phonetics:
• consonant & vowel (& others) formation in the vocal tract
• classification of consonants, vowels, ... in relation to
frequencies, shape & position of the tongue and various
muscles in the vocal t.
• intonation
– Orthography: normalization, punctuation, etc.
7
Phonology
• Input:
– sequence of phones/sounds (in a phonetic alphabet); or
“normalized” text (sequence of (surface) letters in one
language’s alphabet) [NB nota bene (note well): phones vs.
phonemes]
• Output:
– sequence of phonemes (~ (lexical) letters; in an abstract
alphabet)
• Deals with:
– relation between sounds and phonemes (units which might
have some function on the upper level)
– e.g.: [u] ~ oo (as in book), [æ] ~ a (cat); i ~ y (flies)
8
Morphology
• Input:
– sequence of phonemes (~ (lexical) letters)
• Output:
– sequence of pairs (lemma, (morphological) tag)
• Deals with:
– composition of phonemes into word forms and
their underlying lemmas (lexical units) +
morphological categories (inflection, derivation,
compounding)
– e.g. quotations ~ quote/V + -ation(der.V->N) +
NNS.
9
(Surface) Syntax
• Input:
– sequence of pairs (lemma, (morphological) tag)
• Output:
– sentence structure (tree) with annotated nodes (all lemmas,
(morphosyntactic) tags, functions), of various forms
• Deals with:
– the relation between lemmas & morph. categories and the
sentence structure
– uses syntactic categories such as Subject, Verb, Object,...
– e.g.: I/PP1 see/VB a/DT dog/NN ~
((I/sg)SB ((see/pres)V (a/ind dog/sg)OBJ)VP)S
10
Meaning (semantics)
•
•
•
Input:
– sentence structure (tree) with annotated nodes (lemmas,
(morphosyntactic) tags, surface functions)
Output:
– sentence structure (tree) with annotated nodes (autosemantic -has
meaning in isolation - lemmas, (morphosyntactic) tags, deep
functions)
Deals with:
– relation between categories such as “Subject”, “Object” and (deep)
categories such as “Agent”, “Effect”; adds other cat’s
– e.g. ((I)SB ((was seen)V (by Tom)OBJ)VP)S ~
(I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f)
11
...and Beyond
•
Input:
– sentence structure (tree): annotated nodes (autosemantic lemmas,
(morphosyntactic) tags, deep functions)
• Output:
– logical form, which can be evaluated (true/false)
• Deals with:
– assignment of objects from the real world to the nodes of the sentence
structure
– e.g.: (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f) ~
see(Mark-Twain[SSN:...],Tom-Sawyer[SSN:...])[Time:bef
99/9/27/14:15][Place:39ş19’40”N76ş37’10”W]
12
Phonology
• (Surface <-> Lexical) Correspondence
• “symbol-based” (no complex structures)
• En.: (stem-final change)
– lexical: b a b y + s (+ denotes start of ending)
– surface: b a b i e s (phonetic-related:
bébì0s)
• Arabic: (interfixing, inside-stem doubling) (lit. ‘read’)
– lexical: kTb+uu+CVCCVC
(CVCC...vowel/consonant pattern)
– surface: kuttub
13
Phonology Examples
• German (umlaut) (satz ~ sentence)
– lexical: s A t z + e (A denotes “umlautable” a)
– surface: s ä t z
e (phonetic: zæcƏ, vs. zac)
• Turkish (vowel harmony)
– lexical: e v + l A r (←houses)
b a š + l A r
– surface: e v
l e r
(heads→) b a š
l a r
• Czech (e-insertion & palatalization)
– lexical: m a t E K + 0 (<-mothers/gen.) m a t E K + ě
– surface: m a t e k
(mother/dat. →) m a t
c
e
14
Parts of Speech and Morphology
• Parts of Speech correspond to syntactic or
grammatical categories such as noun, verb, adjective,
adverb, pronoun, determiner, conjunction, and
preposition.
• Word categories are systematically related by
morphological processes such as the formation of
plural form from the singular form.
• The major types of morphological processes are
inflection, derivation and compounding.
Parts of Speech
• Correspond to syntactic or grammatical
categories such as noun, verb, adjectives,
prepositions….
• Word categories are systematically related by
morphological processes such as the
formation of plural form from the singular form,
past tense from present tense.
The Parts of Speech
• Noun – Refer to entities like people, places, things or
idea.
• Pronoun – words that take the place of nouns.
• Proper noun – names.
• Determiner – describes the particular action in a noun.
• Adjective – describes the properties of nouns or
pronouns.
• Verb – action in a sentence.
• Adverb – describes a verb, an adjective or another
adverb.
• And many more
POS Labeling
• Children (NOUN) eat
candy(NOUN)
• The(ARTICLE)
the(ARTICLE)
(VERB)
sweet(ADJECTIVE)
children(NOUN)
cake(NOUN)
ate(VERB)
• The(ARTICLE) news(NOUN) has(AUXILIARY) been(MAIN VERB)
quite(ADVERB) sad(ADJECTIVE) in(PREPOSITION)
fact(NOUN) .(PERIOD)
Morphology: Morphemes & Order
• Handles what is an isolated form in written text
• Grouping of phonemes into morphemes
– sequence deliverables → deliver, able and s (3
units)
– could as well be some “ID” numbers:
• e.g. deliver ~ 23987, s ~ 12, able ~ 3456
• Morpheme Combination
– certain combinations/sequencing possible, other not:
• deliver+able+s, but not able+derive+s; noun+s, but not
noun+ing
• typically fixed (in any given language)
19
Morphology: From Morphemes
to Lemmas & Categories
• Lemma: lexical unit, “pointer” to lexicon
– might as well be a number, but typically is represented as
the “base form”, or “dictionary headword”
• possibly indexed when ambiguous/polysemous:
– state1 (verb), state2 (state-of-the-art), state3
(government)
– from one or more morphemes (“root”, “stem”,
“root+derivation”, ...) (derivation vs. inflection)
• Categories: non-lexical
– small number of possible values (< 100, often < 5-10)
20
Morphology Level: The Mapping
•
Formally: A+ → 2(L,C1,C2,...,Cn)
– A is the alphabet of phonemes (A+ denotes any non-empty
sequence of phonemes)
– L is the set of possible lemmas, uniquely identified
– Ci are morphological categories, such as:
• grammatical number, gender, case
• person, tense, negation, degree of comparison, voice,
aspect, ...
• tone, politeness, ...
• part of speech (not quite morphological category, but...)
– 2(L,C1,C2,...,Cn) denotes the power set of (L,C1,C2,...,Cn)
– A, L and Ci are obviously language-dependent
21
The Dictionary (or Lexicon)
•
Repository of information about words:
– Morphological:
• description of morphological “behavior”: inflection
patterns/classes
– Syntactic:
• Part of Speech
• relations to other words:
– subcategorization (or “surface valency frames”)
– Semantic:
• semantic features
• valency frames
– ...and any other! (e.g., translation)
22
The Categories: Part of Speech:
Open and Closed Categories
•
Part of Speech - POS (pretty much stable set across languages)
– not so much morphological (can be looked up in a dictionary), but:
– morphological “behavior” is typically consistent within a POS category
– Open categories: (“open” to additions)
• verb, noun, pronoun, adjective, numeral, adverb
– subject to inflection (in general); subject to cross-category
derivations
– newly coined words always belong to open POS categories
– potentially unlimited number of words
– Closed categories:
• preposition, conjunction, article, interjection, clitic, particle
– not a base for derivation (possibly only by compounding)
– finite and (very) small number of words
23
The Categories: Part of Speech,
Open Categories: Verbs
•
Verbs:
– infl. categories: person, number, tense, voice, aspect, [gender, neg.], ...
– syntactic/semantic: classification:
• ordinary: (to) speak, (to) write
• auxiliaries: be, have, will, would, do, go (going)
• modals: can, could, may, should, must, want
• phrasal: begin, end, start
– morphological classification
• conjugation type: regular/irregular, (Ge.: weak/strong/irregular)
– conjugation class: (Cz.: 5 classes + ~100 combinations)
24
The Categories: Part of Speech,
Open Categories: Nouns
•
Nouns: infl. categories: number, [gender, case, negation, ...]
– semantic classification:
• human/animal/(non-living) things: driver/bird/stone
• concrete/abstract: computer/thought
• common/proper: table/Hopkins
– syntactic classification: countable/unc.: book, water
– morphological classification:
• pluralia/singularia tantum: data (is), police (are)
• declension type (“pattern” or “class”) (Cz.: 14 basic patterns,
plus deviations: ~300 patterns, + irregular inflection)
• “adverbial” nouns: afternoon, home, east (no inflection)
25
The Categories: Part of Speech,
Open Categories: Pronouns
•
Pronouns: infl. categories: number, gender, case, negation; person
– much like nouns (syntactic usage also similar)
– (pro)noun ~ “stands for” a noun
– classification (mostly syntactic/semantic):
• personal: I, you, she, she, it, we, you, they
• demonstrative: this, that
• possessive: my, your, her, his, its, our, their; mine, yours, ours,...
• reflexive: myself, yourself, herself,..., oneself
• interrogative: what, which, who, whom, whose, that
• indefinite (“nominal”): somebody, something, one
– morphological classification: mostly idiosyncratic pattern
26
The Categories: Part of Speech,
Open Categories: Adjectives
•
Adjectives:
– infl. categories: degree of comp., [number, gender, case, negation]
– classification:
• ordinary: new, interesting, [test (equipment)]
• possessive: John’s, driver’s
• proper: Appalachian (Mountains)
• often derived from verbs/nouns: teaching (assistant), trendy,
stylish
– morphological classification
• mostly regular declension (Cz.: 4 basic patterns, ~ 10 total)
• degrees of comparison (En.: big, bigger, biggest)
• but: large number of forms (agreement, cf. section on syntax)
27
The Categories: Part of Speech,
Open Categories: Adverbs
•
Adverbs: “infl.” categories: degree of comp., [negation]
– open cat.: regular derivation from adjectives common:
• new → newly, interesting → interestingly
– non-derived adverbs:
• ordinary: so, well, just, too, then, often, there
• wh-adverbs (interrogative): why, when, where, how
• degree adverbs/qualifiers: very, too
– morphological classification (not much, really...)
• degree of comparison: well, better, best
– soon, sooner (other lang.: all 3 degrees regular)
28
The Categories: Part of Speech,
Open Categories: Numerals
•
Numerals: infl. categories: number, gender, case, negation
– open cat.: compounding (Ge.: einundzwanzig, 21)
– classification:
• cardinals: one, five, hundred
– NB: million etc. often considered noun
• ordinals/fractionals: first, second, thirtieth
• quantifiers: all, many, some, none
• multiplicative: times, twice (Cz.: dvaadvacetkrát, 22-times)
• multilateral: single, triple, twofold
– morphological classification: as nouns/adjectives; many irreg.
29
The Categories: Part of Speech,
Closed Categories
•
Closed categories: preposition, conjunction, article, interjection, clitic, particle
– Morphological behavior: indeclinable (no declension, no conjugation)
• preposition: of, without, by, to;
• conjunction:
coordinating: and, but, or, however
subordinating: that, if, because, before, after, although, as
• article: a, the;
• interjection: wow, eh, hello;
• clitic: ‘s; may be attached to whole phrases (at the end)
• particle: yes, no, not; to (+verb);
– many (otherwise) prepositions if part of phrasal verbs, e.g. (look)
up
30
The Categories: Number and
Gender
•
•
Grammatical Number: Singular, Plural
– nouns, pronouns, verbs, adjectives, numerals
• computer / computers; (he) goes / (they) go
– In some languages (Czech): Dual (nouns, pronouns, adjectives)
• (Pl.) nohami / (Dl.) nohama (Cz.; (by) legs (of sth)/(by) legs (of
sb))
Grammatical Gender: Masculine, Feminine, Neuter
– nouns, pronouns, verbs, adjectives, numerals
• he/she/it; читал, читала, читало (Ru.; (he/she/it) was-reading)
• nouns: (mostly) do not change gender for a single lexical unit
– Also: animate/inanimate (gram., some genders), etc.
• Mädchen (Ge.; girl, neuter); děti (Cz.; children, masc. inanim.)
31
The Categories: Case
•
Case
– English: only personal pronouns/possessives, 2 forms
– other languages: 4 (German), 6 (Russian), 7 (Czech,Slovak,...)
• nouns, pronouns, adjectives, numerals
– most common cases (forms in singular/plural)
• nominative
I/we (work)
tøída/tøídy (Cz.;
class)
• genitive
(picture of) me/us
tøídy/tøíd
• dative
(give to) me/us
tøídě/tøídám
• accusative
(see) me/us
tøídu/tøídy
• vocative
-/tøído/tøídy
• locative
(about) me/us
tøídě/tøídách
• instrumental (by) me/us
tøídou/tøídami
32
The Categories: Person, Tense
•
•
Person
– verbs, personal pronouns
• 1st, 2nd, 3rd: (I) go, (you) go, (he) goes; (we) go, (you) go, (they)
go
•
jdu,
jdeš,
jde,
jdeme, jdete,
jdou (Cz.)
Tense
(Cz.: go) (Pol.:
go)
– past:
(you) went
szliœcie
– present:
(you pl.) go
jdete
idziecie
– future (!if not “analytical”)
pùjdete – concurrent (gerund)
going
jda
idąc
– preceding
sze³szy
33
The Categories: Person, Tense
•
•
Person
– verbs, personal pronouns
• 1st, 2nd, 3rd: (I) go, (you) go, (he) goes; (we) go, (you) go, (they)
go
•
jdu,
jdeš,
jde,
jdeme, jdete,
jdou (Cz.)
Tense
(Cz.: go) (Pol.:
go)
– past:
(you) went
szliœcie
– present:
(you pl.) go
jdete
idziecie
– future (!if not “analytical”)
pùjdete – concurrent (gerund)
going
jda
id¹c
– preceding
szed³szy
34
Note on Tense
•
•
Grammars: more (syntactic/sematnic) tenses  Time X
– but: morphology handles isolated words → some tenses can be
defined & handled only at an upper level (surface syntax)
Examples of (traditional) tense (synthetical and analytical):
• infinitive: (to) write (tenseless, personless, ..., except negation
(Cz.))
• simple present/past: (I) write/(she) writes; (I,she) wrote
• progressive present/past: (I) am writing; (I) was writing
• perfect present/past: (I) have written; (I) had written
• all in passive voice (cf. later), too:
– (the book) is being/has been/had been written etc.
• all in conditional mood, too (mood: in Eng. not a morph. category!)
– (the book) would have been written
35
The Categories: Voice &
Aspect
•
•
Voice
– active vs. passive
• (I) drive / (I am being) driven
• (Ich) setzte (mich) / (Ich bin) gesetzt (Ge.: to sit down)
Aspect
– imperfective vs. perfective:
• пoкупал / купил (Ru.: I used to buy, I was buying) / I (have) bought)
– imperfective continuous vs. iterative (repeating)
• spal / spával (Cz.: I was sleeping / I used to sleep (every ...))
36
The Categories:
Negation, Degree of Comparison
•
•
•
Negation:
– even in English: impossible (~ not possible)
• Cz: every verb, adjective, adverb, some nouns; prefix neDegree of Comparison (non-analytical):
– adjectives, adverbs:
• positive (big), comparative (bigger), superlative (biggest)
• Pol.: (new) nowy, nowszy, najnowszy
Combination (by prefixing):
– order? both possible: (neg.: Cz./Pol.: ne-/nie-, sup.: nej-/naj)
• Cz.: nejnemo‫ٱ‬nìjší (the most impossible)
• Pol.: nienajwierniejszy (the most unfaithful)
37
Typology of Languages
•
By morphological features
– Analytical: using (function) words to express categories
• English, also French, Italian, ..., Japanese, Chinese
– I would have been going ~ (Pol.) szłabym
– Inflective: using prefix/suffix/infix, combines several categ.
• Slavic: Czech, Russian, Polish,... (not Bulgarian); also French,
German; Arabic
– (Cz. new(acc.)) novou (Adj, Fem., Sg., Acc., Non-neg., Pos.)
– Agglutinative: one category per (non-lexical) morpheme
• Finnish, Turkish, Hungarian
– (Fin. plural): -i-
38
Categories & Tags
•
•
Tagset:
– list of all possible combinations of category values for a given
language
– T  C1ⅹC2ⅹ... ⅹCn
– typically string of letters & digits:
• compact system: short idiosyncratic abbreviations:
– NNS (gen. noun, plural)
• positional system: each position i corresponds to Ci:
– AAMP3----2A---- (gen. Adj., Masc., Pl., 3rd case
(dative), comparative (2nd degree of comparison),
Affirmative (no negation))
– tense, person, variant, etc.: N/A (marked by “empty
position”, or ‘-’)
Famous tagsets: Brown, Penn, Multext[-East], ...
39
Words’ Syntactic Functions
• Typically, nouns refer to entities in the world like
people, animals and things.
• Determiners describe the particular reference of a
noun and adjectives describe the properties of nouns.
• Verbs are used to describe actions, activities and
states.
• Adverbs modify a verb in the same way as adjectives
modify nouns. Prepositions are typically small words
that express spatial or time relationships. Prepositions
can also be used as particles to create phrasal verbs.
Conjunctions and complementizers link two words,
phrases or clauses.
Syntax or Phrase Structure: A simple
context-free grammar
• S --> NP VP
• NP --> AT NNS | AT
NN | NP PP
• VP --> VP PP | VBD |
VBD NP
• P --> IN NP
The Grammar
• AT --> the
• NNS --> children |
students | mountains
• VBD --> slept | ate | saw
• IN --> in | of
• NN --> cake
The Lexicon
Syntax or Phrase Structure: A Parse Tree
A Simple Context-Free
Grammar
• The Grammar rules
• S -> NP V
• NP -> N
• The Lexicon
• N -> John, Gaurav, Ram ……
• V -> walks, talks, eats, went …..
Tag Sets
• A tag indicates the various conventional
parts of speech.
• Different Tag Sets have been used: E.g.,
Brown Tag Set, Penn Treebank Tag Set.
• Tag examples: NP Proper noun, NN
Singular noun, AT Article, DET
Determinant.
Stochastic Grammars
• Grammars obtained by adding
probabilities in a fairly transparent way
to “algebraic” (i. e., non-probabilistic)
grammars.
• Stochastic grammars supplement
underlying algebraic grammars.
Dependencies
• Local Dependency: dependence
between two words expressed within
the same syntactic rule. (n-grams
model this well)
• Non-local dependency: is an instance
in which two words can be syntactically
dependent even though they occur far
apart in a sentence.
Ambiguities
1. “Children eat sweet candy”
2. “Too much boiling will candy the
molasses”
• In sentence (1) candy is a noun while
in (2) it is an adjective.
• Word category (POS) ambiguity needs
to be resolved.
Ambiguities (Cont.)
• Semantic Roles: Determining thematic roles in
a sentence.
• Agent, Patient, Experiencer, Instrument, Goal
….
• Raju(AGENT) hit us (PATIENT) with a ball (INSTRUMENT).
• Complicated by the notions of direct and
indirect object, active and passive voice.
Ambiguities (Cont.)
• Attachment ambiguities occur with
phrases that could have been generated
by two different nodes in the parse tree.
E.g.: saw the man in the house with a
pole.
• Rare Usage and spurious usage: A
hectare is a hundred ares.
Garden-Path Sentences
• Garden-Path sentences are sentences
that lead you along a path that suddenly
turns out not to work.
E.g.: The horse raced past the barn fell.
Local and Non-Local Dependencies
• A local dependency is a dependency between two
words expressed within the same syntactic rule.
• A non-local dependency is an instance in which two
words can be syntactically dependent even though
they occur far apart in a sentence (e.g., subject-verb
agreement; long-distance dependencies such as whextraction).
• Non-local phenomena are a challenge for certain
statistical NLP approaches (e.g., n-grams) that model
local dependencies.
The Place of Syntax
• Between Morphology and Meaning
• Morphology provides/expects:
– lemmas (now it’s time to extract syntactic information from a
dictionary)
– tags (Part-of-Speech and combination of morphological
categories, such as number, case, tense, voice, ...)
– and of course, we also have word order now to look at/provide
• Typically multiple input (non-disambiguated morphology) / output
(multiple syntactic structures, non-disambiguated)
52
Words, Phrases, Clauses,
Sentences
• Words
– smallest units on the syntax level
• function/autosemantic
• Phrases
– consist of words and/or phrases; “constituents”
• Clauses
– have predicative meaning (single predicate)
• Sentences
– consist of clauses (one or more)
53
Words
• Words
– lexical units
• auxiliary (function) words: have grammatical function
• autosemantic words (“lexical” words)
– idioms
• fixed phrases (non-compositional) -> “words”
• Relate to other words
– dictionary: repository of information for each words about its
(idiosyncratic) relations to other words
54
Phrases
• Phrases
– sequences of words and/or phrases (i.e. of constituents)
• may be discontinuous, sometimes
• Types of Phrases:
– Simple/Clausal (i.e. clauses, which consist of phrases,
behave like phrases... recursively!)
– According to head type:
• Noun: a new book
• Adjective: brand new
• Adverbial: so much
• Prepositional: in a class
• Verb: catch a ball
55
Noun Phrases
• Head: noun
– water
– a book
– new ideas
– that small village
– The greatest rise of interest rates since W.W.II within a single
year
– an operating system which, despite great efforts on the part
of our administrators, fails all too often
56
Adjective Phrases
• Head: adjective
• Simple APs very common, complex APs rare
– old
– very old
– really very old
– five times older than the oldest elephant in our
ZOO
– (was) sure, as far as I know, to be there first
57
Adverbial and Numerical
Phrases
• Head: adverb
– three times as much
– quickly
– really
– (... speaks) more loudly than anybody could
imagine
– yesterday
• Numerical Phrases
– (... lasted) three hours
– twenty-two
58
Prepositional Phrases
• Head: preposition
• In fact, play the role of Adverbial Phrases often
– in the City
– at five o’clock
– to a brightest future
– without a glitch
– to the point where neither of them could get out of
it
– up to five points
– instead of Charles
59
Verb Phrases
• Head: verb
– (It) rains
– ... could ever see a large Unidentified Flying
Object
– ..., why (we) have got so much rain
– Please!
– On Sunday, (he) was driven to the hospital
– (It) began to snow
– (...) prohibits smoking in this area
60
Coordination of Phrases
• “Head”: conjunction, punctuation
– and, or, but
•
•
•
•
cats and dogs
new or even newer
quickly and precisely
he came to the conclusion that it makes no
sense to hide himself anymore and therefore we
could hear him today
• (trains) from and to Baltimore
• eat your lunch now or at the picnic table
61
Ellipsis
• Word or Phrase missing where one would normally expect one;
often happens in dialogues
– Whom did you see there?
– Peter. ?? verb ??
• Most common in coordination (written text)
– Pittsburgh leads 4-0 but Detroit only 3-1. ??verb in 2nd part??
• Systematic in many languages: pro-drop (leave out a pers.
pronoun in the Subject position)
– [She] Passed the exam easily.
62
Clauses
• Predicative function:
– some activity of some subjects/objects, somewhere in time,
under certain circumstances
• Main clause
– not part of a greater clause
• Embedded clause
– part of other clause, having some function (like a phrase)
• Function of a Clause
– same as for phrase, plus some (direct speech/discourse
etc.)
63
Gaps (Non-Continuous
Constituents)
•
Constituent moves from the expected position:
– happens in questions and relative clauses
• Who(m) do you work for <gap>whom?
– strictly speaking, do you work should be you (do work)
• I don’t know why we have got so much rain <gap>why?
• On Sundays, I usually work <gap>On Sundays but I stay home on
Tuesdays.
• The story he never wrote <gap>the story
• And finally the car she was supposed to use <gap>the car for her
trip to New York broke.
– The last two: also could be considered ellipsis (which) plus
a gap.
64
Sentences
• Consist of a single or several main clauses
• If several main clauses:
– coordination, much like coordinated phrases
– more coordinating conjunctions:
• and, or, but, (and) therefore, ...
• In written text, starts with a capital letter
• Ends by period/question mark/exclamation mark
• not all periods end a sentence!
• Sometimes even semicolon (;) might be a sentence break
(...vague)
65
Syntax: Representation
•
•
Tree structure (“tree” in the sense of graph theory)
– one tree per sentence
Two main ideas for the shape of the tree:
– phrase structure (~ derivation tree, cf. parsing later)
• using bracketed grouping
• brackets annotated by phrase type
• heads (often) explicitly marked
– dependency structure (lexical relations “local”, functions)
• basic relation: head (governor) - dependent
• links (edges) annotated by syntactic function (Sb, Obj, ...)
• phrase structure: implicitly present (but 1:n mapping Dep→PS)
66
Phrase Structure Tree
• Example:
((DaimlerChrysler’s shares)NP (rose (three eights)NUMP (to 22)PP-NUM )VP )S
67
Dependency Tree
• Example:
rosePred(sharesSb(DaimlerChrysler’sAtr),eightsAdv(threeAtr),toAuxP(22Adv))
68
Semantic Roles
• Most commonly, noun phrases are arguments
of verbs. These arguments have semantic roles:
the agent of an action, the patient and other
roles such as the instrument or the goal.
• In English, these semantic roles correspond to
the notions of subject and object.
• But things are complicated by the notions of
direct and indirect object, active and passive
voice.
Subcategorization
• Different verbs can relate different numbers of
entities: transitive versus intransitive verbs.
• Tightly related verb arguments are called
complements but less tightly related ones are called
adjuncts. Prototypical examples of adjuncts tell us
time, place, or manner of the action or state described
by the verb.
• Verbs are classified according to the type of
complements they permit. This called
subcategorization. Subcategorizations allow to
capture syntactic as well as semantic regularities.
Attachment Ambiguity and GardenPath Sentences
• Attachment ambiguities occur with phrases that could
have been generated by two different nodes in the
parse tree.
The child ate the cake with a spoon.
• Genuinely ambiguous: Fruit flies like a banana.
• Garden-Path sentences are sentences that lead along
a path that suddenly turns out not to work.
The horse raced past the barn fell.
Semantics
• Semantics is the study of the meaning of words, constructions,
and utterances.
• Semantics can be divided into two parts: lexical semantics and
combination semantics.
• Lexical semantics: hypernymy, hyponymy, antonymy,
meronymy, holonymy, synonymy, homonymy, polysemy, and
homophony.
• Compositionality: the meaning of the whole often differs from
the meaning of the parts.
• Idioms correspond to cases where the compound phrase means
something completely different from its parts.
Pragmatics
• Pragmatics is the area of studies that goes
beyond the study of the meaning of a sentence
and tries to explain what the speaker really is
expressing.
• Understand the scope of quantifiers, speech
acts, discourse analysis, anaphoric relations.
• The resolution of anaphoric relations is crucial
to the task of information extraction.