Download COMP 790: Statistical Language Processing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ukrainian grammar wikipedia , lookup

Udmurt grammar wikipedia , lookup

Pleonasm wikipedia , lookup

Morphology (linguistics) wikipedia , lookup

Navajo grammar wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Old Norse morphology wikipedia , lookup

Kannada grammar wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Arabic grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Japanese grammar wikipedia , lookup

Compound (linguistics) wikipedia , lookup

Old Irish grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Preposition and postposition wikipedia , lookup

Inflection wikipedia , lookup

Swedish grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Macedonian grammar wikipedia , lookup

Zulu grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Icelandic grammar wikipedia , lookup

French grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Determiner phrase wikipedia , lookup

Italian grammar wikipedia , lookup

Vietnamese grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Esperanto grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Malay grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Polish grammar wikipedia , lookup

English grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
COMP 791A: Statistical Language
Processing
Linguistic Essentials
Chap. 3
1
Levels of study of NLP

Lexical

Possible words in a given language


Phonetics & phonology

How words are related to sounds


rose [roz]
Parts-of-speech & Morphology

How words are constructed from basic meaning units
(morphemes)



rose ?gellapou
friend + ly --> friendly
rose + ly ≠ rosely
friend + s --> friends
woman + s ≠ womans
Phrase Structure and Syntax

How words can be ordered to form correct sentences


?Red the is rose / adj det verb noun
The rose is red / det noun verb adj
2
Levels of study of NLP (con’t)

Semantics

What words mean (lexical semantics, word sense disambiguation)


How word meanings are combined into the meaning of sentences.



chair --> furniture / person
The chair is broken.
The chair is sick.
Pragmatics

How language conventions affects the literal meaning (interpretation)



Discourse


Do you have the time?
Do you have the children?
How surrounding sentences affect interpretation

The chair’s leg is broken. He went skiing last week-end.

The chair’s leg is broken. Someone placed a 500kg package on it.
World-Knowledge

How general knowledge about the world affects interpretation

The prof sent the student to see the chair because he was fed up with his
behavior.

The prof sent the student to see the chair because he wanted to see him.

The prof sent the student to see the chair because he was taking in class.
3
Levels of study of NLP






Lexical
Phonetics & phonology
Parts-of-speech & Morphology
Phrase Structure and Syntax
Semantics
Pragmatics


Discourse
World-Knowledge
4
Parts of Speech and Morphology

Parts of Speech (POS)



Morphology





word/lexical/syntactic/grammatical
categories/tag/class
Ex: noun, verb, adjectives, prepositions, …
study and description of word formation in a language
modification of a root form (stem) by affixes
affix: prefixes, suffixes, infixes, circumfixes
and exceptions… thief --> thieves chief --> chiefs
Word categories are systematically related by
morphological processes
5
Morphological processes

Inflection


to indicate case, gender, number, tense, person, mood, or voice
does not change the word’s grammatical class or meaning significantly



Derivation



car --> cars
talk --> talking
creation of a new word
may have different meaning and/or grammatical class

infect --> disinfect

grateful --> ungrateful

wide (adjective) --> widely (adverb)

teach (verb) --> teacher (noun)
Compounding




merging 2 or more words into a single one
written as separate words
but pronounced as a single word / denotes 1 single concept
so merits an entry in lexicon

tea kettle, disk drive, mad cow disease
6
Classes of POS

Open (lexical) class

things, actions, events, …





ex. cat, John, eat
new words can be added easily
nouns, verbs, adjectives, adverbs
some languages do not have all these categories
Closed (functional) class

generally function/grammatical words



ex. the, in, and, for
relatively fixed membership
prepositions, determiners, pronouns, conjunctions,
particles, numerals, auxiliary verbs
7
Main POS

Open class





Noun – refers to entities like people, places, things or ideas.
Adjective – describes the properties of nouns or pronouns.
Verb – describes actions, activities and states.
Adverb – describes a verb, an adjective or another adverb.
Closed class




Pronoun – word that take the place of a noun or other.
Determiner – describes the particular reference of a noun.
Preposition - expresses spatial or time relationships.
…
8
Nouns (open)

Entities like people, places, things or ideas


Typical inflections:




ex: dog, tree, Mary, idea
number (singular, plural),
gender (masculine, feminine, neuter),
case (nominative, genitive, accusative, dative)
Sub-categories:


proper nouns (John)
adverbial nouns (today, home)
9
Verbs (open)

Actions, activities, and states
The men work in the field.
The men are working in the field.
The men are in the field.

Typical inflections:





tenses: present, past, future
other inflection: number, person
aspect: progressive, perfective
voice: active, passive
Sub-category:

auxiliaries (considered closed-class words)


modal verbs (considered closed-class words)


ex: be, do, will
ex: can, should, could
main verbs
10
Main verbs

Transitive

requires a direct object (found with questions: what? or whom?)



Intransitive

does not require a direct object.


?The child broke.
The child broke a glass.
The train arrived.
Some verbs can be both transitive and intransitive




The ship sailed the seas. (transitive)
The ship sails at noon. (intransitive)
I met my friend at the airport. (transitive)
The delegates met yesterday. (intransitive)
11
Adjectives (open)

Properties and attributes



long road
rainy day
attractive hat
Typical inflections:


number, gender, case
Sub-categories:


comparative (richer)
superlative (richest)
12
Adverbs (open)

words added to a verb, adjective, adverbs or
other to expand its meaning






You must set up the copy now.
Mary walks gracefully.
Sometimes I take a walk in the woods.
Jack usually leaves the house at seven.
I have always admired her.
sub-categories:




locative (here)
degree (very)
manner (slowly)
temporal (late, yesterday (noun?))
13
Closed class categories

Determiners:


words that makes specific the denotation of a noun phrase
 articles
the hat, a hat
 demonstrative
this hat, that hat
 possessive
John‘s hat, my hat, her book
 wh-determiner which hat, whose hat
 quantifier
some hat, every hat
Prepositions:

words that show the relationship between certain words in a sentence



by, to, at,…
Conjunctions:



The accident occurred under the bridge.
words used to join other words or group of words
or, when, but, and,…
Auxiliary & modal verbs:

be, do, can , may, should,…
14
Closed class categories (con’t)

Particles:



words that are added to main verbs to construct different verbs
check+out = check out, make+up = make up
Ex:




She made up a story
She made it up
particles vs. prepositions
 she <ran up> a bill / she <ran> <up> a hill
Numerals:

one, third
15
Closed class categories (con’t)

Pronouns:

a word that replaces a noun or even another sentence


ex: she, ourselves, mine, that
subcategories:






Personal:
 You are very nice.
Possessive:
 Mine is nicer.
Interrogative: used to ask questions: who?, what?, which?
 Who is that girl ?
Demonstrative: point out definite persons, places or things: this,
these, that
 This is my book.
 He said he was busy, but that was a lie.
Relative: joins the clause which is introduced its own attachment:
who, which, that
 She is the girl who won the race.
...
16
Other parts of speech

Interjections:


Negatives:


no, not
Politeness markers:


Ouch!
Hello, bye
Existential:

There are 3 students sleeping.
17
Summary

Open class





nouns
verbs
adjectives
adverbs
cat, spirit
eat, cook
slow, large
slowly
Closed class







prepositions
determiners
pronouns
conjunctions
auxiliary verbs
particles
numerals
on, under, at
a, the, some
she, who, I, other
and, but, or
can, may, should
up, on, off
one, two, first
18
The substitution test

Basic test to determine if 2 words belong to the same
POS class
intelligent
The sad
one is in the corner.
green
fat
…
19
POS Tagging

Automatically assign POS tags to
words in a text.



Children/NOUN eat/VERB sweet/ADJECTIVE candy/NOUN
The/ARTICLE children/NOUN ate/VERB the/ARTICLE
cake/NOUN
The/ARTICLE news/NOUN has/AUXILIARY been/MAIN VERB
quite/ADVERB sad/ADJECTIVE in/PREPOSITION fact/NOUN
./PERIOD
20
Why do POS Tagging?



1st step towards NLU
easier then full NLU (results > 95% accuracy)
Useful for:

speech recognition/ synthesis (better accuracy)



stemming in IR



how to recognize/pronounce a word
CONtent /noun VS conTENT/adj
which morphological affixes the word can take
adverb - ly = noun (friendly - ly = friend)
Indexing in IR

pick out nouns which may be more important than other words in
indexing documents
21
Tag Sets


A tag indicates the various conventional
parts of speech.
Different Tag Sets have been used


Ex. Brown Tag Set, Penn Treebank Tag Set
Tag examples:





NP Proper noun
NN Singular noun
AT Article
DET Determinant
More on this later
22
Penn Treebank tag Set
Tag
Description
Examples
CC
conjunction, coordinating
and but either et for less minus neither nor or plus so therefore
CD
numeral, cardinal
mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one
DT
determiner
all an another any both del each either every half la many much
IN
preposition or subordinating conjunct.
astride among upon whether out inside pro despite on by throughout
JJ
adjective or numeral, ordinal
third ill-mannered pre-war regrettable oiled calamitous first
JJR
adjective, comparative
bleaker braver breezier briefer brighter brisker broader bumper
NN
noun, common, singular or mass
common-carrier cabbage knuckle-duster Casino afghan shed
NNP
noun, proper, singular
Motown Venneboerger Czestochwa Ranzer Conchita Trumplane
NNS
noun, common, plural
undergraduates scotches bric-a-brac products bodyguards facets
PRP
pronoun, personal
hers herself him himself it itself me myself one oneself ours
RB
adverb
occasionally unabatingly maddeningly adventurously professedly
RP
particle
aboard about across along apart around aside at away back
TO
"to" as preposition or infinitive marker
to
VB
verb, base form
ask assemble assess assign assume atone attention avoid bake
VBD
verb, past tense
dipped pleaded swiped wore soaked tidied convened halted
VBG
verb, present participle or gerund
telegraphing stirring focusing angering judging stalling lactating
VBN
verb, past participle
imitated dilapidated aerosolized chaired languished panelized used
VBP
verb, present tense, not 3rd p. singular
predominate wrap resort sue twist spill cure lengthen brush
VBZ
verb, present tense, 3rd p. singular
bases reconstructs marks mixes displeases seals carps weaves
…
23
Ambiguities in POS tagging

Children eat sweet candy / noun.
Too much boiling will candy / adjective
the molasses.

Fruit flies / ? like / ? a banana.

24
Levels of study of NLP






Lexical
Phonetics & phonology
Parts-of-speech & Morphology
Phrase Structure and Syntax
Semantics
Pragmatics


Discourse
World-Knowledge
25
Syntax or Phrase Structure

Syntax

study of the regularities and constrains of
word order and phrase structure


the book is red
vs red book is the
Grammar

expresses the relations among the constituents
of a sentence
26
Constituents


also called, syntactic structures
Main Constituents:

S: sentence
The boy is happy.

NP: noun phrase
the little boy
Sam Smith
I
three boy from Montreal
eat an apple
sing

VP: verb phrase
leave Boston in the morning

PP: prepositional phrase
in the morning
about my ticket

AdjP: adjective phrase
really funny
rather clear
very large

AdvP: adverb phrases
slowly
really slowly
27
Sentence Moods/Types

Declarative



Imperative



Eat!
S --> VP
Yes-No Question



Mary eats.
S --> NP VP
Did Mary eat?
S --> Aux NP VP
Wh-Question


When did Mary eat?
S --> WH-pro Aux NP VP
28
Noun Phrases

NP --> pre-modifiers head post-modifiers

head: central noun in NP


the little boy, the boy from Montreal

determiners, cardinal, ordinal, quantifier
pre-modifiers:



the boy, two boys, first boy, several boys

funny boy, really funny boy

flights from Montreal

gerundive (-ing)
AdjP
post-modifiers:
 PP

non-finite clause




flights arriving from Montreal

dinner served on board, jewels stolen from the queen

flight to arrive from Montreal
-ed
infinitive form
relative clause

flight that arrives from Montreal, girl who won the race
29
Verb Phrases

VP --> head-verb complements adjuncts

Some VPs:






Verb
Verb NP
Verb NP PP
Verb PP
Verb S
Verb VP
eat.
leave Montreal.
leave Montreal in the morning.
leave in the morning.
think I would like the fish.
want to leave.
want to leave Montreal.
want to leave Montreal in the morning.
want to want to leave Montreal in the morning.
30
Subcategorisation frames

Some verbs can take complements that others cannot
I want to fly.

* I find to fly.
Verbs are subcategorized according to the
complements they can take --> subcategorisation
frames


traditionally: transitive vs intransitive
nowadays: up to 100 subcategories / frames
Frame
Verb
Example
empty
NP
eat, sleep
prefer, find
I eat.
I prefer apples.
NP NP
show, give
Show me your hand.
PPfrom PPto fly, travel
VPto
S
I fly from Montreal to Toronto
prefer, want I prefer to leave.
mean
Does this mean you are going to leave me?
31
Prepositional phrases

PP --> Preposition NP


from Japan
inside my blue bag
32
Adjective Phrases

AdjP --> Adj Modifiers

tall

very tall

taller than Mary
33
Adverb Phrases

AdvP --> Adv Modifiers

affirmatively

very graciously

rather secretively
34
Context Free Grammars

set of non-terminal symbols



set of terminal symbols

lexicon of words & punctuation
cat, mouse, nurses, eat, ...

sentence S



constituents & parts-of-speech
S, NP, VP, PP, Det, N, V, ...
a non-terminal designated as the starting symbol
a set of re-write rules



having a single non-terminal on the LHS and one or
more terminal or non-terminal in the RHS
S --> NP VP
NP --> Pro | PN | Det Nominal
35
A simple context-free grammar








S --> NP VP
NP --> AT NNS
NP --> AT NN
NP --> NP PP
VP --> VP PP
VP --> VBD
VP --> VBD NP
P --> IN NP










The Grammar
NNS --> children
NNS --> students
NNS --> mountains
VBD --> slept
VBD --> ate
VBD --> saw
AT --> the
IN --> in
IN --> of
NN --> cake
The Lexicon
36
A parse tree

a tree representation of the application
of the grammar to a specific sentence.
S
NP
AT
The
VP
NNS
children
VBD
ate
NP
AT
the
NN
cake
37
Stochastic Grammars

Grammars obtained by adding probabilities
to “algebraic” (i. e., non-probabilistic)
grammars.








1 S --> NP VP
0.4 NP --> AT NNS
0.4 NP --> AT NN
0.2 NP --> NP PP
0.1 VP --> VP PP
0.1 VP --> VBD
0.8 VP --> VBD NP
1 P --> IN NP
38
Syntactic Dependencies

Local dependency
 dependency between two words expressed within the same
syntactic rule.



The 3/plural books/plural.
n-grams models this very well.
Non-local dependency
 two words can be syntactically dependent even though they
occur far apart in a sentence
 Ex: subject-verb agreement
 The children who found a wallet on the street yesterday
while walking their dog were given a reward.
 challenge for certain statistical NLP approaches (ex. ngrams) that model local dependencies.
39
Difficulties in parsing

Attachment ambiguity

The children ate the cake with a spoon.


The children ate (the cake with a spoon).??
The children (ate with a spoon).??
40
Other difficulties

NP bracketing
plastic cat food can cover
--> ? (plastic cat) (food can) cover

--> ? plastic (cat food can) cover
--> ? (plastic cat food) (can cover)

Conjunctions and appositives

Maddy, my dog, and Samy
--> ?(Maddy, my dog), and (Samy)
--> ?(Maddy), (my dog), and (Samy)
41
Another Ambiguity: Garden-Path
Sentences




well-studied class of syntactic ambiguity
sentence is re-analysed when the last word
in encountered
humans have difficulty analysing such
sentences
Example:
The horse raced past the barn fell.
(the horse that was raced past the barn) fell.
42
Garden Path: Wrong Parse
[S [NP The horse] [VP raced past the barn]]fell
dt: determiner
n: noun
v: verb
p: preposition
S: sentence
NP: noun phrase
VP: verb phrase
PP: prepositional phrase
43
Garden Path: Right Parse
[S [NP The horse [PAP raced past the barn]][VP fell]]
dt: determiner
n: noun
v: verb
p: preposition
S: sentence
NP: noun phrase
VP: verb phrase
PP: prepositional phrase
PAP: passive phrase
44
Levels of study of NLP






Lexical
Phonetics & phonology
Parts-of-speech & Morphology
Phrase Structure and Syntax
Semantics
Pragmatics


Discourse
World-Knowledge
45
Semantics


the study of the meaning of words,
constructions, and utterances
can be divided into two parts:

lexical semantics


meaning of words
compositional semantics


Meaning of sentences and discourse
the meaning of the whole often differs from the
meaning of the parts.
46
Lexical Semantics

Meaning of individual words

I went to the bank of Montreal and deposited 50$.
I went to the bank of the river and dangled my feet.

Word Sense Disambiguation



Determining which sense of a word is used in a specific
sentence
Semantic relations between words:
 hypernymy, hyponymy, synonymy, antonymy,
meronymy, holonymy, polysemy, homonymy and
homophony.
47
Meaning of sentences

The cat eats the mouse = The mouse is eaten by the cat.

Goal:



Some characteristics of a sentence that influence semantic interpretation:





built a representation of the meaning of the sentence
attach semantic roles to constituents
Type
Polarity
Tense
Voice
declarative, interrogative, imperative, exclamatory
positive, negative
past, present, future
Active, passive
Some semantic roles (different from syntactic roles):







Agent
the doer of a volitional act
Patient
the thing that is affected by an act
Recipient
the receiver of an object
Instrument the instrument used to perform an act.
Time
the time the act is performed.
Location
the location of an act or object.
…
48
Semantic Roles

Ex:


Ex:





JohnAGENT hit PeterPATIENT with a ballINSTRUMENT.
I ate
I ate
I ate
I ate
spaghetti
spaghetti
spaghetti
spaghetti
with
with
with
with
meatballsINGREDIENT_OF_SPAGUETTI
saladSIDE DISH_OF_SPAGUETTI
a forkINSTRUMENT
a friendACOMPANIER_OF_EATING
Important for machine translation…



I AGENT: PERSON_LACKING_SOMEONE miss you PATIENT: PERSON_MISSED
?Je PATIENT: PERSON_MISSED teAGENT: PERSON_LACKING_SOMEONE manque.
Tu PATIENT: PERSON_MISSED me AGENT: PERSON_LACKING_SOMEONE manques.
49
Levels of study of NLP






Lexical
Phonetics & phonology
Parts-of-speech & Morphology
Phrase Structure and Syntax
Semantics
Pragmatics


Discourse
World-Knowledge
50
Pragmatics



goes beyond the study of the meaning of a
sentence
tries to explain what the speaker is really
expressing
understanding how people use language
socially (ex. figures of speech, speech
acts, discourse analysis, …)

Ex: Could you spare some change?
51
Discourse Analysis


In logics: A  B  C  C  B  A
Not in NL:



John visited Paris. He bought Mary some expensive
cologne. Then he flew home. He went to Kmart. He
bought some underwear.
John visited Paris. Then he flew home. He went to Kmart.
He bought Mary some expensive cologne. He bought some
underwear.
NL Text must be coherent

? Bill went to see his mother. The trunk is what makes
the bonsai, it gives it both its grace and power.
52
Using world knowledge

Using our general knowledge of the world to
interpret a sentence/discourse

Ex: A men was killed yesterday because a jealous
husband returned home earlier then usual.

Ex: Silence of the lambs…
53