Download 3/39 - M. Ali Fauzi

Document related concepts

Old Norse morphology wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Udmurt grammar wikipedia , lookup

English clause syntax wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Macedonian grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Inflection wikipedia , lookup

Swedish grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Old Irish grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Arabic grammar wikipedia , lookup

Navajo grammar wikipedia , lookup

Compound (linguistics) wikipedia , lookup

Icelandic grammar wikipedia , lookup

Portuguese grammar wikipedia , lookup

Romanian nouns wikipedia , lookup

French grammar wikipedia , lookup

Sotho parts of speech wikipedia , lookup

Italian grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Malay grammar wikipedia , lookup

Zulu grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Vietnamese grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

English grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
POS TAGGING AND
HMM
Tim Teks Mining
Adapted from Heng Ji
Outline
• POS Tagging and HMM
3/39
What is Part-of-Speech (POS)
• Generally speaking, Word Classes (=POS) :
• Verb, Noun, Adjective, Adverb, Article, …
• We can also include inflection:
• Verbs: Tense, number, …
• Nouns: Number, proper/common, …
• Adjectives: comparative, superlative, …
• …
4/39
Parts of Speech
• 8 (ish) traditional parts of speech
• Noun, verb, adjective, preposition, adverb, article, interjection,
pronoun, conjunction, etc
• Called: parts-of-speech, lexical categories, word classes,
morphological classes, lexical tags...
• Lots of debate within linguistics about the number, nature, and
universality of these
• We’ll completely ignore this debate.
5/39
7 Traditional POS Categories
•N
•
•
•
•
•
•
noun
verb
chair, bandwidth, pacing
V
study, debate, munch
ADJ adj
purple, tall, ridiculous
ADV adverb
unfortunately, slowly,
P
preposition of, by, to
PRO pronoun
I, me, mine
DET determiner the, a, that, those
6/39
POS Tagging
• The process of assigning a part-of-speech or lexical
class marker to each word in a collection.
WORD
tag
the
koala
put
the
keys
on
the
table
DET
N
V
DET
N
P
DET
N
7/39
Penn TreeBank POS Tag Set
• Penn Treebank: hand-annotated corpus of Wall Street
Journal, 1M words
• 46 tags
• Some particularities:
• to /TO not disambiguated
• Auxiliaries and verbs not distinguished
8/39
Penn Treebank Tagset
9/39
Why POS tagging is useful?
• Speech synthesis:
• How to pronounce “lead”?
• INsult
inSULT
• OBject
obJECT
• OVERflow overFLOW
• DIScount
disCOUNT
• CONtent
conTENT
• Stemming for information retrieval
• Can search for “aardvarks” get “aardvark”
• Parsing and speech recognition and etc
• Possessive pronouns (my, your, her) followed by nouns
• Personal pronouns (I, you, he) likely to be followed by verbs
• Need to know if a word is an N or V before you can parse
• Information extraction
• Finding names, relations, etc.
• Machine Translation
10/39
Open and Closed Classes
• Closed class: a small fixed membership
• Prepositions: of, in, by, …
• Auxiliaries: may, can, will had, been, …
• Pronouns: I, you, she, mine, his, them, …
• Usually function words (short common words which play a
role in grammar)
• Open class: new ones can be created all the time
• English has 4: Nouns, Verbs, Adjectives, Adverbs
• Many languages have these 4, but not all!
11/39
Open Class Words
• Nouns
• Proper nouns (Boulder, Granby, Eli Manning)
• English capitalizes these.
• Common nouns (the rest).
• Count nouns and mass nouns
• Count: have plurals, get counted: goat/goats, one goat, two goats
• Mass: don’t get counted (snow, salt, communism) (*two snows)
• Adverbs: tend to modify things
• Unfortunately, John walked home extremely slowly yesterday
• Directional/locative adverbs (here,home, downhill)
• Degree adverbs (extremely, very, somewhat)
• Manner adverbs (slowly, slinkily, delicately)
• Verbs
• In English, have morphological affixes (eat/eats/eaten)
12/39
Closed Class Words
Examples:
• prepositions: on, under, over, …
• particles: up, down, on, off, …
• determiners: a, an, the, …
• pronouns: she, who, I, ..
• conjunctions: and, but, or, …
• auxiliary verbs: can, may should, …
• numerals: one, two, three, third, …
13/39
Prepositions from CELEX
14/39
English Particles
15/39
Conjunctions
16/39
POS Tagging
Choosing a Tagset
• There are so many parts of speech, potential distinctions we can
•
•
•
•
draw
To do POS tagging, we need to choose a standard set of tags to
work with
Could pick very coarse tagsets
• N, V, Adj, Adv.
More commonly used set is finer grained, the “Penn TreeBank
tagset”, 45 tags
• PRP$, WRB, WP$, VBG
Even more fine-grained tagsets exist
17/39
Using the Penn Tagset
• The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT
number/NN of/IN other/JJ topics/NNS ./.
• Prepositions and subordinating conjunctions marked IN
(“although/IN I/PRP..”)
• Except the preposition/complementizer “to” is just marked
“TO”.
18/39
POS Tagging
• Words often have more than one POS: back
• The back door = JJ
• On my back = NN
• Win the voters back = RB
• Promised to back the bill = VB
• The POS tagging problem is to determine the POS tag for
a particular instance of a word.
These examples from Dekang Lin
19/39
How Hard is POS Tagging? Measuring
Ambiguity
20/39
Current Performance
• How many tags are correct?
• About 97% currently
• But baseline is already 90%
• Baseline algorithm:
• Tag every word with its most frequent tag
• Tag unknown words as nouns
• How well do people do?
21/39
Quick Test: Agreement?
• the students went to class
• plays well with others
• fruit flies like a banana
DT: the, this, that
NN: noun
VB: verb
P: prepostion
ADV: adverb
22/39
Quick Test
• the students went to class
DT NN
VB P NN
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
23/39
How to do it? History
Trigram Tagger
(Kempe)
96%+
DeRose/Church
Efficient HMM
Sparse Data
95%+
Greene and Rubin
Rule Based - 70%
1960
Brown Corpus
Created (EN-US)
1 Million Words
HMM Tagging
(CLAWS)
93%-95%
1970
Brown Corpus
Tagged
LOB Corpus
Created (EN-UK)
1 Million Words
Tree-Based Statistics
(Helmut Shmid)
Rule Based – 96%+
Transformation
Based Tagging
(Eric Brill)
Rule Based – 95%+
1980
Combined Methods
98%+
Neural Network
96%+
1990
2000
LOB Corpus
Tagged
POS Tagging
separated from
other NLP
Penn Treebank
Corpus
(WSJ, 4.5M)
British National
Corpus
(tagged by CLAWS)
24/39
Two Methods for POS Tagging
Rule-based tagging
1.
•
2.
(ENGTWOL)
Stochastic
1.
Probabilistic sequence models
•
•
HMM (Hidden Markov Model) tagging
MEMMs (Maximum Entropy Markov Models)
25/39
Rule-Based Tagging
• Start with a dictionary
• Assign all possible tags to words from the dictionary
• Write rules by hand to selectively remove tags
• Leaving the correct tag for each word.
26/39
Rule-based taggers
• Early POS taggers all hand-coded
• Most of these (Harris, 1962; Greene and Rubin, 1971)
and the best of the recent ones, ENGTWOL (Voutilainen,
1995) based on a two-stage architecture
• Stage 1: look up word in lexicon to give list of potential POSs
• Stage 2: Apply rules which certify or disallow tag sequences
• Rules originally handwritten; more recently Machine
Learning methods can be used
27/39
Start With a Dictionary
• she:
• promised:
• to
• back:
• the:
• bill:
PRP
VBN,VBD
TO
VB, JJ, RB, NN
DT
NN, VB
• Etc… for the ~100,000 words of English with more than 1
tag
28/39
Assign Every Possible Tag
NN
RB
VBN
JJ
PRP VBD
TO VB
She promised to back the
VB
DT
bill
NN
29/39
Write Rules to Eliminate Tags
Eliminate VBN if VBD is an option when VBN|VBD follows
“<start> PRP”
NN
RB
JJ
VB
VBN
PRP VBD
TO VB DT NN
She promised
to
back the
bill
30/39
POS tagging
The involvement of ion channels in B and T lymphocyte activation is
DT
NN
IN NN NNS IN NN CC NN
NN
NN
VBZ
supported by many reports of changes in ion fluxes and membrane
VBN
IN JJ
NNS IN NNS IN NN NNS CC NN
…………………………………………………………………………………….
…………………………………………………………………………………….
training
Unseen text
We demonstrate
that …
Machine Learning
Algorithm
We demonstrate
PRP
VBP
that …
IN
Goal of POS Tagging
 We want the best set of tags for a sequence of words (a
sentence)
 W — a sequence of words
 T — a sequence of tags
^
Our
Goal
T  arg max P(T | W )
T
 Example:
P((NN NN P DET ADJ NN) | (heat oil in a large pot))
31/39
32/39
But, the Sparse Data Problem …
• Rich Models often require vast amounts of data
• Count up instances of the string "heat oil in a large pot" in
the training corpus, and pick the most common tag
assignment to the string..
• Too many possible combinations
33/39
POS Tagging as Sequence Classification
• We are given a sentence (an “observation” or “sequence of
observations”)
• Secretariat is expected to race tomorrow
• What is the best sequence of tags that corresponds to this
sequence of observations?
• Probabilistic view:
• Consider all possible sequences of tags
• Out of this universe of sequences, choose the tag sequence which is
most probable given the observation sequence of n words w1…wn.
34/39
Getting to HMMs
• We want, out of all sequences of n tags t1…tn the single tag sequence
such that P(t1…tn|w1…wn) is highest.
• Hat ^ means “our estimate of the best one”
• Argmaxx f(x) means “the x such that f(x) is maximized”
35/39
Getting to HMMs
• This equation is guaranteed to give us the best tag
sequence
• But how to make it operational? How to compute this
value?
• Intuition of Bayesian classification:
• Use Bayes rule to transform this equation into a set of other
probabilities that are easier to compute
Reminder: Apply
Bayes’ Theorem (1763)
likelihood
posterior
prior
P(W | T ) P(T )
P(T | W ) 
P(W )
Our Goal: To
maximize it!
marginal likelihood
Reverend Thomas Bayes — Presbyterian minister (1702-1761)
36/39
How to Count
^
T  arg max P(T | W )
T
P(W | T ) P(T )
 arg max
P(W )
T
 arg max P(W | T ) P(T )
T
 P(W|T) and P(T) can be counted from a large
hand-tagged corpus; and smooth them to get rid of the zeroes
37/39
Count P(W|T) and P(T)
Assume each word in the sequence depends only on
its corresponding tag:
n
P(W | T )   P( wi | ti )
i 1
38/39
39/39
Count P(T)
P(t1 ,..., t n ) 
history
P(t1 )  P(t2 | t1 )  P(t3 | t1t 2 )  ...  P(tn | t1 ,..., t n 1 )
 Make a Markov assumption and use N-grams over
tags ...
 P(T) is a product of the probability of N-grams that make
it up
n
P(t1 ,..., t n )  P(t1 )   P(ti | ti  1)
i 2
40/39
Part-of-speech tagging with Hidden Markov
Models
Pw1...wn | t1...t n Pt1...t n 
Pt1...t n | w1...wn  
Pw1...wn 
tags
words
 Pw1...wn | t1...t n Pt1...t n 
n
  Pwi | ti Pti | ti 1 
i 1
output probability
transition probability
41/39
Analyzing
Fish sleep.
42/39
A Simple POS HMM
0.2
start
0.8
0.1
0.8
noun
verb
0.2
0.1
0.1
0.7
end
43/39
Word Emission Probabilities
P ( word | state )
• A two-word language: “fish” and “sleep”
• Suppose in our training corpus,
• “fish” appears 8 times as a noun and 5 times as a verb
• “sleep” appears twice as a noun and 5 times as a verb
• Emission probabilities:
• Noun
• P(fish | noun) : 0.8
• P(sleep | noun) : 0.2
• Verb
• P(fish | verb) : 0.5
• P(sleep | verb) : 0.5
44/39
Viterbi Probabilities
0
start
verb
noun
end
1
2
3
45/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
0
start
1
verb
0
noun
0
end
0
1
2
3
46/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 1: fish
0
1
start
1
0
verb
0
.2 * .5
noun
0
.8 * .8
end
0
0
2
3
47/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 1: fish
0
1
start
1
0
verb
0
.1
noun
0
.64
end
0
0
2
3
48/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 2: sleep
0
1
2
start
1
0
0
verb
0
.1
.1*.1*.5
noun
0
.64
.1*.2*.2
end
0
0
-
(if ‘fish’ is verb)
3
49/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 2: sleep
0
1
2
start
1
0
0
verb
0
.1
.005
noun
0
.64
.004
end
0
0
-
(if ‘fish’ is verb)
3
50/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 2: sleep
0
1
2
start
1
0
0
verb
0
.1
noun
0
.64
.005
.64*.8*.5
.004
.64*.1*.2
end
0
0
(if ‘fish’ is a noun)
-
3
51/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 2: sleep
0
1
2
start
1
0
0
verb
0
.1
noun
0
.64
.005
.256
.004
.0128
end
0
0
(if ‘fish’ is a noun)
-
3
52/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
Token 2: sleep
take maximum,
set back pointers
0.1
0
1
2
start
1
0
0
verb
0
.1
noun
0
.64
.005
.256
.004
.0128
end
0
0
-
3
53/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
Token 2: sleep
take maximum,
set back pointers
0.1
0
1
2
start
1
0
0
verb
0
.1
.256
noun
0
.64
.0128
end
0
0
-
3
54/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 3: end
0
1
2
start
1
0
0
verb
0
.1
.256 -
noun
0
.64
.0128 -
end
0
0
-
3
0
.256*.7
.0128*.1
55/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
Token 3: end
take maximum,
set back pointers
0.1
0
1
2
start
1
0
0
verb
0
.1
.256 -
noun
0
.64
.0128 -
end
0
0
-
3
0
.256*.7
.0128*.1
56/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
Decode:
fish = noun
sleep = verb
0.1
0
1
2
start
1
0
0
verb
0
.1
.256 -
noun
0
.64
.0128 -
end
0
0
-
3
0
.256*.7
Markov Chain for a Simple Name Tagger
George:0.3
0.6
Transition
Probability
W.:0.3
Bush:0.3
Emission
Probability
Iraq:0.1
PER
$:1.0
0.2
0.3
0.1
START
0.2
LOC
0.2
0.5
0.3
END
0.2
0.3
0.1
0.3
George:0.2
0.2
Iraq:0.8
X
W.:0.3
0.5
discussed:0.7
58/39
Exercise
• Tag names in the following sentence:
• George. W. Bush discussed Iraq.
59/39
POS taggers
• Brill’s tagger
• http://www.cs.jhu.edu/~brill/
• TnT tagger
• http://www.coli.uni-saarland.de/~thorsten/tnt/
• Stanford tagger
• http://nlp.stanford.edu/software/tagger.shtml
• SVMTool
• http://www.lsi.upc.es/~nlp/SVMTool/
• GENIA tagger
• http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/
• More complete list at:
http://www-nlp.stanford.edu/links/statnlp.html#Taggers