Download Part-Of-Speech Tagging

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Creating Better Language Models
Idea: Increase the information about the
language contained in the model structure
• Lexical information: parts of speech, word
classes, semantics, …
• Structural information: phrase construction,
attachment, hierarchy, …
Parts of Speech
• Generally speaking, the “grammatical type”
of word:
– Verb, Noun, Adjective, Adverb, Article, …
• We can also include inflection:
–
–
–
–
Verbs: Tense, number, …
Nouns: Number, proper/common, …
Adjectives: comparative, superlative, …
…
• Most commonly used POS sets for English
have 50-80 different tags
BNC Parts of Speech
• Nouns:
NN0 Common noun, neutral for number (e.g. aircraft
NN1 Singular common noun (e.g. pencil, goose, time
NN2 Plural common noun (e.g. pencils, geese, times
NP0 Proper noun (e.g. London, Michael, Mars, IBM
• Pronouns:
PNI Indefinite pronoun (e.g. none, everything, one
PNP Personal pronoun (e.g. I, you, them, ours
PNQ Wh-pronoun (e.g. who, whoever, whom
PNX Reflexive pronoun (e.g. myself, itself, ourselves
• Verbs:
VVB finite base form of lexical verbs (e.g. forget, send, live, return
VVD past tense form of lexical verbs (e.g. forgot, sent, lived
VVG -ing form of lexical verbs (e.g. forgetting, sending, living
VVI infinitive form of lexical verbs (e.g. forget, send, live, return
VVN past participle form of lexical verbs (e.g. forgotten, sent, lived
VVZ -s form of lexical verbs (e.g. forgets, sends, lives, returns
VBB present tense of BE, except for is
…and so on: VBD VBG VBI VBN VBZ
VDB finite base form of DO: do
…and so on: VDD VDG VDI VDN VDZ
VHB finite base form of HAVE: have, 've
…and so on: VHD VHG VHI VHN VHZ
VM0 Modal auxiliary verb (e.g. will, would, can, could, 'll, 'd)
• Articles
AT0 Article (e.g. the, a, an, no)
DPS Possessive determiner (e.g. your, their, his)
DT0 General determiner (this, that)
DTQ Wh-determiner (e.g. which, what, whose, whichever)
EX0 Existential there, i.e. occurring in “there is…” or “there are…”
• Adjectives
AJ0 Adjective (general or positive) (e.g. good, old, beautiful)
AJC Comparative adjective (e.g. better, older)
AJS Superlative adjective (e.g. best, oldest)
• Adverbs
AV0 General adverb (e.g. often, well, longer (adv.), furthest.
AVP Adverb particle (e.g. up, off, out)
AVQ Wh-adverb (e.g. when, where, how, why, wherever)
• Miscellaneous:
CJC Coordinating conjunction (e.g. and, or, but)
CJS Subordinating conjunction (e.g. although, when)
CJT The subordinating conjunction that
CRD Cardinal number (e.g. one, 3, fifty-five, 3609)
ORD Ordinal numeral (e.g. first, sixth, 77th, last)
ITJ Interjection or other isolate (e.g. oh, yes, mhm, wow)
POS The possessive or genitive marker 's or '
TO0 Infinitive marker to
PUL Punctuation: left bracket - i.e. ( or [
PUN Punctuation: general separating mark - i.e. . , ! , : ; - or ?
PUQ Punctuation: quotation mark - i.e. ' or "
PUR Punctuation: right bracket - i.e. ) or ]
XX0 The negative particle not or n't
ZZ0 Alphabetical symbols (e.g. A, a, B, b, c, d)
Task: Part-Of-Speech Tagging
• Goal: Assign the correct part-of-speech to
each word (and punctuation) in a text.
• Example:
Two
old
men
bet
on
the game
CRD
AJ0
NN2
VVD
PP0
AT0
NN1
.
PUN
• Learn a local model of POS dependencies,
usually from pretagged data
• No parsing!!!
Hidden Markov Models
• Assume: POS generated as random process,
and each POS randomly generates a word
0.2
AJ0
0.2
“a” 0.6
0.3
0.3
NN2
0.5
AT0
“the” 0.4
“cats”
“men”
0.9
0.5
NN1
0.1
“cat”
“bet”
HMMs For Tagging
• First-order (bigram) Markov assumptions:
– Limited Horizon: Tag depends only on previous tag
P(ti+1 = tk | t1=tj1,…,ti=tji) = P(ti+1 = tk | ti = tj)
– Time invariance: No change over time
P(ti+1 = tk | ti = tj) = P(t2 = tk | t1 = tj) = P(tj  tk)
• Output probabilities:
– Probability of getting word wk for tag tj: P(wk | tj)
– Assumption:
Not dependent on other tags or words!
Combining Probabilities
• Probability of a tag sequence:
P(t1t2…tn) = P(t1)P(t1t2)P(t2t3)…P(tn-1tn)
Assume t0 – starting tag:
= P(t0t1)P(t1t2)P(t2t3)…P(tn-1tn)
• Prob. of word sequence and tag sequence:
P(W,T) = i P(ti-1ti) P(wi | ti)
Training from labeled training
• Labeled training = each word has a POS tag
• Thus:
PMLE(tj) = C(tj) / N
PMLE(tjtk) = C(tj, tk) / C(tj)
PMLE(tj | wk) = C(tj:wk) / C(wk)
• Smoothing applies as usual
Three Basic Problems
•
•
•
Compute the probability of a text:
Pm(W1,N)
Compute maximum probability tag
sequence:
arg maxT1,N Pm(T1,N | W1,N)
Compute maximum likelihood model
arg maxm Pm(W1,N)
Forward Algorithm
Define ak(i) = P(w1,k, tk=ti)
1. For i = 1 To Bt: a1(i) = m(t0ti)m(w1 | ti)
2. For k = 2 To N; For j = 1 To Bt:
i. ak(j) =
3. Then:
[ a
Pm(W1,N) =
]
(i)m(titj) m(wk | tj)
i k-1
 a (i)
i N
Complexity = O(Bt2 N)
Forward Algorithm
w1
t1
w2
a1(1)
m(t1t1)
t1
Pm(W1,3)
w3
a2(1)
m(t2t1)
m(t1t1)
t1 a3(1)
m(t2t1)
t2
a1(2) m(t3t1)
t2 a2(2)
t3
a1(3)
t3 a2(3)
t4
a1(4)
m(t3t1)
t2 a3(2)
m(t0ti)
m(t4t1)
t4 a2(4)
m(t5t1)
t5
a1(5)
m(t4t1)
t3 a3(3)
t4 a3(4)
m(t5t1)
t5 a2(5)
t5 a3(5)
Backward Algorithm
Define bk(i) = P(wk+1,N | tk=ti)
--note the difference!
1. For i = 1 To Bt: bN(i) = 1
2. For k = N-1 To 1; For j = 1 To Bt:
i. bk(j) =
3. Then:
[ m(t t )m(w
Pm(W1,N) =
j
i
i
]
i)b
|
t
k+1
k+1(i)
 m(t t )m(w | t )b (i)
i
0
Complexity = O(Bt2 N)
i
1
i
1
Pm(W1,3)
Backward Algorithm
w1
t1
w2
b1(1)
m(t0ti)
m(t1t1)
t1
w3
b2(1)
m(t2t1)
t2
b1(2)
t3
b1(3)
t4
b1(4)
m(t3t1)
m(t4t1)
b1(5)
t1 b3(1)
m(t2t1)
t2 b2(2)
t3 b2(3)
t4 b2(4)
m(t5t1)
t5
m(t1t1)
t5 b2(5)
m(t3t1)
m(t4t1)
t2 b3(2)
t3 b3(3)
t4 b3(4)
m(t5t1)
t5 b3(5)
Viterbi Tagging
• Most probable tag sequence given text:
T* = arg maxT Pm(T | W)
= arg maxT Pm(W | T) Pm(T) / Pm(W)
(Bayes’ Theorem)
= arg maxT Pm(W | T) Pm(T)
(W is constant for all T)
= arg maxT i[m(ti-1ti) m(wi | ti) ]
= arg maxT i log[m(ti-1ti) m(wi | ti) ]
w1
t1
-2.3
t0
w2
-1.7
t1
-3
-6
-0.3
-1.7 2
t
-1.7
t1 -7.3
-0.3
t2 -4.7
-3.4
-1.3
-1
w3
t2 -10.3
-1.3
t3 -2.7
t3 -6.7
t3 -9.3
-log m t1
t2
t3
-log m w1
w2
w3
t0 
2.3
1.7
1
t1
0.7
2.3
2.3
t1 
1.7
1
2.3
t2
1.7
0.7
3.3
t2 
0.3
3.3
3.3
t3
1.7
1.7
1.3
t3 
1.3
1.3
2.3
Viterbi Algorithm
1.
2.
3.
D(0, START) = 0
for each tag t != START do: D(1, t) = -
for i  1 to N do:
a. for each tag tj do:
D(i, tj)  maxk D(i-1,tk) + lm(wi|tj) + lm(tktj)
4. log P(W,T) = maxj D(N, tj)
where lm(wi|tj) = log m(wi|tj) and so forth
Related documents