Download Key Components Overview, part-of

Document related concepts

Word-sense disambiguation wikipedia , lookup

Old Norse morphology wikipedia , lookup

Udmurt grammar wikipedia , lookup

English clause syntax wikipedia , lookup

Ojibwe grammar wikipedia , lookup

Lithuanian grammar wikipedia , lookup

Modern Greek grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Macedonian grammar wikipedia , lookup

Swedish grammar wikipedia , lookup

Inflection wikipedia , lookup

Georgian grammar wikipedia , lookup

Modern Hebrew grammar wikipedia , lookup

Old Irish grammar wikipedia , lookup

Arabic grammar wikipedia , lookup

Kannada grammar wikipedia , lookup

Navajo grammar wikipedia , lookup

Lexical semantics wikipedia , lookup

Portuguese grammar wikipedia , lookup

Romanian nouns wikipedia , lookup

Italian grammar wikipedia , lookup

Compound (linguistics) wikipedia , lookup

French grammar wikipedia , lookup

Spanish grammar wikipedia , lookup

Chinese grammar wikipedia , lookup

Sotho parts of speech wikipedia , lookup

Zulu grammar wikipedia , lookup

Ancient Greek grammar wikipedia , lookup

Latin syntax wikipedia , lookup

Vietnamese grammar wikipedia , lookup

Esperanto grammar wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Malay grammar wikipedia , lookup

Scottish Gaelic grammar wikipedia , lookup

English grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Polish grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Transcript
KEY COMPONENTS OVERVIEW
POS TAGGING
Heng Ji
[email protected]
September 9, 2016
Key NLP Components
• Baseline Search
• Math basics, Information Retrieval
• Question Understanding
• Lexical Analysis, Part-of-Speech Tagging, Parsing
• Deep Document Understanding
• Syntactic Parsing
• Semantic Role Labeling
• Dependency Parsing
• Name Tagging
• Coreference Resolution
• Relation Extraction
• Temporal Information Extraction
• Event Extraction
• Answer Ranking, Confidence Estimation
• Knowledge Base Construction, Population and Utilization
What We have as Input
• Question: when and where did Billy Mays die?
• Source Collection: millions of news, web blogs, discussion
forums, tweets, Wikipedia, …
• Question Parsing:
• [Billy Mays, date-of-death]
• [Billy Mays, place-of-death]
Question Analysis
Part-of-Speech Tagging and
Syntactic Parsing
S
NP
NN
VP
IN
School of
NP
NP
CC
NP
NN
and
NN
Theatre
NP
VBZ
Dance
presents
JJ
NN
Wonderful Town
Semantic Role Labeling: Adding
Semantics into Trees
S
NP/ARG0
DT
JJ
VP
NN
The Supreme Court
VBD
gave
NNS
VBG
states
Predicate
NP/ARG1
NP/ARG2
NN
working Leeway/ARG0
Predicate
Core Arguments
•
•
•
•
•
Arg0 = agent
Arg1 = direct object / theme / patient
Arg2 = indirect object / benefactive / instrument /
attribute / end state
Arg3 = start point / benefactive / instrument /
attribute
Arg4 = end point
Dependency Parsing
They hid the letter on the shelf
Dependency Relations
Name Tagging
• Since its inception in 2001, <name ID=“1”
type=“organization”>Red</name>has caused a stir in
<name ID=“2” type=“location”>Northeast Ohio</name> by
stretching the boundaries of classical by adding multimedia elements to performances and looking beyond the
expected canon of composers.
• Under the baton of <name ID=“3” type=“organization”>
Red</name> Artistic Director <name ID=“4”
type=“person”>Jonathan Sheffer</name>, <name ID=“5”
type=“organization”>Red</name> makes its debut
appearance at <name ID=“6” type=“organization”> Kent
State University</name> on March 7 at 7:30 p.m.
Coreference
• But the little prince could not restrain admiration:
• "Oh! How beautiful you are!"
• "Am I not?" the flower responded, sweetly. "And I was born
at the same moment as the sun . . ."
• The little prince could guess easily enough that she was not
any too modest--but how moving--and exciting--she was!
• "I think it is time for breakfast," she added an instant later.
"If you would have the kindness to think of my needs--"
• And the little prince, completely abashed, went to look for a
sprinkling-can of fresh water. So, he tended the flower.
Relation Extraction
relation: a semantic relationship between two entities
ACE relation type
Agent-Artifact
Discourse
Employment/ Membership
Place-Affiliation
Person-Social
Physical
Other-Affiliation
example
Rubin Military Design, the makers of the Kursk
each of whom
Mr. Smith, a senior programmer at Microsoft
Salzburg Red Cross officials
relatives of the dead
a town some 50 miles south of Salzburg
Republican senators
Temporal Information Extraction
• In 1975, after being fired from Columbia amid allegations that he
used company funds to pay for his son's bar mitzvah, Davis founded
Arista
• Is ‘1975’ related to the employee_of relation between Davis
and Arista?
• If so, does it indicate START, END, HOLDS… ?
• Each classification instance represents a temporal expression in the
context of the entity and slot value.
• We consider the following classes
•
START
Rob joined Microsoft in 1999.
•
END
Rob left Microsoft in 1999.
•
HOLDS
In 1999 Rob was still working for Microsoft.
•
RANGE
Rob has worked for Microsoft for the last ten years.
•
NONE
Last Sunday Rob’s friend joined Microsoft.
13
Event Extraction
• An event is specific occurrence that implies a change of states
• event trigger: the main word which most clearly expresses an event occurrence
• event arguments: the mentions that are involved in an event (participants)
• event mention: a phrase or sentence within which an event is described,
including trigger and arguments
• ACE defined 8 types of events, with 33 subtypes
Argument, role=victim
ACE event type/subtype
Life/Die
Transaction/Transfer
Movement/Transport
Business/Start-Org
Conflict/Attack
Contact/Meet
Personnel/Start-Position
Justice/Arrest
trigger
Event Mention Example
Kurt Schork died in Sierra Leone yesterday
GM sold the company in Nov 1998 to LLC
Homeless people have been moved to schools
Schweitzer founded a hospital in 1913
the attack on Gaza killed 13
Arafat’s cabinet met for 4 hours
She later recruited the nursing student
Faison was wrongly arrested on suspicion of murder
Putting Together: Candidate Answer Extraction
 Mays, 50, had died in his sleep at his Tampa home the morning of June 28.
{NUM }
【Per:age】
50
{PER.Individual, NAM, Billy Mays}
【Query】
Mays
amod
nsubj
aux
Tampa
nn
had
sleep
prep_at
home
poss
{FAC.Building-Grounds.NOM}
poss
June,28
{Death-Trigger}
prep_in
located_in
prep_of
died
his
{PER.Individual.PRO, Mays}
Ranking Answers/Confidence Estimation
• Problems:
• different information sources may generate claims with varied
trustability
• various systems may generate erroneous, conflicting,
redundant, complementary, ambiguously worded, or interdependent claims from the same set of documents
Ranking Answers
Hard Constraints
Used to capture the deep syntactic and semantic knowledge
that is likely to pertain to the propositional content of claims.
Node Constraints
 e.g., Entity type, subtype and mention type
Path Constraints
 Trigger phrases, relations and events, path length
 Position of a particular node/edge type in the path
Interdependent Claims
 Conflicting slot fillers
 Inter-dependent slot types
Ranking Answers
• Soft Features:
• Cleanness
• Informativeness
• Local Knowledge Graph
• Voting
NLP+Knowledge Base
• Knowledge Base (KB)
•
Attributes (a.k.a., “slots”) derived from Wikipedia
infoboxes are used to create the reference KB
• Source Collection
•
A large corpus of unstructured texts
Knowledge Base Linking (Wikification)
NIL
Query = “James Parsons”
Knowledge Base Population (Slot Filling)
<query id="SF114">
<name>Jim Parsons</name>
<docid>eng-WL-11-174592-12943233</docid>
<enttype>PER</enttype>
<nodeid>E0300113</nodeid>
<ignore>per:date_of_birth
per:age per:country_of_birth
per:city_of_birth</ignore>
</query>
School Attended: University of Houston
KB Slots
Person
per:alternate_names
per:date_of_birth
per:age
per:country_of_birth
per:stateorprovince_of_birth
per:city_of_birth
per:origin
per:date_of_death
per:country_of_death
per:stateorprovince_of_death
per:city_of_death
per:cause_of_death
per:countries_of_residence
per:stateorprovinces_of_residence
per:cities_of_residence
per:schools_attended
per:title
per:member_of
per:employee_of
per:religion
per:spouse
per:children
per:parents
per:siblings
per:other_family
per:charges
Organization
org:alternate_names
org:political/religious_affiliation
org:top_members/employees
org:number_of_employees/members
org:members
org:member_of
org:subsidiaries
org:parents
org:founded_by
org:founded
org:dissolved
org:country_of_headquarters
org:stateorprovince_of_headquarters
org:city_of_headquarters
org:shareholders
org:website
Text to Speech Synthesis
• ATT:
• http://www.research.att.com/~ttsweb/tts/demo.php
• IBM
• http://www-306.ibm.com/software/pervasive/tech/demos/tts.shtml
• Cepstral
• http://www.cepstral.com/cgi-bin/demos/general
• Rhetorical (= Scansoft)
• http://www.rhetorical.com/cgi-bin/demo.cgi
• Festival
• http://www-2.cs.cmu.edu/~awb/festival_demos/index.html
Information Retrieval
①
②
① Relevant Document Set
② Sentence Set[Tree representation]
③ Extracted Paths
③
Question Answering
Outline
• Course Overview
• Introduction to NLP
• Introduction to Watson
• Watson in NLP View
Machine Learning for NLP
• Why NLP is Hard
Models and Algorithms
• By models we mean the formalisms that are used to
capture the various kinds of linguistic knowledge we need.
• Algorithms are then used to manipulate the knowledge
representations needed to tackle the task at hand.
Some Early NLP History
• 1950s:
• Foundational work: automata, information theory, etc.
• First speech systems
• Machine translation (MT) hugely funded by military (imagine that)
• Toy models: MT using basically word-substitution
• Optimism!
• 1960s and 1970s: NLP Winter
• Bar-Hillel (FAHQT) and ALPAC reports kills MT
• Work shifts to deeper models, syntax
• … but toy domains / grammars (SHRDLU, LUNAR)
• 1980s/1990s: The Empirical Revolution
•
•
•
•
Expectations get reset
Corpus-based methods become central
Deep analysis often traded for robust and simple approximations
Evaluate everything
Two Generations of NLP
• Hand-crafted Systems – Knowledge Engineering
[1950s– ]
• Rules written by hand; adjusted by error analysis
• Require experts who understand both the systems
and domain
• Iterative guess-test-tweak-repeat cycle
• Automatic, Trainable (Machine Learning) System
[1985s– ]
• The tasks are modeled in a statistical way
• More robust techniques based on rich annotations
• Perform better than rules (Parsing 90% vs. 75% accuracy)
Corpora
• A corpus is a collection of text
• Often annotated in some way
• Sometimes just lots of text
• Balanced vs. uniform corpora
• Examples
• Newswire collections: 500M+ words
• Brown corpus: 1M words of tagged
“balanced” text
• Penn Treebank: 1M words of parsed
WSJ
• Canadian Hansards: 10M+ words of
aligned French / English sentences
• The Web: billions of words of who
knows what
Resources: Existing Corpora
• Brown corpus, LOB Corpus, British National Corpus
• Bank of English
• Wall Street Journal, Penn Tree Bank, Nombank,
•
•
•
•
•
•
Propbank, BNC, ANC, ICE, WBE, Reuters Corpus
Canadian Hansard: parallel corpus English-French
York-Helsinki Parsed corpus of Old Poetry
Tiger corpus – German
CORII/CODIS - contemporary written Italian
MULTEX 1984 and The Republic in many languages
MapTask
Paradigms
• In particular..
• State-space search
• To manage the problem of making choices during processing
when we lack the information needed to make the right choice
• Dynamic programming
• To avoid having to redo work during the course of a state-space
search
• CKY, Earley, Minimum Edit Distance, Viterbi, Baum-Welch
• Classifiers
• Machine learning based classifiers that are trained to make
decisions based on features extracted from the local context
Information Units of Interest - Examples
• Explicit units:
• Documents
• Lexical units: words, terms (surface/base form)
• Implicit (hidden) units:
• Word senses, name types
• Document categories
• Lexical syntactic units: part of speech tags
• Syntactic relationships between words – parsing
• Semantic relationships
Data and Representations
• Frequencies of units
• Co-occurrence frequencies
• Between all relevant types of units (term-doc, term-term, termcategory, sense-term, etc.)
• Different representations and modeling
• Sequences
• Feature sets/vectors (sparse)
Learning Methods for NLP
• Supervised: identify hidden units (concepts) of explicit
units
• Syntactic analysis, word sense disambiguation, name
classification, relations, categorization, …
• Trained from labeled data
• Unsupervised: identify relationships and properties
of explicit units (terms, docs)
• Association, topicality, similarity, clustering
• Without labeled data
• Semi-supervised: Combinations
Avoiding/Reducing Manual Labeling
• Basic supervised setting – examples are annotated
manually by labels (sense, text category, part of speech)
• Settings in which labeled data can be obtained without
manual annotation:
• Anaphora, target word selection
The system displays the file on the monitor and prints it.
• Unsupervised/Semi-supervised Learning approaches
Sometimes referred as unsupervised learning, though
it actually addresses a supervised task of identifying an
externally imposed class (“unsupervised” training)
Outline
• POS Tagging and HMM
• Formal Grammars
• Context-free grammar
• Grammars for English
• Treebanks
• Parsing and CKY Algorithm
38/39
What is Part-of-Speech (POS)
• Generally speaking, Word Classes (=POS) :
• Verb, Noun, Adjective, Adverb, Article, …
• We can also include inflection:
• Verbs: Tense, number, …
• Nouns: Number, proper/common, …
• Adjectives: comparative, superlative, …
• …
39/39
Parts of Speech
• 8 (ish) traditional parts of speech
• Noun, verb, adjective, preposition, adverb, article, interjection,
pronoun, conjunction, etc
• Called: parts-of-speech, lexical categories, word classes,
morphological classes, lexical tags...
• Lots of debate within linguistics about the number, nature, and
universality of these
• We’ll completely ignore this debate.
40/39
7 Traditional POS Categories
•N
•
•
•
•
•
•
noun
verb
chair, bandwidth, pacing
V
study, debate, munch
ADJ adj
purple, tall, ridiculous
ADV adverb
unfortunately, slowly,
P
preposition of, by, to
PRO pronoun
I, me, mine
DET determiner the, a, that, those
41/39
POS Tagging
• The process of assigning a part-of-speech or lexical
class marker to each word in a collection.
WORD
tag
the
koala
put
the
keys
on
the
table
DET
N
V
DET
N
P
DET
N
42/39
Penn TreeBank POS Tag Set
• Penn Treebank: hand-annotated corpus of Wall Street
Journal, 1M words
• 46 tags
• Some particularities:
• to /TO not disambiguated
• Auxiliaries and verbs not distinguished
43/39
Penn Treebank Tagset
44/39
Why POS tagging is useful?
• Speech synthesis:
• How to pronounce “lead”?
• INsult
inSULT
• OBject
obJECT
• OVERflow overFLOW
• DIScount
disCOUNT
• CONtent
conTENT
• Stemming for information retrieval
• Can search for “aardvarks” get “aardvark”
• Parsing and speech recognition and etc
• Possessive pronouns (my, your, her) followed by nouns
• Personal pronouns (I, you, he) likely to be followed by verbs
• Need to know if a word is an N or V before you can parse
• Information extraction
• Finding names, relations, etc.
• Machine Translation
45/39
Open and Closed Classes
• Closed class: a small fixed membership
• Prepositions: of, in, by, …
• Auxiliaries: may, can, will had, been, …
• Pronouns: I, you, she, mine, his, them, …
• Usually function words (short common words which play a
role in grammar)
• Open class: new ones can be created all the time
• English has 4: Nouns, Verbs, Adjectives, Adverbs
• Many languages have these 4, but not all!
46/39
Open Class Words
• Nouns
• Proper nouns (Boulder, Granby, Eli Manning)
• English capitalizes these.
• Common nouns (the rest).
• Count nouns and mass nouns
• Count: have plurals, get counted: goat/goats, one goat, two goats
• Mass: don’t get counted (snow, salt, communism) (*two snows)
• Adverbs: tend to modify things
• Unfortunately, John walked home extremely slowly yesterday
• Directional/locative adverbs (here,home, downhill)
• Degree adverbs (extremely, very, somewhat)
• Manner adverbs (slowly, slinkily, delicately)
• Verbs
• In English, have morphological affixes (eat/eats/eaten)
47/39
Closed Class Words
Examples:
• prepositions: on, under, over, …
• particles: up, down, on, off, …
• determiners: a, an, the, …
• pronouns: she, who, I, ..
• conjunctions: and, but, or, …
• auxiliary verbs: can, may should, …
• numerals: one, two, three, third, …
48/39
Prepositions from CELEX
49/39
English Particles
50/39
Conjunctions
51/39
POS Tagging
Choosing a Tagset
• There are so many parts of speech, potential distinctions we can
•
•
•
•
draw
To do POS tagging, we need to choose a standard set of tags to
work with
Could pick very coarse tagsets
• N, V, Adj, Adv.
More commonly used set is finer grained, the “Penn TreeBank
tagset”, 45 tags
• PRP$, WRB, WP$, VBG
Even more fine-grained tagsets exist
52/39
Using the Penn Tagset
• The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT
number/NN of/IN other/JJ topics/NNS ./.
• Prepositions and subordinating conjunctions marked IN
(“although/IN I/PRP..”)
• Except the preposition/complementizer “to” is just marked
“TO”.
53/39
POS Tagging
• Words often have more than one POS: back
• The back door = JJ
• On my back = NN
• Win the voters back = RB
• Promised to back the bill = VB
• The POS tagging problem is to determine the POS tag for
a particular instance of a word.
These examples from Dekang Lin
54/39
How Hard is POS Tagging? Measuring
Ambiguity
55/39
Current Performance
• How many tags are correct?
• About 97% currently
• But baseline is already 90%
• Baseline algorithm:
• Tag every word with its most frequent tag
• Tag unknown words as nouns
• How well do people do?
56/39
Quick Test: Agreement?
• the students went to class
• plays well with others
• fruit flies like a banana
DT: the, this, that
NN: noun
VB: verb
P: prepostion
ADV: adverb
57/39
Quick Test
• the students went to class
DT NN
VB P NN
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
58/39
How to do it? History
Trigram Tagger
(Kempe)
96%+
DeRose/Church
Efficient HMM
Sparse Data
95%+
Greene and Rubin
Rule Based - 70%
1960
Brown Corpus
Created (EN-US)
1 Million Words
HMM Tagging
(CLAWS)
93%-95%
1970
Brown Corpus
Tagged
LOB Corpus
Created (EN-UK)
1 Million Words
Tree-Based Statistics
(Helmut Shmid)
Rule Based – 96%+
Transformation
Based Tagging
(Eric Brill)
Rule Based – 95%+
1980
Combined Methods
98%+
Neural Network
96%+
1990
2000
LOB Corpus
Tagged
POS Tagging
separated from
other NLP
Penn Treebank
Corpus
(WSJ, 4.5M)
British National
Corpus
(tagged by CLAWS)
59/39
Two Methods for POS Tagging
Rule-based tagging
1.
•
2.
(ENGTWOL)
Stochastic
1.
Probabilistic sequence models
•
•
HMM (Hidden Markov Model) tagging
MEMMs (Maximum Entropy Markov Models)
60/39
Rule-Based Tagging
• Start with a dictionary
• Assign all possible tags to words from the dictionary
• Write rules by hand to selectively remove tags
• Leaving the correct tag for each word.
61/39
Rule-based taggers
• Early POS taggers all hand-coded
• Most of these (Harris, 1962; Greene and Rubin, 1971)
and the best of the recent ones, ENGTWOL (Voutilainen,
1995) based on a two-stage architecture
• Stage 1: look up word in lexicon to give list of potential POSs
• Stage 2: Apply rules which certify or disallow tag sequences
• Rules originally handwritten; more recently Machine
Learning methods can be used
62/39
Start With a Dictionary
• she:
• promised:
• to
• back:
• the:
• bill:
PRP
VBN,VBD
TO
VB, JJ, RB, NN
DT
NN, VB
• Etc… for the ~100,000 words of English with more than 1
tag
63/39
Assign Every Possible Tag
NN
RB
VBN
JJ
PRP VBD
TO VB
She promised to back the
VB
DT
bill
NN
64/39
Write Rules to Eliminate Tags
Eliminate VBN if VBD is an option when VBN|VBD follows
“<start> PRP”
NN
RB
JJ
VB
VBN
PRP VBD
TO VB DT NN
She promised
to
back the
bill
65/39
POS tagging
The involvement of ion channels in B and T lymphocyte activation is
DT
NN
IN NN NNS IN NN CC NN
NN
NN
VBZ
supported by many reports of changes in ion fluxes and membrane
VBN
IN JJ
NNS IN NNS IN NN NNS CC NN
…………………………………………………………………………………….
…………………………………………………………………………………….
training
Unseen text
We demonstrate
that …
Machine Learning
Algorithm
We demonstrate
PRP
VBP
that …
IN
Goal of POS Tagging
 We want the best set of tags for a sequence of words (a
sentence)
 W — a sequence of words
 T — a sequence of tags
^
Our
Goal
T  arg max P(T | W )
T
 Example:
P((NN NN P DET ADJ NN) | (heat oil in a large pot))
66/39
67/39
But, the Sparse Data Problem …
• Rich Models often require vast amounts of data
• Count up instances of the string "heat oil in a large pot" in
the training corpus, and pick the most common tag
assignment to the string..
• Too many possible combinations
68/39
POS Tagging as Sequence Classification
• We are given a sentence (an “observation” or “sequence of
observations”)
• Secretariat is expected to race tomorrow
• What is the best sequence of tags that corresponds to this
sequence of observations?
• Probabilistic view:
• Consider all possible sequences of tags
• Out of this universe of sequences, choose the tag sequence which is
most probable given the observation sequence of n words w1…wn.
69/39
Getting to HMMs
• We want, out of all sequences of n tags t1…tn the single tag sequence
such that P(t1…tn|w1…wn) is highest.
• Hat ^ means “our estimate of the best one”
• Argmaxx f(x) means “the x such that f(x) is maximized”
70/39
Getting to HMMs
• This equation is guaranteed to give us the best tag
sequence
• But how to make it operational? How to compute this
value?
• Intuition of Bayesian classification:
• Use Bayes rule to transform this equation into a set of other
probabilities that are easier to compute
Reminder: Apply
Bayes’ Theorem (1763)
likelihood
posterior
prior
P(W | T ) P(T )
P(T | W ) 
P(W )
Our Goal: To
maximize it!
marginal likelihood
Reverend Thomas Bayes — Presbyterian minister (1702-1761)
71/39
How to Count
^
T  arg max P(T | W )
T
P(W | T ) P(T )
 arg max
P(W )
T
 arg max P(W | T ) P(T )
T
 P(W|T) and P(T) can be counted from a large
hand-tagged corpus; and smooth them to get rid of the zeroes
72/39
Count P(W|T) and P(T)
Assume each word in the sequence depends only on
its corresponding tag:
n
P(W | T )   P( wi | ti )
i 1
73/39
74/39
Count P(T)
P(t1 ,..., t n ) 
history
P(t1 )  P(t2 | t1 )  P(t3 | t1t 2 )  ...  P(tn | t1 ,..., t n 1 )
 Make a Markov assumption and use N-grams over
tags ...
 P(T) is a product of the probability of N-grams that make
it up
n
P(t1 ,..., t n )  P(t1 )   P(ti | ti  1)
i 2
75/39
Part-of-speech tagging with Hidden Markov
Models
Pw1...wn | t1...t n Pt1...t n 
Pt1...t n | w1...wn  
Pw1...wn 
tags
words
 Pw1...wn | t1...t n Pt1...t n 
n
  Pwi | ti Pti | ti 1 
i 1
output probability
transition probability
76/39
Analyzing
Fish sleep.
77/39
A Simple POS HMM
0.2
start
0.8
0.1
0.8
noun
verb
0.2
0.1
0.1
0.7
end
78/39
Word Emission Probabilities
P ( word | state )
• A two-word language: “fish” and “sleep”
• Suppose in our training corpus,
• “fish” appears 8 times as a noun and 5 times as a verb
• “sleep” appears twice as a noun and 5 times as a verb
• Emission probabilities:
• Noun
• P(fish | noun) : 0.8
• P(sleep | noun) : 0.2
• Verb
• P(fish | verb) : 0.5
• P(sleep | verb) : 0.5
79/39
Viterbi Probabilities
0
start
verb
noun
end
1
2
3
80/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
0
start
1
verb
0
noun
0
end
0
1
2
3
81/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 1: fish
0
1
start
1
0
verb
0
.2 * .5
noun
0
.8 * .8
end
0
0
2
3
82/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 1: fish
0
1
start
1
0
verb
0
.1
noun
0
.64
end
0
0
2
3
83/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 2: sleep
0
1
2
start
1
0
0
verb
0
.1
.1*.1*.5
noun
0
.64
.1*.2*.2
end
0
0
-
(if ‘fish’ is verb)
3
84/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 2: sleep
0
1
2
start
1
0
0
verb
0
.1
.005
noun
0
.64
.004
end
0
0
-
(if ‘fish’ is verb)
3
85/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 2: sleep
0
1
2
start
1
0
0
verb
0
.1
noun
0
.64
.005
.64*.8*.5
.004
.64*.1*.2
end
0
0
(if ‘fish’ is a noun)
-
3
86/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 2: sleep
0
1
2
start
1
0
0
verb
0
.1
noun
0
.64
.005
.256
.004
.0128
end
0
0
(if ‘fish’ is a noun)
-
3
87/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
Token 2: sleep
take maximum,
set back pointers
0.1
0
1
2
start
1
0
0
verb
0
.1
noun
0
.64
.005
.256
.004
.0128
end
0
0
-
3
88/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
Token 2: sleep
take maximum,
set back pointers
0.1
0
1
2
start
1
0
0
verb
0
.1
.256
noun
0
.64
.0128
end
0
0
-
3
89/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
0.1
Token 3: end
0
1
2
start
1
0
0
verb
0
.1
.256 -
noun
0
.64
.0128 -
end
0
0
-
3
0
.256*.7
.0128*.1
90/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
Token 3: end
take maximum,
set back pointers
0.1
0
1
2
start
1
0
0
verb
0
.1
.256 -
noun
0
.64
.0128 -
end
0
0
-
3
0
.256*.7
.0128*.1
91/39
0.2
start
0.8
0.1
0.8
noun
0.7
verb
end
0.2
0.1
Decode:
fish = noun
sleep = verb
0.1
0
1
2
start
1
0
0
verb
0
.1
.256 -
noun
0
.64
.0128 -
end
0
0
-
3
0
.256*.7
Markov Chain for a Simple Name Tagger
George:0.3
0.6
Transition
Probability
W.:0.3
Bush:0.3
Emission
Probability
Iraq:0.1
PER
$:1.0
0.2
0.3
0.1
START
0.2
LOC
0.2
0.5
0.3
END
0.2
0.3
0.1
0.3
George:0.2
0.2
Iraq:0.8
X
W.:0.3
0.5
discussed:0.7
93/39
Exercise
• Tag names in the following sentence:
• George. W. Bush discussed Iraq.
94/39
POS taggers
• Brill’s tagger
• http://www.cs.jhu.edu/~brill/
• TnT tagger
• http://www.coli.uni-saarland.de/~thorsten/tnt/
• Stanford tagger
• http://nlp.stanford.edu/software/tagger.shtml
• SVMTool
• http://www.lsi.upc.es/~nlp/SVMTool/
• GENIA tagger
• http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/
• More complete list at:
http://www-nlp.stanford.edu/links/statnlp.html#Taggers