* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 3/39 - M. Ali Fauzi
Old Norse morphology wikipedia , lookup
Ojibwe grammar wikipedia , lookup
Lithuanian grammar wikipedia , lookup
Udmurt grammar wikipedia , lookup
English clause syntax wikipedia , lookup
Modern Greek grammar wikipedia , lookup
Macedonian grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Lexical semantics wikipedia , lookup
Swedish grammar wikipedia , lookup
Georgian grammar wikipedia , lookup
Modern Hebrew grammar wikipedia , lookup
Old Irish grammar wikipedia , lookup
Kannada grammar wikipedia , lookup
Arabic grammar wikipedia , lookup
Navajo grammar wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Icelandic grammar wikipedia , lookup
Portuguese grammar wikipedia , lookup
Romanian nouns wikipedia , lookup
French grammar wikipedia , lookup
Sotho parts of speech wikipedia , lookup
Italian grammar wikipedia , lookup
Spanish grammar wikipedia , lookup
Chinese grammar wikipedia , lookup
Malay grammar wikipedia , lookup
Zulu grammar wikipedia , lookup
Ancient Greek grammar wikipedia , lookup
Vietnamese grammar wikipedia , lookup
Latin syntax wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Esperanto grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
English grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
POS TAGGING AND HMM Tim Teks Mining Adapted from Heng Ji Outline • POS Tagging and HMM 3/39 What is Part-of-Speech (POS) • Generally speaking, Word Classes (=POS) : • Verb, Noun, Adjective, Adverb, Article, … • We can also include inflection: • Verbs: Tense, number, … • Nouns: Number, proper/common, … • Adjectives: comparative, superlative, … • … 4/39 Parts of Speech • 8 (ish) traditional parts of speech • Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, etc • Called: parts-of-speech, lexical categories, word classes, morphological classes, lexical tags... • Lots of debate within linguistics about the number, nature, and universality of these • We’ll completely ignore this debate. 5/39 7 Traditional POS Categories •N • • • • • • noun verb chair, bandwidth, pacing V study, debate, munch ADJ adj purple, tall, ridiculous ADV adverb unfortunately, slowly, P preposition of, by, to PRO pronoun I, me, mine DET determiner the, a, that, those 6/39 POS Tagging • The process of assigning a part-of-speech or lexical class marker to each word in a collection. WORD tag the koala put the keys on the table DET N V DET N P DET N 7/39 Penn TreeBank POS Tag Set • Penn Treebank: hand-annotated corpus of Wall Street Journal, 1M words • 46 tags • Some particularities: • to /TO not disambiguated • Auxiliaries and verbs not distinguished 8/39 Penn Treebank Tagset 9/39 Why POS tagging is useful? • Speech synthesis: • How to pronounce “lead”? • INsult inSULT • OBject obJECT • OVERflow overFLOW • DIScount disCOUNT • CONtent conTENT • Stemming for information retrieval • Can search for “aardvarks” get “aardvark” • Parsing and speech recognition and etc • Possessive pronouns (my, your, her) followed by nouns • Personal pronouns (I, you, he) likely to be followed by verbs • Need to know if a word is an N or V before you can parse • Information extraction • Finding names, relations, etc. • Machine Translation 10/39 Open and Closed Classes • Closed class: a small fixed membership • Prepositions: of, in, by, … • Auxiliaries: may, can, will had, been, … • Pronouns: I, you, she, mine, his, them, … • Usually function words (short common words which play a role in grammar) • Open class: new ones can be created all the time • English has 4: Nouns, Verbs, Adjectives, Adverbs • Many languages have these 4, but not all! 11/39 Open Class Words • Nouns • Proper nouns (Boulder, Granby, Eli Manning) • English capitalizes these. • Common nouns (the rest). • Count nouns and mass nouns • Count: have plurals, get counted: goat/goats, one goat, two goats • Mass: don’t get counted (snow, salt, communism) (*two snows) • Adverbs: tend to modify things • Unfortunately, John walked home extremely slowly yesterday • Directional/locative adverbs (here,home, downhill) • Degree adverbs (extremely, very, somewhat) • Manner adverbs (slowly, slinkily, delicately) • Verbs • In English, have morphological affixes (eat/eats/eaten) 12/39 Closed Class Words Examples: • prepositions: on, under, over, … • particles: up, down, on, off, … • determiners: a, an, the, … • pronouns: she, who, I, .. • conjunctions: and, but, or, … • auxiliary verbs: can, may should, … • numerals: one, two, three, third, … 13/39 Prepositions from CELEX 14/39 English Particles 15/39 Conjunctions 16/39 POS Tagging Choosing a Tagset • There are so many parts of speech, potential distinctions we can • • • • draw To do POS tagging, we need to choose a standard set of tags to work with Could pick very coarse tagsets • N, V, Adj, Adv. More commonly used set is finer grained, the “Penn TreeBank tagset”, 45 tags • PRP$, WRB, WP$, VBG Even more fine-grained tagsets exist 17/39 Using the Penn Tagset • The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. • Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”) • Except the preposition/complementizer “to” is just marked “TO”. 18/39 POS Tagging • Words often have more than one POS: back • The back door = JJ • On my back = NN • Win the voters back = RB • Promised to back the bill = VB • The POS tagging problem is to determine the POS tag for a particular instance of a word. These examples from Dekang Lin 19/39 How Hard is POS Tagging? Measuring Ambiguity 20/39 Current Performance • How many tags are correct? • About 97% currently • But baseline is already 90% • Baseline algorithm: • Tag every word with its most frequent tag • Tag unknown words as nouns • How well do people do? 21/39 Quick Test: Agreement? • the students went to class • plays well with others • fruit flies like a banana DT: the, this, that NN: noun VB: verb P: prepostion ADV: adverb 22/39 Quick Test • the students went to class DT NN VB P NN • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN 23/39 How to do it? History Trigram Tagger (Kempe) 96%+ DeRose/Church Efficient HMM Sparse Data 95%+ Greene and Rubin Rule Based - 70% 1960 Brown Corpus Created (EN-US) 1 Million Words HMM Tagging (CLAWS) 93%-95% 1970 Brown Corpus Tagged LOB Corpus Created (EN-UK) 1 Million Words Tree-Based Statistics (Helmut Shmid) Rule Based – 96%+ Transformation Based Tagging (Eric Brill) Rule Based – 95%+ 1980 Combined Methods 98%+ Neural Network 96%+ 1990 2000 LOB Corpus Tagged POS Tagging separated from other NLP Penn Treebank Corpus (WSJ, 4.5M) British National Corpus (tagged by CLAWS) 24/39 Two Methods for POS Tagging Rule-based tagging 1. • 2. (ENGTWOL) Stochastic 1. Probabilistic sequence models • • HMM (Hidden Markov Model) tagging MEMMs (Maximum Entropy Markov Models) 25/39 Rule-Based Tagging • Start with a dictionary • Assign all possible tags to words from the dictionary • Write rules by hand to selectively remove tags • Leaving the correct tag for each word. 26/39 Rule-based taggers • Early POS taggers all hand-coded • Most of these (Harris, 1962; Greene and Rubin, 1971) and the best of the recent ones, ENGTWOL (Voutilainen, 1995) based on a two-stage architecture • Stage 1: look up word in lexicon to give list of potential POSs • Stage 2: Apply rules which certify or disallow tag sequences • Rules originally handwritten; more recently Machine Learning methods can be used 27/39 Start With a Dictionary • she: • promised: • to • back: • the: • bill: PRP VBN,VBD TO VB, JJ, RB, NN DT NN, VB • Etc… for the ~100,000 words of English with more than 1 tag 28/39 Assign Every Possible Tag NN RB VBN JJ PRP VBD TO VB She promised to back the VB DT bill NN 29/39 Write Rules to Eliminate Tags Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP” NN RB JJ VB VBN PRP VBD TO VB DT NN She promised to back the bill 30/39 POS tagging The involvement of ion channels in B and T lymphocyte activation is DT NN IN NN NNS IN NN CC NN NN NN VBZ supported by many reports of changes in ion fluxes and membrane VBN IN JJ NNS IN NNS IN NN NNS CC NN ……………………………………………………………………………………. ……………………………………………………………………………………. training Unseen text We demonstrate that … Machine Learning Algorithm We demonstrate PRP VBP that … IN Goal of POS Tagging We want the best set of tags for a sequence of words (a sentence) W — a sequence of words T — a sequence of tags ^ Our Goal T arg max P(T | W ) T Example: P((NN NN P DET ADJ NN) | (heat oil in a large pot)) 31/39 32/39 But, the Sparse Data Problem … • Rich Models often require vast amounts of data • Count up instances of the string "heat oil in a large pot" in the training corpus, and pick the most common tag assignment to the string.. • Too many possible combinations 33/39 POS Tagging as Sequence Classification • We are given a sentence (an “observation” or “sequence of observations”) • Secretariat is expected to race tomorrow • What is the best sequence of tags that corresponds to this sequence of observations? • Probabilistic view: • Consider all possible sequences of tags • Out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of n words w1…wn. 34/39 Getting to HMMs • We want, out of all sequences of n tags t1…tn the single tag sequence such that P(t1…tn|w1…wn) is highest. • Hat ^ means “our estimate of the best one” • Argmaxx f(x) means “the x such that f(x) is maximized” 35/39 Getting to HMMs • This equation is guaranteed to give us the best tag sequence • But how to make it operational? How to compute this value? • Intuition of Bayesian classification: • Use Bayes rule to transform this equation into a set of other probabilities that are easier to compute Reminder: Apply Bayes’ Theorem (1763) likelihood posterior prior P(W | T ) P(T ) P(T | W ) P(W ) Our Goal: To maximize it! marginal likelihood Reverend Thomas Bayes — Presbyterian minister (1702-1761) 36/39 How to Count ^ T arg max P(T | W ) T P(W | T ) P(T ) arg max P(W ) T arg max P(W | T ) P(T ) T P(W|T) and P(T) can be counted from a large hand-tagged corpus; and smooth them to get rid of the zeroes 37/39 Count P(W|T) and P(T) Assume each word in the sequence depends only on its corresponding tag: n P(W | T ) P( wi | ti ) i 1 38/39 39/39 Count P(T) P(t1 ,..., t n ) history P(t1 ) P(t2 | t1 ) P(t3 | t1t 2 ) ... P(tn | t1 ,..., t n 1 ) Make a Markov assumption and use N-grams over tags ... P(T) is a product of the probability of N-grams that make it up n P(t1 ,..., t n ) P(t1 ) P(ti | ti 1) i 2 40/39 Part-of-speech tagging with Hidden Markov Models Pw1...wn | t1...t n Pt1...t n Pt1...t n | w1...wn Pw1...wn tags words Pw1...wn | t1...t n Pt1...t n n Pwi | ti Pti | ti 1 i 1 output probability transition probability 41/39 Analyzing Fish sleep. 42/39 A Simple POS HMM 0.2 start 0.8 0.1 0.8 noun verb 0.2 0.1 0.1 0.7 end 43/39 Word Emission Probabilities P ( word | state ) • A two-word language: “fish” and “sleep” • Suppose in our training corpus, • “fish” appears 8 times as a noun and 5 times as a verb • “sleep” appears twice as a noun and 5 times as a verb • Emission probabilities: • Noun • P(fish | noun) : 0.8 • P(sleep | noun) : 0.2 • Verb • P(fish | verb) : 0.5 • P(sleep | verb) : 0.5 44/39 Viterbi Probabilities 0 start verb noun end 1 2 3 45/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 0.1 0 start 1 verb 0 noun 0 end 0 1 2 3 46/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 0.1 Token 1: fish 0 1 start 1 0 verb 0 .2 * .5 noun 0 .8 * .8 end 0 0 2 3 47/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 0.1 Token 1: fish 0 1 start 1 0 verb 0 .1 noun 0 .64 end 0 0 2 3 48/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 0.1 Token 2: sleep 0 1 2 start 1 0 0 verb 0 .1 .1*.1*.5 noun 0 .64 .1*.2*.2 end 0 0 - (if ‘fish’ is verb) 3 49/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 0.1 Token 2: sleep 0 1 2 start 1 0 0 verb 0 .1 .005 noun 0 .64 .004 end 0 0 - (if ‘fish’ is verb) 3 50/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 0.1 Token 2: sleep 0 1 2 start 1 0 0 verb 0 .1 noun 0 .64 .005 .64*.8*.5 .004 .64*.1*.2 end 0 0 (if ‘fish’ is a noun) - 3 51/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 0.1 Token 2: sleep 0 1 2 start 1 0 0 verb 0 .1 noun 0 .64 .005 .256 .004 .0128 end 0 0 (if ‘fish’ is a noun) - 3 52/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 Token 2: sleep take maximum, set back pointers 0.1 0 1 2 start 1 0 0 verb 0 .1 noun 0 .64 .005 .256 .004 .0128 end 0 0 - 3 53/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 Token 2: sleep take maximum, set back pointers 0.1 0 1 2 start 1 0 0 verb 0 .1 .256 noun 0 .64 .0128 end 0 0 - 3 54/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 0.1 Token 3: end 0 1 2 start 1 0 0 verb 0 .1 .256 - noun 0 .64 .0128 - end 0 0 - 3 0 .256*.7 .0128*.1 55/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 Token 3: end take maximum, set back pointers 0.1 0 1 2 start 1 0 0 verb 0 .1 .256 - noun 0 .64 .0128 - end 0 0 - 3 0 .256*.7 .0128*.1 56/39 0.2 start 0.8 0.1 0.8 noun 0.7 verb end 0.2 0.1 Decode: fish = noun sleep = verb 0.1 0 1 2 start 1 0 0 verb 0 .1 .256 - noun 0 .64 .0128 - end 0 0 - 3 0 .256*.7 Markov Chain for a Simple Name Tagger George:0.3 0.6 Transition Probability W.:0.3 Bush:0.3 Emission Probability Iraq:0.1 PER $:1.0 0.2 0.3 0.1 START 0.2 LOC 0.2 0.5 0.3 END 0.2 0.3 0.1 0.3 George:0.2 0.2 Iraq:0.8 X W.:0.3 0.5 discussed:0.7 58/39 Exercise • Tag names in the following sentence: • George. W. Bush discussed Iraq. 59/39 POS taggers • Brill’s tagger • http://www.cs.jhu.edu/~brill/ • TnT tagger • http://www.coli.uni-saarland.de/~thorsten/tnt/ • Stanford tagger • http://nlp.stanford.edu/software/tagger.shtml • SVMTool • http://www.lsi.upc.es/~nlp/SVMTool/ • GENIA tagger • http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ • More complete list at: http://www-nlp.stanford.edu/links/statnlp.html#Taggers