Download ch13

Chapter 13 Part-of-Speech Tagging 13-1 POS Tagging + HMMs • Part of Speech Tagging – What and Why? • • • • • What Information is Available? Visible Markov Models Hidden Markov Models Training and Initialization Other Methods 13-2 Part Of Speech Tagging • Assign syntactic categories to words in text – The-AT representative-NN put-VBD chairs-NNS on-IN the-AT table-NN. – The-AT representative-JJ put-NN chairs-VBZ on-IN the-AT table-NN. – Tagging set (see next) • Usefulness – Lexical Acquisition, Shallow/Partial Parse – Information Extraction – Question Answering 13-3 Brown/Penn tag sets 13-4 Sources of information • syntagmatic structural information – looking at information about tag sequences – AT JJ NN vs. AT JJ VBP – 77% performance (Greene and Rubin, 1971) • lexical information – predicting a tag based on the word concerned – The word flour is much more likely to be a noun than a verb 13-5 Visible Markov • View as Markov Chain – Limited Horizon: a word’s tag only depends on the previous tag P( X i 1  t j | X 1 , X i )  P( X i 1  t j | X i ) – Time Invariant: the dependency does not change over time P( X i 1  t j | X i )  P( X 2  t j | X 1 ) 13-6 13-7 Find the best tagging t1,n for a sentence w1,n. P( w1,n | t1,n ) P(t1,n ) arg max P(t1,n | w1,n )  arg max P( w1,n ) t1, n t1, n  arg max P( w1,n | t1,n ) P(t1,n ) t1, n Words are independently of each other n P( w1,n | t1,n ) P(t1,n )   P( wi | t1,n )  P(t n | t1,n 1 )  P(t n 1 | t1,n  2 )   P(t 2,t1 ) i 1 A word’s identity only depends on its tag. n   P( wi | ti )  P(tn | tn 1 )  P(tn 1 | tn 2 )    P(t2,t1 ) i 1 n   [ P( wi | ti )  P(ti | ti 1 )] i 1 13-8 ( we define P(t1, | t0 )  1.0) The final equation for determining the optimal tags for a sentence:  n t 1,n  arg max P(t1,n | w1,n )   P( wi | ti ) P(ti | ti 1 ) t1,n i 1 j k C ( t ,t ) k j P (t | t )  C (t j ) l j C ( w , t ) l j P( w | t )  C (t j ) 13-9 Viterbi Algorithm 13-10 Unknown Words • Simplest model – Unknown words can be of any part of speech – Or only any open class part of speech, I.e., nouns, verbs, and so on • Morphological and other cues – -ed: past tense forms or past participles 13-11 Transformation-Based Learning of Tags • A specification of which “error-correcting” transformations are admissible • The learning algorithm – Tag each word in the training corpus with its most frequent tag – Construct a ranked list of transformations that transforms the initial tagging into a tagging that is close to correct 13-12 Transformations Triggering environment Rewrite rule t1  t2: replace tag t1 by tag t2. Triggering environment: potential rewriting Tag tj occurs in one of the three previous positions Tag tj occurs two positions earlier and tag tk occurs in the following position 13-13 locations where a trigger will be sought (1) Trigger by tags (2) Trigger by word (3) Trigger by morphology to work in a school go to school?? for cut, put more valuable player don’t, shouldn’t e.g., unknown words are tagged as proper nouns (NNP) if capitalized, as common nouns (NN) otherwise. 13-14 Replace NN by NNS if the unknown word’s suffix is –s. Reduce the error rate E(Ck): the number of words that are mistagged in tagged corpus Ck. 13-15 Tagging Accuracy • • • • 95%~97% The amount of training data available The tag set The difference between training corpus and dictionary on the one hand and the corpus of application on the other • Unknown words 13-16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ch13