* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ch13
Survey
Document related concepts
Agglutination wikipedia , lookup
Yiddish grammar wikipedia , lookup
Lithuanian grammar wikipedia , lookup
Ojibwe grammar wikipedia , lookup
Untranslatability wikipedia , lookup
Esperanto grammar wikipedia , lookup
French grammar wikipedia , lookup
Compound (linguistics) wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Polish grammar wikipedia , lookup
Scottish Gaelic grammar wikipedia , lookup
Pipil grammar wikipedia , lookup
Morphology (linguistics) wikipedia , lookup
Transcript
Chapter 13 Part-of-Speech Tagging 13-1 POS Tagging + HMMs • Part of Speech Tagging – What and Why? • • • • • What Information is Available? Visible Markov Models Hidden Markov Models Training and Initialization Other Methods 13-2 Part Of Speech Tagging • Assign syntactic categories to words in text – The-AT representative-NN put-VBD chairs-NNS on-IN the-AT table-NN. – The-AT representative-JJ put-NN chairs-VBZ on-IN the-AT table-NN. – Tagging set (see next) • Usefulness – Lexical Acquisition, Shallow/Partial Parse – Information Extraction – Question Answering 13-3 Brown/Penn tag sets 13-4 Sources of information • syntagmatic structural information – looking at information about tag sequences – AT JJ NN vs. AT JJ VBP – 77% performance (Greene and Rubin, 1971) • lexical information – predicting a tag based on the word concerned – The word flour is much more likely to be a noun than a verb 13-5 Visible Markov • View as Markov Chain – Limited Horizon: a word’s tag only depends on the previous tag P( X i 1 t j | X 1 , X i ) P( X i 1 t j | X i ) – Time Invariant: the dependency does not change over time P( X i 1 t j | X i ) P( X 2 t j | X 1 ) 13-6 13-7 Find the best tagging t1,n for a sentence w1,n. P( w1,n | t1,n ) P(t1,n ) arg max P(t1,n | w1,n ) arg max P( w1,n ) t1, n t1, n arg max P( w1,n | t1,n ) P(t1,n ) t1, n Words are independently of each other n P( w1,n | t1,n ) P(t1,n ) P( wi | t1,n ) P(t n | t1,n 1 ) P(t n 1 | t1,n 2 ) P(t 2,t1 ) i 1 A word’s identity only depends on its tag. n P( wi | ti ) P(tn | tn 1 ) P(tn 1 | tn 2 ) P(t2,t1 ) i 1 n [ P( wi | ti ) P(ti | ti 1 )] i 1 13-8 ( we define P(t1, | t0 ) 1.0) The final equation for determining the optimal tags for a sentence: n t 1,n arg max P(t1,n | w1,n ) P( wi | ti ) P(ti | ti 1 ) t1,n i 1 j k C ( t ,t ) k j P (t | t ) C (t j ) l j C ( w , t ) l j P( w | t ) C (t j ) 13-9 Viterbi Algorithm 13-10 Unknown Words • Simplest model – Unknown words can be of any part of speech – Or only any open class part of speech, I.e., nouns, verbs, and so on • Morphological and other cues – -ed: past tense forms or past participles 13-11 Transformation-Based Learning of Tags • A specification of which “error-correcting” transformations are admissible • The learning algorithm – Tag each word in the training corpus with its most frequent tag – Construct a ranked list of transformations that transforms the initial tagging into a tagging that is close to correct 13-12 Transformations Triggering environment Rewrite rule t1 t2: replace tag t1 by tag t2. Triggering environment: potential rewriting Tag tj occurs in one of the three previous positions Tag tj occurs two positions earlier and tag tk occurs in the following position 13-13 locations where a trigger will be sought (1) Trigger by tags (2) Trigger by word (3) Trigger by morphology to work in a school go to school?? for cut, put more valuable player don’t, shouldn’t e.g., unknown words are tagged as proper nouns (NNP) if capitalized, as common nouns (NN) otherwise. 13-14 Replace NN by NNS if the unknown word’s suffix is –s. Reduce the error rate E(Ck): the number of words that are mistagged in tagged corpus Ck. 13-15 Tagging Accuracy • • • • 95%~97% The amount of training data available The tag set The difference between training corpus and dictionary on the one hand and the corpus of application on the other • Unknown words 13-16