Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IntroductiontoPartof SpeechTagging AlanRitter ManyslidesadaptedfromBrendanO’Connor ChrisManning Wherearewegoingwiththis? • Textclassification:bagsofwords • Sequencetagging • PartsofSpeech • NamedEntityRecognition • Otherareas:bioinformatics(geneprediction),etc… What’sapart-of-speech(POS)? • Syntax=howwordscomposetoformlargermeaningbearingunits • POS=syntacticcategoriesforwords • Youcouldsubstitutewordswithinaclassandhaveasyntacticallyvalid sentence • Givesinformationhowwordscombineintolargerphrases • Isawthedog • Isawthecat • Isawthe___ PartsofSpeechisanoldidea • PerhapsstartingwithAristotleintheWest(384–322BCE),therewas theideaofhavingpartsofspeech • Schoolgrammar:noun,verb,adjective,adverb,preposition, conjunction,pronoun,interjection • Manymorefinegrainedpossibilities https://www.youtube.com/watch?v=ODGA7ssL-6g&index=1&list=PL6795522EAD6CE2F7 Open class (lexical) words Nouns Proper IBM Italy Verbs Common cat / cats snow Closed class (functional) Determiners the some Conjunctions and or Pronouns he its Main see registered Adjectives old older oldest Adverbs slowly Numbers … more 122,312 one Modals can had Prepositions to with Particles off up Interjections Ow Eh … more Openvs.Closedclasses • Openvs.Closedclasses • Closed: • determiners:a,an,the • pronouns:she,he,I • prepositions:on,under,over,near,by,… • Why“closed”? • Open: • Nouns,Verbs,Adjectives,Adverbs. ManyTaggingStandards • PennTreebank(45tags)…thisisthemostcommonone • Browncorpus(85tags) • Coarsetagsets • UniversalPOStags(Petrovet.al.https://github.com/slavpetrov/universalpos-tags) • Motivation:cross-linguisticregularities Whatarepartsofspeechusefulfor? • Phraseidentification(chunking) • Namedentityrecognition • InformationExtraction • Parsing QuickandDirtyNounPhraseIdentification POSTagging • WordsoftenhavemorethanonePOS:back • • • • Theback door =JJ Onmyback =NN Winthevotersback =RB Promisedtoback thebill =VB • ThePOStaggingproblemistodeterminethePOStagforaparticular instanceofaword. POSTagging • Input: Playswellwithothers • Ambiguity:NNS/VBZUH/JJ/NN/RBINNNS • Output: Plays/VBZwell/RBwith/INothers/NNS • Uses: • • • • Penn Treebank POStags Text-to-speech(howdowepronounce“lead”?) Canwriteregexps like(Det)Adj*N+overtheoutputforphrases,etc. Asinputtoortospeedupafullparser Ifyouknowthetag,youcanbackofftoitinothertasks POStaggingperformance • Howmanytagsarecorrect?(Tagaccuracy) • About97%currently • Butbaselineisalready90% • Baselineisperformanceofstupidestpossiblemethod • Tageverywordwithitsmostfrequent tag • Tagunknown wordsasnouns • Partlyeasybecause • Manywordsareunambiguous • Yougetpointsforthem(the,a,etc.)andforpunctuationmarks! Decidingonthecorrectpartofspeechcanbe difficultevenforpeople • Mrs/NNPShaefer/NNPnever/RBgot/VBDaround/RP to/TO joining/VBG • All/DTwe/PRPgotta/VBNdo/VBis/VBZgo/VBaround/INthe/DT corner/NN • Chateau/NNPPetrus/NNPcosts/VBZaround/RB250/CD HowdifficultisPOStagging? • About11%ofthewordtypesintheBrowncorpusareambiguous withregardtopartofspeech • Buttheytendtobeverycommonwords.E.g.,that • Iknowthat heishonest=IN • Yes,that playwasnice=DT • Youcan’tgothat far=RB • 40%ofthewordtokensareambiguous It’shardforpeopletoo!