Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Japanese grammar wikipedia , lookup
Georgian grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Germanic weak verb wikipedia , lookup
Germanic strong verb wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Pipil grammar wikipedia , lookup
Penn Putting Meaning Into Your Trees Martha Palmer University of Pennsylvania Columbia University New York City January 29, 2004 Columbia, 1/29/04 1 Outline Penn Introduction Background: WordNet, Levin classes, VerbNet Proposition Bank – capturing shallow semantics Mapping PropBank to VerbNet Mapping PropBank to WordNet Columbia, 1/29/04 2 Ask Jeeves – A Q/A, IR ex. Penn What do you call a successful movie? Blockbuster Tips on Being a Successful Movie Vampire ... I shall call the police. Successful Casting Call & Shoot for ``Clash of Empires'' ... thank everyone for their participation in the making of yesterday's movie. Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague... VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer. Columbia, 1/29/04 3 Ask Jeeves – filtering w/ POS tag Penn What do you call a successful movie? Tips on Being a Successful Movie Vampire ... I shall call the police. Successful Casting Call & Shoot for ``Clash of Empires'' ... thank everyone for their participation in the making of yesterday's movie. Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague... VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer. Columbia, 1/29/04 4 Filtering out “call the police” Penn Syntax call(you,movie,what) ≠ call(you,police) Columbia, 1/29/04 5 English lexical resource is required Penn That provides sets of possible syntactic frames for verbs. And provides clear, replicable sense distinctions. AskJeeves: Who do you call for a good electronic lexical database for English? Columbia, 1/29/04 6 WordNet – call, 28 senses Penn 1. name, call -- (assign a specified, proper name to; "They named their son David"; …) -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; …) ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; …) -> LABEL 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER Columbia, 1/29/04 7 WordNet – Princeton (Miller 1985, Fellbaum 1998) Penn On-line lexical reference (dictionary) Nouns, verbs, adjectives, and adverbs grouped into synonym sets Other relations include hypernyms (ISA), antonyms, meronyms Limitations as a computational lexicon Contains little syntactic information No explicit predicate argument structures No systematic extension of basic senses Sense distinctions are very fine-grained, ITA 73% No hierarchical entries Columbia, 1/29/04 8 Levin classes (Levin, 1993) Penn 3100 verbs, 47 top level classes, 193 second and third level Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. John cut the bread. / *The bread cut. / Bread cuts easily. John hit the wall. / *The wall hit. / *Walls hit easily. Columbia, 1/29/04 9 Levin classes (Levin, 1993) Penn Verb class hierarchy: 3100 verbs, 47 top level classes, 193 Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. change-of-state John cut the bread. / *The bread cut. / Bread cuts easily. change-of-state, recognizable action, sharp instrument John hit the wall. / *The wall hit. / *Walls hit easily. contact, exertion of force Columbia, 1/29/04 10 Penn Columbia, 1/29/04 11 Confusions in Levin classes? Penn Not semantically homogenous {braid, clip, file, powder, pluck, etc...} Multiple class listings homonymy or polysemy? Conflicting alternations? Carry verbs disallow the Conative, (*she carried at the ball), but include {push,pull,shove,kick,draw,yank,tug} also in Push/pull class, does take the Conative (she kicked at the ball) Columbia, 1/29/04 12 Intersective Levin Classes Penn “apart” CH-STATE “across the room” “at” ¬CH-LOC CH-LOC Columbia, 1/29/04 Dang, Kipper & Palmer, ACL98 13 Intersective Levin Classes Penn More syntactically and semantically coherent sets of syntactic patterns explicit semantic components relations between senses VERBNET www.cis.upenn.edu/verbnet Dang, Kipper & Palmer, IJCAI00, Coling00 Columbia, 1/29/04 14 VerbNet – Karin Kipper Penn Class entries: Capture generalizations about verb behavior Organized hierarchically Members have common semantic elements, semantic roles and syntactic frames Verb entries: Refer to a set of classes (different senses) each class member linked to WN synset(s) (not all WN senses are covered) Columbia, 1/29/04 15 Semantic role labels: Penn Julia broke the LCD projector. break (agent(Julia), patient(LCD-projector)) cause(agent(Julia), broken(LCD-projector)) agent(A) -> intentional(A), sentient(A), causer(A), affector(A) patient(P) -> affected(P), change(P),… Columbia, 1/29/04 16 Hand built resources vs. Real data Penn VerbNet is based on linguistic theory – how useful is it? How well does it correspond to syntactic variations found in naturally occurring text? PropBank Columbia, 1/29/04 17 Proposition Bank: Penn From Sentences to Propositions Powell met Zhu Rongji battle wrestle join debate Powell and Zhu Rongji met Powell met with Zhu Rongji Powell and Zhu Rongji had a meeting consult Proposition: meet(Powell, Zhu Rongji) meet(Somebody1, Somebody2) ... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) Columbia, 1/29/04 discuss([Powell, Zhu], return(X, plane)) 18 Capturing semantic roles* Penn SUBJ Owen broke [ ARG1 the laser pointer.] SUBJ [ARG1 The windows] were broken by the hurricane. SUBJ [ARG1 The vase] broke into pieces when it toppled over. *See also Framenet, http://www.icsi.berkeley.edu/~framenet/ Columbia, 1/29/04 19 English lexical resource is required Penn That provides sets of possible syntactic frames for verbs with semantic role labels. And provides clear, replicable sense distinctions. Columbia, 1/29/04 20 A TreeBanked Sentence Penn (S (NP-SBJ Analysts) (VP have (VP been VP (VP expecting (NP (NP a GM-Jaguar pact) have VP (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) NPbeen VP (VP would Analyst SBJ expectingNP (VP give s (NP the U.S. car maker) SBAR NP S (NP (NP an eventual (ADJP 30 %) stake) a GM-Jaguar WHNP-1 (PP-LOC in (NP the British company)))))))))))) VP pact that NP-SBJ VP *T*-1 would NP give PPNP Analysts have been expecting a GM-Jaguar NP LOC pact that would give the U.S. car maker an the US car NP an eventual maker eventual 30% stake in the British company. the British 30% stake in company S Columbia, 1/29/04 21 The same sentence, PropBanked Penn (S Arg0 (NP-SBJ Analysts) (VP have (VP been Arg1 (VP expecting Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) a GM-Jaguar (VP would pact (VP give Arg2 (NP the U.S. car maker) Arg1 (NP (NP an eventual (ADJP 30 %) stake) Arg0 (PP-LOC in (NP the British that would give Arg1 company)))))))))))) have been expecting Arg0 Analyst s *T*-1 Arg2 the US car maker Columbia, 1/29/04 an eventual 30% stake in the British company expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake) 22 Frames File Example: expect Penn Roles: Arg0: expecter Arg1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg0: REL: Arg1: Columbia, 1/29/04 Portfolio managers expect further declines in interest rates 23 Frames File example: give Penn Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation Columbia, 1/29/04 24 Word Senses in PropBank Penn Orders to ignore word sense not feasible for 700+ verbs Mary left the room Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in VerbNet and WordNet? Columbia, 1/29/04 25 Annotation procedure Penn PTB II - Extraction of all sentences with given verb Create Frame File for that verb Paul Kingsbury (3100+ lemmas, 4400 framesets,118K predicates) Over 300 created automatically via VerbNet First pass: Automatic tagging (Joseph Rosenzweig) http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon Second pass: Double blind hand correction Paul Kingsbury Tagging tool highlights discrepancies Scott Cotton Third pass: Solomonization (adjudication) Betsy Klipple, Olga Babko-Malaya Columbia, 1/29/04 26 Trends in Argument Numbering Penn Arg0 = agent Arg1 = direct object / theme / patient Arg2 = indirect object / benefactive / instrument / attribute / end state Arg3 = start point / benefactive / instrument / attribute Arg4 = end point Per word vs frame level – more general? Columbia, 1/29/04 27 Additional tags (arguments or adjuncts?) Penn Variety of ArgM’s (Arg#>4): TMP - when? LOC - where at? DIR - where to? MNR - how? PRP -why? REC - himself, themselves, each other PRD -this argument refers to or modifies another ADV –others Columbia, 1/29/04 28 Inflection Penn Verbs also marked for tense/aspect Passive/Active Perfect/Progressive Third singular (is has does was) Present/Past/Future Infinitives/Participles/Gerunds/Finites Modals and negations marked as ArgMs Columbia, 1/29/04 29 Frames: Multiple Framesets Penn Out of the 787 most frequent verbs: 1 Frameset – 521 2 Frameset – 169 3+ Frameset - 97 (includes light verbs) 94% ITA Framesets are not necessarily consistent between different senses of the same verb Framesets are consistent between different verbs that share similar argument structures, (like FrameNet) Columbia, 1/29/04 30 Ergative/Unaccusative Verbs Penn Roles (no ARG0 for unaccusative verbs) Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. The Nasdaq composite index added 1.01 to 456.6 on paltry volume. Columbia, 1/29/04 31 Actual data for leave Penn http://www.cs.rochester.edu/~gildea/PropBank/Sort/ Leave .01 “move away from” Arg0 rel Arg1 Arg3 Leave .02 “give” Arg0 rel Arg1 Arg2 sub-ARG0 obj-ARG1 44 sub-ARG0 20 sub-ARG0 NP-ARG1-with obj-ARG2 17 sub-ARG0 sub-ARG2 ADJP-ARG3-PRD 10 sub-ARG0 sub-ARG1 ADJP-ARG3-PRD 6 sub-ARG0 sub-ARG1 VP-ARG3-PRD 5 NP-ARG1-with obj-ARG2 4 obj-ARG1 3 sub-ARG0 sub-ARG2 VP-ARG3-PRD 3 Columbia, 1/29/04 32 Penn PropBank/FrameNet Buy Sell Arg0: buyer Arg0: seller Arg1: goods Arg1: goods Arg2: seller Arg2: buyer Arg3: rate Arg3: rate Arg4: payment Arg4: payment Broader, more neutral, more syntactic – maps readily to VN,TR.FN Rambow, et al, PMLB03 33 Columbia, 1/29/04 Annotator accuracy – ITA 84% Penn Annotator Accuracy-primary labels only 0.96 hertlerb 0.95 forbesk 0.94 solaman2 istreit accuracy 0.93 0.92 wiarmstr 0.91 0.9 0.89 kingsbur ksledge nryant jaywang malayao 0.88 0.87 0.86 1000 ptepper cotter 10000 delilkan 100000 1000000 # of annotations (log scale) Columbia, 1/29/04 34 English lexical resource is required Penn That provides sets of possible syntactic frames for verbs with semantic role labels? And provides clear, replicable sense distinctions. Columbia, 1/29/04 35 English lexical resource is required Penn That provides sets of possible syntactic frames for verbs with semantic role labels that can be automatically assigned accurately to new text? And provides clear, replicable sense distinctions. Columbia, 1/29/04 36 Automatic Labelling of Semantic Relations Penn • Stochastic Model • Features: Predicate Phrase Type Parse Tree Path Position (Before/after predicate) Voice (active/passive) Head Word Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02 Columbia, 1/29/04 37 Semantic Role Labelling AccuracyKnown Boundaries Gold St. parses Framenet PropBank ≥ 10 inst 77.0 Automatic parses 82.0 73.6 Penn PropBank ≥ 10 instances 83.1 79.6 •Accuracy of semantic role prediction for known boundaries--the system is given the constituents to classify. •FrameNet examples (training/test) are handpicked to be unambiguous. • Lower performance with unknown boundaries. • Higher performance with traces. • Almost evens out. Columbia, 1/29/04 38 Additional Automatic Role Labelers Penn Performance improved from 77% to 88% Colorado (Gold Standard parses, < 10 instances) Same features plus Named Entity tags Head word POS For unseen verbs – backoff to automatic verb clusters SVM’s Role or not role For each likely role, for each Arg#, Arg# or not No overlapping role labels allowed Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03, Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03 Columbia, 1/29/04 39 Additional Automatic Role Labelers Penn Performance improved from 77% to 88% Colorado New results, original features, labels, 88%, 93% Penn (Gold Standard parses, < 10 instances) Same features plus Named Entity tags Head word POS For unseen verbs – backoff to automatic verb clusters SVM’s Role or not role For each likely role, for each Arg#, Arg# or not No overlapping role labels allowed Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03, Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03 Columbia, 1/29/04 40 Word Senses in PropBank Penn Orders to ignore word sense not feasible for 700+ verbs Mary left the room Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in VerbNet and WordNet? Columbia, 1/29/04 41 Mapping from PropBank to VerbNet Frameset id = leave.02 Sense = give VerbNet class = future-having 13.3 Arg0 Giver Agent Arg1 Thing given Theme Arg2 Benefactive Recipient Columbia, 1/29/04 Penn 42 Mapping from PB to VerbNet Penn Columbia, 1/29/04 43 Mapping from PropBank to VerbNet Penn Overlap with PropBank framesets 50,000 PropBank instances < 50% VN entries, > 85% VN classes Results MATCH - 78.63%. (80.90% relaxed) (VerbNet isn’t just linguistic theory!) Benefits Thematic role labels and semantic predicates Can extend PropBank coverage with VerbNet classes WordNet sense tags Kingsbury & Kipper, NAACL03, Text Meaning Workshop http://www.cs.rochester.edu/~gildea/VerbNet/ Columbia, 1/29/04 44 WordNet as a WSD sense inventory Penn Senses unnecessarily fine-grained? Word Sense Disambiguation bakeoffs Senseval1 – Hector, ITA = 95.5% Senseval2 – WordNet 1.7, ITA verbs = 71% Groupings of Senseval2 verbs, ITA =82% Used syntactic and semantic criteria Columbia, 1/29/04 45 Groupings Methodology (w/ Dang and Fellbaum) Penn Double blind groupings, adjudication Syntactic Criteria (VerbNet was useful) Distinct subcategorization frames call him a bastard call him a taxi Recognizable alternations – regular sense extensions: play an instrument play a song play a melody on an instrument SIGLEX01, SIGLEX02, JNLE04 Columbia, 1/29/04 46 Groupings Methodology (cont.) Penn Semantic Criteria Differences in semantic classes of arguments Abstract/concrete, human/animal, animate/inanimate, different instrument types,… Differences in entailments Change of prior entity or creation of a new entity? Differences in types of events Abstract/concrete/mental/emotional/…. Specialized subject domains Columbia, 1/29/04 47 Results – averaged over 28 verbs Dang and Palmer, Siglex02,Dang et al,Coling02 Penn Total WN polysemy 16.28 Group polysemy 8.07 ITA-fine 71% ITA-group 82% MX-fine 60.2% MX-group 69% MX – Maximum Entropy WSD, p(sense|context) Features: topic, syntactic constituents, semantic classes 48 +2.5%, +1.5 to +5%, +6% Columbia, 1/29/04 Grouping improved ITA and Maxent WSD Penn Call: 31% of errors due to confusion between senses within same group 1: name, call -- (assign a specified, proper name to; They named their son David) call -- (ascribe a quality to or give a name of a common noun that reflects a quality; He called me a bastard) call -- (consider or regard as being;I would not call her beautiful) 75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses Columbia, 1/29/04 49 WordNet: - call, 28 senses, groups WN5, WN16,WN12 Loud cry WN3 WN19 WN1 WN22 Label WN15 WN26 Bird or animal cry WN4 WN 7 WN8 WN9 Request WN20 WN18 WN27 Challenge WN2 WN 13 Phone/radioWN28 WN17 , WN 11 Columbia, 1/29/04 Penn WN6 WN25 Call a loan/bond WN23 Visit WN10, WN14, WN21, WN24, Bid 50 WordNet: - call, 28 senses, groups WN5, WN16,WN12 Loud cry WN3 WN19 WN1 WN22 Label WN15 WN26 Bird or animal cry WN4 WN 7 WN8 WN9 Request WN20 WN18 WN27 Challenge WN2 WN 13 Phone/radioWN28 WN17 , WN 11 Columbia, 1/29/04 Penn WN6 WN25 Call a loan/bond WN23 Visit WN10, WN14, WN21, WN24, Bid 51 Overlap between Groups and Framesets – 95% Penn Frameset2 Frameset1 WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN19 WN 14 WN20 develop Palmer, Dang & Fellbaum, NLE 2004 Columbia, 1/29/04 52 Sense Hierarchy Penn PropBank Framesets – coarse grained distinctions 20 Senseval 2 verbs w/ > 1 Frameset Maxent WSD system, 73.5% baseline, 90% accuracy Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap, 69% WordNet – fine grained distinctions, 60.2% Columbia, 1/29/04 53 English lexical resource is available Penn That provides sets of possible syntactic frames for verbs with semantic role labels that can be automatically assigned accurately to new text. And provides clear, replicable sense distinctions. Columbia, 1/29/04 54 A Chinese Treebank Sentence Penn 国会/Congress 最近/recently 通过/pass 了/ASP 银行法 /banking law “The Congress passed the banking law recently.” (IP (NP-SBJ (NN 国会/Congress)) (VP (ADVP (ADV 最近/recently)) (VP (VV 通过/pass) (AS 了/ASP) (NP-OBJ (NN 银行法/banking law))))) Columbia, 1/29/04 55 The Same Sentence, PropBanked Penn (IP (NP-SBJ arg0 (NN 国会)) (VP argM (ADVP (ADV 最近)) (VP f2 (VV 通过) (AS 了) arg1 (NP-OBJ (NN 银行法))))) 通过(f2) (pass) arg0 国会 argM 最近 arg1 银行法 (law) (congress) Columbia, 1/29/04 56 Chinese PropBank Status (w/ Bert Xue and Scott Cotton) Penn Create Frame File for that verb Similar alternations – causative/inchoative, unexpressed object 5000 lemmas, 3000 DONE, (hired Jiang) First pass: Automatic tagging 2500 DONE Subcat frame matcher (Xue & Kulick, MT03) Second pass: Double blind hand correction In progress (includes frameset tagging), 1000 DONE Ported RATS to CATS, in use since May Third pass: Solomonization (adjudication) Columbia, 1/29/04 57 A Korean Treebank Sentence Penn 그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다. He added that Renault has a deadline until the end of March for a merger proposal. (S (NP-SBJ 그/NPN+은/PAU) (VP (S-COMP (NP-SBJ 르노/NPR+이/PCA) (VP (VP (NP-ADV 3/NNU 월/NNX+말/NNX+까지/PAU) (VP (NP-OBJ 인수/NNC+제의/NNC 시한/NNC+을/PCA) 갖/VV+고/ECS)) 있/VX+다/EFN+고/PAD) 덧붙이/VV+었/EPF+다/EFN) ./SFN) Columbia, 1/29/04 58 The same sentence, PropBanked 덧붙이었다 Arg0 그는 Arg2 갖고 있다 Arg0 르노가 Arg1 ArgM 3 월말까지 Penn (S Arg0 (NP-SBJ 그/NPN+은/PAU) (VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노/NPR+이/PCA) (VP (VP ( ArgM NP-ADV 3/NNU 월/NNX+말/NNX+까지/PAU) (VP ( Arg1 NP-OBJ 인수/NNC+제의/NNC 시한/NNC+을/PCA) 갖/VV+고/ECS)) 있/VX+다/EFN+고/PAD) 덧붙이/VV+었/EPF+다/EFN) ./SFN) 인수제의 시한을 덧붙이다 (그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다) (add) (he) (Renaut has a deadline until the end of March for a merger proposal) 갖다 (르노가, (has) Columbia, 1/29/04 3 월말까지, 인수제의 시한을) (Renaut) (until the end of March) (a deadline for a merger proposal) 59 PropBank II Penn Nominalizations NYU Lexical Frames DONE Event Variables, (including temporals and locatives) More fine-grained sense tagging Tagging nominalizations w/ WordNet sense Selected verbs and nouns Nominal Coreference not names Clausal Discourse connectives – selected subset Columbia, 1/29/04 60 PropBank II Event variables; Penn sense tags; nominal reference; discourse connectives {Also}, [Arg0substantially lower Dutch corporate tax rates] helped [Arg1[Arg0 the company] keep [Arg1 its tax outlay] [Arg3PRD flat] [ArgM-ADV relative to earnings growth]]. ID# REL h23 help tax rates help2,5 tax rate1 the company keep its tax outlay flat k16 keep the keep1 company1 company its tax outlay Columbia, 1/29/04 Arg0 Arg1 Arg3PRD ArgM-ADV flat relative to earnings… 61 Summary Penn Shallow semantic annotation that captures critical dependencies and semantic role labels Supports training of supervised automatic taggers Methodology ports readily to other languages English PropBank release – spring 2004 Chinese PropBank release – fall 2004 Korean PropBank release – summer 2005 Columbia, 1/29/04 62 Word sense in Machine Translation Penn Different syntactic frames John left the room Juan saiu do quarto. (Portuguese) John left the book on the table. Juan deizou o livro na mesa. Same syntactic frame? John left a fortune. Juan deixou uma fortuna. Columbia, 1/29/04 63 Summary of Multilingual TreeBanks, PropBanks Parallel Corpora Text Treebank PropBank I Penn Prop II Chinese Chinese 500K Chinese 500K Chinese 500K Ch 100K Treebank English 400K English 100K English 350K En 100K Arabic 500K Arabic 500K Arabic Treebank English 500K English ? ? ? Korean 180K Korean Treebank English 50K Korean 180K English 50K Columbia, 1/29/04 Korean 180K English 50K 64 Levin class: escape-51.1-1 Penn WordNet Senses: WN 1, 5, 8 Thematic Roles: Location[+concrete] Theme[+concrete] Frames with Semantics Basic Intransitive "The convict escaped" motion(during(E),Theme) direction(during(E),Prep,Theme, ~Location) Intransitive (+ path PP) "The convict escaped from the prison" Locative Preposition Drop "The convict escaped the prison" Columbia, 1/29/04 65 Levin class: future_having-13.3 Penn WordNet Senses: WN 2,10,13 Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[] Frames with Semantics Dative "I promised somebody my time" Agent V Recipient Theme has_possession(start(E),Agent,Theme) future_possession(end(E),Recipient,Theme) cause(Agent,E) Transitive (+ Recipient PP) "We offered our paycheck to her" Agent V Theme Prep(to) Recipient ) Transitive (Theme Object) "I promised my house (to somebody)" Agent V Theme Columbia, 1/29/04 66 Automatic classification Penn Merlo & Stevenson automatically classified 59 verbs with 69.8% accuracy 1. Unergative, 2. unaccusative, 3. object-drop 100M words automatically parsed C5.0. Using features: transitivity, causativity, animacy, voice, POS EM clustering – 61%, 2669 instances, 1M words Using Gold Standard semantic role labels 1. float hop/hope jump march leap 2. change clear collapse cool crack open flood 3. borrow clean inherit reap organize study Columbia, 1/29/04 67 SENSEVAL – Word Sense Disambiguation Evaluation Penn DARPA style bakeoff: training data, testing data, scoring algorithm. SENSEVAL1 1998 Languages 3 Systems 24 Eng. Lexical Sample Yes Verbs/Poly/Instances 13/12/215 Sense Inventory Hector, 95.5% Columbia, 1/29/04 SENSEVAL2 2001 12 90 Yes 29/16/110 WordNet, 73+% NLE99, CHUM01, NLE02, NLE03 68 Maximum Entropy WSD Hoa Dang, best performer on Verbs Penn Maximum entropy framework, p(sense|context) Contextual Linguistic Features Topical feature for W: keywords (determined automatically) Local syntactic features for W: presence of subject, complements, passive? words in subject, complement positions, particles, preps, etc. Local semantic features for W: Semantic class info from WordNet (synsets, etc.) Named Entity tag (PERSON, LOCATION,..) for proper Ns words within +/- 2 word window Columbia, 1/29/04 69 Best Verb Performance - Maxent-WSD Hoa Dang 28 verbs - average Total WN polysemy 16.28 ITA 71% MX-WSD 60.2% Penn MX – Maximum Entropy WSD, p(sense|context) Features: topic, syntactic constituents, semantic classes +2.5%, +1.5 to +5%, +6% Dang and Palmer, Siglex02,Dang et al,Coling02 Columbia, 1/29/04 70 Role Labels & Framesets as features for WSD Penn Preliminary results Jinying Chen Gold Standard PropBank annotation Decision Tree C5.0, Groups 5 verbs, Features: Frameset tags, Arg labels Comparable results to Maxent with PropBank features Syntactic frames and sense distinctions are inseparable Columbia, 1/29/04 71 Lexical resources provide concrete criteria for sense distinctions Penn PropBank – coarse grained sense distinctions determined by different subcategorization frames (Framesets) Intersective Levin classes – regular sense extensions through differing syntactic constructions VerbNet – distinct semantic predicates for each sense (verb class) Are these the right distinctions? Columbia, 1/29/04 72 Results – averaged over 28 verbs Penn Total WN 16.28 Grp 8.07 ITA-fine 71% ITA-group 82% MX-fine 60.2% JHU - MLultra 56.6%,58.7% MX-group 69% Columbia, 1/29/04 73