Download Martha Palmer`s 2004 talk slides

Document related concepts

Japanese grammar wikipedia , lookup

Georgian grammar wikipedia , lookup

Yiddish grammar wikipedia , lookup

Germanic weak verb wikipedia , lookup

Germanic strong verb wikipedia , lookup

Serbo-Croatian grammar wikipedia , lookup

Old English grammar wikipedia , lookup

Pipil grammar wikipedia , lookup

Word-sense disambiguation wikipedia , lookup

Lexical semantics wikipedia , lookup

Transcript
Penn
Putting Meaning Into Your Trees
Martha Palmer
University of Pennsylvania
Columbia University
New York City
January 29, 2004
Columbia, 1/29/04
1
Outline
Penn
 Introduction
 Background: WordNet, Levin classes, VerbNet
 Proposition Bank – capturing shallow
semantics
 Mapping PropBank to VerbNet
 Mapping PropBank to WordNet
Columbia, 1/29/04
2
Ask Jeeves – A Q/A, IR ex.
Penn
What do you call a successful movie? Blockbuster
 Tips on Being a Successful Movie Vampire ... I shall call
the police.
 Successful Casting Call & Shoot for ``Clash of Empires''
... thank everyone for their participation in the making of
yesterday's movie.
 Demme's casting is also highly entertaining, although I
wouldn't go so far as to call it successful. This movie's
resemblance to its predecessor is pretty vague...
 VHS Movies: Successful Cold Call Selling: Over 100
New Ideas, Scripts, and Examples from the Nation's
Foremost Sales Trainer.
Columbia, 1/29/04
3
Ask Jeeves – filtering w/ POS tag
Penn
What do you call a successful movie?
 Tips on Being a Successful Movie Vampire ... I shall call
the police.
 Successful Casting Call & Shoot for ``Clash of Empires''
... thank everyone for their participation in the making of
yesterday's movie.
 Demme's casting is also highly entertaining, although I
wouldn't go so far as to call it successful. This movie's
resemblance to its predecessor is pretty vague...
 VHS Movies: Successful Cold Call Selling: Over 100
New Ideas, Scripts, and Examples from the Nation's
Foremost Sales Trainer.
Columbia, 1/29/04
4
Filtering out “call the police”
Penn
Syntax
call(you,movie,what)
≠
call(you,police)
Columbia, 1/29/04
5
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs.
 And provides clear, replicable sense
distinctions.
AskJeeves: Who do you call for a good
electronic lexical database for English?
Columbia, 1/29/04
6
WordNet – call, 28 senses
Penn
1. name, call -- (assign a specified, proper name to;
"They named their son David"; …)
-> LABEL
2. call, telephone, call up, phone, ring -- (get or try to get into
communication (with someone) by telephone;
"I tried to call you all night"; …)
->TELECOMMUNICATE
3. call -- (ascribe a quality to or give a name of a common
noun that reflects a quality;
"He called me a bastard"; …)
-> LABEL
4. call, send for -- (order, request, or command to come;
"She was called into the director's office"; "Call the police!")
-> ORDER
Columbia, 1/29/04
7
WordNet – Princeton (Miller 1985, Fellbaum 1998)
Penn
 On-line lexical reference (dictionary)
 Nouns, verbs, adjectives, and adverbs grouped into
synonym sets
 Other relations include hypernyms (ISA), antonyms,
meronyms
 Limitations as a computational lexicon
 Contains little syntactic information
 No explicit predicate argument structures
 No systematic extension of basic senses
 Sense distinctions are very fine-grained, ITA 73%
 No hierarchical entries
Columbia, 1/29/04
8
Levin classes
(Levin, 1993)
Penn
 3100 verbs, 47 top level classes, 193 second and third level
 Each class has a syntactic signature based on alternations.
John broke the jar. / The jar broke. / Jars break easily.
John cut the bread. / *The bread cut. / Bread cuts easily.
John hit the wall. / *The wall hit. / *Walls hit easily.
Columbia, 1/29/04
9
Levin classes
(Levin, 1993)
Penn
 Verb class hierarchy: 3100 verbs, 47 top level classes, 193
 Each class has a syntactic signature based on alternations.
John broke the jar. / The jar broke. / Jars break easily.
change-of-state
John cut the bread. / *The bread cut. / Bread cuts easily.
change-of-state, recognizable action,
sharp instrument
John hit the wall. / *The wall hit. / *Walls hit easily.
contact, exertion of force
Columbia, 1/29/04
10
Penn
Columbia, 1/29/04
11
Confusions in Levin classes?
Penn
 Not semantically homogenous
{braid, clip, file, powder, pluck, etc...}
 Multiple class listings
homonymy or polysemy?
 Conflicting alternations?
Carry verbs disallow the Conative,
(*she carried at the ball), but include
{push,pull,shove,kick,draw,yank,tug}
also in Push/pull class, does take the Conative
(she kicked at the ball)
Columbia, 1/29/04
12
Intersective Levin Classes
Penn
“apart” CH-STATE
“across the room”
“at” ¬CH-LOC
CH-LOC
Columbia, 1/29/04
Dang, Kipper & Palmer, ACL98
13
Intersective Levin Classes
Penn
 More syntactically and semantically coherent
sets of syntactic patterns
explicit semantic components
relations between senses
VERBNET
www.cis.upenn.edu/verbnet
Dang, Kipper & Palmer, IJCAI00, Coling00
Columbia, 1/29/04
14
VerbNet – Karin Kipper
Penn
 Class entries:
Capture generalizations about verb behavior
Organized hierarchically
Members have common semantic elements,
semantic roles and syntactic frames
 Verb entries:
Refer to a set of classes (different senses)
each class member linked to WN synset(s)
(not all WN senses are covered)
Columbia, 1/29/04
15
Semantic role labels:
Penn
Julia broke the LCD projector.
break (agent(Julia), patient(LCD-projector))
cause(agent(Julia),
broken(LCD-projector))
agent(A) -> intentional(A), sentient(A),
causer(A), affector(A)
patient(P) -> affected(P), change(P),…
Columbia, 1/29/04
16
Hand built resources vs. Real data
Penn
 VerbNet is based on linguistic theory –
how useful is it?
 How well does it correspond to syntactic
variations found in naturally occurring text?
PropBank
Columbia, 1/29/04
17
Proposition Bank:
Penn
From Sentences to Propositions
Powell met Zhu Rongji
battle
wrestle
join
debate
Powell and Zhu Rongji met
Powell met with Zhu Rongji
Powell and Zhu Rongji had
a meeting
consult
Proposition: meet(Powell, Zhu Rongji)
meet(Somebody1, Somebody2)
...
When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.
meet(Powell, Zhu)
Columbia, 1/29/04
discuss([Powell, Zhu], return(X, plane))
18
Capturing semantic roles*
Penn
SUBJ
 Owen broke [ ARG1 the laser pointer.]
SUBJ
 [ARG1 The windows] were broken by the
hurricane.
SUBJ
 [ARG1 The vase] broke into pieces when it
toppled over.
*See also Framenet, http://www.icsi.berkeley.edu/~framenet/
Columbia, 1/29/04
19
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels.
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
20
A TreeBanked Sentence
Penn
(S (NP-SBJ Analysts)
(VP have
(VP been
VP
(VP expecting
(NP (NP a GM-Jaguar pact)
have VP
(SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1)
NPbeen VP
(VP would
Analyst
SBJ
expectingNP
(VP give
s
(NP the U.S. car maker)
SBAR
NP
S (NP (NP an eventual (ADJP 30 %) stake)
a GM-Jaguar WHNP-1
(PP-LOC
in (NP the British company))))))))))))
VP
pact
that NP-SBJ
VP
*T*-1 would
NP
give
PPNP
Analysts have been expecting a GM-Jaguar
NP
LOC
pact that would give the U.S. car maker an the US car
NP
an eventual
maker
eventual 30% stake in the British company.
the British
30% stake in
company
S
Columbia, 1/29/04
21
The same sentence, PropBanked
Penn
(S Arg0 (NP-SBJ Analysts)
(VP have
(VP been
Arg1
(VP expecting
Arg1 (NP (NP a GM-Jaguar pact)
(SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1)
a GM-Jaguar
(VP would
pact
(VP give
Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %)
stake)
Arg0
(PP-LOC in (NP the British
that would give
Arg1
company))))))))))))
have been expecting
Arg0
Analyst
s
*T*-1
Arg2
the US car
maker
Columbia, 1/29/04
an eventual 30% stake in the
British company
expect(Analysts, GM-J pact)
give(GM-J pact, US car maker, 30% stake)
22
Frames File Example: expect
Penn
Roles:
Arg0: expecter
Arg1: thing expected
Example: Transitive, active:
Portfolio managers expect further declines in
interest rates.
Arg0:
REL:
Arg1:
Columbia, 1/29/04
Portfolio managers
expect
further declines in interest rates
23
Frames File example: give
Penn
Roles:
Arg0: giver
Arg1: thing given
Arg2: entity given to
Example:
double object
The executives gave the chefs a standing ovation.
Arg0:
The executives
REL:
gave
Arg2:
the chefs
Arg1:
a standing ovation
Columbia, 1/29/04
24
Word Senses in PropBank
Penn
 Orders to ignore word sense not feasible for 700+
verbs
 Mary left the room
 Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":
Arg0: entity leaving
Arg1: place left
Frameset leave.02 "give":
Arg0: giver
Arg1: thing given
Arg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Columbia, 1/29/04
25
Annotation procedure
Penn
 PTB II - Extraction of all sentences with given verb
 Create Frame File for that verb Paul Kingsbury
 (3100+ lemmas, 4400 framesets,118K predicates)
 Over 300 created automatically via VerbNet
 First pass: Automatic tagging (Joseph Rosenzweig)
 http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon
 Second pass: Double blind hand correction
Paul Kingsbury
 Tagging tool highlights discrepancies Scott Cotton
 Third pass: Solomonization (adjudication)
 Betsy Klipple, Olga Babko-Malaya
Columbia, 1/29/04
26
Trends in Argument Numbering
Penn
 Arg0 = agent
 Arg1 = direct object / theme / patient
 Arg2 = indirect object / benefactive /
instrument / attribute / end state
 Arg3 = start point / benefactive / instrument /
attribute
 Arg4 = end point
 Per word vs frame level – more general?
Columbia, 1/29/04
27
Additional tags
(arguments or adjuncts?)
Penn
 Variety of ArgM’s (Arg#>4):
 TMP - when?
 LOC - where at?
 DIR - where to?
 MNR - how?
 PRP -why?
 REC - himself, themselves, each other
 PRD -this argument refers to or modifies
another
 ADV –others
Columbia, 1/29/04
28
Inflection
Penn
 Verbs also marked for tense/aspect





Passive/Active
Perfect/Progressive
Third singular (is has does was)
Present/Past/Future
Infinitives/Participles/Gerunds/Finites
 Modals and negations marked as ArgMs
Columbia, 1/29/04
29
Frames: Multiple Framesets
Penn
 Out of the 787 most frequent verbs:
 1 Frameset – 521
 2 Frameset – 169
 3+ Frameset - 97 (includes light verbs)
 94% ITA
 Framesets are not necessarily consistent between
different senses of the same verb
 Framesets are consistent between different verbs
that share similar argument structures,
(like FrameNet)
Columbia, 1/29/04
30
Ergative/Unaccusative Verbs
Penn
Roles (no ARG0 for unaccusative verbs)
Arg1 = Logical subject, patient, thing rising
Arg2 = EXT, amount risen
Arg3* = start point
Arg4 = end point
Sales rose 4% to $3.28 billion from $3.16
billion.
The Nasdaq composite index added 1.01
to 456.6 on paltry volume.
Columbia, 1/29/04
31
Actual data for leave
Penn
 http://www.cs.rochester.edu/~gildea/PropBank/Sort/
Leave .01 “move away from” Arg0 rel Arg1 Arg3
Leave .02 “give” Arg0 rel Arg1 Arg2
sub-ARG0 obj-ARG1 44
sub-ARG0 20
sub-ARG0 NP-ARG1-with obj-ARG2 17
sub-ARG0 sub-ARG2 ADJP-ARG3-PRD 10
sub-ARG0 sub-ARG1 ADJP-ARG3-PRD 6
sub-ARG0 sub-ARG1 VP-ARG3-PRD 5
NP-ARG1-with obj-ARG2 4
obj-ARG1 3
sub-ARG0 sub-ARG2 VP-ARG3-PRD 3
Columbia, 1/29/04
32
Penn
PropBank/FrameNet
Buy
Sell
Arg0: buyer
Arg0: seller
Arg1: goods
Arg1: goods
Arg2: seller
Arg2: buyer
Arg3: rate
Arg3: rate
Arg4: payment
Arg4: payment
Broader, more neutral, more syntactic –
maps readily to VN,TR.FN
Rambow, et al, PMLB03
33
Columbia, 1/29/04
Annotator accuracy – ITA 84%
Penn
Annotator Accuracy-primary labels only
0.96
hertlerb
0.95
forbesk
0.94
solaman2
istreit
accuracy
0.93
0.92
wiarmstr
0.91
0.9
0.89
kingsbur
ksledge
nryant
jaywang
malayao
0.88
0.87
0.86
1000
ptepper
cotter
10000
delilkan
100000
1000000
# of annotations (log scale)
Columbia, 1/29/04
34
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels?
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
35
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels
that can be automatically
assigned accurately to new text?
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
36
Automatic Labelling of Semantic
Relations
Penn
• Stochastic Model
• Features:
Predicate
Phrase Type
Parse Tree Path
Position (Before/after predicate)
Voice (active/passive)
Head Word
Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02
Columbia, 1/29/04
37
Semantic Role Labelling AccuracyKnown Boundaries
Gold St. parses
Framenet PropBank
≥ 10 inst
77.0
Automatic parses 82.0
73.6
Penn
PropBank
≥ 10 instances
83.1
79.6
•Accuracy of semantic role prediction for known boundaries--the
system is given the constituents to classify.
•FrameNet examples (training/test) are handpicked to be unambiguous.
• Lower performance with unknown boundaries.
• Higher performance with traces.
• Almost evens out.
Columbia, 1/29/04
38
Additional Automatic Role Labelers
Penn
 Performance improved from 77% to 88% Colorado
 (Gold Standard parses, < 10 instances)
 Same features plus
 Named Entity tags
 Head word POS
 For unseen verbs – backoff to automatic verb clusters
 SVM’s
 Role or not role
 For each likely role, for each Arg#, Arg# or not
 No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,
Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Columbia, 1/29/04
39
Additional Automatic Role Labelers
Penn
 Performance improved from 77% to 88% Colorado
 New results, original features, labels, 88%, 93% Penn
 (Gold Standard parses, < 10 instances)
 Same features plus
 Named Entity tags
 Head word POS
 For unseen verbs – backoff to automatic verb clusters
 SVM’s
 Role or not role
 For each likely role, for each Arg#, Arg# or not
 No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,
Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Columbia, 1/29/04
40
Word Senses in PropBank
Penn
 Orders to ignore word sense not feasible for 700+
verbs
 Mary left the room
 Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":
Arg0: entity leaving
Arg1: place left
Frameset leave.02 "give":
Arg0: giver
Arg1: thing given
Arg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Columbia, 1/29/04
41
Mapping from PropBank to VerbNet
Frameset id =
leave.02
Sense =
give
VerbNet class =
future-having 13.3
Arg0
Giver
Agent
Arg1
Thing given
Theme
Arg2
Benefactive
Recipient
Columbia, 1/29/04
Penn
42
Mapping from PB to VerbNet
Penn
Columbia, 1/29/04
43
Mapping from PropBank to VerbNet
Penn
 Overlap with PropBank framesets
 50,000 PropBank instances
 < 50% VN entries, > 85% VN classes
 Results
 MATCH - 78.63%. (80.90% relaxed)
 (VerbNet isn’t just linguistic theory!)
 Benefits
 Thematic role labels and semantic predicates
 Can extend PropBank coverage with VerbNet classes
 WordNet sense tags
Kingsbury & Kipper, NAACL03, Text Meaning Workshop
http://www.cs.rochester.edu/~gildea/VerbNet/
Columbia, 1/29/04
44
WordNet as a WSD sense inventory
Penn
 Senses unnecessarily fine-grained?
 Word Sense Disambiguation bakeoffs
Senseval1 – Hector, ITA = 95.5%
Senseval2 – WordNet 1.7, ITA verbs = 71%
Groupings of Senseval2 verbs, ITA =82%
 Used syntactic and semantic criteria
Columbia, 1/29/04
45
Groupings Methodology
(w/ Dang and Fellbaum)
Penn
 Double blind groupings, adjudication
 Syntactic Criteria (VerbNet was useful)
Distinct subcategorization frames
 call him a bastard
 call him a taxi
Recognizable alternations – regular sense
extensions:
 play an instrument
 play a song
 play a melody on an instrument
SIGLEX01, SIGLEX02, JNLE04
Columbia, 1/29/04
46
Groupings Methodology (cont.)
Penn
 Semantic Criteria
Differences in semantic classes of arguments
 Abstract/concrete, human/animal,
animate/inanimate, different instrument types,…
Differences in entailments
 Change of prior entity or creation of a new entity?
Differences in types of events
 Abstract/concrete/mental/emotional/….
Specialized subject domains
Columbia, 1/29/04
47
Results – averaged over 28 verbs
Dang and Palmer, Siglex02,Dang et al,Coling02
Penn
Total
WN polysemy
16.28
Group polysemy
8.07
ITA-fine
71%
ITA-group
82%
MX-fine
60.2%
MX-group
69%
MX – Maximum Entropy WSD, p(sense|context)
Features: topic, syntactic constituents, semantic classes
48
+2.5%,
+1.5 to +5%,
+6%
Columbia, 1/29/04
Grouping improved ITA
and Maxent WSD
Penn
 Call: 31% of errors due to confusion between senses
within same group 1:
 name, call -- (assign a specified, proper name to; They named
their son David)
 call -- (ascribe a quality to or give a name of a common noun
that reflects a quality; He called me a bastard)
 call -- (consider or regard as being;I would not call her beautiful)
 75% with training and testing on grouped senses vs.
 43% with training and testing on fine-grained senses
Columbia, 1/29/04
49
WordNet: - call, 28 senses, groups
WN5, WN16,WN12
Loud cry
WN3
WN19
WN1 WN22
Label
WN15 WN26
Bird or animal cry
WN4 WN 7 WN8 WN9
Request
WN20
WN18 WN27
Challenge
WN2 WN 13
Phone/radioWN28
WN17 , WN 11
Columbia, 1/29/04
Penn
WN6
WN25
Call a loan/bond
WN23
Visit
WN10, WN14, WN21, WN24,
Bid
50
WordNet: - call, 28 senses, groups
WN5, WN16,WN12
Loud cry
WN3
WN19
WN1 WN22
Label
WN15 WN26
Bird or animal cry
WN4 WN 7 WN8 WN9
Request
WN20
WN18 WN27
Challenge
WN2 WN 13
Phone/radioWN28
WN17 , WN 11
Columbia, 1/29/04
Penn
WN6
WN25
Call a loan/bond
WN23
Visit
WN10, WN14, WN21, WN24,
Bid
51
Overlap between Groups and
Framesets – 95%
Penn
Frameset2
Frameset1
WN1 WN2
WN3 WN4
WN6 WN7 WN8
WN5 WN 9 WN10
WN11 WN12 WN13
WN19
WN 14
WN20
develop
Palmer, Dang & Fellbaum, NLE 2004
Columbia, 1/29/04
52
Sense Hierarchy
Penn
 PropBank Framesets –
coarse grained distinctions
20 Senseval 2 verbs w/ > 1 Frameset
Maxent WSD system, 73.5% baseline, 90% accuracy
 Sense Groups (Senseval-2) intermediate level
(includes Levin classes) – 95% overlap, 69%
 WordNet – fine grained distinctions, 60.2%
Columbia, 1/29/04
53
English lexical resource is available
Penn
That provides sets of possible syntactic
frames for verbs with semantic role labels
that can be automatically assigned
accurately to new text.
And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
54
A Chinese Treebank Sentence
Penn
国会/Congress 最近/recently 通过/pass 了/ASP 银行法
/banking law
“The Congress passed the banking law recently.”
(IP (NP-SBJ (NN 国会/Congress))
(VP (ADVP (ADV 最近/recently))
(VP (VV 通过/pass)
(AS 了/ASP)
(NP-OBJ (NN 银行法/banking law)))))
Columbia, 1/29/04
55
The Same Sentence, PropBanked
Penn
(IP (NP-SBJ arg0 (NN 国会))
(VP argM (ADVP (ADV 最近))
(VP f2 (VV 通过)
(AS 了)
arg1 (NP-OBJ (NN 银行法)))))
通过(f2) (pass)
arg0
国会
argM
最近
arg1
银行法 (law)
(congress)
Columbia, 1/29/04
56
Chinese PropBank Status (w/ Bert Xue and Scott Cotton)
Penn
 Create Frame File for that verb  Similar alternations – causative/inchoative,
unexpressed object
 5000 lemmas, 3000 DONE, (hired Jiang)
 First pass: Automatic tagging 2500 DONE
 Subcat frame matcher
(Xue & Kulick, MT03)
 Second pass: Double blind hand correction
 In progress (includes frameset tagging), 1000 DONE
 Ported RATS to CATS, in use since May
 Third pass: Solomonization (adjudication)
Columbia, 1/29/04
57
A Korean Treebank Sentence
Penn
그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다.
He added that Renault has a deadline until the end of March for a merger
proposal.
(S (NP-SBJ 그/NPN+은/PAU)
(VP (S-COMP (NP-SBJ 르노/NPR+이/PCA)
(VP (VP (NP-ADV 3/NNU
월/NNX+말/NNX+까지/PAU)
(VP (NP-OBJ 인수/NNC+제의/NNC
시한/NNC+을/PCA)
갖/VV+고/ECS))
있/VX+다/EFN+고/PAD)
덧붙이/VV+었/EPF+다/EFN)
./SFN)
Columbia, 1/29/04
58
The same sentence, PropBanked
덧붙이었다
Arg0
그는
Arg2
갖고 있다
Arg0
르노가
Arg1
ArgM
3 월말까지
Penn
(S Arg0 (NP-SBJ 그/NPN+은/PAU)
(VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노/NPR+이/PCA)
(VP (VP ( ArgM NP-ADV 3/NNU
월/NNX+말/NNX+까지/PAU)
(VP ( Arg1 NP-OBJ 인수/NNC+제의/NNC
시한/NNC+을/PCA)
갖/VV+고/ECS))
있/VX+다/EFN+고/PAD)
덧붙이/VV+었/EPF+다/EFN)
./SFN)
인수제의 시한을
덧붙이다 (그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다)
(add)
(he) (Renaut has a deadline until the end of March for a merger proposal)
갖다 (르노가,
(has)
Columbia, 1/29/04
3 월말까지,
인수제의 시한을)
(Renaut) (until the end of March) (a deadline for a merger proposal)
59
PropBank II
Penn
 Nominalizations NYU
 Lexical Frames DONE
 Event Variables, (including temporals and
locatives)
 More fine-grained sense tagging
 Tagging nominalizations w/ WordNet sense
 Selected verbs and nouns
 Nominal Coreference
 not names
 Clausal Discourse connectives – selected subset
Columbia, 1/29/04
60
PropBank II
Event variables;
Penn
sense tags;
nominal reference;
discourse connectives
{Also}, [Arg0substantially lower Dutch corporate tax rates]
helped [Arg1[Arg0 the company] keep [Arg1 its tax outlay] [Arg3PRD flat] [ArgM-ADV relative to earnings growth]].
ID#
REL
h23
help
tax rates
help2,5 tax rate1
the company
keep its tax
outlay flat
k16
keep the
keep1 company1
company
its tax outlay
Columbia, 1/29/04
Arg0
Arg1
Arg3PRD
ArgM-ADV
flat
relative to
earnings…
61
Summary
Penn
 Shallow semantic annotation that captures critical
dependencies and semantic role labels
 Supports training of supervised automatic
taggers
 Methodology ports readily to other languages
English PropBank release – spring 2004
Chinese PropBank release – fall 2004
Korean PropBank release – summer 2005
Columbia, 1/29/04
62
Word sense in Machine Translation
Penn
 Different syntactic frames
John left the room
Juan saiu do quarto. (Portuguese)
John left the book on the table.
Juan deizou o livro na mesa.
 Same syntactic frame?
John left a fortune.
Juan deixou uma fortuna.
Columbia, 1/29/04
63
Summary of
Multilingual TreeBanks, PropBanks
Parallel
Corpora
Text
Treebank
PropBank I
Penn
Prop II
Chinese Chinese 500K Chinese 500K Chinese 500K Ch 100K
Treebank English 400K English 100K English 350K En 100K
Arabic 500K
Arabic 500K
Arabic
Treebank English 500K English ?
?
?
Korean 180K
Korean
Treebank English 50K
Korean 180K
English 50K
Columbia, 1/29/04
Korean 180K
English 50K
64
Levin class: escape-51.1-1
Penn
 WordNet Senses: WN 1, 5, 8
 Thematic Roles: Location[+concrete]
Theme[+concrete]
 Frames with Semantics
Basic Intransitive
"The convict escaped"
motion(during(E),Theme) direction(during(E),Prep,Theme, ~Location)
Intransitive (+ path PP)
"The convict escaped from the prison"
Locative Preposition Drop
"The convict escaped the prison"
Columbia, 1/29/04
65
Levin class: future_having-13.3
Penn
 WordNet Senses: WN 2,10,13
 Thematic Roles: Agent[+animate OR +organization]
Recipient[+animate OR +organization]
Theme[]
 Frames with Semantics
Dative
"I promised somebody my time"
Agent V Recipient Theme
has_possession(start(E),Agent,Theme)
future_possession(end(E),Recipient,Theme) cause(Agent,E)
Transitive (+ Recipient PP)
"We offered our paycheck to her"
Agent V Theme Prep(to) Recipient )
Transitive (Theme Object)
"I promised my house (to somebody)"
Agent V Theme
Columbia, 1/29/04
66
Automatic classification
Penn
 Merlo & Stevenson automatically classified 59
verbs with 69.8% accuracy
 1. Unergative, 2. unaccusative, 3. object-drop
 100M words automatically parsed
 C5.0. Using features: transitivity, causativity,
animacy, voice, POS
 EM clustering – 61%, 2669 instances, 1M words
 Using Gold Standard semantic role labels
 1. float hop/hope jump march leap
 2. change clear collapse cool crack open flood
 3. borrow clean inherit reap organize study
Columbia, 1/29/04
67
SENSEVAL – Word Sense
Disambiguation Evaluation
Penn
DARPA style bakeoff: training data, testing data, scoring algorithm.
SENSEVAL1
1998
Languages
3
Systems
24
Eng. Lexical Sample Yes
Verbs/Poly/Instances 13/12/215
Sense Inventory
Hector,
95.5%
Columbia, 1/29/04
SENSEVAL2
2001
12
90
Yes
29/16/110
WordNet,
73+%
NLE99, CHUM01, NLE02, NLE03
68
Maximum Entropy WSD
Hoa Dang, best performer on Verbs
Penn
 Maximum entropy framework, p(sense|context)
 Contextual Linguistic Features
Topical feature for W:
 keywords (determined automatically)
Local syntactic features for W:
 presence of subject, complements, passive?
 words in subject, complement positions, particles,
preps, etc.
Local semantic features for W:
 Semantic class info from WordNet (synsets, etc.)
 Named Entity tag (PERSON, LOCATION,..) for
proper Ns
 words within +/- 2 word window
Columbia, 1/29/04
69
Best Verb Performance - Maxent-WSD
Hoa Dang
28 verbs - average
Total
WN polysemy
16.28
ITA
71%
MX-WSD
60.2%
Penn
MX – Maximum Entropy WSD, p(sense|context)
Features: topic, syntactic constituents, semantic classes
+2.5%,
+1.5 to +5%,
+6%
Dang and Palmer, Siglex02,Dang et al,Coling02
Columbia, 1/29/04
70
Role Labels & Framesets
as features for WSD
Penn
 Preliminary results
Jinying Chen
Gold Standard PropBank annotation
 Decision Tree C5.0,
Groups
5 verbs,
Features: Frameset tags, Arg labels
 Comparable results to Maxent with
PropBank features
Syntactic frames and sense distinctions are inseparable
Columbia, 1/29/04
71
Lexical resources provide concrete
criteria for sense distinctions
Penn
 PropBank – coarse grained sense
distinctions determined by different
subcategorization frames (Framesets)
 Intersective Levin classes – regular sense
extensions through differing syntactic
constructions
 VerbNet – distinct semantic predicates for
each sense (verb class)
Are these the right distinctions?
Columbia, 1/29/04
72
Results – averaged over 28 verbs
Penn
Total
WN
16.28
Grp
8.07
ITA-fine
71%
ITA-group
82%
MX-fine
60.2%
JHU - MLultra
56.6%,58.7%
MX-group
69%
Columbia, 1/29/04
73