* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Martha Palmer`s 2004 talk slides
Japanese grammar wikipedia , lookup
Georgian grammar wikipedia , lookup
Yiddish grammar wikipedia , lookup
Germanic weak verb wikipedia , lookup
Germanic strong verb wikipedia , lookup
Serbo-Croatian grammar wikipedia , lookup
Old English grammar wikipedia , lookup
Pipil grammar wikipedia , lookup
Penn
Putting Meaning Into Your Trees
Martha Palmer
University of Pennsylvania
Columbia University
New York City
January 29, 2004
Columbia, 1/29/04
1
Outline
Penn
 Introduction
 Background: WordNet, Levin classes, VerbNet
 Proposition Bank – capturing shallow
semantics
 Mapping PropBank to VerbNet
 Mapping PropBank to WordNet
Columbia, 1/29/04
2
Ask Jeeves – A Q/A, IR ex.
Penn
What do you call a successful movie? Blockbuster
 Tips on Being a Successful Movie Vampire ... I shall call
the police.
 Successful Casting Call & Shoot for ``Clash of Empires''
... thank everyone for their participation in the making of
yesterday's movie.
 Demme's casting is also highly entertaining, although I
wouldn't go so far as to call it successful. This movie's
resemblance to its predecessor is pretty vague...
 VHS Movies: Successful Cold Call Selling: Over 100
New Ideas, Scripts, and Examples from the Nation's
Foremost Sales Trainer.
Columbia, 1/29/04
3
Ask Jeeves – filtering w/ POS tag
Penn
What do you call a successful movie?
 Tips on Being a Successful Movie Vampire ... I shall call
the police.
 Successful Casting Call & Shoot for ``Clash of Empires''
... thank everyone for their participation in the making of
yesterday's movie.
 Demme's casting is also highly entertaining, although I
wouldn't go so far as to call it successful. This movie's
resemblance to its predecessor is pretty vague...
 VHS Movies: Successful Cold Call Selling: Over 100
New Ideas, Scripts, and Examples from the Nation's
Foremost Sales Trainer.
Columbia, 1/29/04
4
Filtering out “call the police”
Penn
Syntax
call(you,movie,what)
≠
call(you,police)
Columbia, 1/29/04
5
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs.
 And provides clear, replicable sense
distinctions.
AskJeeves: Who do you call for a good
electronic lexical database for English?
Columbia, 1/29/04
6
WordNet – call, 28 senses
Penn
1. name, call -- (assign a specified, proper name to;
"They named their son David"; …)
-> LABEL
2. call, telephone, call up, phone, ring -- (get or try to get into
communication (with someone) by telephone;
"I tried to call you all night"; …)
->TELECOMMUNICATE
3. call -- (ascribe a quality to or give a name of a common
noun that reflects a quality;
"He called me a bastard"; …)
-> LABEL
4. call, send for -- (order, request, or command to come;
"She was called into the director's office"; "Call the police!")
-> ORDER
Columbia, 1/29/04
7
WordNet – Princeton (Miller 1985, Fellbaum 1998)
Penn
 On-line lexical reference (dictionary)
 Nouns, verbs, adjectives, and adverbs grouped into
synonym sets
 Other relations include hypernyms (ISA), antonyms,
meronyms
 Limitations as a computational lexicon
 Contains little syntactic information
 No explicit predicate argument structures
 No systematic extension of basic senses
 Sense distinctions are very fine-grained, ITA 73%
 No hierarchical entries
Columbia, 1/29/04
8
Levin classes
(Levin, 1993)
Penn
 3100 verbs, 47 top level classes, 193 second and third level
 Each class has a syntactic signature based on alternations.
John broke the jar. / The jar broke. / Jars break easily.
John cut the bread. / *The bread cut. / Bread cuts easily.
John hit the wall. / *The wall hit. / *Walls hit easily.
Columbia, 1/29/04
9
Levin classes
(Levin, 1993)
Penn
 Verb class hierarchy: 3100 verbs, 47 top level classes, 193
 Each class has a syntactic signature based on alternations.
John broke the jar. / The jar broke. / Jars break easily.
change-of-state
John cut the bread. / *The bread cut. / Bread cuts easily.
change-of-state, recognizable action,
sharp instrument
John hit the wall. / *The wall hit. / *Walls hit easily.
contact, exertion of force
Columbia, 1/29/04
10
Penn
Columbia, 1/29/04
11
Confusions in Levin classes?
Penn
 Not semantically homogenous
{braid, clip, file, powder, pluck, etc...}
 Multiple class listings
homonymy or polysemy?
 Conflicting alternations?
Carry verbs disallow the Conative,
(*she carried at the ball), but include
{push,pull,shove,kick,draw,yank,tug}
also in Push/pull class, does take the Conative
(she kicked at the ball)
Columbia, 1/29/04
12
Intersective Levin Classes
Penn
“apart” CH-STATE
“across the room”
“at” ¬CH-LOC
CH-LOC
Columbia, 1/29/04
Dang, Kipper & Palmer, ACL98
13
Intersective Levin Classes
Penn
 More syntactically and semantically coherent
sets of syntactic patterns
explicit semantic components
relations between senses
VERBNET
www.cis.upenn.edu/verbnet
Dang, Kipper & Palmer, IJCAI00, Coling00
Columbia, 1/29/04
14
VerbNet – Karin Kipper
Penn
 Class entries:
Capture generalizations about verb behavior
Organized hierarchically
Members have common semantic elements,
semantic roles and syntactic frames
 Verb entries:
Refer to a set of classes (different senses)
each class member linked to WN synset(s)
(not all WN senses are covered)
Columbia, 1/29/04
15
Semantic role labels:
Penn
Julia broke the LCD projector.
break (agent(Julia), patient(LCD-projector))
cause(agent(Julia),
broken(LCD-projector))
agent(A) -> intentional(A), sentient(A),
causer(A), affector(A)
patient(P) -> affected(P), change(P),…
Columbia, 1/29/04
16
Hand built resources vs. Real data
Penn
 VerbNet is based on linguistic theory –
how useful is it?
 How well does it correspond to syntactic
variations found in naturally occurring text?
PropBank
Columbia, 1/29/04
17
Proposition Bank:
Penn
From Sentences to Propositions
Powell met Zhu Rongji
battle
wrestle
join
debate
Powell and Zhu Rongji met
Powell met with Zhu Rongji
Powell and Zhu Rongji had
a meeting
consult
Proposition: meet(Powell, Zhu Rongji)
meet(Somebody1, Somebody2)
...
When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane.
meet(Powell, Zhu)
Columbia, 1/29/04
discuss([Powell, Zhu], return(X, plane))
18
Capturing semantic roles*
Penn
SUBJ
 Owen broke [ ARG1 the laser pointer.]
SUBJ
 [ARG1 The windows] were broken by the
hurricane.
SUBJ
 [ARG1 The vase] broke into pieces when it
toppled over.
*See also Framenet, http://www.icsi.berkeley.edu/~framenet/
Columbia, 1/29/04
19
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels.
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
20
A TreeBanked Sentence
Penn
(S (NP-SBJ Analysts)
(VP have
(VP been
VP
(VP expecting
(NP (NP a GM-Jaguar pact)
have VP
(SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1)
NPbeen VP
(VP would
Analyst
SBJ
expectingNP
(VP give
s
(NP the U.S. car maker)
SBAR
NP
S (NP (NP an eventual (ADJP 30 %) stake)
a GM-Jaguar WHNP-1
(PP-LOC
in (NP the British company))))))))))))
VP
pact
that NP-SBJ
VP
*T*-1 would
NP
give
PPNP
Analysts have been expecting a GM-Jaguar
NP
LOC
pact that would give the U.S. car maker an the US car
NP
an eventual
maker
eventual 30% stake in the British company.
the British
30% stake in
company
S
Columbia, 1/29/04
21
The same sentence, PropBanked
Penn
(S Arg0 (NP-SBJ Analysts)
(VP have
(VP been
Arg1
(VP expecting
Arg1 (NP (NP a GM-Jaguar pact)
(SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1)
a GM-Jaguar
(VP would
pact
(VP give
Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %)
stake)
Arg0
(PP-LOC in (NP the British
that would give
Arg1
company))))))))))))
have been expecting
Arg0
Analyst
s
*T*-1
Arg2
the US car
maker
Columbia, 1/29/04
an eventual 30% stake in the
British company
expect(Analysts, GM-J pact)
give(GM-J pact, US car maker, 30% stake)
22
Frames File Example: expect
Penn
Roles:
Arg0: expecter
Arg1: thing expected
Example: Transitive, active:
Portfolio managers expect further declines in
interest rates.
Arg0:
REL:
Arg1:
Columbia, 1/29/04
Portfolio managers
expect
further declines in interest rates
23
Frames File example: give
Penn
Roles:
Arg0: giver
Arg1: thing given
Arg2: entity given to
Example:
double object
The executives gave the chefs a standing ovation.
Arg0:
The executives
REL:
gave
Arg2:
the chefs
Arg1:
a standing ovation
Columbia, 1/29/04
24
Word Senses in PropBank
Penn
 Orders to ignore word sense not feasible for 700+
verbs
 Mary left the room
 Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":
Arg0: entity leaving
Arg1: place left
Frameset leave.02 "give":
Arg0: giver
Arg1: thing given
Arg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Columbia, 1/29/04
25
Annotation procedure
Penn
 PTB II - Extraction of all sentences with given verb
 Create Frame File for that verb Paul Kingsbury
 (3100+ lemmas, 4400 framesets,118K predicates)
 Over 300 created automatically via VerbNet
 First pass: Automatic tagging (Joseph Rosenzweig)
 http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon
 Second pass: Double blind hand correction
Paul Kingsbury
 Tagging tool highlights discrepancies Scott Cotton
 Third pass: Solomonization (adjudication)
 Betsy Klipple, Olga Babko-Malaya
Columbia, 1/29/04
26
Trends in Argument Numbering
Penn
 Arg0 = agent
 Arg1 = direct object / theme / patient
 Arg2 = indirect object / benefactive /
instrument / attribute / end state
 Arg3 = start point / benefactive / instrument /
attribute
 Arg4 = end point
 Per word vs frame level – more general?
Columbia, 1/29/04
27
Additional tags
(arguments or adjuncts?)
Penn
 Variety of ArgM’s (Arg#>4):
 TMP - when?
 LOC - where at?
 DIR - where to?
 MNR - how?
 PRP -why?
 REC - himself, themselves, each other
 PRD -this argument refers to or modifies
another
 ADV –others
Columbia, 1/29/04
28
Inflection
Penn
 Verbs also marked for tense/aspect
Passive/Active
Perfect/Progressive
Third singular (is has does was)
Present/Past/Future
Infinitives/Participles/Gerunds/Finites
 Modals and negations marked as ArgMs
Columbia, 1/29/04
29
Frames: Multiple Framesets
Penn
 Out of the 787 most frequent verbs:
 1 Frameset – 521
 2 Frameset – 169
 3+ Frameset - 97 (includes light verbs)
 94% ITA
 Framesets are not necessarily consistent between
different senses of the same verb
 Framesets are consistent between different verbs
that share similar argument structures,
(like FrameNet)
Columbia, 1/29/04
30
Ergative/Unaccusative Verbs
Penn
Roles (no ARG0 for unaccusative verbs)
Arg1 = Logical subject, patient, thing rising
Arg2 = EXT, amount risen
Arg3* = start point
Arg4 = end point
Sales rose 4% to $3.28 billion from $3.16
billion.
The Nasdaq composite index added 1.01
to 456.6 on paltry volume.
Columbia, 1/29/04
31
Actual data for leave
Penn
 http://www.cs.rochester.edu/~gildea/PropBank/Sort/
Leave .01 “move away from” Arg0 rel Arg1 Arg3
Leave .02 “give” Arg0 rel Arg1 Arg2
sub-ARG0 obj-ARG1 44
sub-ARG0 20
sub-ARG0 NP-ARG1-with obj-ARG2 17
sub-ARG0 sub-ARG2 ADJP-ARG3-PRD 10
sub-ARG0 sub-ARG1 ADJP-ARG3-PRD 6
sub-ARG0 sub-ARG1 VP-ARG3-PRD 5
NP-ARG1-with obj-ARG2 4
obj-ARG1 3
sub-ARG0 sub-ARG2 VP-ARG3-PRD 3
Columbia, 1/29/04
32
Penn
PropBank/FrameNet
Buy
Sell
Arg0: buyer
Arg0: seller
Arg1: goods
Arg1: goods
Arg2: seller
Arg2: buyer
Arg3: rate
Arg3: rate
Arg4: payment
Arg4: payment
Broader, more neutral, more syntactic –
maps readily to VN,TR.FN
Rambow, et al, PMLB03
33
Columbia, 1/29/04
Annotator accuracy – ITA 84%
Penn
Annotator Accuracy-primary labels only
0.96
hertlerb
0.95
forbesk
0.94
solaman2
istreit
accuracy
0.93
0.92
wiarmstr
0.91
0.9
0.89
kingsbur
ksledge
nryant
jaywang
malayao
0.88
0.87
0.86
1000
ptepper
cotter
10000
delilkan
100000
1000000
# of annotations (log scale)
Columbia, 1/29/04
34
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels?
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
35
English lexical resource is required
Penn
 That provides sets of possible syntactic
frames for verbs with semantic role labels
that can be automatically
assigned accurately to new text?
 And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
36
Automatic Labelling of Semantic
Relations
Penn
• Stochastic Model
• Features:
Predicate
Phrase Type
Parse Tree Path
Position (Before/after predicate)
Voice (active/passive)
Head Word
Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02
Columbia, 1/29/04
37
Semantic Role Labelling AccuracyKnown Boundaries
Gold St. parses
Framenet PropBank
≥ 10 inst
77.0
Automatic parses 82.0
73.6
Penn
PropBank
≥ 10 instances
83.1
79.6
•Accuracy of semantic role prediction for known boundaries--the
system is given the constituents to classify.
•FrameNet examples (training/test) are handpicked to be unambiguous.
• Lower performance with unknown boundaries.
• Higher performance with traces.
• Almost evens out.
Columbia, 1/29/04
38
Additional Automatic Role Labelers
Penn
 Performance improved from 77% to 88% Colorado
 (Gold Standard parses, < 10 instances)
 Same features plus
 Named Entity tags
 Head word POS
 For unseen verbs – backoff to automatic verb clusters
 SVM’s
 Role or not role
 For each likely role, for each Arg#, Arg# or not
 No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,
Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Columbia, 1/29/04
39
Additional Automatic Role Labelers
Penn
 Performance improved from 77% to 88% Colorado
 New results, original features, labels, 88%, 93% Penn
 (Gold Standard parses, < 10 instances)
 Same features plus
 Named Entity tags
 Head word POS
 For unseen verbs – backoff to automatic verb clusters
 SVM’s
 Role or not role
 For each likely role, for each Arg#, Arg# or not
 No overlapping role labels allowed
Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03,
Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03
Columbia, 1/29/04
40
Word Senses in PropBank
Penn
 Orders to ignore word sense not feasible for 700+
verbs
 Mary left the room
 Mary left her daughter-in-law her pearls in her will
Frameset leave.01 "move away from":
Arg0: entity leaving
Arg1: place left
Frameset leave.02 "give":
Arg0: giver
Arg1: thing given
Arg2: beneficiary
How do these relate to traditional word senses in VerbNet and WordNet?
Columbia, 1/29/04
41
Mapping from PropBank to VerbNet
Frameset id =
leave.02
Sense =
give
VerbNet class =
future-having 13.3
Arg0
Giver
Agent
Arg1
Thing given
Theme
Arg2
Benefactive
Recipient
Columbia, 1/29/04
Penn
42
Mapping from PB to VerbNet
Penn
Columbia, 1/29/04
43
Mapping from PropBank to VerbNet
Penn
 Overlap with PropBank framesets
 50,000 PropBank instances
 < 50% VN entries, > 85% VN classes
 Results
 MATCH - 78.63%. (80.90% relaxed)
 (VerbNet isn’t just linguistic theory!)
 Benefits
 Thematic role labels and semantic predicates
 Can extend PropBank coverage with VerbNet classes
 WordNet sense tags
Kingsbury & Kipper, NAACL03, Text Meaning Workshop
http://www.cs.rochester.edu/~gildea/VerbNet/
Columbia, 1/29/04
44
WordNet as a WSD sense inventory
Penn
 Senses unnecessarily fine-grained?
 Word Sense Disambiguation bakeoffs
Senseval1 – Hector, ITA = 95.5%
Senseval2 – WordNet 1.7, ITA verbs = 71%
Groupings of Senseval2 verbs, ITA =82%
 Used syntactic and semantic criteria
Columbia, 1/29/04
45
Groupings Methodology
(w/ Dang and Fellbaum)
Penn
 Double blind groupings, adjudication
 Syntactic Criteria (VerbNet was useful)
Distinct subcategorization frames
 call him a bastard
 call him a taxi
Recognizable alternations – regular sense
extensions:
 play an instrument
 play a song
 play a melody on an instrument
SIGLEX01, SIGLEX02, JNLE04
Columbia, 1/29/04
46
Groupings Methodology (cont.)
Penn
 Semantic Criteria
Differences in semantic classes of arguments
 Abstract/concrete, human/animal,
animate/inanimate, different instrument types,…
Differences in entailments
 Change of prior entity or creation of a new entity?
Differences in types of events
 Abstract/concrete/mental/emotional/….
Specialized subject domains
Columbia, 1/29/04
47
Results – averaged over 28 verbs
Dang and Palmer, Siglex02,Dang et al,Coling02
Penn
Total
WN polysemy
16.28
Group polysemy
8.07
ITA-fine
71%
ITA-group
82%
MX-fine
60.2%
MX-group
69%
MX – Maximum Entropy WSD, p(sense|context)
Features: topic, syntactic constituents, semantic classes
48
+2.5%,
+1.5 to +5%,
+6%
Columbia, 1/29/04
Grouping improved ITA
and Maxent WSD
Penn
 Call: 31% of errors due to confusion between senses
within same group 1:
 name, call -- (assign a specified, proper name to; They named
their son David)
 call -- (ascribe a quality to or give a name of a common noun
that reflects a quality; He called me a bastard)
 call -- (consider or regard as being;I would not call her beautiful)
 75% with training and testing on grouped senses vs.
 43% with training and testing on fine-grained senses
Columbia, 1/29/04
49
WordNet: - call, 28 senses, groups
WN5, WN16,WN12
Loud cry
WN3
WN19
WN1 WN22
Label
WN15 WN26
Bird or animal cry
WN4 WN 7 WN8 WN9
Request
WN20
WN18 WN27
Challenge
WN2 WN 13
Phone/radioWN28
WN17 , WN 11
Columbia, 1/29/04
Penn
WN6
WN25
Call a loan/bond
WN23
Visit
WN10, WN14, WN21, WN24,
Bid
50
WordNet: - call, 28 senses, groups
WN5, WN16,WN12
Loud cry
WN3
WN19
WN1 WN22
Label
WN15 WN26
Bird or animal cry
WN4 WN 7 WN8 WN9
Request
WN20
WN18 WN27
Challenge
WN2 WN 13
Phone/radioWN28
WN17 , WN 11
Columbia, 1/29/04
Penn
WN6
WN25
Call a loan/bond
WN23
Visit
WN10, WN14, WN21, WN24,
Bid
51
Overlap between Groups and
Framesets – 95%
Penn
Frameset2
Frameset1
WN1 WN2
WN3 WN4
WN6 WN7 WN8
WN5 WN 9 WN10
WN11 WN12 WN13
WN19
WN 14
WN20
develop
Palmer, Dang & Fellbaum, NLE 2004
Columbia, 1/29/04
52
Sense Hierarchy
Penn
 PropBank Framesets –
coarse grained distinctions
20 Senseval 2 verbs w/ > 1 Frameset
Maxent WSD system, 73.5% baseline, 90% accuracy
 Sense Groups (Senseval-2) intermediate level
(includes Levin classes) – 95% overlap, 69%
 WordNet – fine grained distinctions, 60.2%
Columbia, 1/29/04
53
English lexical resource is available
Penn
That provides sets of possible syntactic
frames for verbs with semantic role labels
that can be automatically assigned
accurately to new text.
And provides clear, replicable sense
distinctions.
Columbia, 1/29/04
54
A Chinese Treebank Sentence
Penn
国会/Congress 最近/recently 通过/pass 了/ASP 银行法
/banking law
“The Congress passed the banking law recently.”
(IP (NP-SBJ (NN 国会/Congress))
(VP (ADVP (ADV 最近/recently))
(VP (VV 通过/pass)
(AS 了/ASP)
(NP-OBJ (NN 银行法/banking law)))))
Columbia, 1/29/04
55
The Same Sentence, PropBanked
Penn
(IP (NP-SBJ arg0 (NN 国会))
(VP argM (ADVP (ADV 最近))
(VP f2 (VV 通过)
(AS 了)
arg1 (NP-OBJ (NN 银行法)))))
通过(f2) (pass)
arg0
国会
argM
最近
arg1
银行法 (law)
(congress)
Columbia, 1/29/04
56
Chinese PropBank Status (w/ Bert Xue and Scott Cotton)
Penn
 Create Frame File for that verb  Similar alternations – causative/inchoative,
unexpressed object
 5000 lemmas, 3000 DONE, (hired Jiang)
 First pass: Automatic tagging 2500 DONE
 Subcat frame matcher
(Xue & Kulick, MT03)
 Second pass: Double blind hand correction
 In progress (includes frameset tagging), 1000 DONE
 Ported RATS to CATS, in use since May
 Third pass: Solomonization (adjudication)
Columbia, 1/29/04
57
A Korean Treebank Sentence
Penn
그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다.
He added that Renault has a deadline until the end of March for a merger
proposal.
(S (NP-SBJ 그/NPN+은/PAU)
(VP (S-COMP (NP-SBJ 르노/NPR+이/PCA)
(VP (VP (NP-ADV 3/NNU
월/NNX+말/NNX+까지/PAU)
(VP (NP-OBJ 인수/NNC+제의/NNC
시한/NNC+을/PCA)
갖/VV+고/ECS))
있/VX+다/EFN+고/PAD)
덧붙이/VV+었/EPF+다/EFN)
./SFN)
Columbia, 1/29/04
58
The same sentence, PropBanked
덧붙이었다
Arg0
그는
Arg2
갖고 있다
Arg0
르노가
Arg1
ArgM
3 월말까지
Penn
(S Arg0 (NP-SBJ 그/NPN+은/PAU)
(VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노/NPR+이/PCA)
(VP (VP ( ArgM NP-ADV 3/NNU
월/NNX+말/NNX+까지/PAU)
(VP ( Arg1 NP-OBJ 인수/NNC+제의/NNC
시한/NNC+을/PCA)
갖/VV+고/ECS))
있/VX+다/EFN+고/PAD)
덧붙이/VV+었/EPF+다/EFN)
./SFN)
인수제의 시한을
덧붙이다 (그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다)
(add)
(he) (Renaut has a deadline until the end of March for a merger proposal)
갖다 (르노가,
(has)
Columbia, 1/29/04
3 월말까지,
인수제의 시한을)
(Renaut) (until the end of March) (a deadline for a merger proposal)
59
PropBank II
Penn
 Nominalizations NYU
 Lexical Frames DONE
 Event Variables, (including temporals and
locatives)
 More fine-grained sense tagging
 Tagging nominalizations w/ WordNet sense
 Selected verbs and nouns
 Nominal Coreference
 not names
 Clausal Discourse connectives – selected subset
Columbia, 1/29/04
60
PropBank II
Event variables;
Penn
sense tags;
nominal reference;
discourse connectives
{Also}, [Arg0substantially lower Dutch corporate tax rates]
helped [Arg1[Arg0 the company] keep [Arg1 its tax outlay] [Arg3PRD flat] [ArgM-ADV relative to earnings growth]].
ID#
REL
h23
help
tax rates
help2,5 tax rate1
the company
keep its tax
outlay flat
k16
keep the
keep1 company1
company
its tax outlay
Columbia, 1/29/04
Arg0
Arg1
Arg3PRD
ArgM-ADV
flat
relative to
earnings…
61
Summary
Penn
 Shallow semantic annotation that captures critical
dependencies and semantic role labels
 Supports training of supervised automatic
taggers
 Methodology ports readily to other languages
English PropBank release – spring 2004
Chinese PropBank release – fall 2004
Korean PropBank release – summer 2005
Columbia, 1/29/04
62
Word sense in Machine Translation
Penn
 Different syntactic frames
John left the room
Juan saiu do quarto. (Portuguese)
John left the book on the table.
Juan deizou o livro na mesa.
 Same syntactic frame?
John left a fortune.
Juan deixou uma fortuna.
Columbia, 1/29/04
63
Summary of
Multilingual TreeBanks, PropBanks
Parallel
Corpora
Text
Treebank
PropBank I
Penn
Prop II
Chinese Chinese 500K Chinese 500K Chinese 500K Ch 100K
Treebank English 400K English 100K English 350K En 100K
Arabic 500K
Arabic 500K
Arabic
Treebank English 500K English ?
?
?
Korean 180K
Korean
Treebank English 50K
Korean 180K
English 50K
Columbia, 1/29/04
Korean 180K
English 50K
64
Levin class: escape-51.1-1
Penn
 WordNet Senses: WN 1, 5, 8
 Thematic Roles: Location[+concrete]
Theme[+concrete]
 Frames with Semantics
Basic Intransitive
"The convict escaped"
motion(during(E),Theme) direction(during(E),Prep,Theme, ~Location)
Intransitive (+ path PP)
"The convict escaped from the prison"
Locative Preposition Drop
"The convict escaped the prison"
Columbia, 1/29/04
65
Levin class: future_having-13.3
Penn
 WordNet Senses: WN 2,10,13
 Thematic Roles: Agent[+animate OR +organization]
Recipient[+animate OR +organization]
Theme[]
 Frames with Semantics
Dative
"I promised somebody my time"
Agent V Recipient Theme
has_possession(start(E),Agent,Theme)
future_possession(end(E),Recipient,Theme) cause(Agent,E)
Transitive (+ Recipient PP)
"We offered our paycheck to her"
Agent V Theme Prep(to) Recipient )
Transitive (Theme Object)
"I promised my house (to somebody)"
Agent V Theme
Columbia, 1/29/04
66
Automatic classification
Penn
 Merlo & Stevenson automatically classified 59
verbs with 69.8% accuracy
 1. Unergative, 2. unaccusative, 3. object-drop
 100M words automatically parsed
 C5.0. Using features: transitivity, causativity,
animacy, voice, POS
 EM clustering – 61%, 2669 instances, 1M words
 Using Gold Standard semantic role labels
 1. float hop/hope jump march leap
 2. change clear collapse cool crack open flood
 3. borrow clean inherit reap organize study
Columbia, 1/29/04
67
SENSEVAL – Word Sense
Disambiguation Evaluation
Penn
DARPA style bakeoff: training data, testing data, scoring algorithm.
SENSEVAL1
1998
Languages
3
Systems
24
Eng. Lexical Sample Yes
Verbs/Poly/Instances 13/12/215
Sense Inventory
Hector,
95.5%
Columbia, 1/29/04
SENSEVAL2
2001
12
90
Yes
29/16/110
WordNet,
73+%
NLE99, CHUM01, NLE02, NLE03
68
Maximum Entropy WSD
Hoa Dang, best performer on Verbs
Penn
 Maximum entropy framework, p(sense|context)
 Contextual Linguistic Features
Topical feature for W:
 keywords (determined automatically)
Local syntactic features for W:
 presence of subject, complements, passive?
 words in subject, complement positions, particles,
preps, etc.
Local semantic features for W:
 Semantic class info from WordNet (synsets, etc.)
 Named Entity tag (PERSON, LOCATION,..) for
proper Ns
 words within +/- 2 word window
Columbia, 1/29/04
69
Best Verb Performance - Maxent-WSD
Hoa Dang
28 verbs - average
Total
WN polysemy
16.28
ITA
71%
MX-WSD
60.2%
Penn
MX – Maximum Entropy WSD, p(sense|context)
Features: topic, syntactic constituents, semantic classes
+2.5%,
+1.5 to +5%,
+6%
Dang and Palmer, Siglex02,Dang et al,Coling02
Columbia, 1/29/04
70
Role Labels & Framesets
as features for WSD
Penn
 Preliminary results
Jinying Chen
Gold Standard PropBank annotation
 Decision Tree C5.0,
Groups
5 verbs,
Features: Frameset tags, Arg labels
 Comparable results to Maxent with
PropBank features
Syntactic frames and sense distinctions are inseparable
Columbia, 1/29/04
71
Lexical resources provide concrete
criteria for sense distinctions
Penn
 PropBank – coarse grained sense
distinctions determined by different
subcategorization frames (Framesets)
 Intersective Levin classes – regular sense
extensions through differing syntactic
constructions
 VerbNet – distinct semantic predicates for
each sense (verb class)
Are these the right distinctions?
Columbia, 1/29/04
72
Results – averaged over 28 verbs
Penn
Total
WN
16.28
Grp
8.07
ITA-fine
71%
ITA-group
82%
MX-fine
60.2%
JHU - MLultra
56.6%,58.7%
MX-group
69%
Columbia, 1/29/04
73
					 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            