Download Semantic Roles and Ontologies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Semantic Roles and
Ontologies
[email protected]
Ontologies
• Growing interest in the data structures known as
ontologies
• Language expressions covering the individual domains,
hierarchically organized
• E.g. Top Ontology from EuroWordNet – has been designed
for English as whole
• Others – SUMO/MILO, CYC, etc.
• How they are built – introspectively, designed usually from
the ‘top’
• Large ontologies (TO EWN) are not based on evidence, on
language data coming from corpora
• In the area called Semantic Web ontologies are developed
quite pragmatically, thus their compatibility is questionable
Semantic roles
• In NLP area much attention is paid to ‘valency
frames’ of verbs
• Data structures that describe the relational
properties of the verbs
• Thus we speak about predicate-argument
structure of verbs, e.g. for drink there is someone
who drinks (AGENT) and the respective beverage
(beer) – labeled as PATIENT
• Semantic roles are labels used for description of
the arguments – we have their inventories (their
number usually is about 40-50
• Inventories of the semantic roles are built mainly
from the ‘top’ as well (i.e. introspectively)
How Ontologies and SRs are
related?
• Inventories of the semantic roles can be, in fact,
viewed as a kind of ontologies
• Number of verbs in Czech is about 35 000 and we
want to have the semantic roles for them
• We need them also for discrimination of senses – this
is a critical problem
• How to obtain the adequate inventory of the semantic
roles – certainly not from the ‘top’
• It is necessary to have look at the language data that
can be found in corpora
• There are some tools that can help us with that
Verballex -Valency Frames for
Czech
An example:
Pít:1/drink:1, impf
AG<person:1|animal:1>obl(kdo1)
VERB
SUBS<beverage:1>obl(co4)
- The lexicon of such frames is being built
(presently about 8 000 Czech verbs)
- Approx. 3 000 of them are linked to their
English equivalents in Czech WordNet
Verballex – cont.
• The semantic roles within Verballex are twolevel labels
• General labels like AGENT, PATIENT,
ADDRESSEE, SUBSTANCE, … about 50,
taken from EWN TO
• Lower labels such as human:1, animal:1 –
literals from WordNet – approx. 200 (the list)
• The frames can be used to obtain semantic
classes of verbs
• Our inventory of the roles is based on
WordNet – there are some problems
If we have a look at the real data?
• Verbs like vidět/see – we can see ANYTHING, the
right argument is then ENTITY
• This does not help us too much, we want to
describe what one can really see?
• Corpora and Word Sketches – table for vidět, what
follows from it?
• AG(osoba|zvíře|organizace) – vidět –
PAT(SITUAT{situace, problém, věc, svět, rozdíl}
CAUSE{důvod, příčina, smysl, chyba, nebezpečí}
STARTPOINT{východisko, perspektiva,
budoucnost, možnost}
OBJECT{film, karta, tvář, silueta, obrys, světlo,
spousta, svět})
If we have a look…cont.
• The verb slyšet/hear – Word Sketch table
• AG(osoba|zvíře|organizace|ucho) – slyšet –
PAT(SOUND{hluk, rachot, hukot, hučení, klapot}|
SHOOT{výstřel, střelba, rána}|
VOICE{křik, výkřik, řev, zpěv, pláč, nářek, smích,
volání}|
WORD{slovo}|
NOISE{bzukot<hmyz>, šplouchání<voda>,
pleskání<voda>}|
IDIOM1{trávu růst<neodůvodněné podezření>})
What is to be done?
• The inventory of the s. roles we are using in
Verballex cannot capture semantic nature
of some verb arguments
• It concerns frequent verbs
• We obviously need a better
inventory/ontology for Czech/English verbs
(and others as well – universality of the VFs
• The task – how this can be done using
Word Sketches and possibly semiautomatically?
One Solution – CPA?
•
•
•
•
Do meanings (senses) exist?
Can they be empirically justified?
Can corpus data help us?
An example: how many meanings of the
verb držet are there?
• Meaning potentials (Hanks, Pustejovsky) –
realization through contexts
• Learning from the failure of WSD (up to
80 %)
• What is the cause of the WSD failure?
One Solution – cont.
• WSD people are making a wrong
assumption – meanings exist and can be
enumerated (if we use dictionaries)
• But dictionaries differ and yield different
answers to this question, see držet or get in
English
• What is the rescue?
• Look at the verb contexts, sort them, design
an adequate list of their semantic roles, i. e.
prepare an adequate ontology of s. roles
CPA solution