Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Semantic Roles and Ontologies [email protected] Ontologies • Growing interest in the data structures known as ontologies • Language expressions covering the individual domains, hierarchically organized • E.g. Top Ontology from EuroWordNet – has been designed for English as whole • Others – SUMO/MILO, CYC, etc. • How they are built – introspectively, designed usually from the ‘top’ • Large ontologies (TO EWN) are not based on evidence, on language data coming from corpora • In the area called Semantic Web ontologies are developed quite pragmatically, thus their compatibility is questionable Semantic roles • In NLP area much attention is paid to ‘valency frames’ of verbs • Data structures that describe the relational properties of the verbs • Thus we speak about predicate-argument structure of verbs, e.g. for drink there is someone who drinks (AGENT) and the respective beverage (beer) – labeled as PATIENT • Semantic roles are labels used for description of the arguments – we have their inventories (their number usually is about 40-50 • Inventories of the semantic roles are built mainly from the ‘top’ as well (i.e. introspectively) How Ontologies and SRs are related? • Inventories of the semantic roles can be, in fact, viewed as a kind of ontologies • Number of verbs in Czech is about 35 000 and we want to have the semantic roles for them • We need them also for discrimination of senses – this is a critical problem • How to obtain the adequate inventory of the semantic roles – certainly not from the ‘top’ • It is necessary to have look at the language data that can be found in corpora • There are some tools that can help us with that Verballex -Valency Frames for Czech An example: Pít:1/drink:1, impf AG<person:1|animal:1>obl(kdo1) VERB SUBS<beverage:1>obl(co4) - The lexicon of such frames is being built (presently about 8 000 Czech verbs) - Approx. 3 000 of them are linked to their English equivalents in Czech WordNet Verballex – cont. • The semantic roles within Verballex are twolevel labels • General labels like AGENT, PATIENT, ADDRESSEE, SUBSTANCE, … about 50, taken from EWN TO • Lower labels such as human:1, animal:1 – literals from WordNet – approx. 200 (the list) • The frames can be used to obtain semantic classes of verbs • Our inventory of the roles is based on WordNet – there are some problems If we have a look at the real data? • Verbs like vidět/see – we can see ANYTHING, the right argument is then ENTITY • This does not help us too much, we want to describe what one can really see? • Corpora and Word Sketches – table for vidět, what follows from it? • AG(osoba|zvíře|organizace) – vidět – PAT(SITUAT{situace, problém, věc, svět, rozdíl} CAUSE{důvod, příčina, smysl, chyba, nebezpečí} STARTPOINT{východisko, perspektiva, budoucnost, možnost} OBJECT{film, karta, tvář, silueta, obrys, světlo, spousta, svět}) If we have a look…cont. • The verb slyšet/hear – Word Sketch table • AG(osoba|zvíře|organizace|ucho) – slyšet – PAT(SOUND{hluk, rachot, hukot, hučení, klapot}| SHOOT{výstřel, střelba, rána}| VOICE{křik, výkřik, řev, zpěv, pláč, nářek, smích, volání}| WORD{slovo}| NOISE{bzukot<hmyz>, šplouchání<voda>, pleskání<voda>}| IDIOM1{trávu růst<neodůvodněné podezření>}) What is to be done? • The inventory of the s. roles we are using in Verballex cannot capture semantic nature of some verb arguments • It concerns frequent verbs • We obviously need a better inventory/ontology for Czech/English verbs (and others as well – universality of the VFs • The task – how this can be done using Word Sketches and possibly semiautomatically? One Solution – CPA? • • • • Do meanings (senses) exist? Can they be empirically justified? Can corpus data help us? An example: how many meanings of the verb držet are there? • Meaning potentials (Hanks, Pustejovsky) – realization through contexts • Learning from the failure of WSD (up to 80 %) • What is the cause of the WSD failure? One Solution – cont. • WSD people are making a wrong assumption – meanings exist and can be enumerated (if we use dictionaries) • But dictionaries differ and yield different answers to this question, see držet or get in English • What is the rescue? • Look at the verb contexts, sort them, design an adequate list of their semantic roles, i. e. prepare an adequate ontology of s. roles CPA solution