Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RCDL'2009 Related terms search based on WordNet / Wiktionary and its application in ontology matching St. Petersburg Institute for Informatics and Automation of RAS Jönköping University, Sweden Feiyu Lin, A. Krizhanovsky (andrew.krizhanovsky at gmail.com) Contents Wiki and Wiktionary intro MRD, parser and Wiktionaries comparison Correlation of relatedness measures Experiment scheme Result and comparison Results, applications and future 2 Goal Is it possible to find related terms by the current version of Wiktionary as successfully as by WordNet? for ontology matching, for application in text search systems, etc. What advantages? Wiki-resources Distributed users and authors (edit pages) Centralized storage (e.g. MySQL, Apache, PHP) Set of hyper linked articles Each article has one or more categories (tree) * Example: http://en.wikipedia.org 4 Wiktionary is a free-content multilingual dictionary Wiktionary data: +, -, simplicity & complexity + Rich data + thesaurus (synonyms, antonyms ) + phrase books + etymologies + pronunciations + sample quotations + translations + Fast growing data + Interwiki (add. data) + GNU DFL − Different wiktionaries have different levels of standartization. − Fast growing data, but it’s created by a huge community (a developed parser should be very stable) 6 Wiktionary machinereadable dictionary database scheme 7 Size of Wiktionaries WordNet (2006): 150,000 words, 115,000 synsets A shortest path in Russian Wiktionary Correlation of relatedness measures Correlation with human judgments of relatedness measures 353-TC to measures based on WordNet, English Wikipedia, Russian Wiktionary Largest eight Wiktionary editions (March 2008) Application of Machinereadable dictionary (MRD) Thesaurus data: Related Terms Search Search request extension (by synonyms) / request reformulation (in search systems) Request recognition in question-answering systems Word sense disambiguation Media data (audio + pictures) Language learning Work plan: done and todo Russian Wiktionary • Extraction (by RE) – – – – – Definition Relations (synonyms…) Translation Audio Graphics • Database API • Visualization (MRD browser) • Quiz & tests (test application) Russian Wiktionary • Database scheme – – – – – Definition Relations (synonyms…) Translation Audio Graphics • Database API English Wiktionary Implementation Software based on Synarcher code Java MySQL or SQLite database JUnit test framework 15 Results The scheme of the experiment for calculating the semantic relatedness measure based on Russian Wiktionary data The parser of Russian Wiktionary Database scheme designed Database API implemented in Java Compared the results of related terms search based on Wiktionary and WordNet Project site (Wiki tool kit) http://code.google.com/p/wikokit/ 16 Future work Finish creation MRD Database and software Russian Wiktionary and English Wiktionary Visualization (JavaFX) MRD browser Quiz & tests (learning application) Online application (Java Web-start) Thank you!