Download Related Terms Search In Wiki Resources

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
RCDL'2009
Related terms search based
on WordNet / Wiktionary
and its application in
ontology matching
St. Petersburg Institute
for Informatics and Automation of RAS
Jönköping University, Sweden
Feiyu Lin, A. Krizhanovsky (andrew.krizhanovsky at gmail.com)
Contents

Wiki and Wiktionary intro

MRD, parser and Wiktionaries comparison

Correlation of relatedness measures



Experiment scheme
Result and comparison
Results, applications and future
2
Goal

Is it possible to find related terms by the
current version of Wiktionary
as successfully as by WordNet?




for ontology matching,
for application in text search systems,
etc.
What advantages?
Wiki-resources

Distributed users and authors (edit pages)

Centralized storage (e.g. MySQL, Apache, PHP)

Set of hyper linked articles

Each article has one or more categories (tree)
* Example: http://en.wikipedia.org
4
Wiktionary is
a free-content
multilingual
dictionary
Wiktionary data: +, -,
simplicity & complexity
+ Rich data
+ thesaurus
(synonyms, antonyms )
+ phrase books
+ etymologies
+ pronunciations
+ sample quotations
+ translations
+ Fast growing data
+ Interwiki (add. data)
+ GNU DFL
− Different
wiktionaries have
different levels of
standartization.
− Fast growing data,
but it’s created by a
huge community
(a developed parser
should be very stable)
6
Wiktionary
machinereadable
dictionary
database
scheme
7
Size of Wiktionaries
WordNet (2006): 150,000 words, 115,000 synsets
A shortest path in Russian
Wiktionary
Correlation of relatedness
measures
Correlation with
human judgments
of relatedness
measures 353-TC
to measures based
on WordNet,
English Wikipedia,
Russian
Wiktionary
Largest eight Wiktionary
editions (March 2008)
Application of Machinereadable dictionary (MRD)
Thesaurus data:

Related Terms Search

Search request extension (by synonyms) / request
reformulation (in search systems)

Request recognition in question-answering systems

Word sense disambiguation
Media data (audio + pictures)

Language learning
Work plan: done and todo
Russian Wiktionary
• Extraction (by RE)
–
–
–
–
–
Definition
Relations (synonyms…)
Translation
Audio
Graphics
• Database API
• Visualization
(MRD browser)
• Quiz & tests
(test application)
Russian Wiktionary
• Database scheme
–
–
–
–
–
Definition
Relations (synonyms…)
Translation
Audio
Graphics
• Database API
English Wiktionary
Implementation

Software based on Synarcher code

Java

MySQL or SQLite database

JUnit test framework
15
Results
The scheme of the experiment for calculating
the semantic relatedness measure based on
Russian Wiktionary data
 The parser of Russian Wiktionary



Database scheme designed
Database API implemented in Java

Compared the results of related terms search
based on Wiktionary and WordNet

Project site (Wiki tool kit)

http://code.google.com/p/wikokit/
16
Future work



Finish creation MRD

Database and software

Russian Wiktionary and English Wiktionary
Visualization (JavaFX)

MRD browser

Quiz & tests (learning application)
Online application (Java Web-start)
Thank you!
Related documents