Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Wikification via Link Co-occurrence Presenter : WU, MIN-CONG Authors : ZHIYUAN CAI, KAIQI ZHAO, AND KENNY Q. ZHU,AND HAIXUN WANG 2013, CIKM Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments Intelligent Database Systems Lab Motivation • Wikipedia concepts by the link graph or link distributions but, link structure or link distribution is often biased or incomplete by themselves due to the fact that Wikipedia pages are often sparsely linked. Intelligent Database Systems Lab Objectives • We propose an iterative method to enrich the sparsely-linked articles by adding more links and then use the resulting link co-occurrence matrix Intelligent Database Systems Lab Methodology-framework Intelligent Database Systems Lab Methodology-preprocessing Intelligent Database Systems Lab Methodology-preprocessing Produce by Algorithm 1 by Intelligent Database Systems Lab Methodology- co-occurrence matrix generation Matrix Initialization problem problem caused computing the cooccurrence within the whole article computationally demanding Multiple topics caused each other in the article might not be related at all! Solve: Therefore we only consider two concepts co-occur if they are less than Wc terms Set: Intelligent Database Systems Lab Methodology- co-occurrence matrix generation Avoid Scc = 0 No discrimination Concept 1’s scpre = 20 Concept 2’s scpre = 19 Intelligent Database Systems Lab Methodology- Wikify New Documents BCC Next step Intelligent Database Systems Lab Experiment - Parameter Settings Intelligent Database Systems Lab Experiment - Data Preparation first dataset :Cucerzan second dataset : Kulkarni third dataset : our own creation which is extracted from 25 articles Intelligent Database Systems Lab Experiment - Effects of Wikipedia Corpus Sizes increases the cost in time and space doesn’t give us proportional gain. Intelligent Database Systems Lab Experiment - Iteration Results accuracy stabilizes above 0.9. Intelligent Database Systems Lab Experiment - End-to-End Wikification Results Intelligent Database Systems Lab Conclusions • Our evaluation shows that the co-occurrence based wikification can achieve high accuracy (about 82.58% on F1) efficiently (over 1000 words per second) Intelligent Database Systems Lab Comments • Advantages – high accuracy. – efficiently • Applications – Phrase Sense Disambiguation. Intelligent Database Systems Lab