Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Towards Ontology Learning from Folksonomies Jie Tang*, Ho-fung Leung#, Qiong Luo+, Dewei Chen*, and Jibin Gong* *Dept. of Computer Science and Technology, Tsinghua University #Dept. of Computer Science and Engineering, The Chinese U. of Hong Kong +Dept. of Computer Science, Hong Kong U. of Science and Technology July. 14th 2009 1 Motivation • The Semantic Web aims to provide a Web environment in which each Web document is annotated with machinereadable metadata (e.g., concept from an ontology). – Manual annotation tool, e.g., Protégé (Noy, et al., IS’01) – Automatic annotation methods using ML, e.g., iASA (Tang, et al., JoDS’05), TCRF(Tang, et al., ISWC’06) • Folksonomy provides a way to annotate the Web… – , but a really free way…… – It also poses a big challenge in reliability and consistency due to the lack of terminological control. • This work aims to learn ontology from folksonomies 2 Motivating Example Things web Merge Web2.0 web2 social web Merge semanticweb semweb Data mining clustering Merge ontologyfolksonomy ontology folksonomy tag foaf tag-clustering 3 Motivating Example Things web Merge Web2.0 web2 social web Merge semanticweb semweb Several key challenges: Data mining clustering Merge ontologyfolksonomy • How to define this problem inontology a principled way? folksonomy • How to model the synonym/hypernym/homonym tag foaf between tags? tag-clustering •How to construct the hierarchical ontology according to the modeling results? 4 Our Solution 1. Use topic to model tags and documents. Web2.0 web2 social web ontology ... tag foaf 2. Define four divergence measures to estimate the difference between tags. 3. Present an algorithm to construct the hierarchical structure from the tags. tags Tags documents Documents ------------------------------------------------------------- Ontology-based ... ------------------------------------------------------------------------------------------------------------------------- 5 ------------------------------------------------------------- Outline • Related Work • Our Approach – Modeling Folksonomy – Divergence Estimation – Hierarchical Structure Construction • Experiments • Conclusion & Future Work 6 Previous Work Ontology learning from text • WebOntEx (Han and Elmasri, 03); • Protégé plug-in (Buitelaar et al., 99); • (Maedche and Staab, 2001; Sleeman et al., 03); etc. Web2.0 web2 social web ontology ... tag foaf tags Folksonomy integration • Learning syno-/hyper-nym between tags(Li et al., 07); • Clustering tags (Specia and Motta, 2007); • Learning hierarchical relations between tags (Zhou et al., 07); • Non-taxonomic relations (Mori et al., 06); etc. Tags documents Documents ------------------------------------------------------------- Ontology-based ... ------------------------------------------------------------------------------------------------------------------------- 7 ------------------------------------------------------------- Topic models • PLSI (Hofmann, 1999); LDA (Blei et al., 03); Authortopic model (Steyvers et al., 04); etc. Outline • Related Work • Our Approach – Modeling Folksonomy – Divergence Estimation – Hierarchical Structure Construction • Experiments • Conclusion & Future Work 8 How to model tags and documents? • Input: Assume that a tag ti is used to annotate multiple documents and a document d contains a vector wd of Nd words. Then a set of tags with the annotated documents can be represented as Web2.0 web2 social web ontology ... tag foaf Tags Documents • Modeling: how to represent each document and each tag? and how to characterize the relationship between documents and tags? ------------------------------------------------------------- Ontology-based ... ------------------------------------------------------------- words tags Tag-Topic (TT) Models 9 ------------------------------------------------------------- topic ------------------------------------------------------------- Generative Story of Tagging • Generative process Document Latent Dirichlet Co-clustering IR NLP ML P(w|z) 1 2 3 4 mining 0.23 clustering 0.19 classification 0.17 …. DM Data mining NLP IR DM P(w|z) ML probabilistic model …… 10 1 2 3 4 model 0.23 learning 0.19 boost 0.17 …. We present a generative model for clusteringdocuments and terms. clustering Our model is a four hierarchical bayesian model. We present efficient inference inference techniques based on Markow Chain Monte Carlo. We report results in document modeling, document and terms clustering … Tags: Data mining, clustering, probabilistic model Tag-Topic (TT) Models Generative process: words tags Topic Tag-Topic (TT) Models 11 Topic Smoothing The new objective function: with Smoothing term Log-likelihood of the tag-topic (TT) model. 12 Divergence Estimation • Tag divergence • Hypernym-divergence • Merging-divergence • Keep-divergence 13 Estimated topic distribution Posterior probability derived from the topic modeling results Hierarchical Structure Construction Correspond to a divergence Step 1. Step 2. 14 Penalty to the complex of the generated hierarchy Outline • Related Work • Our Approach – Modeling Folksonomy – Divergence Estimation – Hierarchical Structure Construction • Experiments • Conclusion & Future Work 15 Data Sets and Evaluation Measures • Data sets – PAPER: 4,841 papers and their associated tags (8,071 unique tags and a total of 37,010 tags) from CITEULIKE – MOVIE: 4,009 movies and their tags (18,559 unique tags and a total of 142,498 tags) from IMDB • Evaluation Measures – Accuracy (against ODP or human judgement) – Case study • Baseline – Hierarchical clustering 16 Accuracy Performance 17 Case Study—Movie Merging Merging Merging Merging chicken daffy-duck duck donald duck cartoon looney tunes bugs bunny Merging cat pig bird bear Merging porky pig Merging dog cartoon cartoon cat Merging Merging cat versus mouse tom and jerry gambling gambler wager money Merging Merging automobile automobile race horse racing horse race Top Top Keep cartoon By clustering cat dog cartoon Subordinate By TT Subordinate cartoon Subordinate Keep Merging Merging gambler gambling gambler gambling wager Merging looney tunes tunes looney Subordinate mouse cartoon cat catMergingbird Subordinate Subordinate bear bear Subordinate chicken bugs bunny pig bugs bunny duck wager cat versus mouse pig Subordinate money royalty mickey Subordinate Subordinate Subordinate mouse money Merging porky pig Merging deception automobile Merging minnie porky pig Merging daffy-duck dog cartoon tom and jerry horse racing automobile race mouse chicken daffy-duck cartoon cat Merging horse racing horse race duck donald duck donald duck Subordinate horse race automobile Merging cat versus mouse tom and jerry By TT with smoothing 18 Case Study—Paper Top Merging ontologies ontology clustering link-analysis hierarchical-clustering imaging Merging graph graphs web-graph By TT web Merging semantic semantics radiology rdf Top owl Merging semanticweb semantic-web semantic_web Merging integration mapping Merging ontologies ontology clustering hierarchical-clustering web Merging image imaging Merging semantic semantics indexing knowledge Merging knowledgediscovery semanticweb semantic-web semantic_web medical-imaging image-processing Merging systems system systembiology Merging rdf owl By TT with smoothing 19 taxonomy ir Outline • Related Work • Our Approach – Modeling Folksonomy – Divergence Estimation – Hierarchical Structure Construction • Experiments • Conclusion & Future Work 20 Conclusion • Formalize a novel problem of ontology learning from folksonomies. • Exploit a probabilistic topic model to model the tags and their annotated documents and propose four divergence measures. • Present an algorithm to construct the hierarchical structure from tags. • Experimental results on two different types of realworld data sets show that our method can effectively learn the ontological hierarchy from social tags. 21 Future Work • Discover non-taxonomic relationship between tags • Ontology learning from noisy tags • Incremental ontology learning from the dynamic tagging space • Applications: – Personalized tag recommendation – Social tagging—guiding the tagging process –… 22 Thanks! Q&A HP: http://keg.cs.tsinghua.edu.cn/persons/tj/ Open resource will be available soon at: http://arnetminer.org/resources 23