Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國立雲林科技大學 National Yunlin University of Science and Technology Bridging Domains Using World Wide Knowledge for Transfer Learning Evan Wei Xiang, Bin Cao, Derek Hao Hu, and Qiang Yang TKDE, 2010 presented by Wen-Chung Liao, 2010/05/12 1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outlines 2 Motivation Objectives Methodology Experiments Conclusions Comments Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Supervised learning, require sufficient labeled instances It is not easy or feasible to obtain new labeled data in a domain of interest To solve this problem, transfer learning techniques ─ ─ ─ 3 capture the shared knowledge from some related domains (source domains ) where labeled data are available use the knowledge to improve the performance of data mining tasks in a target domain. domain adaptation techniques, However, transfer learning may not work well when the difference (information gap) between the source and target domains is large. Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives To solve this problem, introduce a bridge between the two different domains ─ by leveraging additional knowledge sources ─ ─ treat the two domains from a single underlying distribution “domain adaptation problem” classification problem under the supervised setting or a semisupervised (transductive) setting. Introduces a novel domain adaptation algorithm called BIG (Bridging Information Gap). ─ we apply semisupervised learning (SSL) to domain adaption problems based on the use of the auxiliary data (bridge). 4 Wikipedia or the Open Directory Project (ODP) the labeled data from the source domain the unlabeled data from the target domain an auxiliary data source such as the Wikipedia. Intelligent Database Systems Lab N.Y.U.S.T. I. M. Support vector machines (SVMs) 5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology Information Gap with No Background Knowledge Available SVM Information Gap with Background Knowledge TSVM Selecting the set of unlabeled data {xi} from K to minimize the margin NP-Hard 9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions THREE MAJOR CONTRIBUTIONS 1) We view the problem from a new perspective, i.e., we consider the problem of transfer learning as one of filling in the information gap based on a large document corpus. 2) we show that we can successfully bridge the source and target domains using well developed semisupervised learning algorithms. 3) We propose a minmargin algorithm that can effectively identify and reduce the information gap between two domains. FUTURE WORK ─ ─ ─ 15 First, we plan to validate the effectiveness of our approach through other semisupervised learning algorithms and other relational knowledge bases We plan to extend our approach to be able to consider heterogeneous transfer learning Finally, we will try to develop online TSVM methods for incremental cross-domain transductive learning. Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage ─ 16 new perspective Shortage Applications ─ Web and document data mining applications ─ information retrieval ─ spam detection ─ online advertisement ─ Web search Intelligent Database Systems Lab