Download Research Progress Report

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Concurrency control wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

ContactPoint wikipedia , lookup

Transcript
國立雲林科技大學
National Yunlin University of Science and Technology
Bridging Domains Using World
Wide Knowledge for Transfer
Learning
Evan Wei Xiang, Bin Cao, Derek Hao Hu, and Qiang Yang
TKDE, 2010
presented by Wen-Chung Liao, 2010/05/12
1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Outlines






2
Motivation
Objectives
Methodology
Experiments
Conclusions
Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation



Supervised learning, require sufficient labeled
instances
It is not easy or feasible to obtain new labeled data in
a domain of interest
To solve this problem, transfer learning techniques
─
─
─

3
capture the shared knowledge from some related domains (source
domains ) where labeled data are available
use the knowledge to improve the performance of data mining
tasks in a target domain.
domain adaptation techniques,
However, transfer learning may not work well when
the difference (information gap) between the source
and target domains is large.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objectives

To solve this problem, introduce a bridge between the
two different domains
─
by leveraging additional knowledge sources

─
─

treat the two domains  from a single underlying distribution
“domain adaptation problem”  classification problem under
the supervised setting or a semisupervised (transductive) setting.
Introduces a novel domain adaptation algorithm
called BIG (Bridging Information Gap).
─
we apply semisupervised learning (SSL) to domain adaption
problems based on the use of the auxiliary data (bridge).



4
Wikipedia or the Open Directory Project (ODP)
the labeled data from the source domain
the unlabeled data from the target domain
an auxiliary data source such as the Wikipedia.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Support vector machines (SVMs)
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
8
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Methodology
Information Gap with No Background Knowledge Available
SVM
Information Gap with Background Knowledge
TSVM
Selecting the set of unlabeled data {xi} from K
to minimize the margin
NP-Hard
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Conclusions

THREE MAJOR CONTRIBUTIONS
1) We view the problem from a new perspective, i.e., we consider the
problem of transfer learning as one of filling in the information gap
based on a large document corpus.
2) we show that we can successfully bridge the source and target domains
using well developed semisupervised learning algorithms.
3) We propose a minmargin algorithm that can effectively identify and
reduce the information gap between two domains.

FUTURE WORK
─
─
─
15
First, we plan to validate the effectiveness of our approach through other
semisupervised learning algorithms and other relational knowledge
bases
We plan to extend our approach to be able to consider heterogeneous
transfer learning
Finally, we will try to develop online TSVM methods for incremental
cross-domain transductive learning.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Comments

Advantage
─
16
new perspective

Shortage

Applications
─ Web and document data mining applications
─ information retrieval
─ spam detection
─ online advertisement
─ Web search
Intelligent Database Systems Lab