Download Support Cluster Machine

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Support Cluster Machine
Paper from ICML2007
Read by Haiqin Yang
2007-10-18
This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi,
Jianping Fan, Xiangyang Xue, which was published in 2007.
1
Outline
 Background and Motivation
 Support Cluster Machine - SCM
 Kernel in SCM
 Experiments
 An Interesting Application: Privacy-preserving Data Mining
 Discussions
2
Background and Motivation
 Large scale classification problem
 Decomposition methods
 Osuna et al., 1997;
 Joachims, 1999;
 Platt, 1999;
 Collobert & Bengio, 2001;
 Keerthi et al., 2001;
 Incremental algorithms
 Cauwenberghs & Poggio, 2000;
 Fung & Mangasarian, 2002;
 Laskov et al., 2006;
 Parallel techniques
 Collobert et al., 2001;
 Graf et al., 2004;
 Approximate formula
 Fung & Mangasarian, 2001;
 Lee & Mangasarian, 2001;
 Choose representatives
 Active learning - Schohn &
Cohn, 2003;
 Cluster Based-SVM -Yu et al.,
2003;
 Core Vector Machine (CVM) -
Tsang et al., 2005;
 Clustering SVM -Boley, D. &
Cao, 2004;
3
Support Cluster Machine - SCM
 Given training samples:
 Procedure


4
SCM Solution
 Dual representation
 Decision function
5
Kernel
 Probability product kernel
 By Gaussian assumption, i.e.,
 Hence
6
Kernel
 Property I
 That is
 Decision function
 Property II
7
Experiments
 Datasets
 Classification methods
 Toydata
 libSVM
 MNIST – Handwritten digits
 SVMTorch
(‘0’-’9’) classification
 Adult – Privacy-preserving
Dataset
 SVMlight
 Clustering algorithms
 Threshold Order Dependent
(TOD)
 EM algorithm
 CVM (Core Vector Machine)
 SCM
 Model selection


 CPU: 3.0GHz
8
Toydata
 Samples: 2500 samples/class generated from a mixture of
Gaussian distribution
 Clustering algorithm: TOD
 Clustering results: 25 positive, 25 negative
9
MNIST
 Data description
 10 classes: Handwritten digits ‘0’-’9’
 Training samples: 60,000, about 6000 for each class
 Testing samples: 10,000
 Construct 45 binary classifiers
 Results
 25 Clusters for EM algorithm
10
MNIST
 Test results for TOD algorithm
11
Privacy-preserving Data Mining
 Inter-Enterprise data mining
 Problem: Two parties owning confidential databases
wish to build a decision-tree classifier on the union of
their databases, without revealing any unnecessary
information.
 Horizontally partitioned
 Records (users) split across companies
 Example: Credit card fraud detection model
 Vertically partitioned
 Attributes split across companies
 Example: Associations across websites
12
Privacy-preserving Data Mining
 Randomization approach
30 | 70K | ...
50 | 40K | ...
Randomizer
Randomizer
65 | 20K | ...
25 | 60K | ...
Reconstruct
distribution
of Age
Reconstruct
distribution
of Salary
Data Mining
Algorithms
...
...
...
Model
13
Classification Example
A
g
e
S
a
l
a
r
y
R
e
p
e
a
t
V
i
s
i
t
o
r
?
2
35
0
K
R
e
p
e
a
t
1
73
0
K
R
e
p
e
a
t
4
34
0
K
R
e
p
e
a
t
6
85
0
K
S
i
n
g
l
e
3
27
0
K
S
i
n
g
l
e
2
02
0
K
R
e
p
e
a
t
A
g
e
<
2
5
N
o
Y
e
s
R
e
p
e
a
t
S
a
l
a
r
y
<
5
0
K
Y
e
s N
o
R
e
p
e
a
t
S
i
n
g
l
e
14
Privacy-preserving Dataset: Adult
 Data description
 Training samples: 30162
 Testing samples: 15060
 Percentage of positive samples: 24.78%
 Procedure
 Horizontally partition data into three subsets (parties)
 Cluster by TOD algorithm
 Obtain three positive and three negative GMMs
 Combine positive and negative GMMs into one positive and one negative
GMMs with modified priors
 Classify them by SCM
15
Privacy-preserving Dataset: Adult
 Partition results
 Experimental results
16
Discussions
 Solved problems
 Large scale problems: downsample by clustering + classifier
 Privacy-preserving problems: hide individual information
 Differences to other methods
 Training units are generative model, testing units are vectors
 Training units contain complete statistical information
 Only one parameter for model selection
 Easy implementation
 Generalization ability is not clear, while the RBF kernel in SVM has the
property of larger width leads to lower VC dimension.
17
Discussions
 Advantages of using priors and covariances
18
Thank
you!
19
Related documents