Download Distance Metric Learning in Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Distance Metric Learning in Data Mining
Fei Wang
1
What is a Distance Metric?
From wikipedia
2
Distance Metric Taxonomy
Linear vs.
Nonlinear
Distance
Characteristics
Global vs.
Local
Learning Metric
Unsupervised
Algorithms
Distance Metric
Supervised
Semisupervised
Discrete
Numeric
Fixed Metric
Continuous
Euclidean
Categorical
3
Categorization of Distance Metrics: Linear vs. Non-linear
 Linear Distance
– First perform a linear mapping to project the data into some space, and
then evaluate the pairwise data distance as their Euclidean distance in
the projected space
– Generalized Mahalanobis distance
•
• If M is symmetric, and positive definite, D is a distance metric;
• If M is symmetric, and positive semi-definite, D is a pseudo-metric.
•
•
 Nonlinear Distance
– First perform a nonlinear mapping to project the data into some space,
and then evaluate the pairwise data distance as their Euclidean distance
in the projected space
4
Categorization of Distance Metrics: Global vs. Local
 Global Methods
– The distance metric satisfies some global properties of the data set
• PCA: find a direction on which the variation of the whole data set
is the largest
• LDA: find a direction where the data from the same classes are
clustered while from different classes are separated
• …
 Local Methods
– The distance metric satisfies some local properties of the data set
• LLE: find the low dimensional embeddings such that the local
neighborhood structure is preserved
• LSML: find the projection such that the local neighbors of
different classes are separated
• ….
5
Related Areas
Metric learning
Dimensionality
reduction
Kernel learning
6
Related area 1: Dimensionality reduction
 Dimensionality Reduction is the process of reducing the number of
features through Feature Selection and Feature Transformation.
 Feature Selection
 Find a subset of the original features
 Correlation, mRMR, information gain…
 Feature Transformation
 Transforms the original features in a lower dimension space
 PCA, LDA, LLE, Laplacian Embedding…
 Each dimensionality reduction technique can map a distance metric,
where we first perform dimensionality reduction, then evaluate the
Euclidean distance in the embedded space
7
Related area 2: Kernel learning
What is Kernel?
Suppose we have a data set X with n data points, then the kernel matrix K defined
on X is an nxn symmetric positive semi-definite matrix, such that its (i,j)-th element
represents the similarity between the i-th and j-th data points
Kernel Leaning vs. Distance Metric Learning
Kernel learning constructs a new kernel from the data, i.e., an inner-product
function in some feature space, while distance metric learning constructs a
distance function from the data
Kernel Leaning vs. Kernel Trick
• Kernel learning infers the n by n kernel matrix from the data,
• Kernel trick is to apply the predetermined kernel to map the data from the
original space to a feature space, such that the nonlinear problem in the original
space can become a linear problem in the feature space
B. Scholkopf, A. Smola. Learning with Kernels - Support Vector Machines, Regularization,
Optimization, and Beyond. 2001.
8
Different Types of Metric Learning
Nonlinear
Linear
Kernelization
LE LLE
ISOMAP SNE
PCA UMMP
LPP
Kernelization
LDA MMDA
Xing RCA ITML
NCA ANMM LMNN
Unsupervised
Supervised
Kernelization
SSDR
CMM
LRML
Semi-Supervised
•Red means local methods
•Blue means global methods
9
Metric Learning Meta-algorithm
 Embed the data in some space
– Usually formulate as an optimization problem
• Define objective function
– Separation based
– Geometry based
– Information theoretic based
• Solve optimization problem
– Eigenvalue decomposition
– Semi-definite programming
– Gradient descent
– Bregman projection
 Euclidean distance on the embedded space
10
Unsupervised metric learning
 Learning pairwise distance metric purely based on the data, i.e.,
without any supervision
Type
nonlinear
linear
Kernelization
Principal Component
Analysis
Laplacian Eigenmap
Locally Linear Embedding
Isometric Feature Mapping
Stochastic Neighborhood Embedding
Kernelization
Locality Preserving
Projections
Objective
Global
Local
11
Outline
Part I - Applications
 Motivation and Introduction
 Patient similarity application
Part II - Methods
 Unsupervised Metric Learning
 Supervised Metric Learning
 Semi-supervised Metric Learning
 Challenges and Opportunities for Metric Learning
12
Principal Component Analysis (PCA)
 Find successive projection directions
which maximize the data variance
2nd PC
1st PC
Geometry based objective
Eigenvalue decomposition
Linear mapping
Global method
The pairwise Euclidean distances
will not change after PCA
13
Kernel Mapping
Map the data into some high-dimensional feature space, such that the
nonlinear problem in the original space becomes the linear problem in the
high-dimensional feature space. We do not need to know the explicit form
of the mapping, but we need to define a kernel function.
Figure from http://omega.albany.edu:8008/machine-learning-dir/notes-dir/ker1/ker1.pdf
14
Kernel Principal Component Analysis (KPCA)
Perform PCA in the feature space
Original data
With the representer theorem
Embedded data after KPCA
Geometry based objective
Eigenvalue decomposition
Nonlinear mapping
Global method
Figure from http://en.wikipedia.org/wiki/Kernel_principal_component_analysis
15
Locality Preserving Projections (LPP)
Find linear projections that can preserve the localities of the data set
PCA
Data
Degree
Matrix
X-axis is the major directio
Laplacian
Matrix
LPP
Data
Geometry based objective
Generalized eigenvalue decomposition
Linear mapping
Local method
X. He and P. Niyogi. Locality Preserving Projections. NIPS 2003.
Y-axis is the major
projection direction
16
Laplacian Embedding (LE)
LE: Find embeddings that can preserve the
localities of the data set
The embedded Xi
The relationship
between Xi and Xj
Geometry based objective
Generalized eigenvalue decomposition
Nonlinear mapping
Local method
M. Belkin, P. Niyogi. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, June
17
2003; 15 (6):1373-1396
Locally Linear Embedding (LLE)
Obtain data relationships
Obtain data embeddings
Geometry based objective
Generalized eigenvalue decomposition
Nonlinear mapping
Local method
Sam Roweis & Lawrence Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, v.290 no.5500 ,
Dec.22, 2000. pp.2323—2326.
18
Isometric Feature Mapping (ISOMAP)
1.
2.
3.
Construct the neighborhood graph
Compute the shortest path length
(geodesic distance) between pairwise
data points
Recover the low-dimensional
embeddings of the data by MultiDimensional Scaling (MDS) with
preserving those geodesic distances
Geometry based objective
Eigenvalue decomposition
Nonlinear mapping
Local method
J. B. Tenenbaum, V. de Silva and J. C. Langford. A Global Geometric Framework for Nonlinear Dimensionality
Reduction. Science 2000.
19
Stochastic Neighbor Embedding (SNE)
The probability that i picks j as its neighbor
Neighborhood distribution in the embedded space
Neighborhood distribution preservation
Information theoretic objective
Gradient descent
Nonlinear mapping
Local method
Hinton, G. E. and Roweis, S. Stochastic Neighbor Embedding. NIPS 2003.
20
Summary: Unsupervised Distance Metric Learning
PCA
Local
Global
√
Linear
√
LPP
LLE
Isomap
SNE
√
√
√
√
√
√
√
√
√
√
Nonlinear
Separation
Geometry
√
√
√
Information theoretic
Extensibility
√
√
21
Outline
 Unsupervised Metric Learning
 Supervised Metric Learning
 Semi-supervised Metric Learning
 Challenges and Opportunities for Metric Learning
22
Supervised Metric Learning
 Learning pairwise distance metric with data and their supervision, i.e.,
data labels and pairwise constraints (must-links and cannot-links)
Type
nonlinear
linear
Kernelization
Linear Discriminant
Analysis
Eric Xing’s method
Information Theoretic
Metric Learning
Global
Kernelization
Locally Supervised
Metric Learning
Objective
Local
23
Linear Discriminant Analysis (LDA)
Suppose the data are from C different classes
Separation based objective
Eigenvalue decomposition
Linear mapping
Global method
Figure from
http://cmp.felk.cvut.cz/cmp/software/stprtool/exa
mples/ldapca_example1.gif
Keinosuke Fukunaga. Introduction to statistical pattern recognition. Academic Press Professional, Inc. 1990 .
Y. Jia, F. Nie, C. Zhang. Trace Ratio Problem Revisited. IEEE Trans. Neural Networks, 20:4, 2009.
24
Distance Metric Learning with Side Information (Eric Xing’s method)
Must-link constraint set:
each element is a pair of data that come from the same class or cluster
Cannot-link constraint set:
each element is a pair of data that come from different classes or clusters
Separation based objective
Semi-definite programming
No direct mapping
Global method
E.P. Xing, A.Y. Ng, M.I. Jordan and S. Russell, Distance Metric Learning, with application to
Clustering with side-information. NIPS 16. 2002.
25
Information Theoretic Metric Learning (ITML)
Suppose we have an initial Mahalanobis distance parameterized by
, a set
must-link constraints and a set of cannot-link constraints. ITML solves the
following optimization problem
Information theoretic objective
Bregman projection
No direct mapping
Global method
of
Scalable and efficient
Jason Davis, Brian Kulis, Prateek Jain, Suvrit Sra, & Inderjit Dhillon.
Information-Theoretic Metric Learning. ICML 2007
Best paper award
26
Summary: Supervised Metric Learning
LDA
Label
Xing
ITML
√
√
√
Pairwise constraints
LSML
√
√
√
Local
Global
√
Linear
√
√
√
√
Nonlinear
Separation
√
√
√
√
Geometry
√
Information theoretic
Extensibility
√
√
27
Outline
 Unsupervised Metric Learning
 Supervised Metric Learning
 Semi-supervised Metric Learning
 Challenges and Opportunities for Metric Learning
28
Semi-Supervised Metric Learning
 Learning pairwise distance metric using data with and without
supervision
Type
nonlinear
Kernelization
Semi-Supervised
Dimensionality
Reduction
linear
Constraint Margin
Maximization
Laplacian
Regularized
Metric Learning
Objective
Global
Local
29
Constraint Margin Maximization (CMM)
Scatterness
Compactness
Projected Constraint Margin
Maximum Variance
No label information
Geometry & Separation based objective
Eigenvalue decomposition
Linear mapping
Global method
F. Wang, S. Chen, T. Li, C. Zhang. Semi-Supervised Metric Learning by Maximizing Constraint Margin. CIKM08
30
Laplacian Regularized Metric Learning (LRML)
Smoothness term
Compactness term
Scatterness term
Geometry & Separation based objective
Semi-definite programming
No mapping
Local & Global method
S. Hoi, W. Liu, S. Chang. Semi-Supervised Distance Metric Learning for Collaborative Image Retrieval. CVPR08
31
Semi-Supervised Dimensionality Reduction (SSDR)
Most of the nonlinear dimensionality reduction algorithms can
finally be formalized as solving the following optimization problem
Low-dimensional
embeddings
Algorithm specific
symmetric matrix
Separation based objective
Eigenvalue decomposition
No mapping
Local method
X. Yang , H. Fu , H. Zha , J. Barlow. Semi-supervised nonlinear dimensionality reduction. ICML 2006.
32
Summary: Semi-Supervised Distance Metric Learning
CMM
LRML
Label
√
√
Pairwise constraints
√
√
√
Embedded coordinates
√
Local
Global
√
Linear
√
√
√
√
Nonlinear
Separation
SSDM
√
√
√
Locality-Preservation
√
Information theoretic
Extensibility
√
33
Outline
 Unsupervised Metric Learning
 Supervised Metric Learning
 Semi-supervised Metric Learning
 Challenges and Opportunities for Metric Learning
34
 Scalability
– Online (Sequential) Learning (Shalev-Shwartz,Singer and Ng,
ICML2004) (Davis et al. ICML2007)
– Distributed Learning
 Efficiency
– Labeling efficiency (Yang, Jin and Sukthankar, UAI2007)
– Updating efficiency (Wang, Sun, Hu and Ebadollahi, SDM2011)
 Heterogeneity
– Data heterogeneity (Wang, Sun and Ebadollahi, SDM2011)
– Task heterogeneity (Zhang and Yeung, KDD2011)
 Evaluation
35
 Shai Shalev-Shwartz, Yoram Singer and Andrew Y. Ng. Online and
Batch Learning of Pseudo-Metrics. ICML 2004
 J. Davis, B. Kulis, P. Jain, S. Sra, & I. Dhillon. Information-Theoretic
Metric Learning. ICML 2007
 Liu Yang, Rong Jin and Rahul Sukthankar. Bayesian Active Distance
Metric Learning. UAI 2007.
 Fei Wang, Jimeng Sun, Shahram Ebadollahi. Integrating Distance
Metrics Learned from Multiple Experts and its Application in InterPatient Similarity Assessment. SDM2011.
 F. Wang, J. Sun, J. Hu, S. Ebadollahi. iMet: Interactive Metric
Learning in Healthcare Applications. SDM 2011.
 Yu Zhang and Dit-Yan Yeung. Transfer Metric Learning by Learning
Task Relationships. In: Proceedings of the 16th ACM SIGKDD
Conference on Knowledge Discovery and Data Mining (KDD), pp.
1199-1208. Washington, DC, USA, 2010.
36
37
Related documents