Download Clustering on the simplex - EMMDS 2009

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial intelligence wikipedia , lookup

Mathematical optimization wikipedia , lookup

Mathematical physics wikipedia , lookup

Hendrik Wade Bode wikipedia , lookup

Mathematical economics wikipedia , lookup

Theoretical computer science wikipedia , lookup

Stream processing wikipedia , lookup

Signal (software) wikipedia , lookup

Community informatics wikipedia , lookup

Transcript
Informatics and Mathematical Modelling / Intelligent Signal Processing
Clustering on the Simplex
Morten Mørup
DTU Informatics
Intelligent Signal Processing
Technical University of Denmark
EMMDS 2009
July 3rd, 2009
1
Informatics and Mathematical Modelling / Intelligent Signal Processing
Joint work with
Christian Walder
Lars Kai Hansen
DTU Informatics
Intelligent Signal Processing
Technical University of Denmark
DTU Informatics
Intelligent Signal Processing
Technical University of Denmark
EMMDS 2009
July 3rd, 2009
2
Informatics and Mathematical Modelling / Intelligent Signal Processing
Clustering
Cluster analysis or clustering is the assignment of a
set of observations into subsets (called clusters) so
that observations in the same cluster are similar in
some sense. (Wikipedia)
EMMDS 2009
July 3rd, 2009
3
Informatics and Mathematical Modelling / Intelligent Signal Processing
Clustering approaches
 K-means iterative refinement algorithm
(Lloyd, 1982; Hartigan, 1979)
Guarantee of optimality:
No single change in assignment
better than current assignment
(1-spin stability).
Assignmnt Step (S):
Assign each data point to
the cluster with closest
mean value
Update Step (C):
Calculate the new mean
value for each cluster
 Problem NP-complete
(Megiddo and Supowit, 1984)
Relaxations of the hard assigment problem:
 Annealing approaches based
on temperature parameter
Drawbacks:
(T0 the original clustering problem is recovered)
(see for instance Hofmann and Buhmann, 1997)
 Fuzzy clustering
(Hathaway and Bezdek, 1988)
 Expectation Maximization
(Mixture of Gaussians)
 Spectral Clustering
EMMDS 2009
July 3rd, 2009
4
Previously relaxations are
either not exact or dependent
on some problem specific
annealing parameter in order
to recover the original binary
combinatorial assignments.
Informatics and Mathematical Modelling / Intelligent Signal Processing
From the K-means objective to Pairwise Clustering
K-mean objective
Pairwise Clustering (Buhmann and Hofmann, 1994)
K similarity matrix,
K=XTX equivalent to
the k-means objective
EMMDS 2009
July 3rd, 2009
5
Informatics and Mathematical Modelling / Intelligent Signal Processing
Although Clustering is hard there is room to be simple(x) minded!
Binary Combinatorial (BC)
EMMDS 2009
July 3rd, 2009
Simplicial Relaxation (SR)
6
Informatics and Mathematical Modelling / Intelligent Signal Processing
The simplicial relaxation (SR) admits standard continuous
optimization to solve for the pairwise clustering problems.
For instance by normalization invariant projected gradient ascent:
EMMDS 2009
July 3rd, 2009
7
Informatics and Mathematical Modelling / Intelligent Signal Processing
Synthetic data example
K-means
SR-clustering
Brown and grey clusters each contain 1000 data-points in R2
Whereas the remaining clusters each have 250 data-points.
EMMDS 2009
July 3rd, 2009
8
Informatics and Mathematical Modelling / Intelligent Signal Processing
SR-clustering algorithm driven by high density regions
EMMDS 2009
July 3rd, 2009
9
Informatics and Mathematical Modelling / Intelligent Signal Processing
Thus, solutions in general substantially better than Lloyd’s
algorithm having the same computational complexity
SR-clustering (init=1)
EMMDS 2009
July 3rd, 2009
SR-clustering (init=0.01)
10
Lloyd’s K-means
Informatics and Mathematical Modelling / Intelligent Signal Processing
K-means
SR-clustering
(init=1)
SR-clustering
(init=0.01)
10 components
EMMDS 2009
50 components
July 3rd, 2009
11
100 components
Informatics and Mathematical Modelling / Intelligent Signal Processing
SR-clustering for Kernel based semisupervised learning
Kernel based semi-supervised
learning based on pairwise clustering
(Basu et al, 2004, Kulis et al. 2005, Kulis et al, 2009)
EMMDS 2009
July 3rd, 2009
12
Informatics and Mathematical Modelling / Intelligent Signal Processing
Simplicial relaxation admit solving the problem as a
(non-convex) continous optimization problem
EMMDS 2009
July 3rd, 2009
13
Informatics and Mathematical Modelling / Intelligent Signal Processing
Class labels can be handled explicitly fixing
Must and cannot links can be absorbed into the Kernel
Hence the problem reduces more or
less to standard SR-clustering problem
for the estimation of S
EMMDS 2009
July 3rd, 2009
14
Informatics and Mathematical Modelling / Intelligent Signal Processing
At stationarity we have that the gradients of elements in each
column of S that are 1 are larger than elements that are 0.
Thus, evaluating the impact of the supervision can be done
estimating the minimal lagrange multipliers that guarantee
stationarity of the solution obtained by the SR-clustering
algorithm. This is a convex optimization problem
Thus, Lagrange multipliers give a measure of
conflict between the data and the supervision
EMMDS 2009
July 3rd, 2009
15
Informatics and Mathematical Modelling / Intelligent Signal Processing
Digit classification with one miss-labeled data
observation from each class.
EMMDS 2009
July 3rd, 2009
16
Informatics and Mathematical Modelling / Intelligent Signal Processing
Community Detection in Complex Networks
Communities/modules: a natural divisions of network nodes into densely
connected subgroups (Newman & Girvan 2003)
G(V,E)
Adjacency Matrix
A
Permuted adjacency matrix
PAPT
Community detection algorithm
Permutation P of graph
from clustering assignment S
EMMDS 2009
July 3rd, 2009
17
Informatics and Mathematical Modelling / Intelligent Signal Processing
Common Community detection objectives
Hamiltonian
(Fu & Anderson, 1986, Reichardt & Bornholdt, 2004)
Generic problems
of the form
Modularity
EMMDS 2009
(Newman & Girvan, 2004)
July 3rd, 2009
18
Informatics and Mathematical Modelling / Intelligent Signal Processing
Again we can make an exact relaxation to the simplex!
EMMDS 2009
July 3rd, 2009
19
Informatics and Mathematical Modelling / Intelligent Signal Processing
EMMDS 2009
July 3rd, 2009
20
Informatics and Mathematical Modelling / Intelligent Signal Processing
EMMDS 2009
July 3rd, 2009
21
Informatics and Mathematical Modelling / Intelligent Signal Processing
SR-clustering of complex networks
Quality of solutions comparable to results obtained by extensive
Gibbs sampling
EMMDS 2009
July 3rd, 2009
22
Informatics and Mathematical Modelling / Intelligent Signal Processing
So far we have demonstrated how binary combinatorial
constraints are recovered at stationarity when relaxing the
problems to the simplex.
However, simplex constraints also holds promising data
mining properties of their own!
EMMDS 2009
July 3rd, 2009
23
Informatics and Mathematical Modelling / Intelligent Signal Processing
The Convex Hull
Def: The convex hull/convex envelope of XRMN is the minimal convex set containing
X. (Informally it can be described as a rubber band wrapped around the data points.)
Finding the convex hull is solvable in linear time, O(N) (McCallum and D. Avis, 1979)
However, the size of the convex set grows exponentially with the dimensionality of
the data, O(logM-1(N)) (Dwyer, 1988)
The Principal Convex Hull (PCH)
Def: The best convex set of size K according to some measure of distortion D(·|·)
(Mørup et al. 2009). (Informally it can be described as a less flexible rubber band that wraps most of
the data points.)
EMMDS 2009
July 3rd, 2009
24
Informatics and Mathematical Modelling / Intelligent Signal Processing
The mathematical formulation of the Principal Convex
Hull (PCH) is given by two simplex constraints
”Principal” in terms of the Frobenius norm
X

C: Give the fraction in which observations
in X are used to form each feature
(distinct aspects/freaks). In general C will
be very sparse!!
S: Give the fraction each observation
resembles each distinct aspects XC.
S
X
C
(note when K large enough such that
EMMDS 2009
July 3rd, 2009
the PCH recover the convex hull)
25
Informatics and Mathematical Modelling / Intelligent Signal Processing
Relation between the PCH model, low rank
decomposition and clustering approaches
PCH naturally bridges clustering and low-rank approximations!
EMMDS 2009
July 3rd, 2009
26
Informatics and Mathematical Modelling / Intelligent Signal Processing
Two important properties of the PCH model
The PCH model is
invariant to affine
transformation and scaling
The PCH model is
unique up to permutation
of the components
EMMDS 2009
July 3rd, 2009
27
Informatics and Mathematical Modelling / Intelligent Signal Processing
More contrast in features
than obtained by clustering
approaches. As such, PCH
aim for distict
aspects/regions in data
The PCH model strives to
attain Platonic ”Ideal Forms”
EMMDS 2009
July 3rd, 2009
28
Informatics and Mathematical Modelling / Intelligent Signal Processing
Data contain 3 components:
High-Binding regions
Low-binding regions
Non-binding regions
Each voxel given concentration
fraction of these regions
XC
S
EMMDS 2009
July 3rd, 2009
29
Informatics and Mathematical Modelling / Intelligent Signal Processing
NMF spectroscopy of samples of mixtures of
propanol butanol and pentanol.
EMMDS 2009
July 3rd, 2009
30
Informatics and Mathematical Modelling / Intelligent Signal Processing
Medium size and large size Movie lens data (www.grouplens.org)
Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users
Large size: 10,000,054 ratings of 10,677 movies given by 71,567
EMMDS 2009
July 3rd, 2009
31
Informatics and Mathematical Modelling / Intelligent Signal Processing
Conclusion
 The simplex offers unique data mining properties
 Simplicial relaxations (SR) form exact
relaxation of common hard assignment
clustering problems, i.e.
K-means, Pairwise Clustering and
Community detection in graphs.
 SR Enable to solve binary combinatorial
problems using standard solvers from
continuous optimization.
 The proposed SR-clustering algorithm
outperforms traditional iterative
refinement algorithms
 No need for annealing parameter.
hard assignments guaranteed at
stationarity (Theorem 1 and 2)
 Semi-Supervised learning can be posed as
continuous optimization problem with
associated lagrange multipliers giving an
evaluation measure of each supervised constraint
EMMDS 2009
July 3rd, 2009
32
Informatics and Mathematical Modelling / Intelligent Signal Processing
Conclusion cont.
 The Principal Convex Hull (PCH)
formed by two types of simplex
constraints
 Extract distinct aspects of the data
 Relevant for data mining in general
where low rank approximation and
clustering approaches have been
invoked.
EMMDS 2009
July 3rd, 2009
33
Informatics and Mathematical Modelling / Intelligent Signal Processing
A reformulation of ”Lex Parsimoniae”
The simplest explanation is usually the best.
- William of Ockham
The simplex explanation is usually the best.
Simplicity is the ultimate sophistication.
- Leonardo Da Vinci
Simplexity is the ultimate sophistication.
The presented work is described in:
M. Mørup and L. K. Hansen ”An Exact Relaxation of Clustering”, Submitted JMLR 2009
M. Mørup, C. Walder and L. K. Hansen ”Simplicial Semi-supervised Learning”, submitted
M. Mørup and L. K. Hansen ” Platonic Forms Revisited”, submitted
EMMDS 2009
July 3rd, 2009
34