Download Data Clustering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Human genetic clustering wikipedia , lookup

Mixture model wikipedia , lookup

K-means clustering wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Clustering
Readings:
 Chapter 8.4 from Principles of Data Mining by
Hand, Mannila, Smyth.
 Chapter 9 from Principles of Data Mining by
Hand, Mannila, Smyth.
-----------------------------------------------------------------------------------------------------------------
1. Clustering versus Classification
 classification: give a pre-determined label
to a sample
 clustering: provide the relevant labels for
classification from structure in a given
dataset
 clustering: maximal intra-cluster similarity
and maximal inter-cluster dissimilarity
 Objectives: - 1. segmentation of space
- 2. find natural subclasses
1
Cumulative Intensity Difference, blockfit =95%
Analysis of Actual Image
120
Difference with template in %
100
80
60
40
20
0
-15
-10
-5
0
distance to printborder in mm
5
10
L1L2-plane with six clusters
20
15
10
C5-
L2-axis in mm
C6-
5
C4C1-
0
C3-
C2-
-5
Cluster centres C1...C6 indicated by blue dot
Observed point indicated as green blob
Active cluster indicated in red
Active Cluster: C1 and Lregion: #2
-10
-15
-20
-20
-15
-10
-5
0
5
L1-axis in mm
10
15
20
PRE-clustering Probability
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
2
4
6
8
10
12
14
16
18
20
2
Types of True State Evolutions
1
type II
0.9
type IV
0.8
true state value
0.7
type III
0.6
0.5
0.4
0.3
type I
0.2
0.1
0
<d> = 0.278
noise = 0.300
0
0.2
0.4
0.6
relative time t/N
0.8
1
Fig. 5.15. a (left): clustering of 10,000 points in (x,y)-plane, disks indicate prototypes. b
(right): four prototype evolution types for true state function ul(t).
3
Cluster Analysis [book section 9.3]
Segmentation and Partitioning
Extra: Voronoi diagrams, see for example:
http://www.ics.uci.edu/~eppstein/gina/voronoi.html
VORONOI-Partitioning:
* Given a set of points x[k]
* Voronoi-partitioning: Part of the space V[k] of
points closer to x[k] than to any other point x[l]
Convex hull
Delaunay Triangularization
4
Partitional Clustering [book section 9.4]
score-functions
centroid
intra-cluster distance
inter-cluster distance
C-means [book page 303]
while changes in cluster Ck
% form clusters
for k=1,…,K do
Ck = {x | ||x – rk|| < || x – rl|| }
end
% compute new cluster centroids
for k=1,…,K do
rk = mean({x | x  Ck })
end
end
5
Extra: Fuzzy C-means
ij is membership of sample i to custer j
ck is centroid of custer i
while changes in cluster Ck
% compute new memberships
for k=1,…,K do
for i=1,…,N do
jk = f(xj – ck)
end
end
% compute new cluster centroids
for k=1,…,K do
% weighted mean
ck = jjk xj /jjk
end
end
6
Trajectory of Fuzzy MultiVariate Centroids
1
0.9
0.8
0.7
1
0.6
2
453
0.5
0.4
2
4
0.3
0.2
3
5
0.1
0
-0.1
0
0.2
0.4
0.6
0.8
1
7
Hierarchical Clustering
[book section 9.5]
One major problem with partitional
clustering is that the number of clusters (=
#classes) must be pre-spedified !!!
This poses the question: what IS the real
number of clusters in a given set of data?
Answer: it depends!
Agglomerative methods: bottom-up
Divisive methods: top-down
8
9
Probabilistic Model-Based Clustering
using Mixture Models
The EM-algorithm [book section 8.4]
 Suppose we have data D with a
model with parameters  and hidden
parameters H (e.g. the class label!).
Suppose:
1. p(D,H|) is the prob.dist. for
set+hidden for given parameters
2. Q(H) is the prob.dist. on the hidden
params H.
Then we can write the log-likelihood distr.
as [book pg 261]:
l() = log  p(D,H|)
= log  Q(H) p(D,H|)/Q(H)
  Q(H) log (p(D,H|)/Q(H))
=  Q(H) log p(D,H|) +  Q(H) log(1/Q(H))
 F(Q,)
10
EM-algorithm:
E-step: maximize F with respect to Q with
fixed 
M-step: maximize F with respect to  with
fixed Q(H)
so:
E-step: Qk+1 = arg max Qk F(Qk,k)
M-step: k+1 = arg max k F(Qk+1k)
11
Adaptations to EM-algorithm [book
section 9.2.4, 9.2.5, 9.6]
Probability Mixtures [idem]
Mixture Decomposition, e.g. Gaussian
Mixture Decomposition [idem]
12