Download DenStream

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DenStream
Paul Voigtlaender
Supervision: Prof. Dr. T. Seidl
Dipl. Ing. Marwan Hassani
Proseminar Elementary Data Mining Techniques 29.11.2012
Paul Voigtlaender ()
DenStream
Proseminar
1 / 21
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
2 / 21
Motivation
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
3 / 21
Motivation
Motivation
Why another stream clustering algorithm?
Paul Voigtlaender ()
DenStream
Proseminar
4 / 21
Motivation
Motivation
Why another stream clustering algorithm?
Consider this distribution:
Paul Voigtlaender ()
DenStream
Proseminar
4 / 21
Motivation
Motivation
Why another stream clustering algorithm?
Consider this distribution:
Clusters are not spherical
CluStream can only nd spherical clusters
⇒
will fail on this
distribution
DenStream can nd clusters of arbitrary shape
DenStream can handle noise
Paul Voigtlaender ()
DenStream
Proseminar
4 / 21
References
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
5 / 21
References
References
1
Cao et al.: Density Based Clustering over an Evolving Data Stream
with Noise. SDM 2007 (DenStream, Main Source)
2
Ester et al.: A Density Based Algorithm for Discovering Clusters in
Large Spatial Databases with Noise. KDD 1996 (DBSCAN, is used
in DenStream)
3
Aggarwal et al.: A Framework for Clustering Evolving Data Stream.
VLDB 2003 (CluStream, uses similar concepts as DenStream)
4
Data Mining Algorithm lecture given by i9 in WS 11/12
5
Temporal and Graph Data lecture given by i9 in WS 11/12
Paul Voigtlaender ()
DenStream
Proseminar
6 / 21
DenStream
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
7 / 21
DenStream
Basic Concepts
Basic Concepts
Damped Window, weight of data objects decreases exponentially over
time: f (t )
= 2−α·t , α > 0
Micro-Clusters: MC
= (WLS , WSS , w , tc ),
where
n
WLS
=
∑ f (t − Ti ) · pi (Weighted Linear Sum)
i =1
n
WSS
=
∑ f (t − Ti ) · pi2 (Weighted Squared Sum)
i =1
w
tc
(Weight of MC )
(Creation Time of MC )
Paul Voigtlaender ()
DenStream
Proseminar
8 / 21
DenStream
Basic Concepts
Micro-Clusters (1/2)
c
r
ε
= WLS
(center of MC)
rw
||2
||WLS ||2 2
−
≤ε
= ||WSS
w
w
(Radius of MC)
(Maximum Radius)
Radius can produce negative arguments to square root
Used for implementation (cf. MOA implementation):
(r
r
= 1.8 ·
max
≤i ≤n
1
Paul Voigtlaender ()
WSSi
w −
WLSi
w
2
)
DenStream
Proseminar
9 / 21
DenStream
Basic Concepts
Micro-Clusters (2/2)
Micro-Clusters are classied based on weight w :
If w
≥ µ,
MC is a core-micro-cluster (similar to core-points of
DBSCAN)
During online part we distinguish between
Potential core-micro-clusters (p-micro-clusters ), with w ≥ β · µ
Outlier micro-clusters (o-micro-clusters ), with w < β · µ
Micro-Clusters can be maintained incrementally
For time interval δ t , if no points are merged into MC, then the weight
will decrease: MC = (2−α·δ t · WLS , 2−αδ t · WSS , 2−α·δ t · w , tc ).
If a point p is merged, MC = (WLS + p, WSS + p2 , w + 1, tc ).
Paul Voigtlaender ()
DenStream
Proseminar
10 / 21
DenStream
Algorithm Overview
Algorithm Overview
Online and Oine Part
Initialization with DBSCAN
Maintains p-micro-clusters and o-micro-clusters during
online-component
New points are merged using a Merging Algorithm
Pruning Strategy is performed periodically
DBSCAN based oine component generates nal clusters on demand
using p-micro-clusters as virtual points
Figure : Online and oine phase
Paul Voigtlaender ()
DenStream
Proseminar
11 / 21
DenStream
Merging Technique
Merging Technique
Paul Voigtlaender ()
DenStream
Proseminar
12 / 21
DenStream
Pruning Strategy
Pruning Strategy (1/2)
Pruning strategy is performed every Tp time steps
Tp
·µ
= d α1 · log2 ( ββµ−
)e
1
All p-micro-clusters with weight w
< β ·µ
are pruned
O-micro-clusters must be pruned too to release memory space
Paul Voigtlaender ()
DenStream
Proseminar
13 / 21
DenStream
Pruning Strategy
Pruning Strategy (2/2)
If o-micro-clusters are pruned too early, they can't become
p-micro-clusters
Weight of o-micro-clusters is compared against
(tc
:
ξ (t ) =
t tc +Tp ) −1
−α Tp −1
−α( −
2
2
creation time of o-micro-cluster)
The longer an o-mico-cluster exists, the higher is its expected weight
Paul Voigtlaender ()
DenStream
Proseminar
14 / 21
DenStream
DenStream: The Algorithm
DenStream: The Algorithm
Paul Voigtlaender ()
DenStream
Proseminar
15 / 21
An Illustrative Example
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
16 / 21
An Illustrative Example
An Illustrative Example: Dataset
Figure : Initial distribution
Figure : Final distribution
Fading distribution
+15% Noise
Paul Voigtlaender ()
DenStream
Proseminar
17 / 21
An Illustrative Example
Live Demo
Live Demo...
Paul Voigtlaender ()
DenStream
Proseminar
18 / 21
An Illustrative Example
Recognized Clusters
Paul Voigtlaender ()
DenStream
Proseminar
19 / 21
Conclusion
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
20 / 21
Conclusion
Conclusion
DenStream can nd clusters of arbitrary shape and can handle noise
Online component maintains set of p- and o-micro-clusters using a
merging and pruning algorithm
Oine component generates nal clusters on demand using a variant
of DBSCAN
Paul Voigtlaender ()
DenStream
Proseminar
21 / 21
Related documents