Download DenStream

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DenStream
Paul Voigtlaender
Supervision: Prof. Dr. T. Seidl
Dipl. Ing. Marwan Hassani
Proseminar Elementary Data Mining Techniques 29.11.2012
Paul Voigtlaender ()
DenStream
Proseminar
1 / 21
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
2 / 21
Motivation
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
3 / 21
Motivation
Motivation
Why another stream clustering algorithm?
Paul Voigtlaender ()
DenStream
Proseminar
4 / 21
Motivation
Motivation
Why another stream clustering algorithm?
Consider this distribution:
Paul Voigtlaender ()
DenStream
Proseminar
4 / 21
Motivation
Motivation
Why another stream clustering algorithm?
Consider this distribution:
Clusters are not spherical
CluStream can only nd spherical clusters
⇒
will fail on this
distribution
DenStream can nd clusters of arbitrary shape
DenStream can handle noise
Paul Voigtlaender ()
DenStream
Proseminar
4 / 21
References
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
5 / 21
References
References
1
Cao et al.: Density Based Clustering over an Evolving Data Stream
with Noise. SDM 2007 (DenStream, Main Source)
2
Ester et al.: A Density Based Algorithm for Discovering Clusters in
Large Spatial Databases with Noise. KDD 1996 (DBSCAN, is used
in DenStream)
3
Aggarwal et al.: A Framework for Clustering Evolving Data Stream.
VLDB 2003 (CluStream, uses similar concepts as DenStream)
4
Data Mining Algorithm lecture given by i9 in WS 11/12
5
Temporal and Graph Data lecture given by i9 in WS 11/12
Paul Voigtlaender ()
DenStream
Proseminar
6 / 21
DenStream
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
7 / 21
DenStream
Basic Concepts
Basic Concepts
Damped Window, weight of data objects decreases exponentially over
time: f (t )
= 2−α·t , α > 0
Micro-Clusters: MC
= (WLS , WSS , w , tc ),
where
n
WLS
=
∑ f (t − Ti ) · pi (Weighted Linear Sum)
i =1
n
WSS
=
∑ f (t − Ti ) · pi2 (Weighted Squared Sum)
i =1
w
tc
(Weight of MC )
(Creation Time of MC )
Paul Voigtlaender ()
DenStream
Proseminar
8 / 21
DenStream
Basic Concepts
Micro-Clusters (1/2)
c
r
ε
= WLS
(center of MC)
rw
||2
||WLS ||2 2
−
≤ε
= ||WSS
w
w
(Radius of MC)
(Maximum Radius)
Radius can produce negative arguments to square root
Used for implementation (cf. MOA implementation):
(r
r
= 1.8 ·
max
≤i ≤n
1
Paul Voigtlaender ()
WSSi
w −
WLSi
w
2
)
DenStream
Proseminar
9 / 21
DenStream
Basic Concepts
Micro-Clusters (2/2)
Micro-Clusters are classied based on weight w :
If w
≥ µ,
MC is a core-micro-cluster (similar to core-points of
DBSCAN)
During online part we distinguish between
Potential core-micro-clusters (p-micro-clusters ), with w ≥ β · µ
Outlier micro-clusters (o-micro-clusters ), with w < β · µ
Micro-Clusters can be maintained incrementally
For time interval δ t , if no points are merged into MC, then the weight
will decrease: MC = (2−α·δ t · WLS , 2−αδ t · WSS , 2−α·δ t · w , tc ).
If a point p is merged, MC = (WLS + p, WSS + p2 , w + 1, tc ).
Paul Voigtlaender ()
DenStream
Proseminar
10 / 21
DenStream
Algorithm Overview
Algorithm Overview
Online and Oine Part
Initialization with DBSCAN
Maintains p-micro-clusters and o-micro-clusters during
online-component
New points are merged using a Merging Algorithm
Pruning Strategy is performed periodically
DBSCAN based oine component generates nal clusters on demand
using p-micro-clusters as virtual points
Figure : Online and oine phase
Paul Voigtlaender ()
DenStream
Proseminar
11 / 21
DenStream
Merging Technique
Merging Technique
Paul Voigtlaender ()
DenStream
Proseminar
12 / 21
DenStream
Pruning Strategy
Pruning Strategy (1/2)
Pruning strategy is performed every Tp time steps
Tp
·µ
= d α1 · log2 ( ββµ−
)e
1
All p-micro-clusters with weight w
< β ·µ
are pruned
O-micro-clusters must be pruned too to release memory space
Paul Voigtlaender ()
DenStream
Proseminar
13 / 21
DenStream
Pruning Strategy
Pruning Strategy (2/2)
If o-micro-clusters are pruned too early, they can't become
p-micro-clusters
Weight of o-micro-clusters is compared against
(tc
:
ξ (t ) =
t tc +Tp ) −1
−α Tp −1
−α( −
2
2
creation time of o-micro-cluster)
The longer an o-mico-cluster exists, the higher is its expected weight
Paul Voigtlaender ()
DenStream
Proseminar
14 / 21
DenStream
DenStream: The Algorithm
DenStream: The Algorithm
Paul Voigtlaender ()
DenStream
Proseminar
15 / 21
An Illustrative Example
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
16 / 21
An Illustrative Example
An Illustrative Example: Dataset
Figure : Initial distribution
Figure : Final distribution
Fading distribution
+15% Noise
Paul Voigtlaender ()
DenStream
Proseminar
17 / 21
An Illustrative Example
Live Demo
Live Demo...
Paul Voigtlaender ()
DenStream
Proseminar
18 / 21
An Illustrative Example
Recognized Clusters
Paul Voigtlaender ()
DenStream
Proseminar
19 / 21
Conclusion
Outline
1
Motivation
2
References
3
DenStream
Basic Concepts
Algorithm Overview
Merging Technique
Pruning Strategy
DenStream: The Algorithm
4
An Illustrative Example
5
Conclusion
Paul Voigtlaender ()
DenStream
Proseminar
20 / 21
Conclusion
Conclusion
DenStream can nd clusters of arbitrary shape and can handle noise
Online component maintains set of p- and o-micro-clusters using a
merging and pruning algorithm
Oine component generates nal clusters on demand using a variant
of DBSCAN
Paul Voigtlaender ()
DenStream
Proseminar
21 / 21
Related documents