Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DenStream Paul Voigtlaender Supervision: Prof. Dr. T. Seidl Dipl. Ing. Marwan Hassani Proseminar Elementary Data Mining Techniques 29.11.2012 Paul Voigtlaender () DenStream Proseminar 1 / 21 Outline 1 Motivation 2 References 3 DenStream Basic Concepts Algorithm Overview Merging Technique Pruning Strategy DenStream: The Algorithm 4 An Illustrative Example 5 Conclusion Paul Voigtlaender () DenStream Proseminar 2 / 21 Motivation Outline 1 Motivation 2 References 3 DenStream Basic Concepts Algorithm Overview Merging Technique Pruning Strategy DenStream: The Algorithm 4 An Illustrative Example 5 Conclusion Paul Voigtlaender () DenStream Proseminar 3 / 21 Motivation Motivation Why another stream clustering algorithm? Paul Voigtlaender () DenStream Proseminar 4 / 21 Motivation Motivation Why another stream clustering algorithm? Consider this distribution: Paul Voigtlaender () DenStream Proseminar 4 / 21 Motivation Motivation Why another stream clustering algorithm? Consider this distribution: Clusters are not spherical CluStream can only nd spherical clusters ⇒ will fail on this distribution DenStream can nd clusters of arbitrary shape DenStream can handle noise Paul Voigtlaender () DenStream Proseminar 4 / 21 References Outline 1 Motivation 2 References 3 DenStream Basic Concepts Algorithm Overview Merging Technique Pruning Strategy DenStream: The Algorithm 4 An Illustrative Example 5 Conclusion Paul Voigtlaender () DenStream Proseminar 5 / 21 References References 1 Cao et al.: Density Based Clustering over an Evolving Data Stream with Noise. SDM 2007 (DenStream, Main Source) 2 Ester et al.: A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996 (DBSCAN, is used in DenStream) 3 Aggarwal et al.: A Framework for Clustering Evolving Data Stream. VLDB 2003 (CluStream, uses similar concepts as DenStream) 4 Data Mining Algorithm lecture given by i9 in WS 11/12 5 Temporal and Graph Data lecture given by i9 in WS 11/12 Paul Voigtlaender () DenStream Proseminar 6 / 21 DenStream Outline 1 Motivation 2 References 3 DenStream Basic Concepts Algorithm Overview Merging Technique Pruning Strategy DenStream: The Algorithm 4 An Illustrative Example 5 Conclusion Paul Voigtlaender () DenStream Proseminar 7 / 21 DenStream Basic Concepts Basic Concepts Damped Window, weight of data objects decreases exponentially over time: f (t ) = 2−α·t , α > 0 Micro-Clusters: MC = (WLS , WSS , w , tc ), where n WLS = ∑ f (t − Ti ) · pi (Weighted Linear Sum) i =1 n WSS = ∑ f (t − Ti ) · pi2 (Weighted Squared Sum) i =1 w tc (Weight of MC ) (Creation Time of MC ) Paul Voigtlaender () DenStream Proseminar 8 / 21 DenStream Basic Concepts Micro-Clusters (1/2) c r ε = WLS (center of MC) rw ||2 ||WLS ||2 2 − ≤ε = ||WSS w w (Radius of MC) (Maximum Radius) Radius can produce negative arguments to square root Used for implementation (cf. MOA implementation): (r r = 1.8 · max ≤i ≤n 1 Paul Voigtlaender () WSSi w − WLSi w 2 ) DenStream Proseminar 9 / 21 DenStream Basic Concepts Micro-Clusters (2/2) Micro-Clusters are classied based on weight w : If w ≥ µ, MC is a core-micro-cluster (similar to core-points of DBSCAN) During online part we distinguish between Potential core-micro-clusters (p-micro-clusters ), with w ≥ β · µ Outlier micro-clusters (o-micro-clusters ), with w < β · µ Micro-Clusters can be maintained incrementally For time interval δ t , if no points are merged into MC, then the weight will decrease: MC = (2−α·δ t · WLS , 2−αδ t · WSS , 2−α·δ t · w , tc ). If a point p is merged, MC = (WLS + p, WSS + p2 , w + 1, tc ). Paul Voigtlaender () DenStream Proseminar 10 / 21 DenStream Algorithm Overview Algorithm Overview Online and Oine Part Initialization with DBSCAN Maintains p-micro-clusters and o-micro-clusters during online-component New points are merged using a Merging Algorithm Pruning Strategy is performed periodically DBSCAN based oine component generates nal clusters on demand using p-micro-clusters as virtual points Figure : Online and oine phase Paul Voigtlaender () DenStream Proseminar 11 / 21 DenStream Merging Technique Merging Technique Paul Voigtlaender () DenStream Proseminar 12 / 21 DenStream Pruning Strategy Pruning Strategy (1/2) Pruning strategy is performed every Tp time steps Tp ·µ = d α1 · log2 ( ββµ− )e 1 All p-micro-clusters with weight w < β ·µ are pruned O-micro-clusters must be pruned too to release memory space Paul Voigtlaender () DenStream Proseminar 13 / 21 DenStream Pruning Strategy Pruning Strategy (2/2) If o-micro-clusters are pruned too early, they can't become p-micro-clusters Weight of o-micro-clusters is compared against (tc : ξ (t ) = t tc +Tp ) −1 −α Tp −1 −α( − 2 2 creation time of o-micro-cluster) The longer an o-mico-cluster exists, the higher is its expected weight Paul Voigtlaender () DenStream Proseminar 14 / 21 DenStream DenStream: The Algorithm DenStream: The Algorithm Paul Voigtlaender () DenStream Proseminar 15 / 21 An Illustrative Example Outline 1 Motivation 2 References 3 DenStream Basic Concepts Algorithm Overview Merging Technique Pruning Strategy DenStream: The Algorithm 4 An Illustrative Example 5 Conclusion Paul Voigtlaender () DenStream Proseminar 16 / 21 An Illustrative Example An Illustrative Example: Dataset Figure : Initial distribution Figure : Final distribution Fading distribution +15% Noise Paul Voigtlaender () DenStream Proseminar 17 / 21 An Illustrative Example Live Demo Live Demo... Paul Voigtlaender () DenStream Proseminar 18 / 21 An Illustrative Example Recognized Clusters Paul Voigtlaender () DenStream Proseminar 19 / 21 Conclusion Outline 1 Motivation 2 References 3 DenStream Basic Concepts Algorithm Overview Merging Technique Pruning Strategy DenStream: The Algorithm 4 An Illustrative Example 5 Conclusion Paul Voigtlaender () DenStream Proseminar 20 / 21 Conclusion Conclusion DenStream can nd clusters of arbitrary shape and can handle noise Online component maintains set of p- and o-micro-clusters using a merging and pruning algorithm Oine component generates nal clusters on demand using a variant of DBSCAN Paul Voigtlaender () DenStream Proseminar 21 / 21