Download Document

Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi Plan of the talk  Introduction     Density-based clustering on trajectories    Trajectory data model distance measure Results Temporal Focusing    Motivations Problem & context Density-based Clustering (OPTICS) A clustering quality measure Heuristics for optimal temporal interval Conclusions & future work 2 Motivations  Plenty of actual and future data sources for spatio-temporal data  Sophisticated analysis method are required, in order to fully exploit them  Data mining methods  Which kind of patterns/models?  Main objectives  A better understanding of the application domain  An improvement for private and public services 3 Problem & context  A distinguishing case: Mobile devices    PDAs Mobile phones LBS-enabled devices (may include the two above)  They (can) yield traces of their movement  An important problem:    Discovering groups of individuals that (approx.) move together in some period of time E.g.: detection of traffic jams during rush hours A candidate Data Mining reformulation of the problem  Clustering of individuals’ trajectories 4 Which kind of clustering?  Several alternatives are available  General requirements:  Non-spherical   clusters should be allowed E.g.: A traffic jam along a road It should be represented as a cluster which individuals form a “snake-shaped” cluster  Tolerance to noise  Low computational cost  Applicability to complex, possibly non-vectorial data  A suitable candidate: Density-based clustering  In particular, we adopt OPTICS 5 A crushed intro to OPTICS  A density threshold is defined through two parameters:  ε: A neighborhood radius  MinPts: Minimum number of points  Key concepts:  Core objects   Reachability-distance reach-d( p, q )   Objects with a ε-Neighborhood that contains at least MinPts objects (simplified definition:) Distance between objects p and q Example:    Object “q” is a core object if MinPts=2 Object “p” is not Their reach-d() is shown ε q reach-d(p,q) p ε –neighborhood of q 6 A crushed intro to OPTICS The algorithm: 1. Repeatedly choose a non-visited random object, until a core object is selected 2. Select the core object having the smallest reachability distance from all the visited core objects. If none can be found, go to step 1 Order of visit Output: reach-d() of all visited points (reachability plot) “jump” from left-hand group (0-9) to right-hand one (10-18) Reachability threshold Cluster 1 Cluster 2 7 Applying OPTICS to trajectories  Two key issues have to be solved  A suitable representation for  Which data model for trajectories?  A mean   trajectories is needed for comparing trajectories has to be provided Which distance between objects? OPTICS needs to define one to perform range queries 8 A trajectory data model  Raw input data:    Each trajectory is represented as a set of time-stamped coordinates T=(t1,x1,y1), …, (tn, xn, yn) => Object position at time ti was (xi,yi) Data model  Parametric-spaghetti: linear interpolation between consecutive points 9 A distance between trajectories  Adopted distance = average distance D( 1 , 2 ) |T  d ( (t ),   T 1 2 (t )) dt |T | It is a metric => efficient indexing methos allowed 10 A sample dataset  Set of trajectories forming 4 clusters + noise  Generated by the CENTRE system (KDDLab software) 11 OPTICS vs. HAC & K-means K-means HAC-average OPTICS 12 Temporal focusing  Different time intervals can show different behaviours  E.g.: objects that are close to each other within a time interval can be much distant in other periods of time  The time interval becomes a parameter  E.g.:  rush hours vs. low traffic times Problem: significant time intervals are not always known a priori  An automated mechanism is needed to find them 13 Temporal focusing  The proposed method 1. Provide a notion of interestingness to be associated with time intervals  2. We define it in terms of estimated quality of the clustering extracted on the given time interval Formalize the Temporal focusing task as an optimization problem  Discover the time interval interestingness measure that maximizes the 14 A quality measure for density-based clustering  General principle   High-density clusters separated by low-density noise are preferred The method  High-density clusters correspond to low dents in the reachability plot => Evaluate the global quality Q of the clustering output as the average reachability within clusters (noise is discarded)  HIGH DENSITY MEDIUM DENSITY LOW DENSITY Definition: given ε and dataset D, compute QD, ε as: QD, ε = - R (D, ε’) = - AVGo in D’ reach-d(o) D’ = D – {noise objects} 15 FAQs  How Q() is computed for a given time interval I ?     How is the reachability threshold set for each interval?     Step 1: trajectory segments out of I are clipped away Step 2: OPTICS is run on the clipped trajectories Step 3: Q(I) is computed on the output reachability plot A reachability threshold is needed in order to locate clusters (and noise) The threshold for the largest I is manually set by the user Thresholds for other intervals I’ I are computed from the first one by proportionally rescaling w.r.t. average reachability Is the optimal Q(I) biased towards tiny intervals? Yes. The problem has been fixed by defining Q’(I) = Q(I) / log |I| => A small decrease in Q(I) is accepted when it yields a much larger I  16 Esperiments  A more complex sample dataset (generated by CENTRE)  Clear clusters in the central time interval vs. dispersion on the borders 17 Optimizing Q()  Find the optimal Q() by plotting values for all time intervals  The optimum corresponds to the central time interval 18 Heuristics for optimum search  Each Q() value computation requires a run of the OPTICS algorithm  Computing all O(N2) values is too expensive  Alternative approaches are needed  Preliminary tests with hill-climbing (i.e., greedy) approach: starting points local optima global optimum (N=|{sub-intervals}|)  Test on the same dataset  Global optimum found in the 70,7% of runs  Avg. number of steps: 17  Avg. OPTICS runs: 49 19 Conclusions & Future works  Summary of the work  Extension of OPTICS to a trajectory data model & distance  Definition of the Temporal Focusing problem  Definition of a clustering quality measure  (Preliminary) Tests with exhaustive & greedy optimization  Future work  Experimental validation over broader benchmarks  Tighter integration between OPTICS and search strategy  Alternative, domain-specific definition of quality measures 20

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document