An Approach to Active Spatial Data Mining Wei Wang Data Mining Lab, UCLA March 24, 1999 Outline • • • • • Introduction Spatial Data Mining Triggers Strategies Performance Conclusions Introduction • Huge amount of spatial data are generated everyday. Satellite Satellite Satellite – – – – – Earth Observing System National Spatial Data Infrastructure National Image Mapping Agency one meter resolution data digital earth Users are usually interested in the hidden information. Satellite dish Satellite dish Satellite dish – Aggregate information – Clustering – Patterns Introduction • Knowledge discovery processes are computationally expensive. • Today’s technology advances provide necessary computing power to carry out such complicated processes. Spatial Data mining Triggers • Since data evolves over time, interesting patterns are likely to emerge or change. • Goal: identify and find (most of) interesting patterns • Problems: – Knowledge discovery processes are expensive. It is not feasible to re-process the entire data set for every change. – Approach to periodically examine the data. • Long delays • Transient patterns might be missed Natural solution: Usage of triggers. Spatial Data mining Triggers • Traditional database triggers can not be directly applied: – Expressive power of traditional database triggers is limited, especially in describing spatial relationships. – Example: Trigger bandwidth reallocation when the size of a cluster exceeds 20. .. .... .. . .. ... .. . . . . . . Strategies • STING+ was designed to introduce and support spatial triggers efficiently. • Observation (spatial locality): Only objects added to the shaded area will contribute to the growth of cluster size at this moment. .. .... .. . .. ... .. . . . . . . Strategies • STING+ Strategy: Monitor only the area occupied by potential clusters and its neighborhood. • Observation (cumulative effect): at least 4 more objects are needed in order to make the cluster size be 20. • STING+ Strategy: Space is organized in a hierarchy so that updates can be suspended at some level in the hierarchy until the cumulative effect might cause the trigger to be fired. Strategies – Space is recursively divided into smaller rectangular cells down to a specified granularity and is organized via the inherit pyramid hierarchy. .. .... .. . .. ... .. . .. .... .. . .. ... . . . . .. . . . Level 1 . . . . Level 2 Strategies – STING+ decomposes a trigger into a set of sub-triggers associated with individual cells in the hierarchical structure to monitor the cumulative effect of data changes within the cell. ..... . . ..... . . . ... ... . ... ... .. .. Sub-trigger . . on cell Higher level sub-trigger on cell . . . . . Level 4 . . . . . Level 3 Strategies – Updates/insertions are suspended at some level in the hierarchy until such time that the cumulative effect of these insertions might cause the trigger condition to become satisfied. ..... . . ..... . . . ... ... . ... ... .. .. . . + ++ + + ++ + . . . . . . Level 0 . . . . Level 1 Strategies ........ . .. ... + ++ + ........ . .. ... .. . + ++ + . . . . . . . Level 2 .. . . . . Level 3 No update of cluster ! Performance • Comparison with periodic re-examination – If the period is set to be less than 4000 updates, STING+ consumes less CPU cycles. – Significant delay and transient patterns misses can occur for larger period. • Not acceptable in many applications – No delay and no transient patterns missed with STING+. Conclusions • STING+ is an efficient approach to active spatial data mining. – Users can define triggers to monitor the change of spatial data. – Updates to the data are carefully monitored – Evaluation of trigger is postponed until the cumulative effect of data updates might cause the trigger to be fired.