Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA [email protected] 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Outline CTBTO Data CTBTO Modeling Requirements EMM 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 2 CTBTO Data As a Data Miner I must first understand your DATA •Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide •Spatial (source and sensor) •Temporal •STREAM Data 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 3 From Sensors to Streams Stream Data - Data captured and sent by a set of sensors Real-time sequence of encoded signals which contain desired information. Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items Stream data is infinite - the data keeps coming. 11/26/07 – IRADSN’07 4 CTBTO & Data Mining Data Mining techniques must be defined based on your data and applications Can’t use predefined fixed models and prediction/classification techniques. Must not redo massive amounts of algorithms already created. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 5 CTBTO + DM Requirements • Model: Handle different data types (seismic, hydroacoustic, etc.) Spatial + Temporal (Spatiotemporal) Hierarchical Scalable Online Dynamic • Anomaly Detection: 9/15/2008 Not just specific wave type or data values Relationships between arrival of waves/data Combined values data from all sensors CTBTO Dataof Mining/Data Fusion Workshop 6 EMM (Extensible Markov Model) Time Varying Discrete First Order Markov Model Nodes are clusters of real world states. Overlap of learning and validation phases Learning: • Transition probabilities between nodes • Node labels (centroid or medoid of cluster) • Nodes are added and removed as data arrives Applications: prediction, anomaly detection 9/15/2008 CTBTO Data Mining/Data 7 Fusion Workshop Research Objectives Apply proven spatiotemporal modeling technique to seismic data Construct EMM to model sensor data • Local EMM at location or area • Hierarchical EMM to summarize lower level models • Represent all data in one vector of values • EMM learns normal behavior Develop new similarity metrics to include all sensor data types (Fusion) Apply anomaly detection algorithms 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 8 EMM Creation/Learning <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> 2/3 2/3 2/22/3 1/1 1 1/2 1/2 N3 N1 1/3 N2 1/1 1/2 1/1 <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.> 9/15/2008 9 Input Data Representation Vector of sensor values (numeric) at precise time points or aggregated over time intervals. Need not come from same sensor types. Similarity/distance between vectors used to determine creation of new nodes in EMM. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 10 Anomaly Detection with EMM Objective: Detect rare (unusual, surprising) events Advantages: Detected unusual weekend traffic pattern •Dynamically learns what is normal •Based on this learning, can predict what is not normal •Do not have to a priori indicate normal behavior Applications: •Network Intrusion •Data: IP traffic data, Automobile traffic data Weekdays Weekend Seismic: •Unusual Seismic Events Minnesota DOT Traffic Data •Automatically Filter out normal events 11/3/04 11 EMM with Seismic Data Input – Wave arrivals (all or one per sensor) Identify states and changes of states in seismic data Wave form would first have to be converted into a series of vectors representing the activity at various points in time. Initial Testing with RDG data Use amplitude, period, and wave type 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 12 New Distance Measure Data = <amplitude, period, wave type> Different wave type = 100% difference For events of same wave type: • 50% weight given to the difference in amplitude. • 50% weight given to the difference in period. If the distance is greater than the threshold, a state change is required. amplitude = | amplitudenew – amplitudeaverage | / amplitudeaverage period = | periodnew – periodaverage | / periodaverage 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 13 EMM with Seismic Data States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 14 Preliminary Testing RDG data February 1, 1981 – 6 earthquakes Find transition times close to known earthquakes 9 total nodes 652 total transitions Found all quakes 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 15 . EMM Nodes Node # 1 2 3 4 5 6 7 8 9 9/15/2008 Average amplitude 1.649m 8.353m 23.237m 87.324m 253.333m 270.524m 7.719m 723.088m 1938.772m Average period 0.119 sec 0.803 sec 0.898 sec 0.997 sec 1.282 sec 0.96 sec 20.4 sec 1.962 sec 1.2 sec CTBTO Data Mining/Data Fusion Workshop Phase code P (primary wave) P (primary wave) P (primary wave) P (primary wave) P (primary wave) P (primary wave) P (primary wave) P (primary wave) P (primary wave) 16 Hierarchical EMM Summary EMM Regional EMM Local EMM 9/15/2008 Regional EMM Local EMM Local EMM CTBTO Data Mining/Data Fusion Workshop Local EMM Local EMM 17 Now What? Interest DM COMMUNITY DATA NEEDED NOISE MAY NOT BE BAD KDD CUP 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 18 References Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for SpatioTemporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data, May 2002, pp 1-9. Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June 2006, pp 258-265. Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium, November 26, 2007, Shreveport Louisiana. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 19