Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314 Research Clustering Algorithms for Data Mining Spatio-Temporal Domain Parallelization of Algorithms Algorithms for Feature Extraction and Knowledge Discovery 2 Challenges of Geographical Data Complexities associated with data volume Domain complexities Systems are interconnected Data gathering and sampling Interesting signals hidden by stronger patterns Complexities caused by local variation Terabyte databases Interpretation of aggregated data Formalizing the domain 3 Background: Issues with Hard Clustering Issue: Force data with imprecision and/or uncertainty into discrete classes Result: Missing important outliers, boundary patterns Approach: Use of Approximate Clustering Technique 4 Background: K-Means Clustering Partition the data into K Clusters that are homogenous Algorithm Select K time series as initial centroids Assign all time series to the most similar centroid Re-compute the centeroids Repeat till centroids do not change Variations based on different measures of similarity 5 Unsupervised Fuzzy K-Means (UKFM) Clustering Choose the initial number of clusters Develop a clustering using the Fuzzy KMeans Merge the cluster pair that have maximum correlation Compute validity measure Repeat till until termination condition reached 6 UKFM Results Weather Data Set Initial: 11 Clusters Optimal: 8 Clusters 7 Final: 4 Clusters Global Earth Science Data Collaborative Effort with V. Kumar (UMinn) Test bed for UKFM (comparison with existing techniques) Data Set Ocean Climate Indices Global Sea Pressure (1989 – 1993) Capture Teleconnections Result UKFM can capture even weaker OCI’s using coarse clusters 8 Global Climate Data (Sea Level Pressure) Intermediate: 60 Clusters 9 Global Climate Data (Sea Level Pressure) Final: 26 Clusters 10 Relation with SOI 11 Integrating Multi Datasets in UFKM Clustering Motivation: Data-based approach of Determining “interesting” clusters Validate using multi datasets Rule: Retain clusters that have supporting data Applicable in Data Rich Environment 12 UKFM Clustering with MultiDataset Validation • Choose the initial number of clusters • Develop a clustering using the Fuzzy KMeans • Validate cluster with other datasets Di=1,n • Merge if clusters is uncorrelated Else Consider next candidate pair to merge Repeat till until termination condition reached 13 UKFM Multi-Dataset Results Height Windspeed Pressure 14 Temperature Multi-threading Parallel Algorithm For each clustering stage For each iteration Slaves: Calculate M for each cluster Master: Normalize M Slaves: Calculate C for each cluster Master: Normalize C 15 Multi-threading Result Implemented on Sun Fire workstation with four 900-MHz UltraSPARC® III processors Near Linear Speed Up Obtained 16 Relevance to the Army Directly supports the FBKOF STO (B. Broome) Development of the Weather Information and Tactical Support (WITS) System 17 Weather Information and Tactical Support (WITS) Objective: Extraction of patterns from weather to be extracted and fused with external databases (logistics, terrain, forces, etc.) for higher level planning 18 Approach Development of an OLAP Weather Repository GA Weather (1981-2002) text Sources: Nat. Weather Svc, GA Env. Network text text Development of WITS Modules MONTH text DAY Ad-hoc Querying Real time Analysis and Planning Effects on Army Systems YEAR TEMPERATURE, PRECIPITATION, WIND SPEED, etc Integration with IWEDA Abstract Data Representation 19 WITS System Design TAPS MODULE DATA MINING MODULES DATA WAREHOUSE USER INTERFACE t e x t text text KNOWLEDGE BASES (IWEDA) text DATA CLEANING & TRANSFORMATION QUERY MODULES DATA ACQUISITION AGENTS IQ MODULE REAL TIME MODULE 20 WITS/IQ 21 WITS/IQ 22 WITS/IWEDA 23 WITS/Analysis 24 WITS/Analysis 25 Work in Progress Characterization of Analysis Queries Incorporation into Data Mining Algorithms into WITS Enhancement of WITS/TAPS Implementation of WITS/Real 26 Hybrid Genetic Fuzzy Systems for Feature Extraction and Knowledge Discovery 27 Project Goals Design and implement hybrid genetic fuzzy system for knowledge discovery. Develop API/Tools. Apply tools to Army related problems. 28 Contribution Hybrid system based on the Simple Genetic Algorithm (SGA). Enhanced the SGA by adding three levels of knowledge discovery. Level 1: Discovers up to k possible rules for a given set of inputs and outputs. It then attempts to minimize the number of rules and tune the knowledge base. Level 2: Takes the set of rules from Level 1 and further minimizes the rules. In addition, it also tunes the knowledge base. Level 3: Makes one last attempt to further tune the architecture of the knowledge base. 29 Rule Discovery Search for k possible rules from the set of p possible rules. k is a input parameter of the GA application. Discover the smallest value of k, therefore reducing the number of rules needed. Example Rules: If INPUT_1 is low AND INPUT_2 is medium THEN OUTPUT_1 is high If INPUT_1 is high THEN OUTPUT_1 is low 30 Relevance to the Army Collaborators: Jeff Passner, John Raby (ARL) IMETS weather modeling Post processing used to predict additional parameters Visibility, Turbulence, Fog, etc. Use of Knowledge Discovery to Predict Parameters 31 Visibility Application Generate and tune a system that can predict visibility based on input parameters Tasks for the fuzzy genetic system Search for a set of k rules from p possible rules that describe the relationship of the input parameters with the output (visibility) Concurrently discover the architecture, and optimize the performance of the knowledge-bases in relation to the k rules 32 Results for Low Visibility Classifier 33 Results for Medium Visibility Classifier 34