Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing Predictive Power in Climate Data: Clustering Matters Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12th International Symposium on Spatial and Temporal Databases Minneapolis, MN August 24, 2011 Outline • Motivation • Networks Primer • From Data to Networks • Motivating Networks in Climate Science • Descriptive Analysis and Predictive Modeling • Empirical Evaluation & Comparison • Conclusions 08/24/2011 University of Minnesota 2 Mining Complex Data • Complex spatio-temporal data pose unique challenges • Tobler’s First Law of Geography: “Everything is related, but near things more than distant.” – But are all near things equally related? – Are there phenomena explained by interactions among distant things? (teleconnections) 08/24/2011 University of Minnesota 3 Networks Primer What is a Network? • Oxford English Dictionary: network, n.: Any netlike or complex system or collection of interrelated things, as topographical features, lines of transportation, or telecommunications routes (esp. telephone lines). • My working definition: Any set of items that are connected or related to each other. (“items” and “connections” can be concrete or abstract) 08/24/2011 University of Minnesota 4 Networks Primer Community Detection in Networks • Identify groups of nodes that are relatively more tightly connected to each other than to other nodes in the network • Computationally challenging problem for real-world networks 08/24/2011 University of Minnesota 5 From Data to Networks • Networks are pervasive in social science, technology, and nature • Many datasets explicitly define network structure • But networks can also represent other types of data, framework for identifying relationships, patterns, etc. 08/24/2011 University of Minnesota 6 Motivating Networks in Climate Uncertainty derives from many known and often many more unknown sources 08/24/2011 University of Minnesota 7 Motivating Networks in Climate • Projections of climate rely on many factors – Understanding of the physical processes – Ability to implement this understanding in computational models – Assumptions about the future Source: IPCC SRES and AR-4 08/24/2011 University of Minnesota 8 Motivating Networks in Climate • Some processes well-understood and modeled, others much less credible • Comparison to observations shows varying skills 08/24/2011 University of Minnesota 9 Motivating Networks in Climate • Models cannot capture some features/processes • Comparison to observations illustrates severe geographic variability, topographic bias 08/24/2011 University of Minnesota 10 Motivating Networks in Climate Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them to improve or refine our understanding? Answer: Stay Tuned… 08/24/2011 University of Minnesota 11 Knowledge Discovery for Climate Data Storage & Management Knowledge Discovery System Inputs Observations GCM outputs HPSS Complex Networks Oracle DB Data Mining HighPerformance Computing HPCC Lens - Jaguar Visualization ARM Data ArcGIS Basic OLAP Capabilities PowerWall System Outputs Novel Insights Decision Support 08/24/2011 University of Minnesota 12 Historical Climate Data • NCEP/NCAR Reanalysis (proxy for observation) • Monthly for 60 years (1948-2007) on 5ºx5º grid • Seven variables: 08/24/2011 University of Minnesota Raw Data De-Seasonalize Sea surface temperature (SST) Sea level pressure (SLP) Geopotential height (GH) Precipitable water (PW) Relative Humidity (RH) Horizontal wind speed (HWS) Vertical wind speed (VWS) Anomaly Series 13 Network Construction • View global climate system as a collection of interacting oscillators [Tsonis & Roebber, 2004] – Vertices represent locations in space – Edges denote correlation in variability • Link strength estimated by correlation, low-weight edges are pruned from the network • Construct networks only for locations over the oceans – Relatively better captured by models 08/24/2011 University of Minnesota 14 Geographic Properties • Examine network structure in spatial context – Link lengths computed as great-circle distance – Compare autocorrelation / de-correlation lengths for different variables, interpret within the domain Autocorrelation Teleconnection Sea Level Pressure 08/24/2011 Precipitable Water University of Minnesota Vertical Wind Speed 15 Clustering Climate Networks • Apply community detection to partition networks – Use Walktrap algorithm [Pons & Latapy, 2006] – Efficient and works well for dense networks Sea Level Pressure • Visualize spatial pattern using GIS tools • Cluster structure suggests relationships within the climate system Precipitable Water 08/24/2011 University of Minnesota 16 Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them to improve or refine our understanding? Revised Answer: Yes… but that’s not all. 08/24/2011 University of Minnesota 17 Descriptive Predictive • Network representation is able to capture interactions, reveal patterns in climate – Validate existing assumptions / knowledge – Suggest potentially new insights or hypotheses for climate science • Want to extract the relationships between atmospheric dynamics over ocean and land – i.e., “Learn” physical phenomena from the data 08/24/2011 University of Minnesota 18 Predictive Modeling • Use network clusters as candidate predictors • Create response variables for target regions around the globe (illustrated below) • Build regression model relating ocean clusters to land climate 08/24/2011 University of Minnesota 19 Illustrative Example • Predictive model for air temperature in Peru – Long-term variability highly predictable due to well-documented relation to El Nino • Small number of clusters have majority of skill – Feature selection (blue line) improves predictions Raw Data All Clusters Feature Selection 08/24/2011 University of Minnesota 20 Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them to improve or refine our understanding? Revised Answer: Yes and Yes… but wait, there’s more. 08/24/2011 University of Minnesota 21 Results on Train/Test 08/24/2011 University of Minnesota 22 Predictive Skill 08/24/2011 University of Minnesota 23 Update! Research Question: Can we characterize the credible variables, identify relationships to the relatively less credible variables, and leverage them to improve or refine our understanding? Revised Answer: Yes, Yes, and Yes. 08/24/2011 University of Minnesota 24 Variations / Extensions • Compare network approach to traditional clustering methods – k-means, k-medoids, spectral, EM, etc. • Compare different types of predictive models – (linear) regression, regression trees, neural nets, support vector regression 08/24/2011 University of Minnesota 25 Compare Clustering Methods 08/24/2011 University of Minnesota 26 Compare Predictive Models 08/24/2011 University of Minnesota 27 Refining Model Projections 08/24/2011 University of Minnesota 28 Conclusions • Networks capture behavior of the climate system • Clusters (or “communities”) derived from these networks have useful predictive skill – Statistically significantly better than predictors based on clusters derived using traditional methods • Potential for advancing climate science – Understanding of physical processes – Complement climate model simulations 08/24/2011 University of Minnesota 29 Upcoming Events 1. First International Workshop on Climate Informatics, New York, NY, August 26, 2011 http://www.nyas.org/climateinformatics 2. NASA Conference on Intelligent Data Understanding (CIDU), Mountain View, CA, Oct 19-21, 2011 https://c3.ndc.nasa.gov/dashlink/projects/43/ 3. IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Vancouver, Canada, December 10, 2011 http://www.nd.edu/~dial/climkd11/ 08/24/2011 University of Minnesota 30 Thanks & Questions Contact [email protected] Personal Homepage http://www.nd.edu/~ksteinha NSF Expeditions on Understanding Climate Change http://climatechange.cs.umn.edu This work was supported in part by the National Science Foundation under Grants OCI-1029584 and BCS-0826958. This research was also funded in part by the project entitled “Uncertainty Assessment and Reduction for Climate Extremes and Climate Change Impacts” under the initiative “Understanding Climate Change Impact: Energy, Carbon, and Water Initiative” within the Laboratory Directed Research and Development (LDRD) Program of the Oak Ridge National Laboratory, managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract DE-AC05-00OR22725. 08/24/2011 University of Minnesota 31