* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download expedition_RSV
Global warming controversy wikipedia , lookup
Global warming wikipedia , lookup
Soon and Baliunas controversy wikipedia , lookup
Heaven and Earth (book) wikipedia , lookup
Climate resilience wikipedia , lookup
Atmospheric model wikipedia , lookup
Politics of global warming wikipedia , lookup
Economics of global warming wikipedia , lookup
Climate change adaptation wikipedia , lookup
Climate change denial wikipedia , lookup
Climate change feedback wikipedia , lookup
Climate change in Tuvalu wikipedia , lookup
Michael E. Mann wikipedia , lookup
Instrumental temperature record wikipedia , lookup
Fred Singer wikipedia , lookup
Climate change and agriculture wikipedia , lookup
Climate engineering wikipedia , lookup
Citizens' Climate Lobby wikipedia , lookup
Climate change in the United States wikipedia , lookup
Public opinion on global warming wikipedia , lookup
Climatic Research Unit email controversy wikipedia , lookup
Media coverage of global warming wikipedia , lookup
Climate governance wikipedia , lookup
Solar radiation management wikipedia , lookup
Climate sensitivity wikipedia , lookup
Attribution of recent climate change wikipedia , lookup
Scientific opinion on climate change wikipedia , lookup
Climate change and poverty wikipedia , lookup
Effects of global warming on humans wikipedia , lookup
Climate change, industry and society wikipedia , lookup
IPCC Fourth Assessment Report wikipedia , lookup
General circulation model wikipedia , lookup
Surveys of scientists' views on climate change wikipedia , lookup
Mining Climate and Ecosystem Data l l l l l l Team Motivation Transformative Computer Science Research – Predictive Modeling – Complex Networks – Association Analysis – High Performance Computing Evaluation Plan Relationship to Physics Based Models Management and Collaboration Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› Team Vipin Kumar, UM Fred Semazzi, NCSU Auroop Ganguly, UTK/ORNL Nagiza Samatova, NCSU Arindam Banerjee, UM Joe Knight, UM Shashi Shekhar, UM Peter Snyder, UM Jon Foley, UM Alok Choudhary, NW Abdollah Homiafar, NCA&T Michael Steinbach, UM Singdhansu Chatterjee, UM Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› Team Qualifications Team members are well-recognized experts in Analysis of ecosystem and climate data • Foley, Ganguly, Kumar, Semazzi, Snyder, Steinbach Data mining, machine learning, nonlinear dynamics & signal processing • Banerjee, Chatterjee, Choudhary, Ganguly, Homaifar, Kumar, Samatova, Shekhar, Steinbach High performance computing • Choudhary, Kumar, Samatova Weather and climate models • Ganguly, Foley, Semazzi, Snyder Remote sensing and land cover change • Foley, Knight, Snyder Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› • The planet is warming • Multiple lines of evidence • Credible link to human GHG (green house gas) emissions • Consequences can be dire • Extreme weather events • Regional climate and ecosystem shifts • Abrupt climate change • Stress on key resources and critical infrastructures • There is an urgency to act • Adaptation: “Manage the unavoidabale” • Mitigation: “Avoid the unmanageable” • The societal cost of both action and inaction is large Mining Climate and Ecosystem Data Anomalies from 1880-1919 (K) Climate Change: The defining issue of our era Figure Courtesy: ORNL Key outstanding science challenge: Actionable predictive insights to credibly inform policy NSF RSV June 2010 ‹#› Physics-based Models are Essential but Not Adequate l l Models make relatively reliable predictions at global Disagreement between IPCC models scale for ancillary variables: – Sea Surface Temperature (SST) – Temperature/humidity profiles over land – Wind spread at different heights They provide least reliable predictions for variables that are crucial for impact assessment: – Regional precipitation and extremes – Hurricane intensity and frequency – Droughts and floods “The sad truth of climate science is that the most crucial information is the least reliable” (Nature, 2010) Regional hydrology (“P–E” changes in 2030s) exhibits large variations among major IPCC model projections Hypothesis-driven “manual” conceptual models try to address this gap: l l Hurricane models (Emanuel et al, BAMS, 2008) Regional-scale precipitation extremes (O’Gorman & Schneider, PNAS, 2008; Sugiyama et al, PNAS, 2010) We need a systematic approach to semi-automatic data-driven model inference. Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› Data-Driven Knowledge Discovery in Climate Science l l From data-poor to data-rich transformation – Sensor Observations: Remote sensors like satellites and weather radars as well as in-situ sensors and sensor networks like weather station and radiation measurements – Model Simulations: IPCC climate or earth system models as well as regional models of climate and hydrology, along with observed data based model reconstructions Data-guided insights can complement physics-based models – Transform global ancillary information to regional critical climate changes and extremes – Assess reliability of model projections and inform physics model parameterizations – Validate predictions with both held-out data and science understanding – Provide relatively hypothesis-free discovery processes to supplement hypothesisguided data analysis "The world of science has changed ... data-intensive science [is] so different that it is worth distinguishing [it] … as a new, fourth paradigm for scientific exploration." - Jim Gray Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› Challenges in Analyzing Eco-Climate Data Global Sea Surface Temperature Discovering teleconnections El Nino Events Correlation Between ANOM 1+2 and Land Temp (>0.2) 90 0.8 0.6 60 0.4 30 latitude 0.2 Nino 1+2 Index 0 0 -0.2 -30 -0.4 -60 -0.6 -0.8 -90 -180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180 longitude Challenges due to data characteristics • • • • • • • Spatiotemporal, non-stationary, non-i.i.d. Nonlinear multiscale dependencies Low frequency variability Massive data sets Long range spatial dependencies Long memory temporal dependencies … Relationship between El Nino and Fires in Indonesia Changes in Global Forest Cover Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› Transformative Computer Science Research Enabling large-scale data-driven science for complex, multivariate, spatio-temporal, non-linear, and dynamic systems: The Expedition is an end-to-end demonstration of this major paradigm for future knowledge discovery process. Relationship Mining Enable discovery of complex dependence structures such as non linear associations or long range spatial dependencies Nonlinear, spatio-temporal, multivariate, persistence, long memory Complex Networks Enable studying of collective behavior of interacting ecoclimate systems Nonlinear, space-time lag, geographical, multi-scale • Fusion plasma • Combustion • Astrophysics • …. Predictive Modeling Enable predictive modeling relationships Community structure- of typical and extreme behavior from multivariate function-dynamics spatio-temporal data kernels , features, dependencies Nonlinear, spatio-temporal, multivariate High Performance Computing Enable efficient large-scale spatio-temporal analytics on future generation exascale HPC platforms with complex memory hierarchies Large scale, spatio-temporal, unstructured, dynamic Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› Relationship Mining Objective • Computer Science Innovations • • • • Dynamic behavior of the high and low pressure fields corresponding to NOA climate index (Portis et al, 2001) To develop methods for discovery of complex dependence structures (e.g., nonlinear associations, long-range spatial dependencies) Define a notion of “transactions” for association analysis in spatio-temporal data Adapt association analysis to continuous data Reduce the number of redundant patterns and assess statistical significance of the identified patterns Define a new approach to association analysis that trades off completeness for a smaller, simpler set of patterns and far smaller computational requirements Mining Climate and Ecosystem Data NSF RSV Impact • Facilitate discovery of complex dependence structures in dynamic systems • Understand relationships between fire size and frequency to precipitation, land cover, ocean temperature, etc • Characterize impacts of sea surface temperature on hurricane frequency and intensity • Understand feedback effects, the key uncertainty in climate science June 2010 ‹#› Complex Networks Objectives • To develop algorithms for characterization of network structure, function, and dynamics • To enable large-scale comparative analysis of climate networks to increase prediction confidence Node degree in a correlation-based climate network Computer Science Innovations Impact • Construct multivariate networks to capture nonlinear spatio-temporal interactions • • Characterize network dynamics at multiple spatial scales over different time periods • • Scale graph mining algorithms to the required size and number for comparative analysis of multiple real-world networks Understand intricate interplay between topology and dynamics of the climate system over many spatial scales • Explain the 20th century great climate shifts from the collective behavior of interacting subsystems Mining Climate and Ecosystem Data NSF RSV Apply network structure-function-dynamics methods to complex systems June 2010 ‹#› Alok Choudhary, NWU Nagiza Samatova, NCSU Vipin Kumar, UMN Predictive Modeling Objectives • To advance nonparametric multivariate spatiotemporal probabilistic regression models to incorporate nonlinear dependencies = GP1 To develop latent variable models to characterize spatio-temporal climate states • To advance multivariate extreme value theory through geometric and probabilistic generalizations of quantiles • • • • Take into account nonlinear spatial, temporal, and multivariate dependencies Minimize assumptions on temporal evolution, e.g., no`Markov’ assumptions Deal with curse of dimensionality in nonlinear extreme value analysis (recency, frequency, duration) and quantiles Scale algorithms yet ensure uncertainty quantification Mining Climate and Ecosystem Data GP2 Correlated Gaussian Processes for multivariate spatiotemporal regression w/ nonlinear dependencies • Computer Science Innovations + Multivariate quantiles for extreme value analysis • Impact • • • NSF RSV Uncertainty quantified predictions of climate variables, e.g., precipitation, over space-time Abrupt climate change detection, typical climate characterization Modeling regional climate change and extreme climate events, e.g., hurricanes Applications beyond climate data, e.g., finance, healthcare, bioinformatics, networks June 2010 ‹#› High Performance Computing Objectives • To investigate a “co-design” approach to scalable analysis of complex dependence structures • To develop parallel and scalable statistical and data mining functions and kernels • To develop sophisticated parallel I/O techniques that fully exploit complex memory hierarchy (disks, SSDs, DRAMs, accelerators) Computer Science Innovations • • The first“co-design” approach that will take into account exploration of nonlinear associations, long-range spatial dependencies, I/O requirements and optimizations, presence of accelerators and multiple suitable programming models for HPC systems Novel optimizations to deal with data-andcompute intensive computations on exascale platforms with complex memory hierarchies Mining Climate and Ecosystem Data NSF RSV An architecture with accelerators such as GPUs. Some/all nodes have accelerators and many cores. Impact • • Analytical kernels that will scale many data mining algorithms Analytics algorithms designed for effective exploration of complex spatiotemporal dependence structures June 2010 ‹#› Synergy with Physical Modeling Community CS and Climate Science Synergies • • Physics based models inform data mining • More credible variables (e.g., SST) more crucial variables (e.g., precipitations/hurricanes) • Better modeled processes (e.g., atmospheric physics) more crucial processes (e.g., land surface hydrology) • Global and century scale regional and decadal scale • • • Foley, Ganguly, Semazzi, and Snyder Collaborators: Potter (NASA), Erickson (ORNL) Active involvement in the IPCC assessment reports Strong Ties w/ Climate Community Data mining informs physics based models • • Prominent climate science expertise on the team • Better understanding of climate processes Partnership with ORNL, NCAR, LLNL, LBL climate modeling and simulated data archival groups: • Improved insights for parametrization schemes • Partnership with NASA, ORNL, NOAA observed and reanalysis data archival and analysis groups: • Mining Climate and Ecosystem Data NSF RSV Community Climate System Model (NCAR, ORNL, LBL), Earth Systems Grid, IPCC data archives (PCMDI at LLNL and HPSS at ORNL), Climate data analysis and extremes (ORNL, LBL) Remote sensed satellite and other data (NASA), in situ sensors like DOE's ARM (ORNL, PNNL), Reanalysis and observed data assimilation (NOAA) June 2010 ‹#› Measures of Success in CS Research Does the proposed research enable significantly better data analysis than before or enable new kinds of analysis? l l Predictive Modeling – Improved capabilities for multivariate spatio-temporal regression that can simultaneously capture the dependencies between spatial-temporal objects (e.g., temperature, pressure) and help find novel climate patterns not found by standard approaches – New capabilities for detecting extreme events using quantiles and quantile regression for multivariate data Complex Networks – Detection of known climate patterns and tracking of the evolution of climate patterns in time using complex networks that capture non-linear relationships in multivariate and multiscale data Mining Climate and Ecosystem Data l l NSF RSV Relationship Mining – Creation of entirely new capabilities in association mining from approaches that extend / modify traditional approaches to handle spatio-temporal data – Creation of a completely novel approach for association analysis that reduces the number of patterns and the time required to find them – Improvement in the performance of complex networks and predictive models by using nonlinear relationships that more faithfully represent reality High Performance Computing – Scalable analytics code for spatio-temporal data – Enabling of large scale data driven science that serves as a demonstration of the value of the data driven paradigm June 2010 ‹#› Climate-based Measures of Success l Ultimate success is to have data-driven analysis included as a standard part of climate projections and impact assessment (e.g., for IPCC). l Achieving this will require measuring and demonstrating success in two key areas – Filling critical gaps in climate science (Nature, 2010) – Improve climate change impact assessments and inform policy l Reduction of uncertainty in climate predictions at regional and decadal scales Improved predictive insights for key precipitation processes Providing credible assessments of extreme hydro-meteorological events New knowledge about climate processes (e.g., teleconnection patterns) Distinguishing between natural and anthropogenic causes Improved assessments of climate change risks across multiple sectors Specific use cases will provide the context for these evaluations Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› Science Impact View: Evaluation Methodology • Can we improve the projection (e.g., hurricane frequency, intensity)? • Can the data-driven model identify: • • which regional climate variables are informative/causal (e.g., SST, wind speed)? • what is the relationship (+/- feedback) between these variables? To what extent does the data-driven model inform climate scientists? Forecast & Compare w/ Train Model Observation Data Hindcast Prediction 1870 1950 • Reanalysis data • Observed data Forecast & Multimodel Comparison 2010 (now) 2100 • Model projection data • Observed data Illustrative Example for Hurricane Frequency Use Case Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› Conceptual View: Driving Use-Cases CS Technologies Characterize feedbacks contributing to abrupt climate regime changes Yr5+ Yr5 Identify drought and pluvial event triggers to improve prediction Understand and quantify the uncertainty of feedback effects in climate Yr4 Project climate extremes at regional and decadal scales Yr3 Understand relationships between fire size and frequency to precipitation, land cover, ocean temperature, etc. Yr2 Yr1 Characterize impacts of sea surface temperature on hurricane frequency and intensity Data Challenges Non-stationary Long-memory, long- Heterogeneous range, multiscale non-i.i.d Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› CS Technology View: Deliverables Roadmap Year 1-2 Multivariate spatio-temporal regression, Climate state estimation Extreme value theory based Uncertainty quantification in on geom./prob. quantiles single regime-shift detection Multivariate network Handling nonlinear spatio- Dynamic community construction temporal interactions detection & tracking A notion of “transactions” in spatio-temporal data Year 5-5+ Year 3-4 Adapt association analysis to continuous data Impact of SST on hurricane intensity & frequency Scalable parallel analysis kernels Statistical significance assessment of relationships Parallel I/O techniques to exploit memory hierarchy Identify drought and pluvial event triggers to improve prediction Mining Climate and Ecosystem Data Graph perturbation theory for “what-if” exploration NSF RSV Attribution in multiregime-shifts Comparative multiple network analysis Detection of nonlinear associations/relationships Co-design scalable complex dependence structures Data driven analysis as a standard (e.g. in IPCC) Community software, benchmarks outreach June 2010 ‹#› Use Case: Hurricane Frequency Steps for Discovery of Multivariate Non-linear Interactions Yr1: UTK/ORNL, UMN, NWU Yr1.5: NCSU, UMN, NWU IPCC AR4 Models: CMIP3 datasets Monthly mean sea surface temperature Monthly mean atmospheric temperature Daily horizontal wind at 250/850 hPa 1. Pre-process ancillary climate model outputs 2. Construct multivariate nonlinear climate network 3. Detect & track communities Steps for Predictive Modeling of Hurricanes Yr1.5: UMN, UTK/ORNL Yr2: UMN, UTK/ORNL Yr2.5: UMN, UTK/ORNL SST impact on hurricane frequency & intensity 3. Find non-linear relationships 4. Validate w/ hindcasts Mining Climate and Ecosystem Data NSF RSV 5. Determine non-stationary non-i.i.d. climates states & Build hurricane models June 2010 ‹#› Use Case: Regional Precipitation Step1: Conceptual physics models (O’Gorman and Schneider 2009) and relationship mining identify variables in 3D (space, time, vertical) neighborhoods with information relevant for predicting precipitation Step 2: Precipitation mean and extremes are projected with ancillary variables in 3D neighborhoods with predictive modeling Step 3: Complex networks are constructed over oceans using relationship mining Step 4: Complex networks develop proxies for global and regional scale ocean dynamics leading to set of potential predictors Step 6: The (Step 2) 3D Step 5: Teleconnections are developed to neighborhood-based predictions predict regional precipitation change and their are combined with teleconnctionextremes based on both relationship mining based predictions with fusion of and predictive modeling predictive modeling Mining Climate and Ecosystem Data NSF RSV Step 7: Regional precipitation prediction gains are run through cross-validation and interpreted with climate science June 2010 ‹#› Use Case: 1930s Dust Bowl 1930s Dust Bowl Drought Affected almost two-thirds of the U.S. Centered over the agriculturally productive Great Plains Drought initiated by anomalous tropical SSTs (Teleconnections) NASA NSIPP Model: 14 C20C datasets Ensemble of 14 100-year simulations of the 20th Century Forced with observed monthly SSTs Allows for assessment of how much the SSTs control Great Plains climate variations Information content in C20C model simulations along with ancillary However, the abrupt change observations can be used to identify in the precipitation regime and decade-long persistence and predict drought “triggers” and the contribution of complex feedbacks to of drought thought to be abrupt precipitation regime changes related to strong landand drought persistence atmosphere feedbacks (e.g., precipitation recycling) Global SST anomalies averaged over 1932-1938. Boxes represent subregions used to identify the relative importance of SST anomalies in initiating the the Dust Bowl drought. Time series of precipitation anomalies averaged over the Great Plains region for 14 ensemble members of C20C run. Anomalous Tropical SSTs explain most of the 20th Century drought events. Mining Climate and Ecosystem Data NSF RSV June 2010 ‹#› From Schubert et al. (2004) Use Case: Regime Shift in Sahara & Sahel Regime Shift in Sahara Sudden shift from `Green Sahara’ to `Desert Sahara’ Sahara consisted of extensive vegetation, lakes, wetlands according to Geologic data. Complex system with more than one stable state. Vegetation cover and precipitation patterns Sudden transition from vegetated to desert conditions around 5500 years ago Sahel zone -- What were underlying conditions? -- What triggered the regime shift? Data analysis for hypothesis generation, testing: Sahel drought led to widespread famine, ecosystem degradation, and dispersion of its inhabitants Reduction in Precipitation Such sudden, large change in environmental–or “regime shift”—often occurs without advance warning. Underlying conditions may predispose a system to a regime shift. A fairly small event, e.g., storm, drought, etc., may trigger the shift to a new stable state. Mining Climate and Ecosystem Data Northern Africa is dominated by the Sahara. The transition between the Sahara and the savanna/tropical forest to the south occurs in the Sahel. Key Questions stay unanswered: Regime Shift in Sahel Sahel is the transition zone between the Sahara to the north and the tropical savanna and evergreen forests to the south Onset of major 30-year drought over the Sahel region in 1969 Precipitation patterns are tightly correlated with vegetation cover patterns. NSF RSV + Reduction in Vegetation Possible hypothesis: Land-atmosphere feedback process June 2010 ‹#›