* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Temporal Causal Models for Massive Time-series Data
Climate change denial wikipedia , lookup
Economics of global warming wikipedia , lookup
Climate resilience wikipedia , lookup
Climate change adaptation wikipedia , lookup
Michael E. Mann wikipedia , lookup
Climate change feedback wikipedia , lookup
Fred Singer wikipedia , lookup
Numerical weather prediction wikipedia , lookup
Climate change in Tuvalu wikipedia , lookup
Climatic Research Unit email controversy wikipedia , lookup
Climate change and agriculture wikipedia , lookup
Climate governance wikipedia , lookup
Media coverage of global warming wikipedia , lookup
Climate engineering wikipedia , lookup
Citizens' Climate Lobby wikipedia , lookup
Public opinion on global warming wikipedia , lookup
Climate change in the United States wikipedia , lookup
Scientific opinion on climate change wikipedia , lookup
Climate sensitivity wikipedia , lookup
Atmospheric model wikipedia , lookup
Climate change and poverty wikipedia , lookup
Effects of global warming on humans wikipedia , lookup
Effects of global warming on Australia wikipedia , lookup
Attribution of recent climate change wikipedia , lookup
IPCC Fourth Assessment Report wikipedia , lookup
Climatic Research Unit documents wikipedia , lookup
Surveys of scientists' views on climate change wikipedia , lookup
Climate change, industry and society wikipedia , lookup
2011 Japan-America Frontiers of Engineering Symposium June 6-8, 2011 Temporal Causal Models for Massive Time-series Data Mining: Climate Change Attribution and other Applications Yan Liu Computer Science Department Viterbi School of Engineering University of Southern California 1 Climate Change: One of the Most Critical Issues Mankind Faces in the 21st Century Slide 2 Understanding Climate System is Imperative to Devising Potential Solutions Climate system involves complex relationships between large number of variables Need to understand and quantify “causal” effects of the various parameters Slide 3 Challenges with Existing Climate Models 23 widely used global climate models: Model inter- Forward-simulation approach comparison project: http://www.clivar.org/organization/wgcm/cmip.php Slide 4 Massive Amount of Spatial-temporal Data on Climate and Climate-forcing Agents Surface and atmospheric climate Human agents: Land Cover Snow, Ice and Frozen Ground Human agents: atmospheric constituents Solar Radiation Slide 5 Massive Climate Data: New Opportunities for Machine Learning Slide 6 Machine Learning Solution for Climate Modeling and Analysis Input Output Climate Change Attribution Analysis Slide 7 Roadmap Introduction of Granger Graphical Models Examples of Granger Graphical Models Granger Graphical Models for Climate Change Attribution Experiment Results on Biology Applications Slide 8 Roadmap Introduction of Granger Graphical Models Examples of Granger Graphical Models Granger Graphical Models for Climate Change Attribution Experiment Results on Biology Applications Slide 9 Temporal Causal Modeling by Graphical Granger Modeling Methods Our proposed approach for time-series analysis Graphical modeling using the notions of Granger causality and methods of variable selection Granger causality by the Nobel prize winning economist, Clive Granger Definition: a time series x is said to “Granger cause” another time series y, if and only if regressing for y in terms of both past values of y and x is statically significantly better than that of regressing in terms of past values of y only x y Slide 10 Graph Structure Learning Graph Structure learning [Heckerman, 1995] has been an active research area for decades Recent progress on L1-penalized regression method for graph structure learning LASSO regression for neighborhood selection [Meinshausen and Bühlmann, Ann. Stat. 06] Consider the p-dimensional multivariate normal distributed random variable: $ $ The neighborhood selection can be solved efficiently with the LASSO Block sub-gradient algorithm for finding precision matrix [Banerjee, JMLR 08] Efficient fixed-point equations based on a sub-gradient algorithm [Friedman et al., Biostatistics 08] Slide 11 Generic Temporal Causal Modeling Method [KDD 2007 joint work with Arnold, Abe] Neighborhood selection An example of REG can be Lasso [Tibshirani, 1996] Granger Causality Structure learning is possible even when the number of variables is significantly larger than that of the samples Slide 12 Temporal Causal Modeling for Time-series Data Analysis Natural grouping of variables Group Lasso and group boosting [KDD 2009; ISMB 2009, with Lozano, Abe and Rosset] Non-stationary Dynamic linear system [KDD 2009, with Kalagnanam and Johnsen] Non-linear time-series Non-parametric approach [AAAI 2010, with Chen, Liu and Carbonell] Spatial time-series Spatio-temporal regression via group elastic net [KDD 2009, with Lozano et al.] Relational time-series Hidden Markov random field [Snowbird, ICML 2010, with Niculescu-Mizi, Lozano and Lu] Extreme event modeling Spatial-temporal extreme value models [KDD 2009, with Lozano et al; NIPS 2011 in preparation] Slide 13 Roadmap Introduction of Granger Graphical Models Examples of Granger Graphical Models Granger Graphical Models for Climate Change Attribution Experiment Results on Biology Applications Slide 14 Example 1: Relational Multivariate Time-Series Data [ICML 2010, Liu et al] Input: multivariate time-series X(1), …, X(M) and relational graph GM Goal: learn a reasonable temporal causal graph for each location/species .. Slide 15 Proposed approach: Hidden Markov Random Field with L1 Penalty (HMRF-L1) Slide 16 Proposed approach: Hidden Markov Random Field with L1 –Penalty (HMRF-L1) Define a hidden Markov Random Field on relational graph GM Assign a hidden state s(i) to each time-series X(i) Time-series that share the same state will share component networks Use EM to jointly infer the hidden state assignments and the causal structure associated with each state Slide 17 Climate Modeling and Analysis We used the following 18 variables containing climate, solar radiation and greenhouse gas data Data pre-processing (adhering to standard practices in climate modeling) 2.5x2.5 degree grid for North America, Monthly data for 1989-2002 with 3 months temporal lag Data interpolation: a common grid to join multiple data sources using smoothing splines De-seasonalization: removing seasonal averages Slide 18 Experiment Results: Location-Specific Climate Modeling Clusters of US locations by our method (number of clusters = 3) Map of US CO2 Concentration (http://www.purdue.edu/eas/carbon/vulcan/GEarth) Causal graphs associated with each state Slide 19 Example 2: Extreme Event Modeling Extreme weather events happen from time to time Examples include heat wave, hurricane, tornado, flooding They are rare events, but lead to severe consequences Slide 20 Example 2: Extreme Event Modeling Key questions to be answered: Will the extreme weather happen more intensively? Will the extreme weather happen more frequently? Our approach: hierarchical Bayesian spatio-temporal dynamic model via extreme value distribution Quantify the stochastic behavior of a process at unusually large or small levels A point process incorporating spatio-temporal dependence structures Slide 21 Climate Extreme Event Attribution We used the following 18 variables containing climate, solar radiation and greenhouse gas data Output causal structures in decreasing degrees of sparsity Slide 22 Roadmap Introduction of Granger Graphical Models Examples of Granger Graphical Models Granger Graphical Models for Climate Change Attribution Experiment Results on Biology Applications Slide 23 Gene Regulatory Network Discovery [ISMB 2010] Gene expression regulatory networks for the human cancer cell HeLa S3 [Whitfield et al., 2002] Existing methods in the literature are unable to Accommodate lags greater than one Handle causality tests involving a large number of genes simultaneously Our method addresses both limitations, achieved higher accuracy, and was able to uncovered previously uncaptured relationships CCNA2 to PCNA verified in [Liu, et al 2007] CCNE1 to ETF1 verified in [Merdzhanova, et al 2007] CCNE1 to CDC6 verified in [Furstenthal, et al 2001] BioGRID Recent Literature Evaluation against BioGRID Precision Recall F1 Our method 0.50 0.72 0.59 Sambo et al. (2008) 0.36 0.44 0.40 Causal graphs discovered by our method Slide 24 Granger Graphical Models for Time-series Analysis A general framework to reveal important dependency information about timeseries data Extensions to application data with different properties Applications: computational biology, climate science, production management Data properties: non-stationary, non-paranormal, relational data, spatial data, natural grouping On-going work Scalable models to massive data: online algorithms, parallel algorithms Anomaly detection and prediction: scalable and interpretable solutions Hidden variables: automatically identifying the existence of hidden variables Other applications: social-media analysis Slide 25 Acknowledge USC Melody Lab Taha Bahadori Yanting Wu Shiv Prakash IBM Research Aurelie Lozano Naoki Abe Hongfei Li Alexandru Niculescu-Mizil Harvard Medical School Yong Lu Slide 26