Download Temporal Causal Models for Massive Time-series Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Climate change denial wikipedia , lookup

Economics of global warming wikipedia , lookup

Climate resilience wikipedia , lookup

Climate change adaptation wikipedia , lookup

Michael E. Mann wikipedia , lookup

Climate change feedback wikipedia , lookup

Fred Singer wikipedia , lookup

Numerical weather prediction wikipedia , lookup

Climate change in Tuvalu wikipedia , lookup

Climatic Research Unit email controversy wikipedia , lookup

Climate change and agriculture wikipedia , lookup

Climate governance wikipedia , lookup

Media coverage of global warming wikipedia , lookup

Climate engineering wikipedia , lookup

Citizens' Climate Lobby wikipedia , lookup

Public opinion on global warming wikipedia , lookup

Climate change in the United States wikipedia , lookup

Scientific opinion on climate change wikipedia , lookup

Climate sensitivity wikipedia , lookup

Atmospheric model wikipedia , lookup

Climate change and poverty wikipedia , lookup

Effects of global warming on humans wikipedia , lookup

Effects of global warming on Australia wikipedia , lookup

Attribution of recent climate change wikipedia , lookup

IPCC Fourth Assessment Report wikipedia , lookup

Climatic Research Unit documents wikipedia , lookup

Surveys of scientists' views on climate change wikipedia , lookup

Climate change, industry and society wikipedia , lookup

Solar radiation management wikipedia , lookup

General circulation model wikipedia , lookup

Transcript
2011 Japan-America Frontiers of Engineering Symposium June 6-8, 2011
Temporal Causal Models for Massive Time-series Data
Mining: Climate Change Attribution and other Applications
Yan Liu
Computer Science Department
Viterbi School of Engineering
University of Southern California
1
Climate Change: One of the Most Critical Issues Mankind Faces in the 21st Century
Slide 2
Understanding Climate System is Imperative to Devising Potential Solutions
 Climate system involves complex relationships between large number of variables
 Need to understand and quantify “causal” effects of the various parameters
Slide 3
Challenges with Existing Climate Models
  23 widely used global climate models: Model inter-
 
Forward-simulation approach
comparison project:
http://www.clivar.org/organization/wgcm/cmip.php
Slide 4
Massive Amount of Spatial-temporal Data on Climate and Climate-forcing Agents
Surface and atmospheric climate
Human agents: Land Cover
Snow, Ice and Frozen Ground
Human agents:
atmospheric constituents
Solar Radiation
Slide 5
Massive Climate Data: New Opportunities for Machine Learning
Slide 6
Machine Learning Solution for Climate Modeling and Analysis
Input
Output
Climate Change Attribution Analysis
Slide 7
Roadmap
  Introduction of Granger Graphical Models
  Examples of Granger Graphical Models
  Granger Graphical Models for Climate Change Attribution
  Experiment Results on Biology Applications
Slide 8
Roadmap
  Introduction of Granger Graphical Models
  Examples of Granger Graphical Models
  Granger Graphical Models for Climate Change Attribution
  Experiment Results on Biology Applications
Slide 9
Temporal Causal Modeling by Graphical Granger Modeling Methods
 Our proposed approach for time-series analysis
 
Graphical modeling using the notions of Granger causality and methods of variable selection
 Granger causality by the Nobel prize winning economist, Clive Granger
 
Definition: a time series x is said to “Granger cause” another time series y, if and only if
regressing for y in terms of both past values of y and x is statically significantly better than
that of regressing in terms of past values of y only
x
y
Slide 10
Graph Structure Learning
 Graph Structure learning [Heckerman, 1995] has been an active research area
for decades
 Recent progress on L1-penalized regression method for graph structure learning
 
LASSO regression for neighborhood selection [Meinshausen and Bühlmann, Ann. Stat. 06]
Consider the p-dimensional multivariate normal distributed random variable: $
$
The neighborhood selection can be solved efficiently with the LASSO
Block sub-gradient algorithm for finding precision matrix [Banerjee, JMLR 08]
  Efficient fixed-point equations based on a sub-gradient algorithm [Friedman et al.,
Biostatistics 08]
 
Slide 11
Generic Temporal Causal Modeling Method [KDD 2007 joint work with Arnold, Abe]
Neighborhood
selection
An example of REG can be Lasso [Tibshirani, 1996]
Granger Causality
Structure learning is possible even when the number of variables is
significantly larger than that of the samples
Slide 12
Temporal Causal Modeling for Time-series Data Analysis
 Natural grouping of variables
 
Group Lasso and group boosting [KDD 2009; ISMB 2009, with Lozano, Abe and Rosset]
 Non-stationary
 
Dynamic linear system [KDD 2009, with Kalagnanam and Johnsen]
 Non-linear time-series
 
Non-parametric approach [AAAI 2010, with Chen, Liu and Carbonell]
 Spatial time-series
 
Spatio-temporal regression via group elastic net [KDD 2009, with Lozano et al.]
 Relational time-series
 
Hidden Markov random field [Snowbird, ICML 2010, with Niculescu-Mizi, Lozano and Lu]
 Extreme event modeling
 
Spatial-temporal extreme value models [KDD 2009, with Lozano et al; NIPS 2011 in
preparation]
Slide 13
Roadmap
  Introduction of Granger Graphical Models
  Examples of Granger Graphical Models
  Granger Graphical Models for Climate Change Attribution
  Experiment Results on Biology Applications
Slide 14
Example 1: Relational Multivariate Time-Series Data
[ICML 2010, Liu et al]
  Input: multivariate time-series X(1), …, X(M) and relational graph GM
  Goal: learn a reasonable temporal causal graph for each location/species ..
Slide 15
Proposed approach: Hidden Markov Random Field with L1 Penalty
(HMRF-L1)
Slide 16
Proposed approach: Hidden Markov Random Field with L1 –Penalty
(HMRF-L1)
 Define a hidden Markov Random Field on relational graph GM
  Assign a hidden state s(i) to each time-series X(i)
  Time-series that share the same state will share component networks
 Use EM to jointly infer the hidden state assignments and the causal
structure associated with each state
Slide 17
Climate Modeling and Analysis
  We used the following 18 variables containing climate, solar radiation and greenhouse gas data
  Data pre-processing (adhering to standard practices in climate modeling)
 
 
 
2.5x2.5 degree grid for North America, Monthly data for 1989-2002 with 3 months temporal lag
Data interpolation: a common grid to join multiple data sources using smoothing splines
De-seasonalization: removing seasonal averages
Slide 18
Experiment Results: Location-Specific Climate Modeling
Clusters of US locations by our method
(number of clusters = 3)
Map of US CO2 Concentration
(http://www.purdue.edu/eas/carbon/vulcan/GEarth)
Causal graphs associated with each state
Slide 19
Example 2: Extreme Event Modeling
 Extreme weather events happen from time to time
Examples include heat wave, hurricane, tornado, flooding
  They are rare events, but lead to severe consequences
 
Slide 20
Example 2: Extreme Event Modeling
 Key questions to be answered:
Will the extreme weather happen more intensively?
  Will the extreme weather happen more frequently?
 
 Our approach: hierarchical Bayesian spatio-temporal dynamic model via extreme
value distribution
Quantify the stochastic behavior of a process at unusually large or small levels
  A point process incorporating spatio-temporal dependence structures
 
Slide 21
Climate Extreme Event Attribution
  We used the following 18 variables containing climate, solar
radiation and greenhouse gas data
Output causal structures in
decreasing degrees of sparsity
Slide 22
Roadmap
  Introduction of Granger Graphical Models
  Examples of Granger Graphical Models
  Granger Graphical Models for Climate Change Attribution
  Experiment Results on Biology Applications
Slide 23
Gene Regulatory Network Discovery [ISMB 2010]
 Gene expression regulatory networks for the human cancer cell HeLa S3
[Whitfield
et al., 2002]
 Existing methods in the literature are unable to
Accommodate lags greater than one
  Handle causality tests involving a large number of genes simultaneously
 
 Our method addresses both limitations, achieved higher accuracy, and was able
to uncovered previously uncaptured relationships
CCNA2 to PCNA verified in [Liu, et al 2007]
  CCNE1 to ETF1 verified in [Merdzhanova, et al 2007]
  CCNE1 to CDC6 verified in [Furstenthal, et al 2001]
 
BioGRID
Recent Literature
Evaluation against BioGRID
Precision
Recall
F1
Our method
0.50
0.72
0.59
Sambo et al. (2008)
0.36
0.44
0.40
Causal graphs discovered by our method
Slide 24
Granger Graphical Models for Time-series Analysis
 A general framework to reveal important dependency information about timeseries data
 Extensions to application data with different properties
Applications: computational biology, climate science, production management
  Data properties: non-stationary, non-paranormal, relational data, spatial data, natural grouping
 
 On-going work
Scalable models to massive data: online algorithms, parallel algorithms   Anomaly detection and prediction: scalable and interpretable solutions   Hidden variables: automatically identifying the existence of hidden variables   Other applications: social-media analysis
 
Slide 25
Acknowledge
 USC Melody Lab
Taha Bahadori
Yanting Wu
Shiv Prakash
 IBM Research
Aurelie Lozano
Naoki Abe
Hongfei Li
Alexandru Niculescu-Mizil
 Harvard Medical School
Yong Lu
Slide 26