Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Theoretical ecology wikipedia, lookup

Regression analysis wikipedia, lookup

Operational transformation wikipedia, lookup

Predictive analytics wikipedia, lookup

Computer simulation wikipedia, lookup

Numerical weather prediction wikipedia, lookup

Data assimilation wikipedia, lookup

History of numerical weather prediction wikipedia, lookup

Vector generalized linear model wikipedia, lookup

Expectation–maximization algorithm wikipedia, lookup

General circulation model wikipedia, lookup

Least squares wikipedia, lookup

Tropical cyclone forecast model wikipedia, lookup

Atmospheric model wikipedia, lookup

Generalized linear model wikipedia, lookup

Transcript
```Vector Generalized Additive Models
and applications to extreme value
analysis
(1)
(2)
Olivier Mestre (1,2)
Météo-France, Ecole Nationale de la Météorologie, Toulouse, France
Université Paul Sabatier, LSP, Toulouse, France
Based on previous studies realized in collaboration with :
Stéphane Hallegatte (CIRED, Météo-France)
Sébastien Denvil (LMD)
SMOOTHER
« Smoother=tool for summarizing the trend of a response measurement Y
as a function of predictors » (Hastie & Tibshirani)
estimate of the trend that is less variable than Y itself
 Smoothing matrix S
Y*=SY
The equivalent degrees of freedom (df) of the smoother S is the trace of S.
Allows compare with parametric models.
 Pointwise standard error bands
COV(Y*)=V=S tS ² given an estimation of ², this allows approximate
confidence intervals (values : ±2square root of the diagonal of V)
SCATTERPLOT SMOOTHING EXAMPLE
 Data: wind farm production vs numerical windspeed forecasts
SMOOTHING
 Problems raised by smoothers
How to average the response values in
each neighborhood?
How large to take the neighborhoods?

Tradeoff between bias and variance of Y*
SMOOTHING: POLYNOMIAL (parametric)
 Linear and cubic parametric least squares fits: MODEL DRIVEN
APPROACHES
SMOOTHING: BIN SMOOTHER
 In this example, optimum intervals are determined by means of a
regression tree
SMOOTHING: RUNNING LINE
 Running line
KERNEL SMOOTHER
SMOOTHING: LOESS
 The smooth at the target point is the fit of a locally-weighted linear fit
(tricube weight)
CUBIC SMOOTHING SPLINES
 This smoother is the solution of the following optimization problem:
among all functions f(x) with two continuous derivatives, choose the
one that minimizes the penalized sum of squares
n
 Y  f  X 
i 1
2
i
i
Closeness to the data
   f "  x  dx
b
2
a
penalization of the curvature of f
It can be shown that the unique solution to this problem is a natural cubic
spline with knots at the unique values xi
Parameter  can be set by means of cross-validation
CUBIC SMOOTHING SPLINES
 Cubic smoothing splines with equivalent df=5 and 10
 Gaussian Linear Model
:
:
IE[Y]=o+1X1+2X2
IE[Y]=S1(X1)+S2(X2)
S1, S2 smooth functions of predictors X1, X2, usually LOESS, SPLINE
Estimation of S1, S2 : « Backfitting Algorithm »
 PRINCIPLE OF THE BACKFITTING ALGORITHM
Y=S1(X1)+e

estimation S1*
Y-S1*(X1)=S2(X2)+e 
estimation S2*
Y-S2*(X2)=S1(X1)+e 
estimation S1**
Y-S1**(X1)=S2(X2)+e 
estimation S2**
Y-S2**(X2)=S1(X1)+e 
estimation S1***
Etc… until convergence
One efficient way to perform non-linear regression, but…
 Crucial point
2, 3 predictors at most
 Philosophy
DATA DRIVEN APPROACHES RATHER THAN
MODEL DRIVEN APPROACH
USEFUL AS EXPLORATORY TOOLS
 Approximate inference tests are possible, but full inferences are better
assessed by means of parametric models
 Extension to non-normal dependant variables
parameter of exponential family laws (Poisson, Binomial, Gamma,
Gauss…).
g[µ]==S1(X1)+S2(X2)
 Vector Generalized Additive Models (VGAM): one step beyond…
Example 1
Annual umber and maximum integrated
intensity (PDI) of hurricane tracks
over the North Atlantic
Number of Hurricanes
 Number of Hurricanes in North Atlantic ~ Poisson distribution
Factors influencing the number of hurricanes
 GAM applied to number of hurricanes (YEAR,SST,SOI,NAO)
GAM model
 Log()= o+S1(SST)+S2(SOI)
PARAMETRIC model
 “broken stick model” (with continuity constraint) in SOI, revealed by
GAM analysis
 log()
= o+SOI(1)SOI+SSTSST
= o+SOI(1)SOI+SOI(2)(SOI-K)+SSTSST
SOI<K
SOIK
 The best fit obtained for SOI value K=1
log-likelihood=-316.16, to be compared with -318.71 (linearity)
standard deviance test allows reject linearity (p value=0.02)
 Expectation  of the hurricane number is then straightforwardly
computed as a function of SOI and SST
EXPECTATION OF HURRICANE NUMBERS
OBSERVED vs EXPECTED: r=0.6
```
Related documents