Download part 1

Vector Generalized Additive Models and applications to extreme value analysis (1) (2) Olivier Mestre (1,2) Météo-France, Ecole Nationale de la Météorologie, Toulouse, France Université Paul Sabatier, LSP, Toulouse, France Based on previous studies realized in collaboration with : Stéphane Hallegatte (CIRED, Météo-France) Sébastien Denvil (LMD) SMOOTHER « Smoother=tool for summarizing the trend of a response measurement Y as a function of predictors » (Hastie & Tibshirani) estimate of the trend that is less variable than Y itself  Smoothing matrix S Y*=SY The equivalent degrees of freedom (df) of the smoother S is the trace of S. Allows compare with parametric models.  Pointwise standard error bands COV(Y*)=V=S tS ² given an estimation of ², this allows approximate confidence intervals (values : ±2square root of the diagonal of V) SCATTERPLOT SMOOTHING EXAMPLE  Data: wind farm production vs numerical windspeed forecasts SMOOTHING  Problems raised by smoothers How to average the response values in each neighborhood? How large to take the neighborhoods?  Tradeoff between bias and variance of Y* SMOOTHING: POLYNOMIAL (parametric)  Linear and cubic parametric least squares fits: MODEL DRIVEN APPROACHES SMOOTHING: BIN SMOOTHER  In this example, optimum intervals are determined by means of a regression tree SMOOTHING: RUNNING LINE  Running line KERNEL SMOOTHER  Watson-Nadaraya SMOOTHING: LOESS  The smooth at the target point is the fit of a locally-weighted linear fit (tricube weight) CUBIC SMOOTHING SPLINES  This smoother is the solution of the following optimization problem: among all functions f(x) with two continuous derivatives, choose the one that minimizes the penalized sum of squares n  Y  f  X  i 1 2 i i Closeness to the data    f "  x  dx b 2 a penalization of the curvature of f It can be shown that the unique solution to this problem is a natural cubic spline with knots at the unique values xi Parameter  can be set by means of cross-validation CUBIC SMOOTHING SPLINES  Cubic smoothing splines with equivalent df=5 and 10 Additive models  Gaussian Linear Model  Gaussian Additive model : : IE[Y]=o+1X1+2X2 IE[Y]=S1(X1)+S2(X2) S1, S2 smooth functions of predictors X1, X2, usually LOESS, SPLINE Estimation of S1, S2 : « Backfitting Algorithm »  PRINCIPLE OF THE BACKFITTING ALGORITHM Y=S1(X1)+e  estimation S1* Y-S1*(X1)=S2(X2)+e  estimation S2* Y-S2*(X2)=S1(X1)+e  estimation S1** Y-S1**(X1)=S2(X2)+e  estimation S2** Y-S2**(X2)=S1(X1)+e  estimation S1*** Etc… until convergence Additive models  Additive models One efficient way to perform non-linear regression, but…  Crucial point ADAPTED WHEN ONLY FEW PREDICTORS 2, 3 predictors at most Additive models  Philosophy DATA DRIVEN APPROACHES RATHER THAN MODEL DRIVEN APPROACH USEFUL AS EXPLORATORY TOOLS  Approximate inference tests are possible, but full inferences are better assessed by means of parametric models Generalized Additive models (GAM)  Extension to non-normal dependant variables  Generalized additive models : additive modelling of the natural parameter of exponential family laws (Poisson, Binomial, Gamma, Gauss…). g[µ]==S1(X1)+S2(X2)  Vector Generalized Additive Models (VGAM): one step beyond… Example 1 Annual umber and maximum integrated intensity (PDI) of hurricane tracks over the North Atlantic Number of Hurricanes  Number of Hurricanes in North Atlantic ~ Poisson distribution Factors influencing the number of hurricanes  GAM applied to number of hurricanes (YEAR,SST,SOI,NAO) GAM model  Log()= o+S1(SST)+S2(SOI) PARAMETRIC model  “broken stick model” (with continuity constraint) in SOI, revealed by GAM analysis  log() = o+SOI(1)SOI+SSTSST = o+SOI(1)SOI+SOI(2)(SOI-K)+SSTSST SOI<K SOIK  The best fit obtained for SOI value K=1 log-likelihood=-316.16, to be compared with -318.71 (linearity) standard deviance test allows reject linearity (p value=0.02)  Expectation  of the hurricane number is then straightforwardly computed as a function of SOI and SST EXPECTATION OF HURRICANE NUMBERS OBSERVED vs EXPECTED: r=0.6

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download part 1