* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Long-range dependence in the North Atlantic Oscillation
Survey
Document related concepts
Expectation–maximization algorithm wikipedia , lookup
Computer simulation wikipedia , lookup
Numerical weather prediction wikipedia , lookup
History of numerical weather prediction wikipedia , lookup
Generalized linear model wikipedia , lookup
Data assimilation wikipedia , lookup
Transcript
Extreme Value Modelling in Climate Science: Why do it and how it can fail! Professor David B. Stephenson U. of Exeter Aims: • • What the heck do we mean by “extreme”? Summary of statistical methods used in climate science Statistics for modelling the process rather than for just making indices • Some examples of extreme value modelling: • Problem 1: Properties of drought indices • Problem 2: Trends in extreme gridded temperatures • Problem 2: Trends in largest annual skew tides; NCAR summer colloquium, 8 June 2011 © 2011 [email protected] 1 Some wet and windy extremes Convective severe storm Hurricane Extra-tropical cyclone Polar low Extra-tropical cyclone 2 Some dry and hot extremes Drought Dust storm Dust storm Wild fire 3 All are complex multivariate spatio-temporal events! So to massively simplify, it is helpful to focus in on the time evolution of single variable related to the event e.g. wind speeds of major extratropical cyclones passing by London, losses to an insurers, etc. MARKED POINT PROCESS: random times, random marks 4 What do we mean by “extreme”? Large meteorological values NOTE! Extremeness is not a Maximum value (i.e. a local extremum) binary property of an event Exceedance above a high threshold but an ordering of a process Record breaker (time-varying threshold equal to max of previously observed Gare Montparnasse, 22 Oct 1895 values) Rare event in the tail of distribution (e.g. less than 1 in 100 years – p=0.01) Large losses (severe or high-impact) (e.g. $200 billion if hurricane hits Miami) hazard, vulnerability, and exposure Risk V (h( x, t ))e( x, t ) Stephenson, D.B. (2008): Chapter 1: Definition, diagnosis, and origin of extreme weather and climate events In Climate Extremes and Society , R. Murnane and H. Diaz (Eds), Cambridge University Press, pp 348 pp. 5 IPCC 2001 definitions X~N(0,1) Y~N(0.5,1.5) Simple extremes: “individual local weather variables exceeding critical levels on a continuous scale” Complex extremes: “severe weather associated with particular climatic phenomena, often requiring a critical combination of variables” Extreme weather event: “An extreme weather event is an event that is rare within its statistical reference distribution at a particular place. Definitions of "rare" vary, but an extreme weather event would normally be as rare or rarer than the 10th or 90th percentile.” Extreme climate event: “an average of a number of weather events over a certain period of time which is itself extreme (e.g.rainfall over a season)” px=rank(x)/(n+1) 6 How might extreme events change? Changes in location, scale, and shape all lead to big changes in the tail of the distribution. Some physical arguments exist for changes in location and scale. E.g. multiplicative change in precipitation due to increased humidity (change in scale) Scale change impacts high quantiles! Example: Normal variable 1% increase in standard deviation s shifts the 10-year return value (x0.9) by 1.28s and the 200-year return value (x0.995) by 2.58s. 7 How can we relate the tails … 8 to the bulk of the distribution? Change in scale PDF = Or … Change in shape Probability Density Function Probable Dinosaur Function?? 9 Quantile attribution Describe the changes in quantiles in terms of changes in the location, the scale, and the shape of the parent distribution: IQR X X 0.5 ( X X 0.5 ) IQR shape changes The quantile shift is the sum of: • a location effect (shift in median) • a scale effect (change in IQR) • a shape effect Ferro, C.A.T., D.B. Stephenson, and A. Hannachi, 2005: Simple non-parametric techniques for exploring changing probability distributions of weather, J. Climate, 18, 4344 4354. Beniston, M. and Stephenson, D.B. (2004): Extreme climatic events and their evolution under changing climatic conditions, Global and Planetary Change, 44, pp 1-9 10 Example: Regional Model Simulations of daily Tmax T90 ΔT90-Δm ΔT90 (2071-2100 minus 1971-2000) ΔT90-Δm-(T90-m) Δs/s Changes in location, scale and shape all important 11 Statistical methods used in climate science Extreme indices – sample statistics Basic extreme value modelling More complex EVT models GEV modelling of block maxima GPD modelling of excesses above high threshold Point process model of exceedances Inclusion of explanatory factors (e.g. trend, ENSO, etc.) Spatial pooling Max stable processes Bayesian hierarchical models + many more Other stochastic process models 12 Extreme indices are useful and easy but … They don’t always measure extreme values in the tail of the distribution! They often confound changes in rate and magnitude They strongly depend on threshold and so make model comparison difficult They say nothing about extreme behaviour for rarer extreme events at higher thresholds They generally don’t involve probability so fail to quantify uncertainty (no inferential model) More informative approach: model the extremal process using statistical models whose parameters are then sufficient to provide complete summaries of all other possible statistics (and can simulate!) See: Katz, R.W. (2010) “Statistics of Extremes in Climate Change”, Climatic Change, 100, 71-7613 Furthermore … indices are not METRICS! One should avoid the word “metric” unless the statistic has distance properties! Index, sample/descriptive statistic, or measure is a more sensible name! Oxford English Dictionary: Metric - A binary function of a topological space which gives, for any two points of the space, a value equal to the distance between them, or a value treated as analogous to distance for analysis. Properties of a metric: d(x, y) ≥ 0 d(x, y) = 0 if and only if x = y d(x, y) = d(y, x) d(x, z) ≤ d(x, y) + d(y, z) 14 Universal Poisson process for extremes N=number of points with Z>z For a large number n of independent and identically distributed values and a sufficiently high threshold z: N ~ Poisson(Λ) t=t1 t=t2 Λn e Λ Pr(N n) n! z Λ (t2 t1 ) 1 Miraculous limit theorem for tails of i.i.d. variables! 1/ 15 Probability models for maxima and excesses lim n, z Λn e Λ Pr(N n) n! z Λ (t2 t1 ) 1 1/ Pr max( Z ) z Pr N ( z ) 0 e Generalized Extreme Value (GEV) distribution Pr Z z | Z u ( z ) / (u ) 1/ z u 1 (u ) Generalized Pareto Distribution (GPD) Note: extremal properties are characterised by only three parameters (for ANY underlying distribution!) 16 Why use these probability models? Model parameters are sufficient for providing a complete threshold-independent description of extremal properties. All other statistics of the extremal process are a function of these three parameters. The models provide a rigorous probability framework for making inference about extremal behaviour. Their mathematically justifiable parametric form allows more precise inference about tail properties. Model can be used to smoothly interpolate between empirical quantiles/probabilities. Such interpolation has made efficient use of all the large values; Model can be used to extrapolate out carefully to rarer less frequently (or never!) observed events AND provide intervals for such predictions! 17 Problem 1: Do 2 drought series have similar extremal properties? Observed index n=90 Reconstructed index n=5000 Data example kindly provided by Eleanor Burke, Met Office 18 Do 2 drought series have similar extremal properties? Observed index n=90 + Reconstructed index n=5000 Data example kindly provided by Eleanor Burke, Met Office 19 Return level plots Empirical quantiles versus empirical return periods (1 i /( n 1) , y[i ] ) 1 Outlier in the extended data set? Slight kink at 2.5 in d1 20 Quantile-Quantile plot Empirical distributions similar except for the big outlier in d1 21 Modelling the excesses using GPD z u F ( z ) PrZ z | Z u 1 1 ~ ~ (u ) 1 z u f ( z ) ~ 1 ~ (u ) E (Z ) 1 Assumption s : 1. Asymptotic support lim n, u 2. Independen ce of Zi 1 / 11 / 22 Use of mean excess to find a suitable threshold u Observed d0 E(X - u | X u) u 1 Simulated d1 GPD implies linear behaviour in mean excess for u from about 0 to 1 Try fits with u=0.5 as threshold 23 Nested model approach 1 z u f ( z ) ~ 1 ~ 11 / Null model H 0 ~ ~ 0 0 Contrast model H a ~ ~ ~ X 0 1 i 0 1 X i obs data 0 Xi 1 extended data 24 Maximum Likelihood Estimates ~ ~ 0 1 0.631 (0.023) --- -0.107 (0.024) --- 1053.6 0.629 (0.023) 0.234 (0.321) -0.105 (0.024) -0.299 (0.313) 1085.2 0.599 (0.021) 0.264 (0.321) -0.042 (0.019) -0.361 (0.313) No. of params Akaike Inf. Criterion Null 2 1050.4 Contrast 4 Contrast with outlier 4 Model 0 1 Predicted upper limit u - ~0 / 0 0.5 0.631 / 0.107 6.4 Is there a statistically significant difference at 5% level? • Difference in deviance 1050.4-1053.6+2*2=0.8 • Parameter estimates 0.234/0.321=0.729 0.299/0.313=0.956 D2 D4 ~ 22 p=0.67 ˆ / ŝ ~ t n p p=0.23 ˆ / ŝ ~ t n p p=0.17 No significant difference between the exceedances at 5% level 25 Model checking: do the quantiles match? Return period 1 T Pr( Z z ) 1 Pr( Z z | Z u ) Pr( Z u ) 1 (1 F ( z )) Pr( Z u ) Return value ~ zT u T Pr( Z u ) 1 No! The null model underestimates the empirical quantiles 26 Model checking: are estimates stable? ̂ ˆ ̂ No! Constant up to u=1.7 but then trends for larger values?! 27 Model checking: uniform in time? Uniform distribution in time and exponential between events 28 Problem 2: Extremes in surface temperature Coelho, C.A.S., Ferro, C.A.T., Stephenson, D.B. and Steinskog, D.J. (2008): Methods for exploring spatial and temporal variability of extreme events in climate data, Journal of Climate, 21, pp 2072-2092 Observed surface temperatures 1870-2005 Monthly mean gridded surface temperature (HadCRUT2v) 5 degree resolution Summer months only: June July August Grid points with >50% missing values and SH are omitted. Maximum monthly temperatures 0 20 40 60 80 Maximum temperature -150 0 -100 5 10 -50 15 0 20 Celsius 50 25 100 30 150 35 40 29 Non-stationarity due to seasonality and long term trends Example: Grid point in Central Europe (12.5ºE, 47.5ºN) 2003 exceedance Excess (Ty,m – uy,m) a) 10 5 0 -5 Long term trend in mean Temperature (Celsius) 15 20 75th quantile (uy,m = 16.2ºC) 2001 2002 2003 2004 year 2005 2006 30 GPD scale and shape estimates e 0 e 1 z u Pr( Z z | Z u ) 1 log 0 1 x 0 1 Scale parameter is large over highlatitude land areas AND shows some dependence on x=ENSO. Shape parameter is mainly negative suggesting finite upper temperature. Spatial pooling has been used to get more reliable less noisey shape estimates 31 How significant is ENSO on extremes? Null hypothesis of no effect can only be rejected with confidence over tropical Pacific and Northern Continents 32 Use of covariates in models “with four parameters I can fit an elephant and with five I can make him wiggle his trunk.” - John von Neumann 33 Model can be used to estimate return periods Return period for the excess August 2003: Return period for b)August 2003 0 0 20 20 40 40 60 60 80 80 a) August 2003: Excesses above 75% threshold Excess for August 2003 -150 0 -100 1 -50 0 2 Celsius 50 100 3 150 -150 4 1 -100 5 -50 0 10 50 50 100 150 150 500 years return period of 133 years for August 2003 event over Europe 34 Spatial pooling Pool over local grid points but allow for spatial variation by including local spatial covariates to reduce bias (bias-variance tradeoff). For each grid point, estimate 5 GPD parameters by maximising the following likelihood over the 8 neighbouring grid points: Lij 1 f (y i i , j j ; i i , j j , i i , j j ) j 1 i 1 log i i , j j i0, j ix, j ( xi i xi ) iy, j ( y j j y j ) i i , j j 0 i, j No spatial pooling: Local pooling: 2 parameters from n data values 5 parameters from 9n data values Coelho et al., 2008:Methods for Exploring Spatial and Temporal Variability of Extreme Events in Climate Data, J. Climate 35 Teleconnections of extremes 0 20 40 60 80 Bivariate measure of extremal dependency: 2log Pr(Y u ) 1 log Pr(( X u ) & (Y u )) b) Chi bar (75th quantile) Central Europe Coles et al., Extremes, (1999) -150 -0.4 -100 -0.1 -50 0.1 0 50 0.4 100 150 0.7 association with extremes in subtropical Atlantic 1 36 Problem 3: Is there a time trend in extreme skew tides? 10 largest skew tides for each of n=149 years Is there a time trend in the extremes? Dots show largest values Line is linear fit to the mean of the 10 values Data example kindly provided by Tom Howard, Met Office 37 r Largest Order model lim n, z z Λ (t 2 t1 ) 1 n Λ Λe Pr (N n) n! 1 / s Pr max( Z ) z PrN ( z ) r 1 e s 0 s! r Largest Order distributi on (r ) r 1 Trend model 0 1 X 0 0 38 Maximum Likelihood Estimates 0 0 0 --- 0.147 (0.0065) 0.033 (0.025) 0.659 (0.0099) --- 0.146 (0.0068) 0.050 (0.034) -91.5 0.658 (0.014) --- 0.146 (0.0099) 0.031 (0.059) -6880.4 0.754 (0.0095) -4.57E-5 (???) 0.147 (0.0042) 0.032 (0.0025) Model No. of params AIC Null r=10 3 -6882.1 0.661 (0.0096) Null r=5 3 -2469.2 Null r=1 3 Trend r=10 4 1 • Estimates for null model are similar for r=1,5,10 • Estimates get more precise for larger r • Null model has slightly better AIC than trend model • Trend model has trouble estimating trend parameter Could either constrain shape=0 and/or pool over more data 39 Model checking: null model r=10 Model slightly underestimates largest r=1 and r=2 quantiles 40 Model checking: trend model r=10 Including a time trend does not improve the r=1 and 2 fits 41 Summary Sufficiently large values of an independent identically distributed variable can be described by a 3-parameter non-homogenous Poisson process; This leads to simple parametric forms for the distribution of maxima and r-largest values (GEV) and exceedances above a high threshold (GPD); MLE can be used to estimate the parameters (but estimates are often sensitive to individual values); Non-stationarity can be accounted forby making model parameters systematic functions of covariates; Spatial pooling can be used to obtain more precise estimates but covariates have to be included to avoid bias 42 Some outstanding questions … 1. 2. 3. 4. 5. What do extreme indices really tell us about extremes? How best to develop well-specified extreme value models that account for non-stationarity (non-identical distributions) caused by natural and climate change processes? How to deal with large sampling uncertainty due to the rarity of events and shortness of available observational records? Robust estimation in the presence of outlier events? What can imperfect climate models tell us about real world extremes? How to bias correct model errors in extremes? How to develop and test well-specified inferential frameworks for prediction and attribution of real world extremes from multi-model ensembles? 43 References Stephenson, D.B. (2008): Chapter 1: Definition, diagnosis, and origin of extreme weather and climate events, In Climate Extremes and Society , Cambridge University Press, pp 348 pp. Definitions of what we mean by extreme, rare, severe and high-impact events Ferro, C.A.T., D.B. Stephenson, and A. Hannachi, 2005: Simple non-parametric techniques for exploring changing probability distributions of weather, J. Climate, 18, 4344 4354. Attribution of changes in extremes to changes in bulk distribution Beniston, M. and Stephenson, D.B. (2004): Extreme climatic events and their evolution under changing climatic conditions, Global and Planetary Change, 44, pp 1-9 Time-varying attribution of changes in heat wave extremes to changes in bulk distribution Coelho, C.A.S., Ferro, C.A.T., Stephenson, D.B. and Steinskog, D.J. (2008): Methods for exploring spatial and temporal variability of extreme events in climate data, Journal of Climate, 21, pp 2072-2092 GPD fits to gridded data including covariates. Spatial pooling and teleconnection methods. Antoniadou, A., Besse, P., Fougeres, A.-L., Le Gall, C. and Stephenson, D.B. (2001): L Oscillation Atlantique Nord NAO: et son influence sur le climat europeen, Revue de Statistique Applique , XLIX (3), pp 39-60 One of the earliest papers to use climate covariates in EVT fits – NAO effect on CET extremes Stuart Coles, An Introduction to Statistical Modeling of Extreme Values, Springer. Excellent overview of extreme value theory. 44 There are worse things than extreme climate … e.g. extreme ironing! Thanks for your attention [email protected] 45 Tubing Boulder Creek on Sunday? Sunday noon whitewatertubing.com See me today if you are interested. 46 Proposed taxonomy of atmospheric extremes Rarity Rare weather/climate events Rare and Severe events Rare, Severe, Acute events e.g. hurricane in New England Rare and Non-Severe Events Rare, Severe Chronic events Rare, Non-severe, Acute events e.g. European blocking e.g. hurricane over the South Atlantic ocean Severity Rare, Non-severe, Chronic events e.g. Atlantic blocking Rapidity Acute: Chronic: Having a rapid onset and following a short but severe course. Lasting for a long period of time or marked by frequent recurrence Stephenson, D.B. (2008): Chapter 1: Definition, diagnosis, and origin of extreme weather and climate events, In Climate Extremes and Society , R. Murnane and H. Diaz (Eds), Cambridge University Press, pp 348 pp. 47