Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extreme Value Analysis FISH 558 Decision Analysis in Natural Resource Management 12/4/2013 Noble Hendrix QEDA Consulting LLC Affiliate Faculty UW SAFS Lecture Overview • Motivating examples of extreme events • Generalized Extreme Value – Statistical Development – Case Study: the white cliffs of Dover • Generalized Pareto Distribution – Statistical Development – Case Study: whale strikes in SE Alaska • Additional resources 2 Why should we care about extreme events? • They are rare by definition, so why spend much time thinking about them? • Often the consequences of the event have significant impacts to the system – mortality, colonization, episodic recruitment • We tend to focus on averages, but extremes may be more important in some situations. • We may also be interested in estimating extremes beyond what has been observed 3 0.010 0.005 0.000 density 0.015 0.020 Distribution of outcomes ||||||||||||||| |||||||||||||||||||||| |||| | | 0 50 || 100 x 150 4 0.010 0.005 0.000 density 0.015 0.020 Distribution of outcomes ||||||||||||||| |||||||||||||||||||||| |||| | | 0 50 || 100 x 150 5 0.010 0.005 0.000 density 0.015 0.020 Distribution of outcomes ||||||||||||||| |||||||||||||||||||||| |||| | | 0 50 || 100 x 150 6 0.010 0.005 0.000 density 0.015 0.020 Distribution of outcomes ||||||||||||||| |||||||||||||||||||||| |||| | | 0 50 || 100 x 150 7 0.010 0.005 0.000 density 0.015 0.020 Distribution of outcomes ||||||||||||||| |||||||||||||||||||||| |||| | | 0 50 || 100 x 150 8 Motivation 100 year floodplain 9 Motivation Surpassing the 100 year floodplain • Road and home construction based on flood frequency and intensity i.e., 100 year floodplain 10 Motivation Hurricanes 11 Financial Markets 12 Central Limit Theorem Consider sequence of iid random variables, X1, … Xn We know that sum Sn = X1 + … + Xn, when normalized lead to the CLT: Statistical Foundations 13 Generalized Extreme Value Fisher-Tippet Asymptotic Theorem Define maxima of sequence of random variables Mn = max(X1, …, Xn) For normalized maxima, there is also a nondegenerate distribution H(x), which is a GEV distribution 14 Generalized Extreme Value Cumulative Density Function u – location s – scale v - shape 15 Generalized Extreme Value Variants of the GEV Shape parameter v defines several distributions: Gumbel: v = 0 Weibull: v < 0 Fréchet: v > 0 16 Generalized Extreme Value Shapes of GEV 0.1 0.6 shape = 0.0 shape = 0.5 shape = -0.5 0.0 Fréchet 0.4 0.2 Probability Gumbel 0.2 0.3 0.8 0.4 Weibull 0.0 Density Distribution function 1.0 Density function 0 1 2 3 x 4 5 0 1 2 3 4 5 x 17 Generalized Extreme Value Applicability Almost all common continuous distributions converge on H(x) for some value of v • Weibull – beta • Gumbel – normal, lognormal, hyperbolic, gamma, chi-squared • Fréchet – Pareto, inverse gamma, Student t, loggamma 18 Generalized Extreme Value Minima What about minima? min(X1, …, Xn) = - max(-X1, … ,-Xn) If H(x) is the limiting distribution for maxima, then 1 – H(-x) is the limiting distribution for minima, so can also be handled 19 Generalized Extreme Value Estimation Obtain data from an unknown distribution F • Let’s assume that there is an extreme value distribution Hv for some value of v • The true distribution of the n-block maximum Mn can be approximated for large enough n with a GEV distribution H(x) • Fit model to repeated observations of an nblock maximum, thus m blocks of size n 20 Generalized Extreme Value Example - Data Annual sea level height at Dover, Britain between 1912 and 1992 4.0 3.6 3.2 Sea Level (m) 4.4 Dover, Britain 1920 1940 1960 Year 1980 21 Generalized Extreme Value Example - Data Annual sea level height at Dover, Britain between 1912 and 1992 1.0 0.5 0.0 Density 1.5 Dover, Britain 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 Sea Level (m) 22 Generalized Extreme Value R package evd > > > > require(evd) data(sealevel) sl.no<-na.omit(sealevel[,1]) fgev(sl.no) Call: fgev(x = sl.no) Deviance: -5.022368 Estimates loc 3.59252 scale 0.20195 Standard Errors loc scale 0.02642 0.01874 shape -0.02107 shape 0.07730 23 Generalized Extreme Value Diagnostics Empirical 0.4 0.0 Model 3.5 4.0 4.5 5.0 Quantile Plot 0.8 Probability Plot 0.0 0.2 0.4 0.6 Empirical 0.8 1.0 3.4 3.6 3.8 4.0 4.2 4.4 Model 24 Return Level Plot Return level – “how long to wait on average until see another event equal to or more extreme” Return Level Plot 3.5 4.0 4.5 5.0 Return Level Generalized Extreme Value 0.2 1.0 5.0 20.0 Return Period 100.0 If H is the distribution of the n-block maximum, the k return level is the 1 – 1/k quantile of H 25 Generalized Extreme Value 4 0 -4 -2 log likelihood 2 0 -2 log likelihood 0 -2 -6 -4 -4 -8 -6 -6 log likelihood 2 2 4 4 Profile likelihood of parameters 3.50 3.55 3.60 3.65 location (u) 0.16 0.20 0.24 scale (s) 0.28 -0.2 0.0 0.2 shape (v) 26 Generalized Extreme Value Limitations • Limitations of the GEV: – Used for block maxima, e.g., annual precipitation, annual flow, – Only 1 exceedance per block – May ignore some important observations, – Some go so far as to say it is a wasteful method! (McNeil et al. 2005 Quantitative Risk Management, Princeton) 27 Generalized Pareto Distribution GEV has largely been surpassed by another method for extremes over a threshold Pickands (1975) developed a model for excesses y over threshold a Pickands 1975 Annals of Stats 3:119 28 Generalized Pareto Distribution a – threshold b – scale v - shape 29 Generalized Pareto Distribution Shapes of GPD Distribution function 0.8 0.6 0.4 shape = 0.0 shape = 0.5 shape = -0.5 0.2 Probability 0.6 0.4 0.2 Positive shape = limitless loss 0.0 Density 0.8 1.0 Density function 0 1 2 x 3 4 0 1 2 3 4 x 30 Generalized Pareto Distribution Applicability For any continuous distributions that converge on H(x) for some value of v, which was most of the continuous distributions of interest The same distributions will converge on G(x) as an excess distribution as the threshold a is raised 31 Generalized Pareto Distribution Estimation Obtain data from an unknown distribution F Calculate Yj = Xj – a for Na that exceed threshold a maximize log-likelihood: 32 Generalized Pareto Distribution Threshold Estimation Have an interesting problem: • Need a value of threshold a that must be high enough to satisfy the theoretical assumptions • Need enough data above the threshold a so that the parameters are well estimated • Use a sample mean residual life plot to help identify a reasonable threshold value a 33 Generalized Pareto Distribution Sample Mean Residual Life Plot Let Y = X – a0. At threshold a0, if Y is GPD with parameters b and v then E(Y) = b/(1 – v), v < 1 This is true for all thresholds ai > a0, but the scale parameter bi must be appropriate to the threshold ai E(X-ai| X > ai) = (bi + v*ai)/(1-v), Thus E(X - a| X > a) is a linear function of a where GPD appropriate, so can plot E(x-ai) (where x are our observed data) versus ai. This is the sample mean residual life plot, and confidence intervals added by assuming E(x-a) are approximately normally distributed 34 Generalized Pareto Distribution Example - Data Quantifying strike rates of whales in southeast Alaska 35 Generalized Pareto Distribution Distances to Whales 80 60 40 20 0 Whale distance metric 100 Minimum distances (i.e., D < 0) are where losses occur, so transform distance D into a positive loss metric, where value of 100 equates to D = 0 0 200 400 Observation number 600 800 36 Generalized Pareto Distribution 0.010 0.005 0.000 Density 0.015 Whale Distance Metric 0 20 40 60 80 100 Whale distance metric 37 Generalized Pareto Distribution Threshold determination 30 20 10 0 Mean Excess 40 Mean Residual Life Plot 50 60 70 80 90 • Looking for discontinuities in the mean excess, E(x-ai), at different threshold values ai • Identified value of 70 as the threshold (equates to a distance of 300m between whales and ships) Threshold 38 Generalized Pareto Distribution 100 60 library(POT) mrlplot(w.metric, xlim = c(50,90) ) tcplot(w.metric, u.range = c(50, 90) ) 20 Modified Scale Threshold determination 50 60 70 80 90 -0.6 Discontinuity in scale and shape estimates when threshold a > 70 -1.0 Shape -0.2 Threshold Mean residual life plot (previous slide) indicates a = 70 50 60 70 80 Threshold 90 39 Generalized Pareto Distribution Estimation > fitgpd(w.metric, thresh = 70, est = "mle") Estimator: MLE Deviance: 974.4418 AIC: 978.4418 Standard Error Type: observed Standard Errors scale shape 1.53542 0.07452 Varying Threshold: FALSE Threshold Call: 70 Number Above: 151 Proportion Above: 0.1946 Asymptotic Variance Covariance scale shape scale 2.357530 -0.106864 shape -0.106864 0.005553 Estimates scale shape 14.8380 -0.4706 Optimization Information Convergence: successful 40 Generalized Pareto Distribution Diagnostics 0.0 0.2 0.4 0.6 Empirical 0.8 1.0 90 80 70 ------------------------------------ -------------- ------------------------ ------------------------------------- ---------------------------------- ----------------------------------------------------------- ------------------------------ 100 QQ-plot Empirical 0.4 0.0 Model 0.8 Probability plot -- ------ - - - ------------ ---------------------------- --------------------------- -------------------- ---------------------------- ----------------------70 75 80 85 90 95 100 Model 41 Generalized Pareto Distribution -978 -986 -982 log likelihood -978 -982 -986 log likelihood -974 -974 Likelihood profiles 10 12 14 16 scale (b) 18 20 -0.6 -0.5 -0.4 -0.3 -0.2 shape (v) 42 Generalized Pareto Distribution 0 -2 -6 -4 a = 60 a = 70 a = 80 -10 -8 relative log likelihood -4 -6 -8 -10 -12 relative log likelihood -2 0 Likelihood profiles with different thresholds 10 15 20 scale (b) 25 30 -0.8 -0.6 -0.4 -0.2 0.0 0.2 shape (v) relative log likelihood - likelihood relative to maximum for that threshold value 43 Generalized Pareto Distribution Empirical and Estimated 0.02 0.04 Empirical GPD 0.00 Density 0.06 Density Plot 70 80 90 100 110 120 Comparison of empirical (no observed strikes) and GPD model estimates for a = 70 • Since 2000, 2 confirmed strikes • GPD provides better characterization of risk Quantile 44 Generalized Pareto Distribution Return Level 95 100 90 Return level – how many encounters where whales are less than 300m until a strike? 75 80 85 Conditional return level of approx. 500 70 Whale distance metric Return Level Plot 1 5 50 500 Absolute return level of approx. 2500 (1 in 5 encounters has an encounter < 300m) Return period (observations) 45 Summary: GEV and EVT • Generalized Extreme Value (GEV) distribution – Used for block maxima, e.g., maximum sea-level per year – Data loss due to only block maxima • Generalized Pareto Distribution (GPD) – Used for points over a threshold – All exceedances above some limit are used – Question about how to deal with selecting a threshold value 46 Additional Resources Books and Papers Coles, S. 2001. An Introduction to Statistical Modelling of Extreme Values. Springer Series in Statistics. London. McNeil, A. J., Frey, R., & Embrechts, P. 2005. Quantitative risk management: concepts, techniques, and tools. Princeton University Press. Embrechts, P. 1997. Modelling extremal events: for insurance and finance (Vol. 33). Springer. Bayesian GPD Modeling Coles, S. and L. Pericchi. 2003. Anticipating catastrophes through extreme value modeling. Applied Statistics 52(4): 405–416. Jagger. T. H. and J. B. Elsne 2004. Climatology models for extreme hurricane winds near the United States. Journal of Climate 19: 3220-3236. 47 Additional Resources Fitting models in R and BUGS A few R packages • Points over Threshold (POT) • Extreme Value Distributions (evd) • extRemes • Quantitative Risk Management (QRM) • evdbayes BUGS • OpenBUGS – GEV and GPD • WinBUGS/JAGS – GPD with 1’s trick 48