Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter V Extreme Value Theory and Frequency Analysis 5.1 INTRODUCTION TO EXTREME VALUE THEORY Most statistical methods are concerned primarily with what goes on in the center of a statistical distribution, and do not pay particular attention to the tails of a distribution, or in other words, the most extreme values at either the high or low end. Extreme event risk is present in all areas of risk management – market, credit, day to day operation, and insurance. One of the greatest challenges to a risk manager is to implement risk management tools which allow for modeling rare but damaging events, and permit the measurement of their consequences. Extreme value theory (EVT) plays a vital role in these activities. The standard mathematical approach to modeling risks uses the language of probability theory. Risks are random variables, mapping unforeseen future states of the world into values representing profits and losses. These risks may be considered individually, or seen as part of a stochastic process where present risks depend on previous risks. The potential values of a risk have a probability distribution which we will never observe exactly although past losses due to similar risks, where available, may provide partial information about that distribution. Extreme events occur when a risk takes values from the tail of its distribution. We develop a model for risk by selecting a particular probability distribution. We may have estimated this distribution through statistical analysis of empirical data. In this case EVT is a tool which attempts to provide us with the best possible estimate of the tail area of the distribution. However, even in the absence of useful historical data, EVT provides guidance on the kind of distribution we should select so that extreme risks are handled conservatively. There are two principal kinds of model for extreme values. The oldest group of models is the block maxima models; these are models for the largest observations collected from large samples of identically distributed observations. For example, if we record daily or hourly losses and profits from trading a particular instrument or Chapter V: Extreme Value Theory and Frequency Analysis group of instruments, the block maxima/minima method provides a model which may be appropriate for the quarterly or annual maximum of such values. The block maxima/minima methods are fitted with the generalized extreme value (GEV) distribution. A more modern group of models is the peaks-over-threshold (POT) models; these are models for all large observations which exceed a high threshold. The POT models are generally considered to be the most useful for practical applications, due to a number of reasons. First, by taking all exceedances over a suitably high threshold into account, they use the data more efficiently. Second, they are easily extended to situations where one wants to study how the extreme levels of a variable Y depend on some other variable X – for instance, Y may be the level of tropospheric ozone on a particular day and X a vector of meteorological variables for that day. This kind of problem is almost impossible to handle through the annual maximum method. The POT methods are fitted with the generalized Pareto distribution (GPD). 5.2 GENERALIZED EXTREME VALUE DISTRIBUTION The role of the generalized extreme value (GEV) distribution in the theory of extremes is analogous to that of the normal distribution in the central limit theory for sums of random variables. The normal distribution proves to be the important limiting distribution for sample sums or averages, as is made explicit in the central limit theorem, the GEV distribution also proves to be important in the study of the limiting behavior of sample extrema. The three-parameter distribution function of the standard GEV is given by: x − µ −1 / ξ ⎧ ifξ ≠ 0, ⎪exp(−(1 + ξ σ ) ) H ξ , µ ,σ ( x) = ⎨ x−µ − ⎪ σ ) ifξ = 0, ⎩exp(−e where 1 + ξ x−µ σ (5.2.1) > 0 is such that 1 + ξx > 0 . µ and σ > 0 are known as the location and scale parameters, respectively. ξ is the all-important shape parameter which determines the nature of the tail distribution. The extreme value distribution in equation (5.2.1) is generalized in the sense that the parametric form subsumes three types of distributions which are known by other names according to the value of ξ : 103 Chapter V: Extreme Value Theory and Frequency Analysis when ξ > 0 we have the Frechet distribution with α = 1 / ξ ; when ξ < 0 we have the Weibull distribution with shape α = −1 / ξ ; when ξ = 0 we have the Gumbel distribution. The Weibull is a short-tailed distribution with a so-called finite right end point. The Gumbel and the Frechet have infinite right end points but the decay of tail of the Frechet is much slower than that of the Gumbel. Here are a few basic properties of the GEV distribution. The mean exists if ξ < 1 and the variance if ξ < 1 1 ; more generally, the k 'th moment exists if ξ < . The mean 2 k and variance are given by m1 = E ( X ) = µ + m2 = E{( X − m12 ) 2 } = σ {Γ(1 − ξ ) − 1} , ξ (ξ < 1) σ2 {Γ(1 − 2ξ ) − Γ 2 (1 − ξ )} , 2 ξ (ξ < 1 / 2) One objection to the extreme value distributions is that many processes rarely produce observations that are independent and identically distributed. However, there is an extensive theory of extreme value theory for non-IID processes. A second objection is that sometimes it is argued that alternative distributional families fit the data better – for example, in the 1970s there was a lengthy debate among hydrologists over the use of extreme value distributions as compared with those of log Pearson type III. There is no universal solution to this kind of debate. 5.2.1 The Fisher-Tippett Theorem The Fisher-Tippett theorem is the fundamental result in EVT and can be considered to have the same status in EVT as the central limit theorem has in the study of sums. The theorem describes the limiting behavior of appropriately normalized sample maxima. Suppose X 1 , X 2 ,K are a sequence of independent random variables with a common distribution function F; in other words F ( x) = P ( X j ≤ x) for each j and x . The distribution function of the maximum of the first M n = max( X 1 ,K, X n ) is given by the n 'th power of F : P{M n ≤ x} = P{X 1 ≤ x, X 2 ≤ x, L , X n ≤ x} 104 n observations Chapter V: Extreme Value Theory and Frequency Analysis = P{X 1 ≤ x}P{X 2 ≤ x}L P{X n ≤ x} = F n ( x) Suppose further that we can find sequences of real numbers a n > 0 and bn such that ( M n − bn ) / a n , the sequence of normalized maxima, converge in distribution. That is P{(M n − bn ) / a n ≤ x} = P{M n ≤ a n x + bn } = F n ( a n x + bn ) → H (x ) as n → ∞ (5.2.2) for some non-degenerate distribution function H (x ) . If this condition holds we say that F is in the maximum domain of attraction of H and we write F ∈ MDA( H ) . It was shown by Fisher and Tippett (1928) that F ∈ MDA( H ) ⇒ H is of the type H ξ for some ξ Thus, if we know that suitably normalized maxima converge in distribution, then the limit distribution must be an extreme value distribution for some value of the parameters ξ , µ , and σ . The class of distributions F for which the condition (5.2.2) holds is large. Gnedenko (1943) showed that for ξ > 0 , F ∈ MDA( H ξ ) if and only if 1 − F ( x) = x −1 / ξ L( x) , for some slowly varying function L(x). This result essentially says that if the tail of the distribution function F ( x ) decays like a power function, then the distribution is in the domain of attraction of the Frechet. The class of distributions where the tail decays like a power function is quite large and includes the Pareto, Burr, loggamma, Cauchy and t-distributions as well as various mixture models. These are all so-called heavy tailed distributions. Distributions in the maximum domain of attraction of the Gumbel MDA( H 0 ) include the normal, exponential, gamma and lognormal distributions. The lognormal distribution has a moderately heavy tail and has historically been a popular model for loss severity distributions; however it is not as heavy-tailed as the distributions in MDA( H ξ ) for ξ < 0 . Distributions in the domain of attraction of the Weibull ( H ξ for ξ < 0 ) are short tailed distributions such as the uniform and beta distributions. 105 Chapter V: Extreme Value Theory and Frequency Analysis The Fisher-Tippett theorem suggests the fitting of the GEV to data on sample maxima, when such data can be collected. There is much literature on this topic particularly in hydrology where the so-called annual maxima method has a long history. 5.2.2 Fitting the GEV Distribution The GEV distribution can be fitted using various methods – probability weighted moments, maximum likelihood, Bayesian methods. The latter two require numerical computation, and one disadvantage is that the computations are not easily performed with standard statistical packages. However, there are some programs available which are specifically tailored to extreme values. In implementing maximum likelihood, it is assumed that the block size is quite large so that, regardless of whether the underlying data are dependent or not, the block maxima observations can be taken to be independent. For the GEV, the density p ( x; µ , σ , ξ ) is obtained by differentiating (5.2.1) w.r.t. x . The likelihood based on observations Y1 , Y2 ,L, Yn is N ∏ p(Y ; µ , σ , ξ ) i i =1 and so the log likelihood is given by Y −µ⎞ Y −µ⎞ ⎛1 ⎞ ⎛ ⎛ lY (µ , σ , ξ ) = − N log σ − ⎜⎜ + 1⎟⎟∑ log⎜1 + ξ i ⎟ ⎟ − ∑ ⎜1 + ξ i σ ⎠ i ⎝ σ ⎠ ⎝ ⎠ i ⎝ξ −1 / ξ (5.2.3) which must be maximized subject to the parameter constraints that σ > 0 and 1+ ξ Yi − µ σ > 0 for all i . While this represents an irregular likelihood problem, due to the dependence of the parameter space on the values of the data, the consistency and asymptotic efficiency of the resulting MLEs can be established for the case when 1 2 ξ >− . In determining the number and size of blocks ( n and m respectively) a trade-off necessarily takes place: roughly speaking, a large value of m leads to a more accurate 106 Chapter V: Extreme Value Theory and Frequency Analysis approximation of the block maxima distribution by a GEV distribution and a low bias in the parameter estimates; a large value of n gives more block maxima data for the ML estimation and leads to a low variance in the parameter estimates. Notes also that, in the case of dependent data, somewhat larger block sizes than are used in the IID case may be advisable; dependence generally has the effect that convergence to the GEV distribution is slower, since the effective sample size is mθ , which is smaller than m . The following exploratory analyses can be done while fitting a GEV or GPD model to a set of given data: ¾ Time series plot ¾ Autocorrelation plot ¾ Histogram on the log scale ¾ Quantile-Quantile plot against the exponential distribution ¾ Gumbel plot (Mondal, 2006d, p. 338) ¾ Sample mean excess plot (also called the mean residual life plot in survival data) ¾ Plot of empirical distribution function ¾ Quantile-Quantile plot of residuals of a fitted model Example 5.2.1 The highest water level during the period from 1 March to 15 May of different years at the Sunamganj station of the Surma river in Bangladesh is given below. Fit a GEV model to the data. Year 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 WL (m) 8.354 7.615 7.442 7.428 7.308 7.284 7.250 7.075 7.056 7.056 Year 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 WL (m) 7.044 6.874 6.750 6.650 6.614 6.590 6.530 6.214 6.144 6.134 Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 WL (m) 6.082 6.045 5.956 5.188 5.090 4.826 4.040 2.882 2.806 8.080 A Q-Q plot against the exponential distribution is shown below. The plot shows a convex departure from a straight line. This is a sign of thin-tailed behavior. 107 2 0 1 Exponential Quantiles 3 4 Chapter V: Extreme Value Theory and Frequency Analysis 3 4 5 6 7 8 Ordered Data Figure 5.2.1 Q-Q plot of the water level data against the exponential distribution Figure 5.2.2 shows the sample mean excess plot. The slope of the fitted points is 2 1 Mean Excess 3 downward. This again is a sign of thin-tailed behavior. 3 4 5 6 7 Threshold Figure 5.2.2 The sample mean excess plot for the water level data at Sunamganj A GEV model was fitted to the data. The parameters were: µˆ = 6.094991 , σˆ = 1.41705 and ξˆ = −0.5997389 . Since the shape parameter is negative, the distribution of the data is of Weibull-type (thin-tailed). The period of March-May is the dry season in Bangladesh, so the maxima are from low flow periods. The Weibulltype distribution is appropriate for low flows. The Q-Q plot of the residuals of the fitted GEV model is shown in Figure 5.2.3. The plot is roughly a straight line of unit slope passing through the origin. So the model can be considered to be a reasonably 108 Chapter V: Extreme Value Theory and Frequency Analysis good fit of the data. It is to be noted here that other distributions, such as Gumbel, Lognormal, etc., were tried for the data set, however, the GEV distribution was found to fit the data better. The estimated water levels corresponding to different return periods are given below: WL (m) 6.56 7.50 7.85 8.06 8.23 8.31 2 0 1 Exponential Quantiles 3 4 Return period (years) 2 5 10 20 50 100 0 1 2 3 4 Ordered Data Figure 5.2.3 5.3 The Q-Q plot of the fitted GEV residuals GENERALIZED PARETO DISTRIBUTION The GPD is a two-parameter distribution with distribution function ⎧1 − (1 + ξx / β ) −1 / ξ ξ ≠ 0 Gξ , β ( x) = ⎨ ξ =0 ⎩1 − exp(− x / β ) where β > 0 , and x ≥ 0 when ξ ≥ 0 and 0 ≤ x ≤ − β / ξ when ξ < 0 . The parameters ξ and β are referred to respectively as the shape and scale parameters. The GPD is generalized in the sense that it subsumes certain other distributions under a common parametric form. If ξ > 0 then Gξ , β is a reparametrized version of the ordinary Pareto distribution (with α = 1 / ξ and κ = β / ξ ), which has a long history in actuarial mathematics as a model for large losses; ξ = 0 corresponds to the exponential distribution, i.e. a distribution with a medium-sized tail; and ξ < 0 is a 109 Chapter V: Extreme Value Theory and Frequency Analysis short-tailed Pareto type II distribution. The mean of the GPD is defined provided ξ < 1 and is E( X ) = β 1− ξ The first case is the most relevant for risk management purposes since the GPD is heavy-tailed when ξ > 0 . Whereas the normal distribution has moments of all orders, a heavy-tailed distribution does not possess a complete set of moments. In the case of [ ] the GPD with ξ > 0 we find that E x k is infinite for k ≥ 1 / ξ . When ξ = 1 / 2 , the GPD is an infinite variance (second moment) distribution; when ξ = 1 / 4 , the GPD has an infinite fourth moment. The role of the GPD in EVT is as a natural model for the excess distribution over a high threshold. Certain types of large claims data in insurance typically suggest an infinite second moment; similarly econometricians might claim that certain market returns indicate a distribution with infinite fourth moment. The normal distribution cannot model these phenomena but the GPD is used to capture precisely this kind of behavior. 5.3.1 Modeling Excess Distributions Let X be a random variable with distribution function F . The distribution of excesses over a threshold u has distribution function Fu ( y ) = p{X − u ≤ y X > u} for 0 ≤ y < x0 − u where x0 ≤ ∞ is the right endpoint of F . The excess distribution function Fu represents the probability that a loss exceeds the threshold u by at most an amount y , given the information that it exceeds the threshold. In survival analysis the excess distribution function is more commonly known as the residual life distribution function ― it expresses the probability that, say, an electrical component which has functioned for u units of time fails in the time period [u, u + y ] . It is very useful to observe that Fu can be written in terms of the underlying F as Fu ( y ) = F ( y + u ) − F (u ) 1 − F (u ) 110 Chapter V: Extreme Value Theory and Frequency Analysis The mean excess function of a random variable X with finite mean is given by e(u ) = E ( X − u | X > u ) The mean excess function e(u ) expresses the mean of Fu as a function of u . In survival analysis, the mean excess function is known as the mean residual life function and gives the expected residual lifetime for components with different ages. Mostly we would assume our underlying F is a distribution with an infinite right endpoint, i.e. it allows the possibility of arbitrarily large losses, even if it attributes negligible probability to unreasonably large outcomes, e.g. the normal or t distributions. But it is also conceivable, in certain applications, that F could have a finite right endpoint. An example is the beta distribution on the interval [0,1] which attributes zero probability to outcomes larger than 1 and which might be used, for example, as the distribution of credit losses expressed as a proportion of exposure. 5.3.2 The Pickands-Balkema-de Haan Theorem The Pickands-Balkema-de Haan limit theorem (Balkema and de Haan, 1974; Pickands, 1975) is a key result in EVT and explains the importance of the GPD. We can find a (positive measurable function) β (u ) such that lim sup Fu ( y ) − Gξ , β ( u ) ( y ) = 0 , β (u ) = β + ξu u → x0 0 ≤ y < x − u 0 if and only if F ∈ MDA( H ξ ) . The theorem shows that under MDA conditions the generalized Pareto distribution is the limiting distribution for the distribution of excesses as the threshold tends to the right endpoint. All the common continuous distributions of statistics and actuarial science (normal, lognormal, χ 2 , t, F, gamma, exponential, uniform, beta, etc.) are in MDA( H ξ ) for some ξ , so the above theorem proves to be a very widely applicable result that essentially says that the GPD is the natural model for the unknown excess distribution above sufficiently high thresholds. 5.3.3 Fitting a GPD Model Given loss data X 1 , X 2 ,K , X n from F , a random number N u will exceed our ~ ~ ~ threshold u ; it will be convenient to re-label these data X 1 , X 2 , K , X N u . For each of 111 Chapter V: Extreme Value Theory and Frequency Analysis ~ these exceedances we calculate the amount Y j = X j − u of the excess loss. We wish to estimate the parameters ξˆ and βˆ of a GPD model by fitting this distribution to the N u excess losses. There are various ways of fitting the GPD including ML and PWM. The former method is more commonly used and easy to implement if the excess data can be assumed to be realizations of independent random variables, since the joint probability density of the observations will then be a product of marginal densities. This is the most general fitting method in statistics and it also allows us to give estimates of statistical error (standard errors) for the parameter estimates. Writing g ξ , β for the density of the GPD, the log-likelihood may be easily calculated to be Nu ln L(ξ , β ; Y1 , Y1 ,L , YN u ) = ∑ ln g ξ , β (Y j ) j =1 Yj ⎛ 1 ⎞ Nu ⎛ = − N u ln β − ⎜⎜1 + ⎟⎟∑ ⎜⎜1 + ξ β ⎝ ξ ⎠ j =1 ⎝ ⎞ ⎟⎟ ⎠ which must be maximized subject to the parameter constraints that β > 0 and 1 + ξY j / β > 0 for all j . Solving the maximization problem yields a GPD model Gξˆ, βˆ for the excess distribution Fu . Choice of the threshold is basically a compromise between choosing a sufficiently high threshold so that the asymptotic theorem can be considered to be essentially exact and choosing a sufficiently low threshold so that we have sufficient material for estimation of the parameters. 5.4 MULTIVARIATE EXTREMES The multivariate extreme value theory (MEVT) can be used to model the tails of the multivariate distributions. The dependence structure of extreme events can be studied with the MEVT. Consider the random vector = ( X 1 , K , X d ) t which represents losses of d different kinds measured at the same point in time. Assume that these losses have joint distribution F ( x1 , K, x d ) = P{X 1 ≤ x1 ,K, X d ≤ x d } and that individual losses have continuous marginal distributions Fi ( x) = P{X i ≤ x}. It has been shown by Sklar (1959) that every joint distribution can be written as F ( x1 , K , x d ) = C[ F1 ( x1 ),K , Fd ( x d )] , 112 Chapter V: Extreme Value Theory and Frequency Analysis for a unique function C that is known as the copula of F. A copula is the joint distribution of uniformly distributed random variables. If U 1 , L ,U n are U (0,1) then C (u1 ,L , u n ) = P{U 1 ≤ u1 , L,U n ≤ u n } is a copula. C is a function from [0,1]× . . . ×[0,1] into [0,1]. A copula is a function C of n variables on the unit n -cube [0,1] n with the following properties: ¾ the range of C is the unit interval [0,1] ; ¾ C (u) is zero for all u in [0,1] for at least one coordinate equals zero [for the n bivariate case, for every u, v in [0,1] , C (u ,0) = 0 = C (0, v ) ]; ¾ C (u) = u k if all coordinates of u are 1 except the k -th one; [for the bivariate case, for every u, v in [0,1] , C (u ,1) = u and C (1, v) = v ]; ¾ C is n -increasing in the sense that for every a≤b in [0,1] the volume assigned n by C to the n -box [a,b] = [a1,b1]× . . . ×[an,bn] is non-negative [for the bivariate case, for every u1 , u 2 , v1 , v2 in [0,1] such that u1 ≤ u 2 and v1 ≤ v 2 , C (u 2 , v 2 ) − C (u 2 , v1 ) − C (u1 , v2 ) + C (u1 , v1 ) ≥ 0 ];. A copula may be thought of in two equivalent ways: as a function (with some technical restrictions) that maps values in the unit hypercube to values in the unit interval; as a multivariate distribution function with standard uniform marginal distributions. The copula C does not change under (strictly) increasing transformations of the losses X 1 ,K, X d and it makes sense to interpret C as the dependence structure of X or F, as the following simple illustration in d = 2 dimensions shows. We take the marginal distributions to be standard univariate normal distributions F1 = F2 = Φ . We can then choose any copula C (i.e. any bivariate distribution with uniform marginals) and apply it to these marginals to obtain bivariate distributions with normal marginals. For one particular choice of C, which we call the Gaussian copula and denote C ρGa , we obtain the standard bivariate normal distribution with correlation ρ . The Gaussian copula does not have a simple closed form and must be written as a double integral: 113 Chapter V: Extreme Value Theory and Frequency Analysis C ρ (u1 , u 2 ) = Φ −1 ( u1 ) Φ −1 ( u 2 ) 1 Ga 2π 1 − ρ ∫ 2 −∞ ∫ exp{−[s 2 − 2 ρst + t 2 ] / 2(1 − ρ 2 )}dsdt −∞ Another interesting copula is the Gumbel copula which does have a simple closed form: [{ C βGu (v1 , v 2 ) = exp − (− log v1 )1 / β + (− log v 2 )1 / β } ], 0 < β ≤ 1 β Figure 5.4.? shows the bivariate distributions which arise when we apply the two copulas C 0Ga.7 and C 0Gu.5 to standard normal marginals. The left-hand picture is the standard bivariate normal with correlation 70%; the right-hand picture is a bivariate distribution with approximately equal correlation but the tendency to generate extreme values of X1 and X2 simultaneously. It is, in this sense, a more dangerous distribution for risk managers. On the basis of correlation, these distributions cannot be differentiated but they obviously have entirely different dependence structures. The bivariate normal has rather weak tail dependence; the normal-Gumbel distribution has pronounced tail dependence. One way of understanding MEVT is as the study of copulas which arise in the limiting multivariate distribution of component-wise block maxima. Suppose we have a family of random vectors points in time, where , ,K representing d-dimensional losses at different = ( X i1 , K , X id ) t . A simple interpretation might be that they represent daily (negative) returns for d instruments. As for the univariate discussion of block maxima, we assume that losses at different points in time are independent. This assumption simplifies the statement of the result, but can be relaxed to allow serial dependence of losses at the cost of some additional technical conditions. We define the vector of component-wise block maxima to be M = ( M 1n , K , M dn ) t where M jn = max( X 1 j , K X nj ) is the block maximum of the j-th component for a block of size n observations. Now consider the vector of normalized block maxima given by (( M 1n − b1n ) / a1n , K , ( M dn − bdn ) / a dn ) t , where ajn > 0 and bjn are normalizing sequences. If this vector converges in distribution to a non-degenerate limiting distribution then this limit must have the form 114 Chapter V: Extreme Value Theory and Frequency Analysis ⎛ ⎛ x − µd ⎛ x − µ1 ⎞ ⎟⎟, K , H ξd ⎜⎜ d C ⎜⎜ H ξ 1 ⎜⎜ 1 ⎝ σ1 ⎠ ⎝ σd ⎝ ⎞⎞ ⎟⎟ ⎟ , ⎟ ⎠⎠ for some values of the parameters ξ j , µ j , and σ j and some copula C. It must have this form because of univariate EVT. Each marginal distribution of the limiting multivariate distribution must be a GEV, as we noted earlier in this chapter. MEVT characterizes the copulas C which may arise in this limit — the so-called MEV copulas. It turns out that the limiting copulas must satisfy C (u1t , K u dt ) = C t (u1 , K , u d ) for t > 0. There is no single parametric family which contains all the MEV copulas, but certain parametric copulas are consistent with the above condition and might therefore be regarded as natural models for the dependence structure of extreme observations. In two dimensions, the Gumbel copula is an example of an MEV copula; it is moreover a versatile copula. If the parameter β is 1 then C1Gu (v1 , v 2 ) = v1v 2 and this copula models independence of the components of a random vector ( X 1 , X 2 ) t . If β ∈ (0,1) then the Gumbel copula models dependence between X1 and X2. As β decreases the dependence becomes stronger until a value β = 0 corresponds to perfect dependence of X1 and X2; this means X2 = T(X1) for some strictly increasing function T. For β < 1 the Gumbel copula shows tail dependence - the tendency of extreme values to occur together as observed in Figure 5.4.?. 5.4.1 Fitting a Tail Model Using a Copula The Gumbel copula can be used to build tail models in two dimensions as follows. Suppose two risk factors ( X 1 , X 2 ) t have an unknown joint distribution F and marginals F1 and F2 so that, for some copula C , F ( x1 , x2 ) = C[ F1 ( x1 ), F2 ( x2 )] . Assume that we have n pairs of data points from this distribution. Using the univariate peaks-over-threshold method, we model the tails of the two marginal distributions by picking high thresholds u1 and u 2 and using tail estimators to obtain Fˆi ( x) = 1 − N µi n (1 + ξ i x − ui βi 115 ˆ ) −1 / ξi , x > u i , i = 1,2 Chapter V: Extreme Value Theory and Frequency Analysis We model the dependence structure of observations exceeding these thresholds using the Gumbel copula C β̂Gu for some estimated value βˆ of the dependence parameter β . We put tail models and dependence structure together to obtain a model for the joint tail of F. ( ) Fˆ ( x1 , x 2 ) = C βGu Fˆ1 ( x1 ), F2 ( x 2 ) , ˆ x1 > u1 , x 2 > u 2 The estimate of the dependence parameter β can be determined by maximum likelihood, either in a second stage after the parameters of the tail estimators have been estimated or in a single stage estimation procedure where all parameters are estimated together. 5.5 INTRODUCTION TO FREQUENCY ANALYSIS The magnitude of an extreme event is inversely related to its frequency of occurrence, very severe events occurring less frequently than more moderate events. The objective of frequency analysis is to relate the magnitude of extreme events to their frequency of occurrence through the use of probability distributions. Frequency analysis is defined as the investigation of population sample data to estimate recurrence or probabilities of magnitudes. It is one of the earliest and most frequent uses of statistics in hydrology and natural sciences. Early applications of frequency analysis were largely in the area of flood flow estimation. Today nearly every phase of hydrology and natural sciences is subjected to frequency analyses. Two methods of frequency analysis are described: one is a straightforward plotting technique to obtain the cumulative distribution and the other uses the frequency factors. The cumulative distribution function provides a rapid means of determining the probability of an event equal to or less than some specified quantity. The inverse is used to obtain the recurrence intervals. The analytical frequency analysis is a simplified technique based on frequency factors depending on the distributional assumption that is made and of the mean, variance and for some distributions the coefficient of skew of the data. Over the years and continuing today there have been volumes of material written on the best probability distribution to use in various situations. One cannot, in most instances, analytically determine which probability distribution should be used. 116 Chapter V: Extreme Value Theory and Frequency Analysis Certain limit theorems such as the Central Limit Theorem and extreme value theorems might provide guidance. One should also evaluate the experience that has been accumulated with the various distributions and how well they describe the phenomena of interest. Certain properties of the distributions can be used in screening distributions for possible application in a particular situation. For example, the range of the distribution, the general shape of the distribution and the skewness of the distribution many times indicate that a particular distribution may or may not be applicable in a given situation. When two or more distributions appear to describe a given set of data equally well, the distribution that has been traditionally used should be selected unless there are contrary overriding reasons for selecting another distribution. However, if a traditionally used distribution is inferior, its use should not be continued just because "that's the way it's always been done". As a general rule, frequency analysis is cautioned when working with records shorter than 10 years and in estimating frequencies of expected events greater than twice the record length. 5.6 PROBABILITY PLOTTING A probability plot is a plot of a magnitude versus a probability. Determining the probability to assign a data point is commonly referred to as determining the plotting position. If one is dealing with a population, determining the plotting position is merely a matter of determining the fraction of the data values less (greater) than or equal to the value in question. Thus smallest (largest) population value would plot at 0 and the largest (smallest) population value would plot at 1.00. Assigning plotting positions of 0 and 1 should be avoided for sample data unless one has additional information on the population limits. Plotting position may be expressed as a probability from 0 to 1 or a percent from 0 to 100. Which method is being used should be clear from the context. In some discussions of probability plotting, especially in hydrologic literature, the probability scale is used to denote prob ( X ≥ x ) or 1 − Px ( x) . One can always transform the probability scale 1 − Px ( x) to Px ( x) or even Tx ( x) if desired. Probability plotting of hydrologic data requires that individual observations or data points be independent of each other and that the sample data be representative of the 117 Chapter V: Extreme Value Theory and Frequency Analysis population (unbiased). There are four common types of sample data: Complete duration series, annual series, partial duration series, and extreme value series. The complete duration series consists of all available data. An example would be all the available daily flow data for a stream. This particular data set would most likely not have independent observations. The annual series consists of one value per year such as the maximum peak flow of each year. The partial duration series consists of all values above (below) a certain base. All values above a threshold would represent a partial duration series. This series may have more or less values in it than the annual series. For example, there could have some years that would not have contributed any data to a partial duration series with a certain base for the data. Frequently the annual series and the partial duration series are combined so that the largest (smallest) annual value plus all independent values above (below) some base are used. The extreme value series consists of the largest (smallest) observation in a given time interval. The annual series is a special case of the extreme value series with the time interval being one year. Regardless of the type of sample data used, the plotting position can be determined in the same manner. Gumbel (1958) states the following criteria for plotting position relationships: 1. The plotting position must be such that all observations can be plotted. 2. The plotting position should lie between the observed frequencies of ( m − 1) / n and m / n where m is the rank of the observation beginning with m = 1 for the largest (smallest) value and n is the number of years of record (if applicable) or the number of observations. 3. The return period of a value equal to or larger than the largest observation and the return period of a value equal to or smaller than the smallest observation should converge toward n . 4. The observations should be equally spaced on the frequency scale. 5. The plotting position should have an initiative meaning, be analytically simple, and be easy to use. Several plotting position relationships are presented in Table 5.1. Benson (1962a) in a comparative study of several plotting position relationships found on the basis of 118 Chapter V: Extreme Value Theory and Frequency Analysis theoretical sampling from extreme value and normal distributions that the Weibull relationship provided estimates that were consistent with experience. The Weibull ploting position formula meets all 5 of the above criteria. (1) All of the observations can be plotted since the plotting positions range from 1 /( n + 1) which is greater than zero to n /( n + 1) which is less than 1. Probability paper for many distributions does not contain the points zero and one. (2) The relationship m /( n + 1) lies between ( m − 1) / n and m / n for all values of m and n . (3) The return period of the largest value is ( n + 1) / 1 which approaches n as n gets large. (4) The difference between the plotting position of the (m + 1) st and m th value is 1 /( n + 1) for all values of m and n . (5) The fact that condition 3 is met plus the simplicity of the Weibull relationship fulfills condition 5. Cunnane (1978) studied the various available plotting position methods using criteria of unbiasedness and minimum variance. An unbiased plotting method is one that, if used for plotting a large number of equally sized samples, will result in the average of the plotting points for each value of m falling on the theoretical distribution line. A minimum variance plotting method is one that concluded that the Weibull plotting formula is biased and plots the largest values of a sample at too small a return period. For normally distributed data, he found that the Blom (1958) plotting position ( c = 3/8) is closest to being unbiased, while for data distributed according to the Extreme Value Type I distribution, the Gringorten (1963) formula ( c = 0.44) is the best. For the log-Pearson Type III distribution, the optimal value of c depends on the value of the coefficient of skewness, being larger than 3/8 when the data are positively skewed and smaller than 3/8 when the data are negatively skewed. Table 5.6.1 Plotting position relationships Name Source Relationship California California (1923) m/n Hazen Hazen (1930) ( 2m − 1) / 2n Weibull Weibull (1939) m /( n + 1) Blom Blom (1958) ( m − 3 / 8) /( n + 0.25) Gringorten Gringorten (1963) ( m − a ) /( n + 1 − 2a ) # 119 Chapter V: Extreme Value Theory and Frequency Analysis Cunnane * Cunnane (1978) ( m − c ) /( n − 2c + 1) * Benard's median rank ( m − 0.3) /( n + 0.4) M.S. Excel ( m − 1) /( n − 1) c is 3/8 for normal distribution and 0.4 if the applicable distribution is unknown; # most plotting position formulas do not account for the sample size or length of record. One formula that does account for sample size was given by Gringorten. In general, c = 0.4 is recommended in the Gringorten equation. If distribution is approximately normal, c = 3/8 is used. A value of c = 0.44 is used if the data follows a Gumbel (EV1) distribution. It should be noted that all of the relationships give similar values near the centre of the distribution, but may vary considerably in the tails. Predicting extreme events depends on the tails of the distribution so care must be exercised. The quantity 1 − Px ( x) represents the probability of an event with a magnitude equal to or greater than the event in question. When the data are ranked from the largest ( m =1) to the smallest ( m = n ), the plotting positions determined from the Table 5.6.1 correspond to 1 − Px ( x) . If the data are ranked from the smallest ( m =1) to the largest ( m = n ), the plotting position formulas are still valid; however, the plotting position now corresponds to the probability of an event to or small than the event in question which is PX (x) . Probability paper may contain scales of Px ( x) , 1 − Px ( x) , Tx ( x) or a combination of these. The following steps are necessary for a probability plotting of a given set of data: ¾ Rank the data from the largest (smallest) to the smallest (largest) value. If two or more observations have the same value, several procedures can be used for assigning a plotting position. ¾ Calculate the plotting position of each data point following Table 5.6.1. ¾ Select the type of probability paper to be used. ¾ Plot the observations on the probability paper. ¾ A theoretical normal line is drawn. For normal distribution, the line should pass through the mean plus one standard deviation at 84.1% and the mean minus one standard deviation at 15.9%. If a set of data plots as a straight line on a probability paper, the data can be said to be distributed as the distribution corresponding to that probability paper. Since it would 120 Chapter V: Extreme Value Theory and Frequency Analysis be rare for a set of data to plot exactly on a line, a decision must be made as to whether or not the deviations from the line are random deviations or represent true deviations indicating the data do not follow the given probability distribution. Visual comparison of the observed and theoretical frequency histograms and the observed and theoretical cumulative frequency curves in the form probability plots can help determine if a set of data follows a certain distribution. Also, some statistical tests are also available. When probability plots are made and a line drawn through the data, the tendency to extrapolate the data to high return periods is great. The distance on the probability paper from a return period of 20 years to a return period of 200 years is not very much; however, if the data do not truly follow the assumed distribution with population parameters equal to the sample statistics (i.e., µ = X and σ 2 = s 2 for the normal), the error in this extrapolation can be quite large. This fact has already been referred to when it was stated that the estimation of probabilities in the tails of distributions is very sensitive to distributional assumptions. Since one of the usual purposes of probability is to estimate events with longer return periods, Blench (1959) and Dalrymple (1960) have criticized the blind use of analytical flood frequency methods because of this tendency toward extrapolation. 5.7 ANALYTICAL FREQUENCY ANALYSIS Calculating the magnitudes of extreme events by the method outlined in example 12.2.1 requires that the probability distribution function be invertible, that is, given a [ ] value for T or F( xT ) = T /(T − 1) , the corresponding value of xT can be determined. Some probability distribution functions are not readily invertible, including the Normal and Pearson type III distributions, and an alternative method of calculating the magnitudes of extreme events is required for these distributions. The magnitude xT of a hydrologic event may be represented as the mean µ plus the departure ∆xT of the variate from the mean: xT = µ + ∆xT 121 (5.7.1) Chapter V: Extreme Value Theory and Frequency Analysis The departure may be taken as equal to the product of the standard deviation σ and a frequency factor K T ; that is, ∆xT = K T σ . The departure ∆xT and the frequency factor K T are functions of the return period and the type of probability distribution to be used in the analysis. Equation (5.7.1) may be therefore expressed as xT = µ + K T σ (5.7.2) which may be approximated by xT = x + K T s (5.7.3) In the event that the variable analyzed is y = log x , then the same method is applied to the statistics for the logarithms of the data, using yT = y + K T s y (5.7.4) and the required value of xT is found by taking the antilog of yT . The frequency factor equation (5.7.2) was proposed by Chow (1951), and it is applicable to many probability distributions used in hydrologic frequency analysis. For a given distribution, a K − T relationship can be determined between the frequency factor and the corresponding return period. This relationship can be expressed in mathematical terms or by a table. Frequency analysis begins with the calculation of the statistical parameters required for a proposed probability distribution by the method of moments from the given data. For a given return period, the frequency factor can be determined from the K − T relationship for the proposed distribution, and the magnitude xT computed by equation (5.7.3), or (5.7.4). The theoretical K − T relationships for several probability distributions commonly used in frequency analysis are now described. 5.7.1 Normal and Lognormal Distributions The frequency factor can be expressed from equation (5.7.2) as KT = xT − µ σ 122 (5.7.5) Chapter V: Extreme Value Theory and Frequency Analysis This is the same as the standard normal variable z . The value of z corresponding to an exceedence probability of p ( p = 1 / T ) can be calculated by finding the value of an intermediate variable w : ⎡ ⎛ 1 ⎞⎤ w = ⎢ln⎜⎜ 2 ⎟⎟⎥ ⎣ ⎝ p ⎠⎦ 1/ 2 (0 < p ≤ 0.5) (5.7.6) Then calculating z using the approximation z = w− 2.515517 + 0.802853w + 0.010328w 2 1 + 1.432788w + 0.189269w 2 + 0.0013.8w 3 (5.7.7) When p > 0.5 , 1 − p is substituted for p in (5.7.6) and the value of z computed by (5.7.7) is given a negative sign. The error in this formula is less than 0.00045 in z (Abramowitz and Stegun, 1965). The frequency factor K T for the normal distribution is equal to z , as mentioned above. For the lognormal distribution, the same procedure applies except that it is applied to the logarithms of the variables, and their mean and standard deviation are used in equation (5.7.4). Example 5.7.1: Calculate the frequency factor for the normal distribution for an event with a return period of 50 years. (Chow et al., 1988, p. 390) Solution: For T = 50 years, p = 1 / 50 = 0.02 . From equation (5.7.6) ⎡ ⎛ 1 ⎞⎤ w = ⎢ln⎜⎜ 2 ⎟⎟⎥ ⎣ ⎝ p ⎠⎦ 1/ 2 ⎡ ⎛ 1 ⎞⎤ = ⎢ln⎜ 2 ⎟⎥ ⎣ ⎝ 0.02 ⎠⎦ 1/ 2 = 2.7971 Then substituting w into (5.7.7) KT = z = w− 2.515517 + 0.802853 × 2.7971 + 0.010328 × (2.7971) 2 1 + 1.432788 × 2.7971 + 0.189269 × (2.7971) 2 + 0.0013.8 × (2.7971) 3 =2.054 123 Chapter V: Extreme Value Theory and Frequency Analysis 5.7.2 Extreme Value Distributions For the Extreme Value Type I distribution, Chow (1953) derived the expression KT = − ⎡ ⎛ T ⎞⎤ ⎫ 6⎧ ⎟⎥ ⎬ ⎨0.5772 + ln ⎢ln⎜ π ⎩ ⎣ ⎝ T − 1 ⎠⎦ ⎭ (5.7.8) To express T in terms of K T , the above equation can be written as T= 1 ⎧⎪ ⎡ ⎛ πK 1 − exp⎨− exp ⎢− ⎜⎜ γ + T ⎪⎩ 6 ⎣ ⎝ (5.7.9) ⎞⎤ ⎫⎪ ⎟⎟⎥ ⎬ ⎠⎦ ⎪⎭ where γ = 0.5772 . When xT = µ , equation (5.7.5) gives K T = 0 and equation (5.7.8) gives T = 2.33 years. This is the return period of the mean of the Extreme Value Type I distribution. For the Extreme Value Type II distribution, the logarithm of the variate follows the EVI distribution. For this case, (5.7.4) is used to calculate yT , using the value of K T from (5.7.8). Example 5.7.2 Determine the 5-year return period rainfall for Chicago using the frequency factor method and the annual maximum rainfall data given below. (Chow et al., 1988, p. 391) Year 1913 1914 1915 1916 1917 1918 1920 1921 1922 1923 1924 1925 Rainfall (inch) 0.49 0.66 0.36 0.58 0.41 0.47 0.74 0.53 0.76 0.57 0.80 0.66 Year 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 Rainfall (inch) 0.68 0.61 0.88 0.49 0.33 0.96 0.94 0.80 0.62 0.71 1.11 0.64 Year 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 Rainfall (inch) 0.52 0.64 0.34 0.70 0.57 0.92 0.66 0.65 0.63 0.60 Solution: The mean and standard deviation of annual maximum rainfalls at Chicago are x = 0.649 inch and s = 0.177 inch, respectively. For T = 5 , equation (5.7.8) gives KT = − =− ⎡ ⎛ T ⎞⎤ ⎫ 6⎧ ⎟⎥ ⎬ ⎨0.5772 + ln ⎢ln⎜ π ⎩ ⎣ ⎝ T − 1 ⎠⎦ ⎭ ⎡ ⎛ 5 ⎞⎤ ⎫ 6⎧ ⎟⎥ ⎬ ⎨0.5772 + ln ⎢ln⎜ π ⎩ ⎣ ⎝ 5 − 1 ⎠⎦ ⎭ 124 Chapter V: Extreme Value Theory and Frequency Analysis =0.719 By (5.7.3) xT = x + K T s =0.649+0.719 × 0.177 =0.78 in 5.7.3 Log-Pearson Type III Distribution For this distribution, the first step is to take the logarithms of the hydrologic data, y = log x . Usually logarithms to base 10 are used. The mean y , standard deviation s y , and coefficient of skewness C s are calculated for the logarithms of the data. The frequency factor depends on the return period T and the coefficient of skewness C s . When C s = 0 , the frequency factor is equal to the standard normal variable z . When C s ≠ 0 , K T is approximated by Kite (1977) as 1 1 K T = z + ( z 2 − 1)k + ( z 3 − 6 z )k 2 ( z 2 − 1)k 3 + zk 4 + k 5 3 3 (5.7.10) where k = C s / 6 . The value of z for a given return period can be calculated by the procedure used in Example 5.7.1. In standard text books, Tables are given for the values of the frequency factor of the log-Pearson Type III distribution for various values of the return period and coefficient of skewness. Example 5.7.3 Calculate the 5- and 50-year return period annual maximum discharges of the Gaudalupe River near Victoria, Texas, using the lognormal and logpearson Type III distributions. The data in cfs from 1935 to 1978 are given below. (Chow et al., 1988, p. 393) Year 0 1 2 3 4 5 6 7 8 9 1930 38500 179000 17200 25400 4940 1940 55900 58000 56000 7710 12300 22000 17900 46000 6970 20600 1950 13300 12300 28400 11600 8560 4950 1730 25300 58300 10100 125 1960 23700 55800 10800 4100 5720 15000 9790 70000 44300 15200 1970 9190 9740 58500 33100 25200 30200 14100 54500 12700 Chapter V: Extreme Value Theory and Frequency Analysis Solution: The logarithms of the discharge values are taken and their statistics calculated: y = 4.2743 , s x = 0.4027 , C s = −0.0696 . Lognormal distribution: The frequency factor can be obtained from equation (5.7.7). For T = 50 years, K T was computed in Example 5.7.1 as K 50 = 2.054 . By (5.7.4) yT = y + K T s y y 50 = 4.2743 + 2.054 × 0.4027 =5.101 Then x50 = (10) 5.101 =126,300 cfs Similarly, K 5 = 0.842 , y 5 = 4.2743 + 0.842 × 0.4027 = 4.6134 , and x5 = (10) 4.6134 = 41,060 cfs. Log-Pearson Type III distribution: For C s = −0.0696 , the value of K 50 is obtained by interpolation from a standard table or by equation (5.7.10). By interpolation with T=50 yrs: K 50 = 2.054 + (2.00 − 2.054) (−0.0696 − 0) = 2.016 (−0.1 − 0) So y 50 = y + K 50 s y = 4.2743 + 2.016 × 0.4027 = 5.0863 and x50 = (10) 5.0863 = 121.99. cfs. By a similar calculation, K 5 = 0.845 , y5 = 4.6146 , and x5 = 41,170 cfs. The results for estimated annual maximum discharges are: Return Period Lognormal 5 years 50 years 41,060 126,300 41,170 121,990 (C s = 0) Log-Pearson Type III (C s = −0.07) It can be seen that the effect of including the small negative coefficient of skewness in the calculations is to alter slightly the estimated flow with that effect being more pronounced at T = 50 years than at T = 5 years. Another feature of the results is that the 50-year return period estimates are about three times as large as the 5-year return 126 Chapter V: Extreme Value Theory and Frequency Analysis period estimates; for this example, the increase in the estimated flood discharges is less than proportional to the increase in return period. 5.7.4 Treatment of Zeros Most hydrologic variables are bounded on the left by zero. A zero in a set of data that is being logarithmically transformed requires special handling. One solution is to add a small constant to all of the observations. Another method is to analyze the nonzero values and then adjust the relation to the full period of record. This method biases the results as the zero values are essentially ignored. A third and theoretically more sound method would be to use the theorem of total probability: prob ( X > x) = prob( X ≥ x X = 0)prob( X = 0) + prob( X ≥ x X ≠ 0)prob( X ≠ 0) Since prob ( X ≥ x X = 0) is zero, the relationship reduces to prob( X > x) = prob( X ≠ 0)prob( X ≥ x X ≠ 0) In this relationship prob ( X ≠ 0) would be required by the fraction of non-zero values and prob ( X ≥ x X ≠ 0) would be estimated by a standard analysis of the non-zero values with the sample size taken to be equal to the number of non-zero values. This relation can be written as a function of cumulative probability distributions. 1 − F ( x) = k 1 − G ( x) or F ( x ) = 1 − k + kG ( x ) (5.7.11) where F (x ) is the cumulative probability distribution of all X (prob( X ≤ x X ≥ 0)) , k is the probability that X is not zero, and G (x ) is the cumulative probability distribution of the non-zero values of X (i.e. prob( X ≤ x X ≠ 0) . Equation 5.7.11 can be used to estimate the magnitude of an event with return period TX (X ) by solving first for G (x ) and then using the inverse transformation of G (x ) to get the value of X . For example the 10-year event with k = 0.95 is found to be the value of satisfying G ( x) = [F ( x) − 1 + k ] / k = (0.9 − 1 + 0.95) / 0.95 = 0.89 Note that it is possible to generate negative estimates for G (x ) from equation 5.7.11. For example, if k = 0.50 and F ( x ) = 0.05 , the estimated G (x ) is 127 Chapter V: Extreme Value Theory and Frequency Analysis G ( x) = [F ( x) − 1 + k ] / k = (0.05 − 1 + 0.50) / 0.5 = −0.9 This merely means that the value of X corresponding to F ( x ) = 0.05 is zero. 128