* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Likelihood inference for generalized Pareto distribution
Survey
Document related concepts
Algorithm characterizations wikipedia , lookup
Knapsack problem wikipedia , lookup
Factorization of polynomials over finite fields wikipedia , lookup
Mathematical optimization wikipedia , lookup
Computational complexity theory wikipedia , lookup
Pattern recognition wikipedia , lookup
Inverse problem wikipedia , lookup
Genetic algorithm wikipedia , lookup
Least squares wikipedia , lookup
Simplex algorithm wikipedia , lookup
Travelling salesman problem wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Probability box wikipedia , lookup
Transcript
Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Likelihood inference for generalized Pareto distribution J. del Castillo1 and I. Serra1 1 Departament de Matemàtiques Universitat Autònoma de Barcelona EVT2013 Sep de 2013 Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Table of contents 1 Introduction 2 Problem: Calibration of the GPD for likelihood inference 3 Solution: A good algorithm and a new methodology approach 4 Examples Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Table of contents 1 Introduction 2 Problem: Calibration of the GPD for likelihood inference 3 Solution: A good algorithm and a new methodology approach 4 Examples Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples History of EVT: onset of GPD Fisher-Tippett Theorem (1928) Let (X1 . . . Xn ) be a sequence of i.i.r.v, letMn = max{X 1 . . . Xn }. If Mn −bn (an , bn ) and F exist, such that limn→∞ P ≤ x = F(x), then F an belongs to the generalized extreme value distribution. Pickands−Balkema−de Haan theorem (1974) Let (X1 , . . . , Xn ) be a sequence of i.i.r.v., and let Fu be their conditional excess distribution function then Fu (y) → GPD(k, ψ)(y), as u→∞ where GPD is the generalized Pareto distribution. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Widely used Davis, H. T. and Michael L. F. (1979). The Generalized Pareto Law as a Model for Progressively Censored Survival Data. Biometrika Davison, A. C.; Smith, R. L. (1990). Models for exceedances over high thresholds. With discussion and a reply by the authors. JRSS-B Embrechts, P. Klüppelberg, C. and Mikosch, T. (1997). Modelling Extremal Events for Insurance and Finance. Springer-Verlag, Berlin. McNeil, A. J., Frey, R. and Embrechts P. (2005). Quantitative Risk Management: Concepts, Techniques and Tools. Princeton Univ. Press. Coles. S. and Sparks (2006). Extreme value methods for modelling historical series of large volcanic magnitudes. Statistics in Volcanology, Rachev, S. T., Racheva-Iotova, B., Stoyanov, S. (2010). Capturing fat tails, in Risk. Risk Management, Derivatives and Regulation, Papastathopoulos, I. and Tawn, J.A. (2012). Extended generalised Pareto models for tail estimation. J. Statist. Plann. Inference Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Generalized Pareto distribution The GPD is a two-parameter distribution for non-negative variables Probability density function f (x; k, ψ) = 1 (1 − kx/ψ)1/k−1 , ψ for ψ > 0, k ∈ R The range of x is (0, ∞) for k ≤ 0 or (0, ψ/k) for k > 0. Special cases: GPD contains exponential and uniform distribution f (x; 0, ψ) = 1 exp(−x/ψ), ψ Serra, I. f (x; 1, ψ) = x/ψ Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Submodels For k < 0 the Pareto submodel (range (0, ∞)) For k = 0 the exponential distribution For k ∈ (0, 1) a family of decreasing probability densities functions with finite support (0, ψ/k) For k = 1 the uniform distribution For k > 1 a family of increasing densities with finite support. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Main use From Balkema-DeHaan Theorem, the tails were classified for k < 0, k = 0 and k > 0 as heavy tails, exponential tails and light tails, respectively. Hence, a distribution has a class of tail-distribution in the GPD family. Evidently, this is well defined since for any threshold u > 0, the shape parameter k is invariant by the tail of GPD, that is GPDu (k, ψ) = GPD(k, ψ − ku) (1) The natural classification of the tails Heavy-tail, exponential-tail and light-tail Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Table of contents 1 Introduction 2 Problem: Calibration of the GPD for likelihood inference 3 Solution: A good algorithm and a new methodology approach 4 Examples Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Attemps and partial solutions Hosking, J.R.M.; Wallis, J.R. (1987). Parameter and quantile estimation for the generalized Pareto distribution. Technometrics Castillo, E.; Hadi, A. S. (1997). Fitting the generalized Pareto distribution to data. J. Amer. Statist. Assoc Juarez, S. F.; Schucany, W. R. (2004). Robust and efficient estimation for the generalized pareto distribution. Extremes Luceño, A. (2006). Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators. Comput. Statist. Data Anal. Castillo, J. del; Daoudi, J. (2009). Estimation of generalized Pareto distribution. Statistics and Probability Letters Zhang, J.; Stephens, M. A. (2009). A new and efficient estimation method for the generalized Pareto distribution. Technometrics Song, J; Song, S. (2012). A quantile estimation for massive data with generalized Pareto distribution. Comput. Statist. Data Anal. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Likelihood inference Goodness-of-Fit Test for GPD Choulakian, V.; and Stephens, M. A. (2001) . Goodness-of-Fit for the Generalized Pareto Distribution. Technometrics BUT it is available for maximum likelihood estimation of parameters. Zhang, J.; Stephens, M. A. (2009) propose a new estimation method BUT goodness-of-fit test is not modified Model selection Thus several models to the same data can be compared through Akaike and Bayesian information criteria. BUT the underlying theory uses maximum likelihood estimation. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Computing maximum likelihood estimator The log-likelihood function is given by n 1X l(k, ψ) = n − log(ψ) + (1/k − 1) log(1 − kxi /ψ) n ! i=1 where ψ > 0 for k ≤ 0 and ψ > kx(n) for k > 0. The MLE exists BUT the likelihood equations not always have a solution. It is a non-regular problem. In general The MLE problem for non-regular cases is intensely worked, the main results are in Smith (1985), Cheng and Amin (1983), Cheng and Iles (1987) and Hall and Wang (2005). Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Best alternatives to MLE: ZSE and SSE Currently, the best alternatives to MLE are: ZSE proposed by Zhang and Stephens (2009-2010) and SSE by Song and Song (2012). The horizontal is the parameter k with which we make the simulation of samples and the vertical axis is the parameter k estimated for each of the estimation methods: ML, SS i ZS. The grey intensity corresponds to the observed frequency in 1000 trials of sample size 50 in each range of k with: width of 0.1 and uniformly distributed. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Misspecification problem Percentages of the classification for each submodel for some sample size and values of k. The model A corresponds to the submodel of GPD for k < 0, the model B for k ∈ [0, 1] and the model C for k > 1. A n B -0,1 k C 0,1 0,9 1,1 A B C A B C A B C A B C 65.8 84.2 34.1 0.1 44.1 0.9 0.4 57.7 19.9 0.0 0.0 76.5 90.7 22.6 0.0 55.5 80.1 0.4 15.8 9.3 0.0 31.0 41.9 69.0 85.3 70.1 14.5 0.3 73.8 20.7 14.7 40.7 29.1 0.0 0.0 50.0 77.1 29.4 0.0 25.6 70.9 0.7 30.0 22.9 0.0 27.3 41.4 75.0 50.2 8.4 18.8 0.1 0.0 4.1 9.7 0.0 0.0 11,2 49.8 88.8 0.0 64.6 90.3 16.6 25.0 50.2 0.0 4.4 ZSE 15 100 SSE 15 100 44.6 72.7 MLE 15 100 95.9 95.6 The big-bold values correspond to well classified categories. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Table of contents 1 Introduction 2 Problem: Calibration of the GPD for likelihood inference 3 Solution: A good algorithm and a new methodology approach 4 Examples Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On the existence of MLE Since the GPD consists in three models: A, B and C, separated by exponential and uniform distribution, the existence of MLE is analized for each submodel. Proposition (motivation) Let f (x) be a probability density function of a random variable with support [0, xF ] and 0 < f (xF ) < ∞. Then its tail distribution is the uniform distribution. That if a family of right-truncated distributions (in a point where the density different to 0) is used to model tails, then this model only contains a class of tail: the uniform distribution. This is a motivation for choosing the model previously and and once chosen, let to show that the existence of MLE is not a problem. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On the existence of MLE. Submodel A: k ≤ 0 For k ≤ 0 From Castillo and Daoudi (2009) the MLE exists for the model GPD for k ∈ (−∞, 0]. Remark that the global maximum holds in the interior of the domain of k if the empirical coefficient of variation is greater than 1. If the empirical coefficient of variation is less than 1, then in k = 0 has a local maximum and the authors remarks that from a empirical point of view, it is global. See also Kozubowski et al. (2009). Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On the existence of MLE. Submodel B: 0 ≤ k ≤ 1 For 0 ≤ k ≤ 1 Choulakian and Stephens (2001) shows that given k < 1 fixed a single solution exists for ∂l/∂ψ = 0 and it’s a maximum denoted by ψ̂(k). The set (k, ψ̂(k)) for k ∈ (0, 1) is called Choulakian-Stephens curve. Theorem Consider the model GPD for 0 ≤ k ≤ 1, then the global MLE exists. Moreover, x̄ ≤ ψ̂(k) ≤ x(n) , lim ψ̂(k) = x̄ k→0 and lim ψ̂(k) = x(n) k→1 (Idea:) ψ̂(k), for k ∈ (0, 1) is cont., monotonous incr. and diff. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On the existence of MLE. Submodel C: k ≥ 1 Proposition Any solution of the likelihood equations for GPD always satisfies k < 1. The GPD model for k ≥ 1 has not MLE as solution of log-likelihood equations. However, this does not contradict the existence of the MLE for the model GPD for k ≥ 1. Proposition The MLE of the model GPD for k ∈ [1, ∞) with k fixed is ψ̂ = kx(n) . Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Procedure to calibration This methodology provides the tools to obtain the more realistic model for the tail of data. 1st step: To calculate MLE for some thresholds. 2nd(a) step: If the value of k̂ is varying near zero, then consider the possibility that k = 0. To contrast this hypothesis. 2nd(b) step: If the value of k̂ is varying around k = 1, then consider this possibility. To contrast this hypothesis. 2nd(c) step: In other case, the submodel is clear and our algorithm give the MLE. Remark To compute the exact confidence interval for k consider the property that the coefficient of variation of GPD only depends on k. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Table of contents 1 Introduction 2 Problem: Calibration of the GPD for likelihood inference 3 Solution: A good algorithm and a new methodology approach 4 Examples Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples Clarification To illustrate the advantages of the new methodology for estimation procedure, we will present two real-world example and controversial examples. Every one of them is extensively studied in the literature. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On Nidd data Many authors explore this data from Hosking (1987) and Davison and Smith (1990) since Papastathopoulos and Tawn (2012). These data consists in 154 observations of high river levels of the River Nidd in Yorkshire above a threshold value of 65. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On Nidd data The MLE of Nidd data and the exact and confidence intervals for the parameters: k and ψ. 70 80 90 100 110 120 130 140 MLE k ψ -0.32 21.64 -0.34 25.22 -0.24 33.55 0.00 50.62 0.07 56.38 0.25 71.64 0.14 59.43 0.24 65.58 e.i. for k = 0 n 99% 138 (-0.23 , 0.4) 86 (-0.22 , 0.35) 57 (-0.21 , 0.33) 39 (-0.2 , 0.31) 31 (-0.2 , 0.3) 24 (-0.19 , 0.28) 22 (-0.18 , 0.27) 18 (-0.18 , 0.26) Serra, I. c.i. for ψ in exponential cas 99% (25.43 , 39.48) (28.43 , 49.66) (31.48 , 62.53) (34.78 , 79.9) (34.55 , 88.01) (35.47 , 102.97) (31.7 , 96.63) (30.66 , 105.54) Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On Nidd data 1.2 1.0 0.8 Coefficient of variation 1.4 For exponential tail tested, there is an other methodology from Castillo, et al. (2012) and the conclusions are the same. For data larger than 70, the value of statistic of exponentiality test is T3 = 16.95 and the exponentiality is just rejected. For data larger than 90, T2 = 1.42 CV-plotdoes not reject it. 0 20 40 60 80 100 120 Sample Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On Bilbao waves data This data is originally analyzed in Castillo and Hadi (1997), which consists of the zero-crossing hourly mean periods (in seconds) of the sea waves measured in the Bilbao bay, Spain. Later on, this data set was revisited in Luceño (2006) and in Zhang and Stephens (2009). Only the 197 observations with periods above 7 seconds were taken into consideration. We model this data by the GPD using thresholds at t = 7, 7.5, 8, 8.5, 9, 9.5 following the above mentioned authors. They all agree to say that the MLE not exists for the last three cases. We note that the MLE exists as we have seen but as solution of a non-regular problem. Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On Bilbao waves data The estimation of GPD parameters for Bilbao waves data, and for each threshold, using different estimators. The maximum, M, of the distribution is given. 7 7.5 8 8.5 9 9.5 k 0.84 0.56 0.63 0.77 0.80 -0.63 SSE ψ 2.44 1.60 1.42 1.18 0.81 0.22 M 9.90 10.32 10.25 10.04 10.02 k 0.81 0.71 0.77 0.83 0.88 1.01 Serra, I. ZSE ψ 2.38 1.75 1.51 1.21 0.83 0.43 M 9.95 9.99 9.96 9.95 9.94 9.93 k 0.86 0.77 0.86 1.06 1.19 1.53 MLE ψ 2.50 1.86 1.65 1.49 1.07 0.61 M 9.91 9.92 9.91 9.90 9.90 9.90 Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On Bilbao waves data For each threshold of Bilbao waves data, the table provides the estimated GPD parameters using our algorithm for the MLE. 7 7.5 8 8.5 9 9.5 n 179 154 106 69 41 17 e.i. for k = 1 95% 99% 0.72 1.36 0.64 1.51 0.70 1.41 0.62 1.55 0.64 1.50 0.55 1.72 0.57 1.65 0.47 1.93 0.47 1.93 0.35 2.37 0.27 2.83 0.13 4.09 Serra, I. Likelihood inference for generalized Pareto distribution Introduction Problem: Calibration of the GPD for likelihood inference Solution: A good algorithm and a new methodology approach Examples On Bilbao waves data An example of Choulakian Stephens curve for Bilbao waves data with threshold in 7.5 Serra, I. Likelihood inference for generalized Pareto distribution