Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
D Important Probability Distributions Development of stochastic models is facilitated by identifying a few probability distributions that seem to correspond to a variety of data-generating processes, and then studying the properties of these distributions. In the following tables, I list some of the more useful distributions, both discrete distributions and continuous ones. The names listed are the most common names, although some distributions go by different names, especially for specific values of the parameters. In the first column, following the name of the distribution, the parameter space is specified. There are two very special continuous distributions, for which I use special symbols: the uniform over the interval [a, b], designated U(a, b), and the normal (or Gaussian), denoted by N(µ, σ 2 ). Notice that the second parameter in the notation for the normal is the variance. Sometimes, such as in the functions in R, the second parameter of the normal distribution is the standard deviation instead of the variance. A normal distribution with µ = 0 and σ 2 = 1 is called the standard normal. I also often use the notation φ(x) for the PDF of a standard normal and Φ(x) for the CDF of a standard normal, and these are generalized in the obvious way as φ(x|µ, σ 2) and Φ(x|µ, σ 2). Except for the uniform and the normal, I designate distributions by a name followed by symbols for the parameters, for example, binomial(π, n) or gamma(α, β). Some families of distributions are subfamilies of larger families. For example, the usual gamma family of distributions is a the two-parameter subfamily of the three-parameter gamma. There are other general families of probability distributions that are defined in terms of a differential equation or of a form for the CDF. These include the Pearson, Johnson, Burr, and Tukey’s lambda distributions. Most of the common distributions fall naturally into one of two classes. They have either a countable support with positive probability at each point in the support, or a continuous (dense, uncountable) support with zero probability for any subset with zero Lebesgue measure. The distributions listed in the following tables are divided into these two natural classes. c Elements of Computational Statistics, Second Edition 2013 James E. Gentle 418 Appendix D. Important Probability Distributions There are situations for which these two distinct classes are not appropriate. For many such situations, however, a mixture distribution provides an appropriate model. We can express a PDF of a mixture distribution as pM (y) = m X ωj pj (y | θj ), j=1 where the m distributions with PDFs pj can be either discrete or continuous. A simple example is a probability model for the amount of rainfall in a given period, say a day. It is likely that a nonzero probability should be associated with zero rainfall, but with no other amount of rainfall. In the model above, m is 2, ω1 is the probability of no rain, p1 is a degenerate PDF with a value of 1 at 0, ω2 = 1 − ω1 , and p2 is some continuous PDF over IR+ , possibly similar to a distribution in the exponential family. A mixture family that is useful in robustness studies is the -mixture distribution family, which is characterized by a given family with CDF P that is referred to as the reference distribution, together with a point xc and a weight . The CDF of a -mixture distribution family is Pxc , (x) = (1 − )P (x) + I[xc ,∞[ (x), where 0 ≤ ≤ 1. Another example of a mixture distribution is a binomial with constant parameter n, but with a nonconstant parameter π. In many applications, if an identical binomial distribution is assumed (that is, a constant π), it is often the case that “over-dispersion” will be observed; that is, the sample variance exceeds what would be expected given an estimate of some other parameter that determines the population variance. This situation can occur in a model, such as the binomial, in which a single parameter determines both the first and second moments. The mixture model above in which each pj is a binomial PDF with parameters n and πj may be a better model. Of course, we can extend this Pm kind of mixing even further. Instead of ωj pj (y | θj ) with ωj ≥ 0 and j=1 ωj = 1, we can take ω(θ)p(y | θ) with R ω(θ) ≥ 0 and ω(θ) dθ = 1, from which we recognize that ω(θ) is a PDF and θ can be considered to be the realization of a random variable. Extending the example of the mixture of binomial distributions, we may choose some reasonable PDF ω(π). An obvious choice is a beta PDF. This yields the beta-binomial distribution, with PDF n Γ(α + β) x+α−1 pX,Π (x, π) = π (1 − π)n−x+β−1 I{0,1,...,n}×]0,1[(x, π). x Γ(α)Γ(β) This is a standard distribution but I did not include it in the tables below. This distribution may be useful in situations in which a binomial model is appropriate, but the probability parameter is changing more-or-less continuously. c Elements of Computational Statistics, Second Edition 2013 James E. Gentle Appendix D. Important Probability Distributions 419 We recognize a basic property of any mixture distribution: It is a joint distribution factored as a marginal (prior) for a random variable, which is often not observable, and a conditional distribution for another random variable, which is usually the observable variable of interest. In Bayesian analyses, the first two assumptions (a prior distribution for the parameters and a conditional distribution for the observable) lead immediately to a mixture distribution. The beta-binomial above arises in a canonical example of Bayesian analysis. Some distributions are recognized because of their use as conjugate priors and their relationship to sampling distributions. These include the inverted chi-square and the inverted Wishart. General References Evans et al. (2000)give general descriptions of 40 probability distributions. Balakrishnan and Nevzorov (2003) provide an overview of the important characteristics that distinguish different distributions and then describe the important characteristics of many common distributions. Leemis and McQueston (2008) present an interesting compact graph of the relationships among a large number of probability distributions. Currently, the most readily accessible summary of common probability distributions is Wikipedia: http://wikipedia.org/ Search under the name of the distribution. c Elements of Computational Statistics, Second Edition 2013 James E. Gentle 420 Appendix D. Important Probability Distributions Table D.1. Discrete Distributions (PDFs are wrt counting measure) 1 y = a1 , . . . , am m, P ai /m P P (ai − ā)2 /m, where ā = ai /m discrete uniform PDF a1 , . . . , am ∈ IR mean Bernoulli PDF π ∈]0, 1[ mean π variance binomial (n Bernoullis) PDF π(1 − π) ! n y π (1 − π)n−y , y n = 1, 2, . . . ; CF variance π ∈]0, 1[ mean variance geometric PDF π ∈]0, 1[ mean variance negative binomial (n geometrics) PDF n = 1, 2, . . . ; CF π ∈]0, 1[ mean variance multinomial PDF n = 1, 2, . . ., for i = 1, . . . , d, πi ∈]0, 1[, CF P hypergeometric πy (1 − π)1−y , y = 0, 1 y = 0, 1, . . . , n (1 − π + πeit )n nπ nπ(1 − π) π(1 − π)y , y=0,1,2,. . . (1 − π)/π (1 − π)/π2 ! y+n−1 n π (1 − π)y , n−1 „ «n π 1 − (1 − π)eit n(1 − π)/π y = 0, 1, 2, . . . n(1 − π)/π2 d X n! Y yi Q πi , yi = 0, 1, . . . , n, yi = n yi ! “P i=1 ”n d iti i=1 πi e πi = 1 means nπi variances nπi (1 − πi ) covariances −nπi πj ! ! M N −M y n−y ! PDF , N n y = max(0, n − N + M ), . . . , min(n, M ) N = 2, 3, . . .; mean nM/N M = 1, . . . , N ; n = 1, . . . , N variance (nM/N )(1 − M/N )(N − n)/(N − 1) continued ... c Elements of Computational Statistics, Second Edition 2013 James E. Gentle Appendix D. Important Probability Distributions 421 Table D.1. Discrete Distributions (continued) Poisson PDF θy e−θ /y!, θ ∈ IR+ CF eθ(e mean θ variance θ hy y θ , y = 0, 1, 2, . . . c(θ) P it y y hy (θe ) /c(θ) power series PDF θ ∈ IR+ CF {hy } positive constants mean c(θ) = P y hy θy variance logarithmic PDF π ∈]0, 1[ mean Benford’s PDF b integer ≥ 3 mean variance it y = 0, 1, 2, . . . −1) d (log(c(θ)) dθ d d2 θ (log(c(θ)) + θ2 2 (log(c(θ)) dθ dθ θ πy , y = 1, 2, 3, . . . y log(1 − π) −π/((1 − π) log(1 − π)) − −π(π + log(1 − π))/((1 − π)2 (log(1 − π))2 ) logb (y + 1) − logb (y), b − 1 − logb ((b − 1)!) y = 1, . . . , b − 1 c Elements of Computational Statistics, Second Edition 2013 James E. Gentle 422 Appendix D. Important Probability Distributions Table D.2. The Normal Distributions def normal; N(µ, σ2 ) PDF φ(y|µ, σ2 ) = √ µ ∈ IR; σ ∈ IR+ CF eiµt−σ mean µ variance σ2 multivariate normal; Nd (µ, Σ) PDF µ ∈ IRd ; Σ 0 ∈ IRd×d 2 2 1 e−(y−µ) /2σ 2πσ 2 t2 /2 T −1 1 e−(y−µ) Σ (y−µ)/2 (2π)d/2 |Σ|1/2 CF eiµ mean µ T t−tT Σt/2 covariance Σ −1 T −1 1 e−tr(Ψ (Y −M ) Σ (Y −M ))/2 (2π)nm/2 |Ψ |n/2 |Σ|m/2 M matrix normal PDF M ∈ IRn×m , Ψ 0 ∈ IRm×m , mean Σ 0 ∈ IRn×n covariance Ψ ⊗ Σ complex multivariate normal PDF µ ∈C I d, Σ 0 ∈ C I d×d mean H −1 1 e−(z−µ) Σ (z−µ)/2 (2π)d/2 |Σ|1/2 µ covariance Σ c Elements of Computational Statistics, Second Edition 2013 James E. Gentle Appendix D. Important Probability Distributions 423 Table D.3. Sampling Distributions from the Normal Distribution chi-squared; χ2ν PDF ν ∈ IR+ mean if ν ∈ ZZ+ , variance t PDF ν ∈ IR+ mean F PDF variance 1 yν/2−1 e−y/2 IĪR+ (y) Γ(ν/2)2ν/2 ν 2ν Γ((ν + 1)/2) √ (1 + y2 /ν)−(ν+1)/2 Γ(ν/2) νπ 0 ν/(ν − 2), for ν > 2 ν /2 ν /2 ν1 , ν2 ∈ IR+ mean variance ν1 1 ν2 2 Γ(ν1 + ν2 ) yν1 /2−1 I (y) Γ(ν1 /2)Γ(ν2 /2) (ν2 + ν1 y)(ν1 +ν2 )/2 ĪR+ ν2 /(ν2 − 2), for ν2 > 2 2ν22 (ν1 + ν2 − 2)/(ν1 (ν2 − 2)2 (ν2 − 4)), for ν2 > 4 ` ´ |W |(ν−d−1)/2 exp −trace(Σ −1 W ) I{M | M 0∈IRd×d } (W ) νd/2 2 |Σ|ν/2 Γd (ν/2) νΣ Wishart PDF d = 1, 2, . . . ; mean ν > d − 1 ∈ IR; covariance Cov(Wij , Wkl ) = ν(σik σjl + σil σjk ), where Σ = (σij ) Σ 0 ∈ IRd×d noncentral chi-squared PDF ν, λ ∈ IR+ noncentral t mean ν+λ variance 2(ν + 2λ) 2 ν ν/2 e−λ /2 (ν + y2 )−(ν+1)/2 × Γ(ν/2)π1/2 „ « „ «k/2 ∞ X ν + k + 1 (λy)k 2 Γ 2 k! ν + y2 PDF ν ∈ IR+ , λ ∈ IR k=0 mean variance noncentral F ∞ e−λ/2 ν/2−1 −y/2 X (λ/2)k 1 y e yk IĪR+ (y) k! Γ(ν/2 + k)2k 2ν/2 k=0 PDF ν1 , ν2 , λ ∈ IR+ λ(ν/2)1/2 Γ((ν − 1)/2) , for ν > 1 Γ(ν/2) „ «2 ν ν Γ((ν − 1)/2) (1 + λ2 ) − λ2 , for ν > 2 ν−2 2 Γ(ν/2) „ «ν1 /2 „ «ν1 /2+ν2 /2 ν1 ν2 e−λ/2 yν1 /2−1 × ν2 ν2 + ν1 y „ « „ «k ∞ k X (λ/2)k Γ(ν2 + ν1 + k) ν1 ν2 yk IĪR+ (y) Γ(ν2 )Γ(ν1 + k)k! ν2 ν2 + ν1 y k=0 mean variance ν2 (ν1 + λ)/(ν1 (ν2 − 2)), for ν2 > 2 „ «2 „ « ν2 (ν1 + λ)2 + (ν1 + 2λ)(ν2 − 2) 2 , for ν2 > 4 ν1 (ν2 − 2)2 (ν2 − 4) c Elements of Computational Statistics, Second Edition 2013 James E. Gentle 424 Appendix D. Important Probability Distributions Table D.4. Distributions Useful as Priors for the Normal Parameters „ «α+1 1 1 inverted gamma PDF e−1/βy IĪR+ (y) Γ(α)β α y α, β ∈ IR+ mean 1/β(α − 1) for α > 1 variance 1/(β 2 (α − 1)2 (α − 2)) for α > 2 „ «ν/2+1 1 1 inverted chi-squared PDF e−1/2y IĪR+ (y) Γ(ν/2)2ν/2 y ν ∈ IR+ mean 1/(ν − 2) for ν > 2 variance 2/((ν − 2)2 (ν − 4)) for ν > 4 Table D.5. Distributions Derived from the Univariate Normal 2 2 1 y−1 e−(log(y)−µ) /2σ IĪR+ (y) 2πσ lognormal PDF √ µ ∈ IR; σ ∈ IR+ mean eµ+σ inverse Gaussian 2 /2 2 2 variance e2µ+σ (eσ − 1) s λ −λ(y−µ)2 /2µ2 y PDF e IĪR+ (y) 2πy3 µ, λ ∈ IR+ mean skew normal PDF µ variance µ3 /λ µ, λ ∈ IR; σ ∈ IR+ mean 1 −(y−µ)2/2σ 2 e πσ q µ+σ 2λ π(1+λ2 ) Z λ(y−µ)/σ e−t 2 /2 dt −∞ variance σ2 (1 − 2λ2 /π) c Elements of Computational Statistics, Second Edition 2013 James E. Gentle Appendix D. Important Probability Distributions 425 Table D.6. Other Continuous Distributions (PDFs are wrt Lebesgue measure) beta PDF α, β ∈ IR+ mean variance Dirichlet PDF α ∈ IRd+1 + mean variance uniform; U(θ1 , θ2 ) PDF θ1 < θ2 ∈ IR mean Cauchy PDF γ ∈ IR; β ∈ IR+ mean logistic PDF µ ∈ IR; β ∈ IR+ mean Pareto PDF α, γ ∈ IR+ mean power function PDF α, β ∈ IR+ variance variance variance variance Γ(α + β) α−1 y (1 − y)β−1 I[0,1] (y) Γ(α)Γ(β) α/(α + β) αβ (α + β)2 (α + β + 1) !αd+1 −1 P d d Y X Γ( d+1 αi −1 i=1 αi ) yi 1− yi I[0,1]d (y) Qd+1 i=1 Γ(αi ) i=1 i=1 α/kαk1 (αd+1 /kαk1 is the “mean of Yd+1 ”.) α (kαk1 − α) kαk21 (kαk1 + 1) 1 I[θ ,θ ] (y) θ2 − θ1 1 2 (θ2 + θ1 )/2 (θ22 − 2θ1 θ2 + θ12 )/12 1 „ “ ”2 « y−γ πβ 1 + β does not exist does not exist e−(y−µ)/β β(1 + e−(y−µ)/β )2 µ β 2 π2 /3 αγ α I[γ,∞[ (y) yα+1 αγ/(α − 1) for α > 1 αγ 2 /((α − 1)2 (α − 2)) for α > 2 (y/β)α I[0,β[ (y) mean αβ/(α + 1) variance αβ 2 /((α + 2)(α + 1)2 ) 1 eκ cos(x−µ) I[µ−π,µ+π] (y) 2πI0 (κ) µ von Mises PDF µ ∈ IR; κ ∈ IR+ mean variance 1 − (I1 (κ)/I0 (κ))2 continued ... c Elements of Computational Statistics, Second Edition 2013 James E. Gentle 426 Appendix D. Important Probability Distributions Table D.6. Other Continuous Distributions (continued) gamma PDF α, β ∈ IR+ mean 1 yα−1 e−y/β IĪR+ (y) Γ(α)β α αβ α, β ∈ IR+ ; γ ∈ IR mean αβ 2 1 (y − γ)α−1 e−(y−γ)/β I]γ,∞[ (y) Γ(α)β α αβ + γ variance αβ 2 exponential PDF θ−1 e−y/θ IĪR+ (y) θ ∈ IR+ mean θ variance θ2 double exponential PDF 1 −|y−µ|/θ e 2θ µ ∈ IR; θ ∈ IR+ mean µ (folded exponential) variance Weibull PDF 2θ2 α α−1 −y α /β y e IĪR+ (y) β β 1/α Γ(α−1 + 1) ` ´ β 2/α Γ(2α−1 + 1) − (Γ(α−1 + 1))2 1 −(y−α)/β e exp(e−(y−α)/β ) β α − βΓ0(1) variance three-parameter gamma PDF α, β ∈ IR+ mean variance extreme value (Type I) PDF α ∈ IR; β ∈ IR+ mean variance β 2 π2 /6 c Elements of Computational Statistics, Second Edition 2013 James E. Gentle