Download D Important Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
D
Important Probability Distributions
Development of stochastic models is facilitated by identifying a few probability distributions that seem to correspond to a variety of data-generating processes, and then studying the properties of these distributions. In the following
tables, I list some of the more useful distributions, both discrete distributions
and continuous ones. The names listed are the most common names, although
some distributions go by different names, especially for specific values of the
parameters. In the first column, following the name of the distribution, the
parameter space is specified.
There are two very special continuous distributions, for which I use special symbols: the uniform over the interval [a, b], designated U(a, b), and the
normal (or Gaussian), denoted by N(µ, σ 2 ). Notice that the second parameter in the notation for the normal is the variance. Sometimes, such as in the
functions in R, the second parameter of the normal distribution is the standard deviation instead of the variance. A normal distribution with µ = 0 and
σ 2 = 1 is called the standard normal. I also often use the notation φ(x) for
the PDF of a standard normal and Φ(x) for the CDF of a standard normal,
and these are generalized in the obvious way as φ(x|µ, σ 2) and Φ(x|µ, σ 2).
Except for the uniform and the normal, I designate distributions by a
name followed by symbols for the parameters, for example, binomial(π, n) or
gamma(α, β). Some families of distributions are subfamilies of larger families.
For example, the usual gamma family of distributions is a the two-parameter
subfamily of the three-parameter gamma.
There are other general families of probability distributions that are defined in terms of a differential equation or of a form for the CDF. These include
the Pearson, Johnson, Burr, and Tukey’s lambda distributions.
Most of the common distributions fall naturally into one of two classes.
They have either a countable support with positive probability at each point
in the support, or a continuous (dense, uncountable) support with zero probability for any subset with zero Lebesgue measure. The distributions listed in
the following tables are divided into these two natural classes.
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
418
Appendix D. Important Probability Distributions
There are situations for which these two distinct classes are not appropriate. For many such situations, however, a mixture distribution provides an
appropriate model. We can express a PDF of a mixture distribution as
pM (y) =
m
X
ωj pj (y | θj ),
j=1
where the m distributions with PDFs pj can be either discrete or continuous.
A simple example is a probability model for the amount of rainfall in a given
period, say a day. It is likely that a nonzero probability should be associated
with zero rainfall, but with no other amount of rainfall. In the model above,
m is 2, ω1 is the probability of no rain, p1 is a degenerate PDF with a value
of 1 at 0, ω2 = 1 − ω1 , and p2 is some continuous PDF over IR+ , possibly
similar to a distribution in the exponential family.
A mixture family that is useful in robustness studies is the -mixture distribution family, which is characterized by a given family with CDF P that is
referred to as the reference distribution, together with a point xc and a weight
. The CDF of a -mixture distribution family is
Pxc , (x) = (1 − )P (x) + I[xc ,∞[ (x),
where 0 ≤ ≤ 1.
Another example of a mixture distribution is a binomial with constant
parameter n, but with a nonconstant parameter π. In many applications, if
an identical binomial distribution is assumed (that is, a constant π), it is often
the case that “over-dispersion” will be observed; that is, the sample variance
exceeds what would be expected given an estimate of some other parameter
that determines the population variance. This situation can occur in a model,
such as the binomial, in which a single parameter determines both the first
and second moments. The mixture model above in which each pj is a binomial
PDF with parameters n and πj may be a better model.
Of course, we can extend this
Pm kind of mixing even further. Instead of
ωj pj (y | θj ) with ωj ≥ 0 and
j=1 ωj = 1, we can take ω(θ)p(y | θ) with
R
ω(θ) ≥ 0 and ω(θ) dθ = 1, from which we recognize that ω(θ) is a PDF and
θ can be considered to be the realization of a random variable.
Extending the example of the mixture of binomial distributions, we may
choose some reasonable PDF ω(π). An obvious choice is a beta PDF. This
yields the beta-binomial distribution, with PDF
n Γ(α + β) x+α−1
pX,Π (x, π) =
π
(1 − π)n−x+β−1 I{0,1,...,n}×]0,1[(x, π).
x Γ(α)Γ(β)
This is a standard distribution but I did not include it in the tables below.
This distribution may be useful in situations in which a binomial model is
appropriate, but the probability parameter is changing more-or-less continuously.
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
Appendix D. Important Probability Distributions
419
We recognize a basic property of any mixture distribution: It is a joint
distribution factored as a marginal (prior) for a random variable, which is often
not observable, and a conditional distribution for another random variable,
which is usually the observable variable of interest.
In Bayesian analyses, the first two assumptions (a prior distribution for
the parameters and a conditional distribution for the observable) lead immediately to a mixture distribution. The beta-binomial above arises in a canonical
example of Bayesian analysis.
Some distributions are recognized because of their use as conjugate priors
and their relationship to sampling distributions. These include the inverted
chi-square and the inverted Wishart.
General References
Evans et al. (2000)give general descriptions of 40 probability distributions.
Balakrishnan and Nevzorov (2003) provide an overview of the important characteristics that distinguish different distributions and then describe the important characteristics of many common distributions. Leemis and McQueston
(2008) present an interesting compact graph of the relationships among a
large number of probability distributions.
Currently, the most readily accessible summary of common probability
distributions is Wikipedia: http://wikipedia.org/ Search under the name
of the distribution.
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
420
Appendix D. Important Probability Distributions
Table D.1. Discrete Distributions (PDFs are wrt counting measure)
1
y = a1 , . . . , am
m,
P
ai /m
P
P
(ai − ā)2 /m, where ā =
ai /m
discrete uniform
PDF
a1 , . . . , am ∈ IR
mean
Bernoulli
PDF
π ∈]0, 1[
mean
π
variance
binomial (n Bernoullis)
PDF
π(1 − π)
!
n y
π (1 − π)n−y ,
y
n = 1, 2, . . . ;
CF
variance
π ∈]0, 1[
mean
variance
geometric
PDF
π ∈]0, 1[
mean
variance
negative binomial (n geometrics)
PDF
n = 1, 2, . . . ;
CF
π ∈]0, 1[
mean
variance
multinomial
PDF
n = 1, 2, . . .,
for i = 1, . . . , d, πi ∈]0, 1[,
CF
P
hypergeometric
πy (1 − π)1−y ,
y = 0, 1
y = 0, 1, . . . , n
(1 − π + πeit )n
nπ
nπ(1 − π)
π(1 − π)y ,
y=0,1,2,. . .
(1 − π)/π
(1 − π)/π2
!
y+n−1 n
π (1 − π)y ,
n−1
„
«n
π
1 − (1 − π)eit
n(1 − π)/π
y = 0, 1, 2, . . .
n(1 − π)/π2
d
X
n! Y yi
Q
πi , yi = 0, 1, . . . , n,
yi = n
yi !
“P i=1
”n
d
iti
i=1 πi e
πi = 1 means
nπi
variances nπi (1 − πi )
covariances −nπi πj
!
!
M
N −M
y
n−y
!
PDF
,
N
n
y = max(0, n − N + M ), . . . , min(n, M )
N = 2, 3, . . .;
mean
nM/N
M = 1, . . . , N ; n = 1, . . . , N
variance
(nM/N )(1 − M/N )(N − n)/(N − 1)
continued ...
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
Appendix D. Important Probability Distributions
421
Table D.1. Discrete Distributions (continued)
Poisson
PDF
θy e−θ /y!,
θ ∈ IR+
CF
eθ(e
mean
θ
variance
θ
hy y
θ , y = 0, 1, 2, . . .
c(θ)
P
it y
y hy (θe ) /c(θ)
power series
PDF
θ ∈ IR+
CF
{hy } positive constants
mean
c(θ) =
P
y
hy θy
variance
logarithmic
PDF
π ∈]0, 1[
mean
Benford’s
PDF
b integer ≥ 3
mean
variance
it
y = 0, 1, 2, . . .
−1)
d
(log(c(θ))
dθ
d
d2
θ (log(c(θ)) + θ2 2 (log(c(θ))
dθ
dθ
θ
πy
, y = 1, 2, 3, . . .
y log(1 − π)
−π/((1 − π) log(1 − π))
−
−π(π + log(1 − π))/((1 − π)2 (log(1 − π))2 )
logb (y + 1) − logb (y),
b − 1 − logb ((b − 1)!)
y = 1, . . . , b − 1
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
422
Appendix D. Important Probability Distributions
Table D.2. The Normal Distributions
def
normal; N(µ, σ2 )
PDF
φ(y|µ, σ2 ) = √
µ ∈ IR; σ ∈ IR+
CF
eiµt−σ
mean
µ
variance
σ2
multivariate normal; Nd (µ, Σ) PDF
µ ∈ IRd ; Σ 0 ∈ IRd×d
2
2
1
e−(y−µ) /2σ
2πσ
2 t2 /2
T −1
1
e−(y−µ) Σ (y−µ)/2
(2π)d/2 |Σ|1/2
CF
eiµ
mean
µ
T
t−tT Σt/2
covariance Σ
−1
T −1
1
e−tr(Ψ (Y −M ) Σ (Y −M ))/2
(2π)nm/2 |Ψ |n/2 |Σ|m/2
M
matrix normal
PDF
M ∈ IRn×m , Ψ 0 ∈ IRm×m ,
mean
Σ 0 ∈ IRn×n
covariance Ψ ⊗ Σ
complex multivariate normal
PDF
µ ∈C
I d, Σ 0 ∈ C
I d×d
mean
H −1
1
e−(z−µ) Σ (z−µ)/2
(2π)d/2 |Σ|1/2
µ
covariance Σ
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
Appendix D. Important Probability Distributions
423
Table D.3. Sampling Distributions from the Normal Distribution
chi-squared; χ2ν
PDF
ν ∈ IR+
mean
if ν ∈ ZZ+ ,
variance
t
PDF
ν ∈ IR+
mean
F
PDF
variance
1
yν/2−1 e−y/2 IĪR+ (y)
Γ(ν/2)2ν/2
ν
2ν
Γ((ν + 1)/2)
√
(1 + y2 /ν)−(ν+1)/2
Γ(ν/2) νπ
0
ν/(ν − 2), for ν > 2
ν /2 ν /2
ν1 , ν2 ∈ IR+
mean
variance
ν1 1 ν2 2 Γ(ν1 + ν2 )
yν1 /2−1
I (y)
Γ(ν1 /2)Γ(ν2 /2) (ν2 + ν1 y)(ν1 +ν2 )/2 ĪR+
ν2 /(ν2 − 2), for ν2 > 2
2ν22 (ν1 + ν2 − 2)/(ν1 (ν2 − 2)2 (ν2 − 4)), for ν2 > 4
`
´
|W |(ν−d−1)/2
exp −trace(Σ −1 W ) I{M | M 0∈IRd×d } (W )
νd/2
2
|Σ|ν/2 Γd (ν/2)
νΣ
Wishart
PDF
d = 1, 2, . . . ;
mean
ν > d − 1 ∈ IR;
covariance Cov(Wij , Wkl ) = ν(σik σjl + σil σjk ), where Σ = (σij )
Σ 0 ∈ IRd×d
noncentral chi-squared PDF
ν, λ ∈ IR+
noncentral t
mean
ν+λ
variance
2(ν + 2λ)
2
ν ν/2 e−λ /2
(ν + y2 )−(ν+1)/2 ×
Γ(ν/2)π1/2
„
«
„
«k/2
∞
X
ν + k + 1 (λy)k
2
Γ
2
k!
ν + y2
PDF
ν ∈ IR+ , λ ∈ IR
k=0
mean
variance
noncentral F
∞
e−λ/2 ν/2−1 −y/2 X (λ/2)k
1
y
e
yk IĪR+ (y)
k! Γ(ν/2 + k)2k
2ν/2
k=0
PDF
ν1 , ν2 , λ ∈ IR+
λ(ν/2)1/2 Γ((ν − 1)/2)
, for ν > 1
Γ(ν/2)
„
«2
ν
ν
Γ((ν − 1)/2)
(1 + λ2 ) − λ2
, for ν > 2
ν−2
2
Γ(ν/2)
„ «ν1 /2
„
«ν1 /2+ν2 /2
ν1
ν2
e−λ/2 yν1 /2−1
×
ν2
ν2 + ν1 y
„
«
„
«k
∞
k
X (λ/2)k Γ(ν2 + ν1 + k) ν1
ν2
yk
IĪR+ (y)
Γ(ν2 )Γ(ν1 + k)k!
ν2
ν2 + ν1 y
k=0
mean
variance
ν2 (ν1 + λ)/(ν1 (ν2 − 2)), for ν2 > 2
„ «2 „
«
ν2
(ν1 + λ)2 + (ν1 + 2λ)(ν2 − 2)
2
, for ν2 > 4
ν1
(ν2 − 2)2 (ν2 − 4)
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
424
Appendix D. Important Probability Distributions
Table D.4. Distributions Useful as Priors for the Normal Parameters
„ «α+1
1
1
inverted gamma
PDF
e−1/βy IĪR+ (y)
Γ(α)β α y
α, β ∈ IR+
mean
1/β(α − 1) for α > 1
variance 1/(β 2 (α − 1)2 (α − 2)) for α > 2
„ «ν/2+1
1
1
inverted chi-squared PDF
e−1/2y IĪR+ (y)
Γ(ν/2)2ν/2 y
ν ∈ IR+
mean
1/(ν − 2) for ν > 2
variance 2/((ν − 2)2 (ν − 4)) for ν > 4
Table D.5. Distributions Derived from the Univariate Normal
2
2
1
y−1 e−(log(y)−µ) /2σ IĪR+ (y)
2πσ
lognormal
PDF
√
µ ∈ IR; σ ∈ IR+
mean
eµ+σ
inverse Gaussian
2 /2
2
2
variance e2µ+σ (eσ − 1)
s
λ −λ(y−µ)2 /2µ2 y
PDF
e
IĪR+ (y)
2πy3
µ, λ ∈ IR+
mean
skew normal
PDF
µ
variance µ3 /λ
µ, λ ∈ IR; σ ∈ IR+ mean
1 −(y−µ)2/2σ 2
e
πσ q
µ+σ
2λ
π(1+λ2 )
Z
λ(y−µ)/σ
e−t
2 /2
dt
−∞
variance σ2 (1 − 2λ2 /π)
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
Appendix D. Important Probability Distributions
425
Table D.6. Other Continuous Distributions (PDFs are wrt Lebesgue measure)
beta
PDF
α, β ∈ IR+
mean
variance
Dirichlet
PDF
α ∈ IRd+1
+
mean
variance
uniform; U(θ1 , θ2 ) PDF
θ1 < θ2 ∈ IR
mean
Cauchy
PDF
γ ∈ IR; β ∈ IR+
mean
logistic
PDF
µ ∈ IR; β ∈ IR+
mean
Pareto
PDF
α, γ ∈ IR+
mean
power function
PDF
α, β ∈ IR+
variance
variance
variance
variance
Γ(α + β) α−1
y
(1 − y)β−1 I[0,1] (y)
Γ(α)Γ(β)
α/(α + β)
αβ
(α + β)2 (α + β + 1)
!αd+1 −1
P
d
d
Y
X
Γ( d+1
αi −1
i=1 αi )
yi
1−
yi
I[0,1]d (y)
Qd+1
i=1 Γ(αi ) i=1
i=1
α/kαk1
(αd+1 /kαk1 is the “mean of Yd+1 ”.)
α (kαk1 − α)
kαk21 (kαk1 + 1)
1
I[θ ,θ ] (y)
θ2 − θ1 1 2
(θ2 + θ1 )/2
(θ22 − 2θ1 θ2 + θ12 )/12
1
„
“
”2 «
y−γ
πβ 1 +
β
does not exist
does not exist
e−(y−µ)/β
β(1 + e−(y−µ)/β )2
µ
β 2 π2 /3
αγ α
I[γ,∞[ (y)
yα+1
αγ/(α − 1) for α > 1
αγ 2 /((α − 1)2 (α − 2)) for α > 2
(y/β)α I[0,β[ (y)
mean
αβ/(α + 1)
variance
αβ 2 /((α + 2)(α + 1)2 )
1
eκ cos(x−µ) I[µ−π,µ+π] (y)
2πI0 (κ)
µ
von Mises
PDF
µ ∈ IR; κ ∈ IR+
mean
variance
1 − (I1 (κ)/I0 (κ))2
continued ...
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
426
Appendix D. Important Probability Distributions
Table D.6. Other Continuous Distributions (continued)
gamma
PDF
α, β ∈ IR+
mean
1
yα−1 e−y/β IĪR+ (y)
Γ(α)β α
αβ
α, β ∈ IR+ ; γ ∈ IR
mean
αβ 2
1
(y − γ)α−1 e−(y−γ)/β I]γ,∞[ (y)
Γ(α)β α
αβ + γ
variance
αβ 2
exponential
PDF
θ−1 e−y/θ IĪR+ (y)
θ ∈ IR+
mean
θ
variance
θ2
double exponential
PDF
1 −|y−µ|/θ
e
2θ
µ ∈ IR; θ ∈ IR+
mean
µ
(folded exponential)
variance
Weibull
PDF
2θ2
α α−1 −y α /β
y
e
IĪR+ (y)
β
β 1/α Γ(α−1 + 1)
`
´
β 2/α Γ(2α−1 + 1) − (Γ(α−1 + 1))2
1 −(y−α)/β
e
exp(e−(y−α)/β )
β
α − βΓ0(1)
variance
three-parameter gamma PDF
α, β ∈ IR+
mean
variance
extreme value (Type I) PDF
α ∈ IR; β ∈ IR+
mean
variance
β 2 π2 /6
c
Elements of Computational Statistics, Second Edition 2013
James E. Gentle
Related documents