Download 1.6 Families of Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Transcript
1.6. FAMILIES OF DISTRIBUTIONS
15
F(x)
1.0
0.20
0.8
0.15
cdf
Density
0.6
0.10
0.4
0.05
0.2
0.00
Your text
a
b
0.0
c
x
V1
Figure 1.1: Distribution Function and Cumulative Distribution Function for
N(4.5, 2)
1.6 Families of Distributions
Example 1.10. Normal distribution N (µ, σ 2 )
The density function is given by
−(x−µ)2
1
fX (x) = √ e 2σ2 .
σ 2π
(1.8)
There are two parameters which tell us about the position and the shape of the
density curve.
There is an extensive theory of statistical analysis for data which are realizations
of normally distributed random variables. This distribution is most common in
applications, but sometimes it is not feasible to assume that what we observe can
indeed be a sample from such a population. In Example 1.6 we observe time to a
specific response to a drug candidate. Such a variable can only take nonnegative
values, while the normal rv’s domain is R. A lognormal distribution is often used
in such cases. X has a lognormal distribution if log X is normally distributed.
Exercise 1.6. Recall the following discrete distributions:
1. Uniform U(n);
2. Bernoulli(p);
3. Geom(p);
4. Hypogeometric(n, M, N);
16 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
5. Poisson(λ).
Exercise 1.7. Recall the following continuous distributions:
1. Uniform U(a, b);
2. Exp(λ);
3. Gamma(α, λ);
4. Chi-squared with ν degrees of freedom, χ2ν ;
Note that all the distributions depend on some parameters, like p, λ, µ, σ or other.
These values are usually unknown, so their estimation is one of the important
problems in statistical analyzes. These parameters determine some characteristics
of the shape of the pdf/pmf of a random variable. It can be location, spread, skewness etc.
We denote by Pθ the distribution function of a rv X depending on the parameter
θ (a scalar or an s-dimensional vector).
Definition 1.10. The distribution family is a set
P = {Pθ : θ ∈ Θ ⊆ Rs }
defined on a common measurable space (Ω, A).
Example 1.11. Assume that in the efficacy Example 1.9 we have 3 mice. Then the
sample space Ω consists of triples of “efficacious” or “non-efficacious” responses,
Ω = {(ω1 , ω1 , ω1 ), (ω1 , ω1 , ω2), . . . , (ω2 , ω2 , ω2 )}.
Take the σ-algebra A as the power set on Ω.
The random variable defined as the number of successes in n = 3 trials has Binomial distribution with probability of success p.
X : Ω → {0, 1, 2, 3}.
Here p is the parameter which together with the distribution function defines the
family on the common measurable space (Ω, A).
For p =
1
2
we have the symmetric mass function
1.7. EXPECTED VALUES
17
X =x
0
1
2
3
1
8
3
8
3
8
1
8
P (X = x)
while for p =
3
4
,
the asymmetric mass function
X=x
P (X = x)
0
1
2
3
1
64
9
64
27
64
27
64
.
Both functions belong to the same family of binomial distributions with n = 3.
1.7 Expected Values
Definition 1.11. The expected value of a function g(X) is defined by
Z ∞


g(x)f (x)dx, for a continuous r.v.,

 −∞
∞
E(g(X)) =
X


g(xj )p(xj ) for a discrete r.v.,


j=0
and g is any function such that E |g(X)| < ∞.
Two important special cases of g are:
Mean E(X), also denoted by E X, when g(X) = X,
Variance E(X − E X)2 , when g(X) = (X − E X)2 . The following relation is
very useful while calculating the variance
E(X − E X)2 = E X 2 − (E X)2 .
Example 1.12. Let X be a random variable such that

 1 sin x, for x ∈ [0, π],
f (x) = 2

0 otherwise.
Then the expectation and variance are following
(1.9)
18 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
• Expectation
• Variance
1
EX =
2
Z
π
x sin xdx =
0
π
.
2
var(X) = E X 2 − (E X)2
Z
π 2
1 π 2
=
x sin xdx −
2 0
2
2
π
= .
4
Example 1.13. Geom(p) (the number of independent Bernoulli trials until first
“success”): The support set and the pmf are, respectively, X = {1, 2, . . .} and
P (X = x) = p(1 − p)x−1 = pq x−1 , x ∈ X ,
where p ∈ [0, 1] is the probability of success, q = 1 − p.
1−p
1
.
E(X) = , var(X) =
p
p2
Here we will prove the formula for the expectation. By the definition of the expectation we have
E(X) =
∞
X
xpq
x−1
x=1
=p
∞
X
xq x−1 .
x=1
Note that for x ≥ 1 the following equality holds
d x
(q ) = xq x−1 .
dq
Hence,
∞
X
∞
X
d x
E(X) =p
xq
=p
(q )
dq
x=1
x=1
!
∞
d X x
=p
q .
dq x=1
x−1
The latter is the sum of a geometric sequence and is equal to q/(1 − q). Therefore,
d
q
1
1
E(X) = p
=p
= .
2
dq 1 − q
(1 − q)
p
Similar method can be used to show that the var(X) = q/p (second derivative
with respect to q of q x can be applied for this).
2
1.8. MOMENTS AND MOMENT GENERATING FUNCTIONS
19
The following useful properties of the expectation follow from properties of integration (summation).
Theorem 1.5. Let X be a random variable and let a, b and c be constants. Then
for any functions g(x) and h(x) whose expectations exist we have:
(a) E[ag(X) + bh(X) + c] = a E[g(X)] + b E[h(X)] + c;
(b) If g(x) ≥ h(x) for all x, then E(g(X)) ≥ E(h(X));
(c) If g(x) ≥ 0 for all x, then E(g(X)) ≥ 0;
(d) If a ≥ g(x) ≥ b for all x, then a ≥ E(g(X)) ≥ b.
Exercise 1.8. Show that E(X − b)2 is minimized by b = E(X).
Variance of a random variable together with the mean are the most important
parameters used in the theory of statistics. The following theorem is a result of
the properties of the expectation function.
Theorem 1.6. If X is a random variable with a finite variance, then for any
constants a and b,
var(aX + b) = a2 var X.
Exercise 1.9. Prove Theorem 1.6.
1.8 Moments and Moment Generating Functions
Definition 1.12. The nth moment (n ∈ N) of a random variable X is defined as
µ′n = E X n
The nth central moment of X is defined as
µn = E(X − µ)n ,
where µ = µ′1 = E X.
20 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
Note, that the second central moment is the variance of a random variable X, usually denoted by σ 2 .
Moments give an indication of the shape of the distribution of a random variable.
Skewness and kurtosis are measured by the following functions of the third and
fourth central moment respectively:
the coefficient of skewness is given by
γ1 =
µ3
E(X − µ)3
p
;
=
σ3
µ32
the coefficient of kurtosis is given by
γ2 =
E(X − µ)4
µ4
− 3 = 2 − 3.
4
σ
µ2
Moments can be calculated from the definition or by using so called moment generating function.
Definition 1.13. The moment generating function (mgf) of a random variable X
is a function MX : R → [0, ∞) given by
MX (t) = E etX ,
provided that the expectation exists for t in some neighborhood of zero.
More explicitly, the mgf of X can be written as
Z ∞
MX (t) =
etx fX (x)dx, if X is continuous,
−∞
X
MX (t) =
etx P (X = x)dx, if X is discrete.
x∈X
The method to generate moments is given in the following theorem.
Theorem 1.7. If X has mgf MX (t), then
dn
MX (t)|t=0 ,
dtn
That is, the n-th moment is equal to the n-th derivative of the mgf evaluated at
t = 0.
E(X n ) =
1.8. MOMENTS AND MOMENT GENERATING FUNCTIONS
21
Proof. Assuming that we can differentiate under the integral sign we may write
Z
d
d ∞ tx
MX (t) =
e fX (x)dx
dt
dt −∞
Z ∞
d tx
=
e
fX (x)dx
dt
−∞
Z ∞
=
xetx fX (x)dx
−∞
= E(XetX ).
Hence, evaluating the last expression at zero we obtain
d
MX (t)|0 = E(XetX )|0 = E(X).
dt
For n = 2 we will get
d2
MX (t)|0 = E(X 2 etX )|0 = E(X 2 ).
dt2
Analogously, it can be shown that for any n ∈ N we can write
dn
MX (t)|0 = E(X n etX )|0 = E(X n ).
dtn
Example 1.14. Find the mgf of X ∼ Exp(λ) and use results of Theorem 1.7 to
obtain the mean and variance of X.
By definition the mgf can be written as
t
MX (t) = E(e X) =
Z
∞
etx fX (x)dx.
−∞
For the exponential distribution we have
fX (x) = λe−λx I(0,∞) (x),
where λ ∈ R+ . Hence, integrating by the method of substitution, we get
Z ∞
Z ∞
λ
tx
−λx
MX (t) =
e λe dx = λ
e(t−λ)x dx =
provided that |t| < λ.
λ−t
0
0
22 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
Now, using Theorem 1.7 we obtain the first and the second moments, respectively:
E(X) = MX′ (0) =
(2)
E(X 2 ) = MX (0) =
Hence, the variance of X is
1
λ = ,
t=0
2
(λ − t)
λ
2λ 2
= 2.
t=0
3
(λ − t)
λ
var(X) = E(X 2 ) − [E(X)]2 =
1
1
2
−
=
.
λ2 λ2
λ2
Exercise 1.10. Calculate mgf for Binomial and Poisson distributions.
Moment generating functions provide methods for comparing distributions or
finding their limiting forms. The following two theorems give us the tools.
Theorem 1.8. Let FX (x) and FY (y) be two cdfs whose all moments exist. Then,
if the mgfs of X and Y exist and are equal, i.e., MX (t) = MY (t) for all t in some
neighborhood of zero, then FX (u) = FY (u) for all u.
Theorem 1.9. Suppose that X1 , X2 , . . . is a sequence of random variables, each
with mgf MXi (t). Furthermore, suppose that
lim MXi (t) = MX (t), for all t in a neighborhood of zero,
i→∞
and MX (t) is an mgf. Then, there is a cdf FX whose moments are determined by
MX (t) and, for all x where FX (x) is continuous, we have
lim FXi (x) = FX (x).
i→∞
This theorem means that the convergence of mgfs implies convergence of cdfs.
1.8. MOMENTS AND MOMENT GENERATING FUNCTIONS
23
Example 1.15. We know that the Binomial distribution can be approximated by a
Poisson distribution when p is small and n is large. Using the above theorem we
can confirm this fact.
The mgf of Xn ∼ Bin(n, p) and of Y ∼ Poisson(λ) are, respectively:
MXn (t) = [pet + (1 − p)]n ,
t
MY (t) = eλ(e −1) .
We will show that the mgf of X tends to the mgf of Y , where λ → np.
We will need the following useful result given in the lemma:
Lemma 1.1. Let a1 , a2 , . . . be a sequence of numbers converging to a, that is,
limn→∞ an = a. Then
an n
lim 1 +
= ea .
n→∞
n
Now, we can write
n
MXn (t) = pet + (1 − p)
n
1
t
= 1 + np(e − 1)
n
n
np(et − 1)
= 1+
n
t
−→ eλ(e −1) = MY (t).
n→∞
Hence, by Theorem 1.9 the Binomial distribution converges to a Poisson distribution.