Download 1.9.4 Hierarchical Models and Mixture Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
36 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
1.9.4 Hierarchical Models and Mixture Distributions
When a random variable depends on another random variable, then we may try to
model it in a hierarchical way. This leads to a mixture distribution. We explain
this situation by the following example.
Example 1.27. Denote by N a number of eggs laid by an insect. Suppose that
each egg survives, independently of other eggs, with the same probability p. Let
X denote the number of survivors. Then X|N = n is a binomial rv with n trials
and probability of success p, i.e.,
X|(N = n) ∼ Bin(n, p).
Assume that the number of eggs the insect lays follows a Poisson distribution with
parameter λ, i.e.,
N ∼ Poisson(λ).
That is, we can write the following hierarchical model,
X|N ∼ Bin(N, p)
N ∼ Poisson(λ).
Then, the marginal distribution of the random variable X can be written as
P (X = x) =
∞
X
P (X = x, N = n)
n=0
=
=
∞
X
P (X = x|N = n)P (N = n)
n=0
∞ X
n=x
−λ n e λ
n x
n−x
p (1 − p)
,
x
n!
since X|N = n is Binomial, N is Poisson and the conditional probability is zero
x
if n < x. Multiplying this expression by λλx and then simplifying gives,
∞
X
n!
e−λ λn λx
px (1 − p)n−x
(n − x)!x!
n! λx
n=x
n−x
∞
(λp)x e−λ X (1 − p)λ
=
x!
(n − x)!
n=x
P (X = x) =
(λp)x e−λ (1−p)λ
=
e
x!
(λp)x e−λp
=
.
x!
1.9. TWO-DIMENSIONAL RANDOM VARIABLES
37
That is,
X ∼ Poisson(λp).
Then, we can easily get the expected number of surviving eggs as E(X) = λp. Note, that in the above example
pλ = p E(N) = E(pN) = E E(X|N)
since X|N ∼ Bin(N, p). The following theorem generalizes this fact.
Theorem 1.15. If X1 and X2 are any two random variables, then
E X1 = E E(X1 |X2 ) ,
provided that the expectations exist.
Proof. We will show the result for a continuous bi-variate rv. Proof for the discrete
case goes along the same lines, just replace integrals and pdf with sums and pmf.
For any two continuous random variables X1 and X2 with a joint pdf fX (x1 , x2 )
we can write
Z ∞Z ∞
E X1 =
x1 fX (x1 , x2 )dx1 dx2
−∞ −∞
Z ∞ Z ∞
=
x1 f (x1 |x2 )dx1 fX2 (x2 )dx2
−∞
−∞
Z ∞
=
E(X1 |x2 )fX2 (x2 )dx2 = E E(X1 |X2 ) .
−∞
In fact, we have the following, more general result
E[g(X1 )] = E[E(g(X1)|X2 )].
Another very useful result relates the variance of a hierarchical distribution with
variances of conditional distributions. It is given in the following theorem.
Theorem 1.16. If X1 and X2 are any two random variables, then
var X1 = E[var(X1 |X2 )] + var[E(X1 |X2 )],
provided that the expectations exist.
38 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
Proof. We have
var(X1 |X2 ) = E(X12 |X2 ) − [E(X1 |X2 )]2
and hence
E[var(X1 |X2 )] = E[E(X12 |X2 )] − E{[E(X1 |X2 )]2 } = E(X12 ) − E{[E(X1 |X2 )]2 }.
Also,
var[E(X1 |X2 )] = E{[E(X1 |X2 )]2 }−{E[E(X1 |X2 )]}2 = E{[E(X1 |X2 )]2 }−[E(X1 )]2 .
Therefore,
E[var(X1 |X2 )] + var[E(X1 |X2 )] = E(X12 ) − [E(X1 )]2 = var(X1 ).
Example 1.28. A new drug is tested on n patients in phase III of a clinical trial.
Assume that a response can either be efficacy (success) or no efficacy (failure). It
is expected that each patient’s chance to react positively to the drug is different and
the probability of success follows a Beta(α, β) distribution. Then, the response
for each patient could be modeled as a hierarchical Bernoulli rv, that is,
Xi |Pi ∼ Bern(Pi ), i = 1, 2, . . . , n,
Pi ∼ Beta(α, β).
We are interested in the expected total number of efficacious responses and also
in its variance.
Let us denote by Y the total number of successes. Then
Y =
n
X
Xi .
i=1
We will use Theorems 1.15 and 1.16 to calculate its expectation and variance.
EY =
n
X
i=1
E(Xi ) =
n
X
E E(Xi |Pi ) =
i=1
n
X
E(Pi ) =
i=1
n
X
i=1
nα
α
=
.
α+β
α+β
Now, we will calculate the variance of Y . Random variables Xi are independent,
hence
!
n
n
X
X
var(Y ) = var
Xi =
var(Xi ).
i=1
i=1
1.9. TWO-DIMENSIONAL RANDOM VARIABLES
39
By Theorem 1.16 we have
var(Xi ) = var E(Xi |Pi ) + E var(Xi |Pi) , i = 1, 2, . . . , n.
The first term in the above expression is
var E(Xi |Pi ) = var(Pi ) =
αβ
,
(α + β)2 (α + β + 1)
since Pi ∼ Beta(α, β). The second term of the expression for the variance of Xi
is
E var(Xi |Pi ) = E Pi (1 − Pi )
Z
Γ(α + β) 1
pi (1 − pi )pα−1
(1 − pi )β−1 dpi
=
i
Γ(α)Γ(β) 0
Z
Γ(α + β) 1 α
=
p (1 − pi )β dpi
Γ(α)Γ(β) 0 i
Γ(α + β) Γ(α + 1)Γ(β + 1)
=
Γ(α)Γ(β) Γ(α + β + 2)
αβ
=
.
(α + β)(α + β + 1)
Here we used the fact that
Γ(α + β + 2)
pαi (1 − pi )β
Γ(α + 1)Γ(β + 1)
is a pdf of a Beta(α + 1, β + 1) random variable and we also used the property of
Gamma function that Γ(z) = zΓ(z − 1).
Hence,
var(Xi ) = var E(Xi |Pi) + E var(Xi |Pi )
αβ
αβ
=
+
2
(α + β) (α + β + 1) (α + β)(α + β + 1)
αβ
=
.
(α + β)2
Finally, as the variance of Xi does not depend on i, we get
var(Y ) =
nαβ
.
(α + β)2
40 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
Exercise 1.17. The number of spam messages Y sent to a server in a day has
Poisson distribution with parameter λ = 21. Each spam message independently
has probability p = 1/3 of not being detected by the spam filter. Let X denote the
number of spam messages getting through the filter. Calculate the expected daily
number of spam messages which get into the server. Also, calculate the variance
of X.
1.9.5 Covariance and Correlation
Covariance and correlation are two measures of the strength of a relationship between two r.vs.
We will use the following notation.
E X1 = µX1
E X2 = µX2
2
var(X1 ) = σX
1
2
var(X2 ) = σX
2
2
2
Also, we assume that σX
and σX
are finite positive values. A simplified notation
1
2
2
2
µ1 , µ2 , σ1 , σ2 will be used when it is clear which rvs we refer to.
Definition 1.19. The covariance of X1 and X2 is defined by
cov(X1 , X2 ) = E[(X1 − µX1 )(X2 − µX2 )].
(1.16)
Definition 1.20. The correlation of X1 and X2 is defined by
ρ(X1 ,X2 ) = corr(X1 , X2 ) =
cov(X1 , X2 )
.
σX1 σX2
(1.17)
Some useful properties of the covariance and correlation are given in the following
two theorems.
Theorem 1.17.
1. For any random variables X1 and X2 ,
cov(X1 , X2 ) = E(X1 X2 ) − µX1 µX2 .
1.9. TWO-DIMENSIONAL RANDOM VARIABLES
41
2. If random variables X1 and X2 are independent then
cov(X1 , X2 ) = 0.
Then also ρ(X1 ,X2 ) = 0.
3. For any random variables X1 and X2 and any constants a and b,
var(aX1 + bX2 ) = a2 var(X1 ) + b2 var(X2 ) + 2ab cov(X1 , X2 ).
Exercise 1.18. Prove Theorem 1.17.
Theorem 1.18. For any random variables X1 and X2
1. −1 ≤ ρ(X1 ,X2 ) ≤ 1,
2. |ρ(X1 ,X2 ) | = 1 iff there exist numbers a 6= 0 and b such that
P (X2 = aX1 + b) = 1.
If ρ(X1 ,X2 ) = 1 then a > 0, and if ρ(X1 ,X2 ) = −1 then a < 0.
Exercise 1.19. Prove Theorem 1.18. Hint: Consider roots (and so the discriminant) of the quadratic function of t:
g(t) = E[(X − µX )t + (Y − µY )]2 .
1.9.6 Bivariate Normal Distribution
Here we use matrix notation. A bivariate rv is treated as a random vector
X1
X=
.
X2
The expectation of a bivariate random vector is written as
X1
µ1
=
µ = EX = E
X2
µ2
42 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY
and its variance-covariance matrix is
var(X1 )
cov(X1 , X2 )
σ12
ρσ1 σ2
V =
=
.
cov(X2 , X1 )
var(X2 )
ρσ1 σ2
σ22
Then the joint pdf of a normal bi-variate rv X is given by
1
1
T −1
fX (x) = p
exp − (x − µ) V (x − µ) ,
2
2π det(V )
(1.18)
where x = (x1 , x2 )T .
The determinant of V is
det V = det
σ12
ρσ1 σ2
ρσ1 σ2
σ22
= (1 − ρ2 )σ12 σ22 .
Hence, the inverse of V is
1
1
σ22
−ρσ1 σ2
σ1−2
−ρσ1−1 σ2−1
−1
V =
.
=
−1 −1
−ρσ1 σ2
σ12
σ2−2
det V
1 − ρ2 −ρσ1 σ2
Then the exponent in formula (1.18) can be written as
1
− (x − µ)T V −1 (x − µ) =
2
1
σ1−2
−ρσ1−1 σ2−1
x1 − µ
(x1 − µ1 , x2 − µ2 )
=−
−ρσ1−1 σ2−1
σ2−2
x2 − µ
2(1 − ρ2 )
2
1
(x1 − µ1 )
(x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
− 2ρ
+
=−
.
2(1 − ρ2 )
σ12
σ1 σ2
σ22
So, the joint pdf of the two-dimensional normal rv X is
fX (x) =
× exp
1
p
2πσ1 σ2 (1 − ρ2 )
−1
(x1 − µ1 )2
(x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2
− 2ρ
+
.
2(1 − ρ2 )
σ12
σ1 σ2
σ22
Note that when ρ = 0 it simplifies to
1
1 (x1 − µ1 )2 (x2 − µ2 )2
fX (x) =
exp −
+
,
2πσ1 σ2
2
σ12
σ22
which can be written as a product of the marginal distributions of X1 and X2 .
Hence, if X = (X1 , X2 )T has a bivariate normal distribution and ρ = 0 then the
variables X1 and X2 are independent.
1.9. TWO-DIMENSIONAL RANDOM VARIABLES
Figure 1.2: Bivariate Normal pdf
43
Related documents