Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
36 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY 1.9.4 Hierarchical Models and Mixture Distributions When a random variable depends on another random variable, then we may try to model it in a hierarchical way. This leads to a mixture distribution. We explain this situation by the following example. Example 1.27. Denote by N a number of eggs laid by an insect. Suppose that each egg survives, independently of other eggs, with the same probability p. Let X denote the number of survivors. Then X|N = n is a binomial rv with n trials and probability of success p, i.e., X|(N = n) ∼ Bin(n, p). Assume that the number of eggs the insect lays follows a Poisson distribution with parameter λ, i.e., N ∼ Poisson(λ). That is, we can write the following hierarchical model, X|N ∼ Bin(N, p) N ∼ Poisson(λ). Then, the marginal distribution of the random variable X can be written as P (X = x) = ∞ X P (X = x, N = n) n=0 = = ∞ X P (X = x|N = n)P (N = n) n=0 ∞ X n=x −λ n e λ n x n−x p (1 − p) , x n! since X|N = n is Binomial, N is Poisson and the conditional probability is zero x if n < x. Multiplying this expression by λλx and then simplifying gives, ∞ X n! e−λ λn λx px (1 − p)n−x (n − x)!x! n! λx n=x n−x ∞ (λp)x e−λ X (1 − p)λ = x! (n − x)! n=x P (X = x) = (λp)x e−λ (1−p)λ = e x! (λp)x e−λp = . x! 1.9. TWO-DIMENSIONAL RANDOM VARIABLES 37 That is, X ∼ Poisson(λp). Then, we can easily get the expected number of surviving eggs as E(X) = λp. Note, that in the above example pλ = p E(N) = E(pN) = E E(X|N) since X|N ∼ Bin(N, p). The following theorem generalizes this fact. Theorem 1.15. If X1 and X2 are any two random variables, then E X1 = E E(X1 |X2 ) , provided that the expectations exist. Proof. We will show the result for a continuous bi-variate rv. Proof for the discrete case goes along the same lines, just replace integrals and pdf with sums and pmf. For any two continuous random variables X1 and X2 with a joint pdf fX (x1 , x2 ) we can write Z ∞Z ∞ E X1 = x1 fX (x1 , x2 )dx1 dx2 −∞ −∞ Z ∞ Z ∞ = x1 f (x1 |x2 )dx1 fX2 (x2 )dx2 −∞ −∞ Z ∞ = E(X1 |x2 )fX2 (x2 )dx2 = E E(X1 |X2 ) . −∞ In fact, we have the following, more general result E[g(X1 )] = E[E(g(X1)|X2 )]. Another very useful result relates the variance of a hierarchical distribution with variances of conditional distributions. It is given in the following theorem. Theorem 1.16. If X1 and X2 are any two random variables, then var X1 = E[var(X1 |X2 )] + var[E(X1 |X2 )], provided that the expectations exist. 38 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY Proof. We have var(X1 |X2 ) = E(X12 |X2 ) − [E(X1 |X2 )]2 and hence E[var(X1 |X2 )] = E[E(X12 |X2 )] − E{[E(X1 |X2 )]2 } = E(X12 ) − E{[E(X1 |X2 )]2 }. Also, var[E(X1 |X2 )] = E{[E(X1 |X2 )]2 }−{E[E(X1 |X2 )]}2 = E{[E(X1 |X2 )]2 }−[E(X1 )]2 . Therefore, E[var(X1 |X2 )] + var[E(X1 |X2 )] = E(X12 ) − [E(X1 )]2 = var(X1 ). Example 1.28. A new drug is tested on n patients in phase III of a clinical trial. Assume that a response can either be efficacy (success) or no efficacy (failure). It is expected that each patient’s chance to react positively to the drug is different and the probability of success follows a Beta(α, β) distribution. Then, the response for each patient could be modeled as a hierarchical Bernoulli rv, that is, Xi |Pi ∼ Bern(Pi ), i = 1, 2, . . . , n, Pi ∼ Beta(α, β). We are interested in the expected total number of efficacious responses and also in its variance. Let us denote by Y the total number of successes. Then Y = n X Xi . i=1 We will use Theorems 1.15 and 1.16 to calculate its expectation and variance. EY = n X i=1 E(Xi ) = n X E E(Xi |Pi ) = i=1 n X E(Pi ) = i=1 n X i=1 nα α = . α+β α+β Now, we will calculate the variance of Y . Random variables Xi are independent, hence ! n n X X var(Y ) = var Xi = var(Xi ). i=1 i=1 1.9. TWO-DIMENSIONAL RANDOM VARIABLES 39 By Theorem 1.16 we have var(Xi ) = var E(Xi |Pi ) + E var(Xi |Pi) , i = 1, 2, . . . , n. The first term in the above expression is var E(Xi |Pi ) = var(Pi ) = αβ , (α + β)2 (α + β + 1) since Pi ∼ Beta(α, β). The second term of the expression for the variance of Xi is E var(Xi |Pi ) = E Pi (1 − Pi ) Z Γ(α + β) 1 pi (1 − pi )pα−1 (1 − pi )β−1 dpi = i Γ(α)Γ(β) 0 Z Γ(α + β) 1 α = p (1 − pi )β dpi Γ(α)Γ(β) 0 i Γ(α + β) Γ(α + 1)Γ(β + 1) = Γ(α)Γ(β) Γ(α + β + 2) αβ = . (α + β)(α + β + 1) Here we used the fact that Γ(α + β + 2) pαi (1 − pi )β Γ(α + 1)Γ(β + 1) is a pdf of a Beta(α + 1, β + 1) random variable and we also used the property of Gamma function that Γ(z) = zΓ(z − 1). Hence, var(Xi ) = var E(Xi |Pi) + E var(Xi |Pi ) αβ αβ = + 2 (α + β) (α + β + 1) (α + β)(α + β + 1) αβ = . (α + β)2 Finally, as the variance of Xi does not depend on i, we get var(Y ) = nαβ . (α + β)2 40 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY Exercise 1.17. The number of spam messages Y sent to a server in a day has Poisson distribution with parameter λ = 21. Each spam message independently has probability p = 1/3 of not being detected by the spam filter. Let X denote the number of spam messages getting through the filter. Calculate the expected daily number of spam messages which get into the server. Also, calculate the variance of X. 1.9.5 Covariance and Correlation Covariance and correlation are two measures of the strength of a relationship between two r.vs. We will use the following notation. E X1 = µX1 E X2 = µX2 2 var(X1 ) = σX 1 2 var(X2 ) = σX 2 2 2 Also, we assume that σX and σX are finite positive values. A simplified notation 1 2 2 2 µ1 , µ2 , σ1 , σ2 will be used when it is clear which rvs we refer to. Definition 1.19. The covariance of X1 and X2 is defined by cov(X1 , X2 ) = E[(X1 − µX1 )(X2 − µX2 )]. (1.16) Definition 1.20. The correlation of X1 and X2 is defined by ρ(X1 ,X2 ) = corr(X1 , X2 ) = cov(X1 , X2 ) . σX1 σX2 (1.17) Some useful properties of the covariance and correlation are given in the following two theorems. Theorem 1.17. 1. For any random variables X1 and X2 , cov(X1 , X2 ) = E(X1 X2 ) − µX1 µX2 . 1.9. TWO-DIMENSIONAL RANDOM VARIABLES 41 2. If random variables X1 and X2 are independent then cov(X1 , X2 ) = 0. Then also ρ(X1 ,X2 ) = 0. 3. For any random variables X1 and X2 and any constants a and b, var(aX1 + bX2 ) = a2 var(X1 ) + b2 var(X2 ) + 2ab cov(X1 , X2 ). Exercise 1.18. Prove Theorem 1.17. Theorem 1.18. For any random variables X1 and X2 1. −1 ≤ ρ(X1 ,X2 ) ≤ 1, 2. |ρ(X1 ,X2 ) | = 1 iff there exist numbers a 6= 0 and b such that P (X2 = aX1 + b) = 1. If ρ(X1 ,X2 ) = 1 then a > 0, and if ρ(X1 ,X2 ) = −1 then a < 0. Exercise 1.19. Prove Theorem 1.18. Hint: Consider roots (and so the discriminant) of the quadratic function of t: g(t) = E[(X − µX )t + (Y − µY )]2 . 1.9.6 Bivariate Normal Distribution Here we use matrix notation. A bivariate rv is treated as a random vector X1 X= . X2 The expectation of a bivariate random vector is written as X1 µ1 = µ = EX = E X2 µ2 42 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY and its variance-covariance matrix is var(X1 ) cov(X1 , X2 ) σ12 ρσ1 σ2 V = = . cov(X2 , X1 ) var(X2 ) ρσ1 σ2 σ22 Then the joint pdf of a normal bi-variate rv X is given by 1 1 T −1 fX (x) = p exp − (x − µ) V (x − µ) , 2 2π det(V ) (1.18) where x = (x1 , x2 )T . The determinant of V is det V = det σ12 ρσ1 σ2 ρσ1 σ2 σ22 = (1 − ρ2 )σ12 σ22 . Hence, the inverse of V is 1 1 σ22 −ρσ1 σ2 σ1−2 −ρσ1−1 σ2−1 −1 V = . = −1 −1 −ρσ1 σ2 σ12 σ2−2 det V 1 − ρ2 −ρσ1 σ2 Then the exponent in formula (1.18) can be written as 1 − (x − µ)T V −1 (x − µ) = 2 1 σ1−2 −ρσ1−1 σ2−1 x1 − µ (x1 − µ1 , x2 − µ2 ) =− −ρσ1−1 σ2−1 σ2−2 x2 − µ 2(1 − ρ2 ) 2 1 (x1 − µ1 ) (x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2 − 2ρ + =− . 2(1 − ρ2 ) σ12 σ1 σ2 σ22 So, the joint pdf of the two-dimensional normal rv X is fX (x) = × exp 1 p 2πσ1 σ2 (1 − ρ2 ) −1 (x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) (x2 − µ2 )2 − 2ρ + . 2(1 − ρ2 ) σ12 σ1 σ2 σ22 Note that when ρ = 0 it simplifies to 1 1 (x1 − µ1 )2 (x2 − µ2 )2 fX (x) = exp − + , 2πσ1 σ2 2 σ12 σ22 which can be written as a product of the marginal distributions of X1 and X2 . Hence, if X = (X1 , X2 )T has a bivariate normal distribution and ρ = 0 then the variables X1 and X2 are independent. 1.9. TWO-DIMENSIONAL RANDOM VARIABLES Figure 1.2: Bivariate Normal pdf 43