Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1.7. EXPECTED VALUES 17 1.7 Expected Values Definition 1.11. The expected value of a function g(X) is defined by Z ∞ g(x)f (x)dx, for a continuous r.v., −∞ ∞ E(g(X)) = X g(xj )p(xj ) for a discrete r.v., j=0 and g is any function such that E |g(X)| < ∞. Two important special cases of g are: Mean E(X), also denoted by E X, when g(X) = X, Variance E(X − E X)2 , when g(X) = (X − E X)2 . The following relation is very useful while calculating the variance E(X − E X)2 = E X 2 − (E X)2 . (1.9) Example 1.12. Let X be a random variable such that 1 sin x, for x ∈ [0, π], f (x) = 2 0 otherwise. Then the expectation and variance are following • Expectation • Variance 1 EX = 2 Z 0 π x sin xdx = π . 2 var(X) = E X 2 − (E X)2 Z π 2 1 π 2 = x sin xdx − 2 0 2 2 π = . 4 18 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY Example 1.13. Geom(p) (the number of independent Bernoulli trials until first “success”): The support set and the pmf are, respectively, X = {1, 2, . . .} and P (X = x) = p(1 − p)x−1 = pq x−1 , x ∈ X , where p ∈ [0, 1] is the probability of success, q = 1 − p. 1−p 1 E(X) = , var(X) = . p p2 Here we will prove the formula for the expectation. By the definition of the expectation we have E(X) = ∞ X xpq x−1 x=1 =p ∞ X xq x−1 . x=1 Note that for x ≥ 1 the following equality holds d x (q ) = xq x−1 . dq Hence, ∞ X ∞ X d x E(X) =p xq =p (q ) dq x=1 x=1 ! ∞ d X x q . =p dq x=1 x−1 The latter is the sum of a geometric sequence and is equal to q/(1 − q). Therefore, d q 1 1 E(X) = p =p = . 2 dq 1 − q (1 − q) p Similar method can be used to show that the var(X) = q/p (second derivative with respect to q of q x can be applied for this). 2 The following useful properties of the expectation follow from properties of integration (summation). Theorem 1.5. Let X be a random variable and let a, b and c be constants. Then for any functions g(x) and h(x) whose expectations exist we have: (a) E[ag(X) + bh(X) + c] = a E[g(X)] + b E[h(X)] + c; 1.8. MOMENTS AND MOMENT GENERATING FUNCTIONS 19 (b) If g(x) ≥ h(x) for all x, then E(g(X)) ≥ E(h(X)); (c) If g(x) ≥ 0 for all x, then E(g(X)) ≥ 0; (d) If a ≥ g(x) ≥ b for all x, then a ≥ E(g(X)) ≥ b. Exercise 1.8. Show that E(X − b)2 is minimized by b = E(X). Variance of a random variable together with the mean are the most important parameters used in the theory of statistics. The following theorem is a result of the properties of the expectation function. Theorem 1.6. If X is a random variable with a finite variance, then for any constants a and b, var(aX + b) = a2 var X. Exercise 1.9. Prove Theorem 1.6. 1.8 Moments and Moment Generating Functions Definition 1.12. The nth moment (n ∈ N) of a random variable X is defined as µ′n = E X n The nth central moment of X is defined as µn = E(X − µ)n , where µ = µ′1 = E X. Note, that the second central moment is the variance of a random variable X, usually denoted by σ 2 . Moments give an indication of the shape of the distribution of a random variable. Skewness and kurtosis are measured by the following functions of the third and fourth central moment respectively: 20 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY the coefficient of skewness is given by γ1 = µ3 E(X − µ)3 = p 3; 3 σ µ2 the coefficient of kurtosis is given by γ2 = µ4 E(X − µ)4 − 3 = 2 − 3. 4 σ µ2 Moments can be calculated from the definition or by using so called moment generating function. Definition 1.13. The moment generating function (mgf) of a random variable X is a function MX : R → [0, ∞) given by MX (t) = E etX , provided that the expectation exists for t in some neighborhood of zero. More explicitly, the mgf of X can be written as MX (t) = Z ∞ etx fX (x)dx, if X is continuous, −∞ MX (t) = X etx P (X = x)dx, if X is discrete. x∈X The method to generate moments is given in the following theorem. Theorem 1.7. If X has mgf MX (t), then E(X n ) = dn MX (t)|t=0 , dtn That is, the n-th moment is equal to the n-th derivative of the mgf evaluated at t = 0. 1.8. MOMENTS AND MOMENT GENERATING FUNCTIONS 21 Proof. Assuming that we can differentiate under the integral sign we may write Z d d ∞ tx MX (t) = e fX (x)dx dt dt −∞ Z ∞ d tx = e fX (x)dx dt −∞ Z ∞ = xetx fX (x)dx −∞ = E(XetX ). Hence, evaluating the last expression at zero we obtain d MX (t)|0 = E(XetX )|0 = E(X). dt For n = 2 we will get d2 MX (t)|0 = E(X 2 etX )|0 = E(X 2 ). dt2 Analogously, it can be shown that for any n ∈ N we can write dn MX (t)|0 = E(X n etX )|0 = E(X n ). dtn Example 1.14. Find the mgf of X ∼ Exp(λ) and use results of Theorem 1.7 to obtain the mean and variance of X. By definition the mgf can be written as t MX (t) = E(e X) = Z ∞ etx fX (x)dx. −∞ For the exponential distribution we have fX (x) = λe−λx I(0,∞) (x), where λ ∈ R+ . Hence, integrating by the method of substitution, we get Z ∞ Z ∞ λ tx −λx MX (t) = e λe dx = λ e(t−λ)x dx = provided that |t| < λ. λ−t 0 0 22 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY Now, using Theorem 1.7 we obtain the first and the second moments, respectively: E(X) = MX′ (0) = (2) E(X 2 ) = MX (0) = Hence, the variance of X is 1 λ = , t=0 2 (λ − t) λ 2λ 2 = 2. t=0 3 (λ − t) λ var(X) = E(X 2 ) − [E(X)]2 = 1 1 2 − = . λ2 λ2 λ2 Exercise 1.10. Calculate mgf for Binomial and Poisson distributions. Moment generating functions provide methods for comparing distributions or finding their limiting forms. The following two theorems give us the tools. Theorem 1.8. Let FX (x) and FY (y) be two cdfs whose all moments exist. Then, if the mgfs of X and Y exist and are equal, i.e., MX (t) = MY (t) for all t in some neighborhood of zero, then FX (u) = FY (u) for all u. Theorem 1.9. Suppose that X1 , X2 , . . . is a sequence of random variables, each with mgf MXi (t). Furthermore, suppose that lim MXi (t) = MX (t), for all t in a neighborhood of zero, i→∞ and MX (t) is an mgf. Then, there is a cdf FX whose moments are determined by MX (t) and, for all x where FX (x) is continuous, we have lim FXi (x) = FX (x). i→∞ This theorem means that the convergence of mgfs implies convergence of cdfs. 1.9. FUNCTIONS OF RANDOM VARIABLES 23 Example 1.15. We know that the Binomial distribution can be approximated by a Poisson distribution when p is small and n is large. Using the above theorem we can confirm this fact. The mgf of Xn ∼ Bin(n, p) and of Y ∼ Poisson(λ) are, respectively: MXn (t) = [pet + (1 − p)]n , t MY (t) = eλ(e −1) . We will show that the mgf of X tends to the mgf of Y , where λ → np. We will need the following useful result given in the lemma: Lemma 1.1. Let a1 , a2 , . . . be a sequence of numbers converging to a, that is, limn→∞ an = a. Then an n lim 1 + = ea . n→∞ n Now, we can write n MXn (t) = pet + (1 − p) n 1 t = 1 + np(e − 1) n n np(et − 1) = 1+ n t −→ eλ(e −1) = MY (t). n→∞ Hence, by Theorem 1.9 the Binomial distribution converges to a Poisson distribution. 1.9 Functions of Random Variables If X is a random variable with cdf FX (x), then any function of X, say g(X) = Y is also a random variable. The question then is “what is the distribution of Y ?” The function y = g(x) is a mapping from the induced sample space of the random variable X, X , to a new sample space, Y, of the random variable Y , that is g(x) : X → Y. 24 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY The inverse mapping g −1 acts from Y to X and we can write g −1 (A) = {x ∈ X : g(x) ∈ A} where A ⊂ Y. Then, we have P (Y ∈ A) = P (g(X) ∈ A) = P {x ∈ X : g(x) ∈ A} = P X ∈ g −1 (A) . The following theorem relates the cumulative distribution functions of X and Y = g(X). Theorem 1.10. Let X have cdf FX (x), Y = g(X) and let domain and codomain of g(X), respectively, be X = {x : fX (x) > 0}, and Y = {y : y = g(x) f or some x ∈ X }. (a) If g is an increasing function on X then FY (y) = FX g −1(y) for y ∈ Y. (b) If g is a decreasing function on X , then FY (y) = 1 − FX g −1(y) for y ∈ Y. Proof. The cdf of Y = g(X) can be written as FY (y) = P (Y ≤ y) = P (g(X) ≤ y) = P {x ∈ X : g(x) ≤ y} Z = fX (x)dx. {x∈X :g(x)≤y} (a) If g is increasing, then {x ∈ X : g(x) ≤ y} = {x ∈ X : x ≤ g −1 (y)}. So, we can write FY (y) = = Z {x∈X :x≤g −1 (y)} Z −∞ fX (x)dx g −1 (y) = FX (b) Now, if g is decreasing, then fX (x)dx g −1(y) . {x ∈ X : g(x) ≤ y} = {x ∈ X : x ≥ g −1 (y)}. 1.9. FUNCTIONS OF RANDOM VARIABLES 25 So, we can write FY (y) = = Z fX (x)dx {x∈X :x≥g −1 (y)} Z ∞ fX (x)dx = 1 − FX g −1 (y) . g −1 (y) Example 1.16. Find the distribution of Y = g(X) = − log X, where X ∼ U(0, 1). The cdf of X is 0, for x < 0; x, for 0 ≤ x ≤ 1; FX (x) = 1, for x > 1. For x ∈ (0, 1) the function g(x) = − log x is defined on Y = (0, ∞) and it is decreasing. For y > 0, y = − log x implies that x = e−y , i.e., g −1 (y) = e−y and FY (y) = 1 − FX g −1 (y) = 1 − FX e−y = 1 − e−y . Hence we may write FY (y) = 1 − e−y I(0,∞) . This is exponential distribution function for λ = 1. For continuous rvs we have the following result. Theorem 1.11. Let X have pdf fX (x) and let Y = g(X), where g is a monotone function. Suppose that fX (x) is continuous on its support X = {x : fX (x) > 0} and that g −1 (y) has a continuous derivative on support Y = {y : y = g(x) f or some x ∈ X }. Then the pdf of Y is given by d fY (y) = fX g −1(y) | g −1 (y)|IY . dy 26 CHAPTER 1. ELEMENTS OF PROBABILITY DISTRIBUTION THEORY Proof. d FY (y) dy d F g −1 (y) , dy X = d 1 − FX g −1 (y) , dy d −1 fX g −1(y) dy g (y), d −1 = −1 −fX g (y) dy g (y), fY (y) = if g is increasing; if g is decreasing. if g is increasing; if g is decreasing. which gives the thesis of the theorem. Example 1.17. Suppose that Z ∼ N (0, 1). What is the distribution of Y = Z 2 ? For Y > 0, the cdf of Y = Z 2 is FY (y) = P (Y ≤ y) = P (Z 2 ≤ y) √ √ = P (− y ≤ Z ≤ y) √ √ = FZ ( y) − FZ (− y). The pdf can now be obtained by differentiation: d FY (y) dy d √ √ = FZ ( y) − FZ (− y) dy 1 1 √ √ = √ fZ ( y) + √ fZ (− y) 2 y 2 y fY (y) = Now, for the standard normal distribution we have 1 2 fZ (z) = √ e−z /2 , 2π −∞ < z < ∞. This gives, 1 1 (−√y)2 /2 1 (√y)2 /2 fY (y) = √ √ e +√ e 2 y 2π 2π 1 1 = √ √ e−y/2 , 0 < y < ∞. y 2π This is a chi squared random variable with one degree of freedom, χ21 . 1.9. FUNCTIONS OF RANDOM VARIABLES 27 Note that g(Z) = Z 2 is not a monotone function, but the range of Z, (−∞, ∞), can be partitioned so that it is monotone on its sub-sets. Exercise 1.11. Suppose that Z ∼ N (0, 1). Find the distribution of Y = µ + σZ for constant µ ∈ R and σ ∈ R+ . Exercise 1.12. Let X be a random variable with moment generating function MX . (i) Show that the moment generating function of Y = a + bX, where a and b are constants, is given by MY (t) = eta MX (tb). (ii) Derive the moment generating function of Y ∼ N (µ, σ 2 ). Hint: First find MZ (t) for a standard normal rv Z.