Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
18:10 10/16/2000 TOPIC. Expectations. This section deals with the notion of the expected value of a random variable. We start with some definitions and examples, then give some ways of thinking about expected values, and then present some properties of expectation along with examples. Definitions. Let Ω be a sample space, P a probability measure on Ω, X a real-valued random variable on Ω with distribution function F . This situation is illustrated below: Sample space Ω '$ ← Real line R X Sample ................................................................................. ............ . → ω • ............... ............. .......... • point ← X(ω) &% Probability P We are going to define E(X), the expected value of X(ω) when the sample point ω is chosen at Rrandom from Ω according toR P . An alternative notation for E(X) is ω∈Ω X(ω) P (dω), or simply X dP . Consider first the case where X is nonnegative: X(ω) ≥ 0 for all ω ∈ Ω. If X is discrete, taking finitely or countably many values x1 , x2 , . . . with corresponding probabilities f (x1 ), f (x2 ), . . . (here f denotes the probability mass function of X), one takes X xk f (xk ). (1) E(X) := k (2) 0 Formulas (1) and (2) are each special cases of the general definition Z ∞ Z n ³ ´ E(X) := x dF (x) := lim x dF (x) (3) 0 n→∞ 7–1 0 Example 1. (a) Suppose Z is a standard normal random variable, and consider ½ Z, if Z ≥ 0, + Z := max(Z, 0) = 0, if Z < 0. Z + is a nonnegative random variable. Its distribution has a lump of √ −z 2/2 mass of size 1/2 at 0 and density φ(z) = e / 2π over the interval (0, ∞). Hence P [ω ∈ Ω : X(ω) ≤ x] = F (x) If X is continuous with density f , one takes Z ∞ E(X) := x f (x) dx. where the integral is taken to be a Riemann-Stieltjes integral. For this course you don’t need to know much about Riemann-Stieltjes integration; you can just think of the RHS of (3) as a generic way of writing the RHSs of (1) and (2). We don’t require the sum and integrals in (1)–(3) to converge to a finite value; E(X) = ∞ is allowed, and happens (see Example 1 (b) below). Z + E(Z ) = 0 × P [Z 1 1 =0× + √ 2 2π + ∞ zφ(z) dz = 0] + Z 0 ∞ ze 0 −z 2/2 1 dz = √ 2π Z 0 ∞ 1 e−y dy = √ . (4) 2π In particular E(Z + ) is finite. (b) Suppose C is a standard Cauchy random variable, with density f (x) = 1/(π(1 + x2 )) on R. Then C + = max(C, 0) is a nonnegative random variable with expectation Z ∞ 1 x E(C + ) = 0 × + dx 2 π(1 + x2 ) 0 Z ∞ ¯∞ 1 1 1 ¯ = dy = log(1 + y)¯ = ∞. (5) 2π 0 1 + y 2π 0 Note that E(C + ) is infinite. • 7–2 Now consider the case where X can take both positive and negative values. Define random variables X + and X − on Ω by setting ½ X(ω), if X(ω) ≥ 0, + (6+ ) X (ω) = max(X(ω), 0) = 0, otherwise, ½ −X(ω), if X(ω) ≤ 0, − X (ω) = max(−X(ω), 0) = (6− ) 0, otherwise, for each ω ∈ Ω, as illustrated below: X − ....... X ................ ...................... ............... ................ . . . . . . . . ......... ......... ...... . ........... .... .... .... .... .... .... .... .... .... .... ...................... ....... ....... ....... ....... ...... . . . . .... ...... ....... ....... ....... . . . . . . . ......... .......... ............ ....... ....... ....... ....... X+ (b) Suppose C is a standard Cauchy random variable. By Example 1 (b) and symmetry, we have E(C + ) = ∞ = E(C − ). Thus C does not have an expectation, finite or otherwise. X + = X = C + =⇒ E(X + ) = ∞ Ω X − and − = 0 =⇒ E(X ) = 0. Consequently X has an expectation, namely E(X) = E(X + ) − E(X − ) = ∞ − 0 = ∞. In this case E(X) exists, but is infinite; this is an example of a random variable that is quasi-integrable, but not integrable. and |X(ω)| = X + (ω) + X − (ω) for all ω ∈ Ω; these identities are written more concisely as X = X + − X − and |X| = X + + X − . One says that X has an expectation, or that E(X) exists, or that X is quasi-integrable if at least one of E(X + ) and E(X − ) is finite; in that case the expected value, or mean, of X is taken to be E(X) := E(X + ) − E(X − ) E(Z) = E(Z + ) − E(Z − ) = c − c = 0. (c) As in (b), suppose C is standard Cauchy. Put X = C + . Then X + is called the positive part of X, and X − the negative part. Note that X + and X − are nonnegative random variables and that X(ω) = X + (ω) − X − (ω) Example 2. (a) Suppose Z is a standard normal random variable. √ By Example 1 (a), we have E(Z + ) = c := 1/ 2π < ∞. Since Z − and Z + have the same distribution, we also have E(Z − ) = c. Since both E(Z + ) and E(Z − ) are finite, Z is integrable; its (finite) expectation is (7) (d) Suppose X is a continuous random variable with density f on R R∞ such that the integral −∞ xf (x) dx is absolutely convergent. Since E(X + ) + E(X − ) Z ∞ Z = xf (x) dx + 0 Z 0 ∞ (−x)f (x) dx = −∞ |x|f (x) dx < ∞, −∞ X is integrable with finite expectation E(X) = E(X + ) − E(X − ) Z ∞ Z 0 Z = xf (x) dx − (−x)f (x) dx = ∞ with the convention that ∞ − x = ∞ and x − ∞ = −∞ for any nonnegative real number x. One says that X is integrable, or that X has a finite expectation, if both E(X + ) and E(X − ) are finite, or, equivalently, if E(|X|) is finite. There are random variables X for which E(X + ) = ∞ = E(X − ); for such X’s, E(X) is not defined. When it applies, (8) can be used to compute E(X) directly, without first computing E(X + ) and E(X − ). • 7–3 7–4 0 −∞ xf (x) dx. (8) −∞ 18:10 10/16/2000 The strong law of large numbers (SLLN). Why is E(X) important? One of the main reasons is: Theorem 1 (The SLLN). Suppose X1 , X2 , . . . is an infinite sequence of independent random variables, each distributed like a random variable X. Put Sn = X1 + X2 + · · · + Xn E(X) as a measure of location. The expected value of X is often used as a measure of the location of the distribution of X. To understand why, consider the case where X takes finitely many values x1 , x2 , . . . , xk with corresponding probabilities p1 , p2 , . . . , pk . We can represent the distribution of X by a physical system in which a masses of weight pi are placed above the points xi for i = 1, . . . , k on a dimensionless rod, as illustrated below: for each n ∈ N. If X has an expectation E(X) = µ (possibly ±∞) then P [Sn /n converges to µ as n → ∞] = 1. ← mass pi (91 ) On the other hand, if X does not have an expectation, then P [lim supn |Sn /n| = ∞] = 1; x1 (92 ) if in addition X is symmetric, then P [lim inf n Sn /n = −∞] = 1 = P [lim supn Sn /n = ∞]. (93 ) In other words, if the “population mean” µ exists, then the sample means X̄n = Sn /n will converge to it almost surely as the sample size n tends to infinity; but if µ does not exist, the sample means X̄n will behave very badly as n → ∞, as illustrated in Figure 1 below. The proof of the SLLN is not easy; we won’t go into it here (but see Exercises 10 and 11 for some special cases). Figure 1: A graph of Sn /n versus n for a random sample of size 12000 from the standard Cauchy distribution. .. Xn Sn n−1 Sn−1 ..... ....... n = n n−1 + n . . ........ .......................... ........................... 0 .. .. . ............ −1 . ..... . ...... ................................................... .. .............. ................ −2 .... ........................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... ... −3 ...... .......... −4 X̄n 0 2000 4000 6000 n 7–5 8000 10000 12000 ↑ c x3 x4 xk ← dimensionless rod Consider the center of gravity of this mass system, i.e., the point c at which the rod would balance if it were pivoted there. According to physics, c must satisfy the so-called balancing equation k X pi (xi − c) = 0. i=1 Since k X i=1 1 xi pi xi = E(X) and k X pi = 1 i=1 the solution to the balancing equation is Pk pi xi c = Pi=1 = E(X). k i=1 pi (10) In general, for any integrable random variable X, the center of gravity of the distribution of X is c = E(X). There is an important corollary: moving a little bit of probability mass a long way from its initial position has a big effect on the expected value of X. 7–6 The expected value of a transformation of X. Suppose Y = t(X) is a transformation t of X. The expected value of Y can be expressed directly in terms of the distribution of X. To see how, consider the case where X is continuous with density f on (−∞, ∞) and the transformation t is regular from (−∞, ∞) to (0, ∞). Since Y has density ¡ ¢ fY (y) = fX u(y) |u0 (y)| −1 where u = t is the inverse of t (see 3.19), Y has expectation Z Z ¡ ¢ ¡ ¢ yfY (y) dy = t u(y) fX u(y) |u0 (y)| dy y>0 Z by (12) t(x)fX (x) dx 0 The integral here is the area |A| of the region A = {(x, u) : 0 < x < Q(u)} indicated below: 1 y>0 = x=u(y) Expressing E(X) in terms of Q and F . Let X be a random variable with quantile function Q and distribution function F . E(X) can be expressed directly in terms of Q, and also directly in terms of F . To see how, let U be a standard uniform random variable, with density fU (u) = I(0,1) (u). Since X and Q(U ) have the same distribution by the IPT Theorem (Theorem 1.5), so do X + and Q+ (U ), whence Z 1 ¡ + ¢ + E(X ) = E Q (U ) = Q+ (u) du. ............................................................ ..................... .................... ............................... .......................... .................. ............... . ....... ...... . . . . . . ........ . . . . . .... . . . ................ ........ ........... ......................... ......................................... . . . . . . . . .. ............. ............................ ................................................. ................................... ................................... A The area of this infinitesimal ←− strip equals Q+ (u) du. u + du u (11) −∞<x<∞ A = {(x, u) : 0 < x < Q(u)} B = {(x, u) : Q(u) < x < 0} F (x) by (3.18). The point is that you can find E(Y ) from (11) without having to first work out the distribution of Y . This important fact is true in general. Theorem 2. Let X be an arbitrary (not necessarily continuous) random variable) and let Y = t(X) for an arbitrary (not necessarily regular) transformation t. Put Z Z + τ+ = t (x) dFX (x) and τ− = t− (x) dFX (x). 0 B x 0 Q(u) By slicing A into infinitesimal vertical strips instead of horizontal ones, we can also compute its area as Z ∞ Z ∞ Z ∞ ¡ ¢ |A| = 1 − F (x) dx = P [X > x] dx = P [X ≥ x] dx. 0 0 Similarly ¡ ¢ E(X − ) = E Q− (U ) = Z 1 Z Q− (u) du = |B| = 0 Then Y has an expectation if and only if at least one of τ+ and τ− is finite, in which case Z E(Y ) = τ+ − τ− = t(x) dFX (x). (12) With an appropriate definition of the integral, this formula is valid even if X is a (multi-dimensional) random vector. These results are proved in Stat 381. 7–7 0 0 F (x) dx, −∞ where B = {(x, u) : Q(u) < x < 0}. This proves: Theorem 3. Let X be a random variable with df F and quantile function Q, and let A and B be defined as above. X is quasi-integrable if and only if at least one of |A| and |B| is finite, and then E(X) = |A| − |B| Z 1 Z ∞ £ ¡ ¢¤ = Q(u) du = −F (−x) + 1 − F (x) dx. (13) 0 0 7–8 18:10 10/16/2000 If X is quasi-integrable, then E(X) = ¡ ¢¤ R ∞£ −F (−x) + 1 − F (x) dx. 0 Example 3. (a) For any random variable Z ∞ E(|X|) = P [ |X| ≥ x ] dx. (14) 0 Consequently X is integrable if and only if this integral is convergent. For a standard Cauchy random variable C, P [ |C| ≥ x ] ∼ 2/(πx) as x → ∞, so the integral diverges; this is another way to see that C is not integrable. Note that if X is integer valued, then X∞ E(|X|) = P [ |X| ≥ n ]. (15) n=1 (b) Let X be a standard exponential random variable, with density f (x) = e−x I(0,∞) (x) on R. Note that X takes on only nonnegative values. We have Z x F (x) = e−ξ dξ = 1 − e−x 0 Q(u) = F −1 (u) = − log(1 − u) 0 Z 1 Z 1 Q(u) du = 0 E+ : If two random variables X and Y each have finite expectations, then so does X + Y , and E(X + Y ) = E(X) + E(Y ). (16) More generally, if E(X) and E(Y ) exist (possibly as ±∞) and if the sum E(X) + E(Y ) is defined (i.e., is not of the form +∞ − ∞ or −∞ + ∞), then E(X + Y ) exists and is given by (16). Ec : If X has an expectation and c is a finite real number, then cX has an expectation, given by E(cX) = cE(X). (17) E(X) ≤ E(Y ) for 0 < u < 1. By calculus Z ∞ Z ∞ x f (x) dx = xe−x dx = Γ(2) = 1, 0 0 Z ∞ Z ∞ ¡ ¢ 1 − F (x) dx = e−x dx = Γ(1) = 1, 0 Theorem 4. Expectation has the following properties. E≤ : Suppose X and Y are two random variables such that X ≤ Y (i.e., X(ω) ≤ Y (ω) for all sample points ω). Then for x ≥ 0, and Z Properties of E. We state without proof some basic properties of the expectation operator E. These properties are proved (perhaps under some further integrability assumptions) in elementary texts in the discrete and continuous case; they are proved in general in Stat 381. − log(1 − u) du = 0 1 − log(v) dv = 1. 0 (18) provided both expectations exist. If in addition the expectations are equal and finite, then P [ X = Y ] = 1. EI : Suppose X1 , X2 , . . . , Xn are independent random variables. Then the product X1 X2 · · · Xn has an expectation provided: (a) all the Xk ’s are nonnegative, or (b) all the Xk ’s are integrable. In both of these cases, E(X1 X2 · · · Xn ) = E(X1 )E(X2 ) · · · E(Xn ). (19) Of course, all three integrals had to be the same, since they each give the value of E(X). • In case (a) the product on the right-hand side is to be evaluated using the rule ∞ × c = c × ∞ equals ∞ if 0 < c ≤ ∞, and equals 0 if c = 0. 7–9 7 – 10 Example 4. (a) Let X ∼ Gamma(r, λ), with density λr xr−1 e−λx/Γ(r) for x > 0. Then Y = λX ∼ Gamma(r, 1), so E(X) = E(Y )/λ. Moreover Z ∞ 1 E(Y ) = y y r−1 e−y dy Γ(r) 0 Z ∞ ´ Γ(r + 1) 1 Γ(r + 1) ³ y (r+1)−1 e−y dy = =r = Γ(r) Γ(r + 1) 0 Γ(r) (see Exercise 5 for the last step). Hence E(X) = r/λ. (20) (b) Suppose again that X ∼ Gamma(r, λ) and Y = λX. Then Z ∞ Z ∞ 1 1 E(1/Y ) = fY (y) dy = y (r−1)−1 e−y dy y Γ(r) 0 0 ½ Γ(r − 1)/Γ(r) = 1/(r − 1), if r > 1, = ∞, if r ≤ 1, and ½ E(1/X) = E(λ/Y ) = λ/(r − 1), if r > 1, ∞, otherwise. (21) (d) Suppose X ∼ χ2n = Gamma(r, λ) for r = n/2 and λ = 1/2. Then 7 – 11 (23) if n > 2, otherwise. (e) Suppose X ∼ UF (m, n). Thus X = SS 1 /SS 2 where SS 1 ∼ χ2m and SS 2 ∼ χ2n , and SS 1 is independent of SS 2 . Since each SS i is nonnegative, (19) gives ½ m/(n − 2), if n > 2, E(X) = E(SS 1 )E(1/SS 2 ) = (25) ∞, otherwise. (f) Suppose X ∼ F (m, n). Then X = (SS 1 /m)/(SS 2 /n) = (n/m)Y where Y ∼ UF (m, n). Hence ½ n/(n − 2), if n > 2, n (26) E(X) = E(Y ) = m ∞, otherwise. Example 5. Consider the following game. I am going to pick a number x at random from the F distribution with m = 3 and n = 4 degrees of freedom. Before I make my draw, you have guess what my x will be; call your guess c. Then I’ll make the draw, and you’ll pay me (c) Similar calculations (do them!) show that for X ∼ Beta(α, β), with density xα−1 (1 − x)β−1 /B(α, β) for 0 < x < 1, one has α E(X) = . (22) α+β r n/2 E(X) = = =n λ 1/2 ½ λ/(r − 1) = 1/(n − 2), E(1/X) = ∞, (19) E(Y1 Y2 ) = E(Y1 )E(Y2 ) if Y1 ≥ 0 and Y2 ≥ 0 are independent. (24) (x − c)2 − w cents (or dollars!), where w is my wager, say 10 units. For example, if you guess my x exactly, I’ll pay you 10 units. But if your guess if off by 2, I’ll only pay you 10 − 4 = 6 units, whereas if your guess if off by 4, you’ll pay me 16 − 10 = 6 units. Any takers? Classroom demonstration here Some questions: (a) What is the best choice for your guess c? (b) Is it fair for me to wager w = 10 units? These questions will be answered in the next lecture. • We close this section with a couple of simple but useful inequalities. 7 – 12 18:10 10/16/2000 Theorem 5 (Markov’s inequality). Let X be a nonnegative random variable. One has P[X ≥ c] ≤ E(X) c (27) for each number c > 0. Moreover for any given c, equality holds in (27) if and only if P [ X = 0 or X = c ] = 1. Proof Let Ω be the sample space on which X is defined. Let V be the random variable on Ω defined by . ..... X ...... ½ ..... ...... . . . . . c, if X(ω) ≥ c, V (ω) ...• X(ω) → ...... ...... . V (ω) = ....... • ....... c . . . V . . . 0, otherwise. ........ ........ 0 Since ... ......... ........... V (ω) ≤ X(ω) ω Ω (28) for all ω, (18) implies that E(V ) ≤ E(X); (29) (27) follows since E(V ) = 0 × P [ V = 0 ] + c × P [ V = c ] = c × P [ X ≥ c ]. If equality holds in (27), then it also holds in (29). By the addendum to (18), equality must hold in (28) for almost all sample points ω, and hence X can take only the values 0 and c, with probability one. Conversely, if X takes just those values, equality does hold in (27). Theorem 6 (Chebychev’s inequality). Let X be an integrable random variable with mean µ. One has ¡ ¢ E (X − µ)2 P [ |X − µ| ≥ c ] ≤ (30) c2 for each number c > 0. Moreover for any given c, equality holds in (30) if and only if X takes the values µ − c, µ, and µ + c with probabilities (1 − p)/2, p, and (1 − p)/2 respectively, for some p ∈ [0, 1]. 7 – 13 Chebychev’s inequality follows easily from Markov’s inequality; the proof is left to you as Exercise 7. Exercise 1. Let Z be a standard normal random variable. Show that for positive integers k (Q k k j=1 (2j − 1), if k = 2j is even, E(Z ) = (31) ¦ 0, if k is odd. Exercise 2. Let Y and Z be independent standard normal random variables. For positive integers n, put √ Xn := Y (1 + Z/ n ). (32) For k = 1, 2, . . . find a simple computable expression for E(Xnk ) and show that E(Xnk ) → E(Y k ) as n → ∞. Evaluate E(Xnk ) for k = 1, . . . , 4. ¦ Exercise 3. Let X be an integrable real random variable with distribution function F , quantile function Q, and mean µ = E(X). Let Z ∞ ¡ ¢ δ := E |X − µ| = |x − µ| F (dx) (33) −∞ be the so-called mean (absolute) deviation (MAD) of X about its mean. Show that Z 1 δ= |Q(u) − µ| du 0 Z µ Z ∞ ¡ ¢ =2 F (x) dx = 2 1 − F (x) dx. (34) ¦ −∞ µ Exercise 4.P Show that a random variable X is quasi-integrable if ∞ ¦ and only if n=1 P [ |X| ≥ n ] < ∞. 7 – 14 Exercise 5. Show that the Gamma function Γ(r) := satisfies the recursion formula R∞ 0 xr−1 e−x dx Γ(r + 1) = rΓ(r) (35) for r > 0. [Hint: integrate by parts.] ¦ Exercise 6. Find E(X) for random variables X having the following discrete distributions. Distribution Poisson(µ) P[X = k ] ³n´ pk (1 − p)n−k , k = 0, . . . , n k e−µ µk/k! , k = 0, 1, . . . Geometric(p) q k−1 p, k = 0, 1, . . . Binomial(n, p) Exercise 7. Prove Theorem 6. ¦ Exercise 8 (A weak law of large numbers). Let X1 , X2 , . . . be independent random variables, each distributed like a random variable X with E(X) = 0 and σ 2 := E(X 2 ) < ∞. For each n ∈ N set Sn = X1 + · · · + Xn . (a) Show that E(Sn ) = 0 and E(Sn2 ) = nσ 2 . (b) Show that for each ² > 0, limn→∞ P [ |Sn /n| ≥ ² ] = 0. (36) ¦ Exercise 9. Let X1 , . . . , Xn be independent random variables, each distributed like a random variable X with E(X) = 0 and E(X 4 ) < ∞. (a) Show that X 2 and X 3 are integrable. (b) Put Sn = X1 + · · · + Xn . Show that ¡ ¢2 E(Sn4 ) = nE(X 4 ) + 3n(n − 1) E(X 2 ) . (37) The following information is needed for next two exercises. Let P be a probability measure on a sample space Ω. Let A1 , A2 , A3 , . . . be an infinite sequence of events (i.e., subsets of Ω), and let lim supn An be the set of sample points ω ∈ Ω which belong to An for infinitely many n’s. According to the first Borel-Cantelli lemma, P∞ P [ lim supn An ] = 0 provided (38) n=1 P [An ] < ∞. According to the second Borel-Cantelli lemma, · P∞ ¸ P [An ] = ∞ and n=1 . (39) P [ lim supn An ] = 1 provided the An ’s are independent Exercise 10. Let X1 , X2 , . . . be an infinite sequence of independent standard Cauchy random variables and let c be a positive number. Use the second Borel-Cantelli lemma to show that for almost every sample point ω, Xn (ω) ≥ nc for infinitely many n’s, and also Xn (ω) ≤ −nc for infinitely many n’s. Use this fact to explain the behavior of Sn /n exhibited in Figure 1. ¦ Exercise 11 (A SLLN ). Let X1 , X2 , . . . be independent random variables, each distributed like a random variable X with E(X) = 0 and E(X 4 ) < ∞. Put Sn = X1 + · · · + Xn for each n. Use Markov’s inequality for Sn4 , Exercise 9, and the first Borel-Cantelli lemma to show that P [ |Sn |/n ≥ 1/n1/8 for infinitely many n ] = 0 and conclude that the set of sample points ω such that Sn (ω)/n → E(X) as n → ∞ has probability 1. ¦ Pn Pn Pn Pn [Hint: Write Sn4 as ( i=1 Xi )( j=1 Xj )( k=1 Xk )( `=1 X` ) and expand the sums.] ¦ 7 – 15 (40) 7 – 16