Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
7.4 Expected Value and Variance Recall: A random variable is a function from the sample space of an experiment to the set of real numbers. That is, a random variable assigns a real number to each possible outcome. Expected Values People who buy lottery tickets regularly often justify the practice by saying that, even though they know that on average they will lose money, they are hoping for one significant gain, after which they believe they will quit playing. Unfortunately, when people who have lost money on a string of losing lottery tickets win some or all of it back, they generally decide to keep trying their luck instead of quitting. The technical way to say that on average a person will lose money on the lottery is to say that the expected value of playing the lottery is negative. Definition 1. The expected value, also called the expectation or mean, of the random variable X on the sample space S is equal to X E(X) = p(x)X(x). x∈S The deviation of X at s ∈ S is X(s) − E(X), the difference between the value of X and the mean of X. Pn Remark 1. When the sample space S has n elements, S = {x1 , . . . , xn }, then E(X) = i=1 p(xi )X(xi ). Remark 2. When there are infinitely many elements of the sample space, the expectation is defined only when the infinite series in the definition is absolutely convergent. In particular, the expectation of a random variable on an infinite sample space is finite if it exists. Example 1. What is the expected number of times a H comes up when a fair coin is flipped twice? Solution. Sample space S = {HH, HT, T H, T T }. Random variable X: X(HH) = 2 X(HT ) = X(T H) = 1 X(T T ) = 0 Because the coin is fair and the flips are independent, the probability of each outcome is 41 . Consequently, 1 [X(HH) + X(HT ) + X(T H) + X(T T )] 4 1 = [2 + 1 + 1 + 0] 4 =1 E(X) = Consequently, the expected number of heads that come up when a fair coin is flipped twice is 1. Notation. If X is a random variable on a sample space S, let p(X = r) be the probability that X = r, that is, X p(X = r) = p(s). s∈S,X(s)=r Example 2. Suppose that 500,000 people pay $5 each to play a lottery game with the following prizes: a grand prize of $1,000,000, 10 second prizes of $1,000 each, 1,000 third prizes of $500 each, and 10,000 fourth prizes of $10 each. What is the expected value of a ticket? Solution. Each of the 500,000 lottery tickets has the same chance as any other of containing a winning 1 lottery number, and so p(xk ) = 500000 for all k = 1, 2, 3, . . . , 500000. Let x1 , x2 , x3 , . . . , x500000 be the net gain for an individual ticket, where x1 = 999995 (the net gain for the grand prize ticket, which is one million dollars minus the $5 cost of the winning ticket), x2 = x3 = · · · = x11 = 995 (the net gain for each of the 10 second prize tickets), x12 = x13 = · · · = x1011 = 495 (the net gain for each of the 1,000 third prize tickets), and x1012 = x1013 = · · · = x11011 = 5 (the net gain for each of the 10,000 fourth prize tickets). Since the remaining 488,989 tickets just lose $5, x11012 = x11013 = · · · = x500000 = −5. The expected value of a ticket is therefore 500000 X xi p(xi ) = i=1 500000 X i=1 = xi · 1 500000 500000 X 1 xi 500000 i=1 1 (999995 + 10 · 995 + 1000 · 495 + 10000 · 5 + (−5) · 488989) 500000 1 = (999995 + 9950 + 495000 + 50000 − 2444945) 500000 = −1.78 = In other words, a person who continues to play this lottery for a very long time will probably win some money occasionally but on average will lose $1.78 per ticket. Lemma. If X is a random variable, then E(X) = X p(X = r)r. r∈X(s) Example 3. What is the expected value of the sum of the numbers that appear when a pair of fair dice is rolled? Solution. Let X be the random variable equal to the sum of the numbers that appear when a pair of dice is rolled. The range of X is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. We have 1 36 2 = 11) = 36 3 = 10) = 36 4 = 9) = 36 5 = 8) = 36 p(X = 2) = p(X = 12) = p(X = 3) = p(X p(X = 4) = p(X p(X = 5) = p(X p(X = 6) = p(X p(X = 7) = 6 36 Thus 2 3 4 5 6 5 4 3 2 1 1 +3· +4· +5· +6· +7· +8· +9· + 10 · + 11 · + 12 · 36 36 36 36 36 36 36 36 36 36 36 2 · 1 + 3 · 2 + · · · + 12 · 1 = 36 =7 E(X) = 2 · Linearity of Expectations Recall that if f1 , f2 are functions from A to R, we may add and multiply them: (f1 + f2 )(x) = f1 (x) + f2 (x) (f1 f2 )(x) = f1 (x) · f2 (x) So if X1 and X2 are random variables with sample space S, we may add or multiply them to obtain new random variables. Theorem 1 (Expectation is Linear). If Xi , i = 1, 2, . . . , n are random variables on S, and if a, b ∈ R, then 1. E(X1 + X2 + · · · Xn ) = E(X1 ) + E(X2 ) + · · · + E(Xn ) 2. E(aX + b) = aE(X) + b Example 4. Suppose that n Bernoulli trials are performed, where p is the probability of success on each trial. What is the expected number of successes? Solution. Let Xi be the random variable with Xi ((t1 , t2 , . . . , tn )) = 1 if ti is a success and Xi ((t1 , t2 , . . . , tn )) = 0 if ti is a failure. The expected value of Xi is E(Xi ) = 1 · p + 0 · (1 − p) = p for i = 1, 2, . . . , n. Let X = X1 + X2 + · · · + Xn , so that X counts the number of successes when these n Bernoulli trials are performed. Thus the sum of n random variables shows that E(X) = E(X1 ) + E(X2 ) + · · · + E(Xn ) = np. The Geometric Distribution We now turn our attention to a random variable with infinitely many possible outcomes. Definition 2. A random variable X has a geometric distribution with parameter p if p(X = k) = (1−p)k−1 p for k = 1, 2, 3, . . ., where p ∈ R with 0 ≤ p ≤ 1. Lemma. ∞ X jxj−1 = j=1 1 (x − 1)2 for |x| < 1. Example 5. Suppose that the probability that a coin comes up tails is p. This coin is flipped repeatedly until it comes up tails. What is the expected number of flips until this coin comes up tails? Solution. We first note that the sample space consists of all sequences that begin with any number of heads, denoted by H, followed by a tail, denoted by T . Therefore, the sample space is the set S = {T, HT, HHT, HHHT, HHHHT, . . .}. Note that this is an infinite sample space. We can determine the probability of an element of the sample space by noting that the coin flips are independent and that the probability of a head is 1 − p. Therefore, p(T ) = p, p(HT ) = (1 − p)p, p(HHT ) = (1 − p)2 p, and in general the probability that the coin is flipped n times before a tail comes up, that is, that n − 1 heads come up followed by a tail, is (1 − p)n−1 p. Now let X be the random variable equal to the number of flips in an element in the sample space. That is, X(T ) = 1, X(HT ) = 2, X(HHT ) = 3, and so on. Note that p(X = j) = (1 − p)j−1 p. The expected number of flips until the coin comes up tails equals E(X). Thus, using the above lemma: E(X) = ∞ X j=1 j · p(X = j) = ∞ X j=1 j(1 − p)j−1 p = p ∞ X j=1 j(1 − p)j−1 = p · 1 1 = . p2 p Theorem 2. If the random variable X has the geometric distribution with parameter p, then E(X) = 1/p. Independent Random Variables Definition 3. The random variables X and Y on a sample space S are independent if p(X = r1 and Y = r2 ) = p(X = r1 ) · p(Y = r2 ), or in words, if the probability that X = r1 and Y = r2 equals the product of the probabilities that X = r1 and Y = r2 , for all real numbers r1 and r2 . Theorem 3. If X and Y are independent random variables on a sample space S, then E(XY ) = E(X)E(Y ). Proof. To prove this formula, we use the key observation that the event XY = r is the disjoint union of the events X = r1 and Y = r2 over all r1 ∈ X(S) and r2 ∈ Y (S) with r = r1 r2 . We have X E(XY ) = r · p(XY = r) r∈XY (s) X = r1 r2 · p(X = r1 and Y = r2 ) r1 ∈X(s),r2 ∈Y (s) = X X r1 r2 · p(X = r1 and Y = r2 ) r1 ∈X(s) r2 ∈Y (s) = X X r1 r2 · p(X = r1 ) · p(Y = r2 ) r1 ∈X(s) r2 ∈Y (s) = X (r1 · p(X = r1 ) · r1 ∈X(s) = X X r2 · p(Y = r2 )) r2 ∈Y (s) r1 · p(X = r1 ) · E(Y ) r1 ∈X(s) = E(Y ) · X r1 · p(X = r1 ) r1 ∈X(s) = E(Y )E(X) We complete the proof by noting that E(Y )E(X) = E(X)E(Y ), which is a consequence of the commutative law for multiplication. Variance The expected value of a random variable tells us its average value. What if we want to know how far from the average the values of a random variable are distributed? For example, if X and Y are the random variables on the set S = {1, 2, 3, 4, 5, 6}, with X(s) = 0 for all s ∈ S and Y (s) = −1 if s ∈ {1, 2, 3} and Y (s) = 1 if s ∈ {4, 5, 6}, then the expected values of X and Y are both zero. However, the random variable X never varies from 0, while the random variable Y always differs from 0 by 1. The variance of a random variable helps us characterize how widely a random variable is distributed. In particular, it provides a measure of how widely X is distributed about its expected value. Definition 4. Let X be a random variable on a sample space S. The variance of X, denoted by V (X), is X V (X) = (X(s) − E(X))2 p(s). s∈S That is, V (X) is the weighted p average of the square of the deviation of X. The standard deviation of X, denoted σ(X), is defined to be V (X). Theorem 4. If X is a random variable on a sample space S, then V (X) = E(X 2 ) − E(X)2 . Proof. Note that V (X) = X (X(s) − E(X))2 p(s) s∈S = X X(s)2 p(s) − 2E(X) s∈S X X(s)p(s) + E(X)2 s∈S X p(s) s∈S = E(X 2 ) − 2E(X)E(X) + E(X)2 = E(X 2 ) − E(X)2 P We have used the fact that s∈S p(s) = 1 in the next-to-last step. Using this theorem, we can prove ... Theorem 5. If X and Y are two independent random variables on a sample space S, and a ∈ R, then V (X + Y ) = V (X) + V (Y ) and V (aX) = a2 V (X). Example 6. What is the variance of the random variable X with X(t) = 1 if a Bernoulli trial is a success and X(t) = 0 if it is a failure, where p is the probability of success and q is the probability of failure? Solution. Because X takes only the values 0 and 1, it follows that X 2 (t) = X(t). Hence, V (X) = E(X 2 ) − E(X)2 = p − p2 = p(1 − p) = pq. Example 7. What is the variance of the number of successes when n independent Bernoulli trials are performed, where, on each trial, p is the probability of success and q is the probability of failure? Solution. Let Xi be the random variable with Xi ((t1 , t2 , . . . , tn )) = 1 if trial ti is a success and Xi ((t1 , t2 , . . . , tn )) = 0 if trial ti is a failure. Let X = X1 + X2 + · · · + Xn . Then X counts the number of successes in the n trials. From a theorem it follows that V (X) = V (X1 ) + V (X2 ) + · · · + V (Xn ). Using the previous example, we have V (Xi ) = pq for i = 1, 2, . . . , n. It follows that V (X) = npq. Chebyshev’s Inequality How likely is it that a random variable takes a value far from its expected value? Theorem 6 (CHEBYSHEV’S INEQUALITY). Let X be a random variable on a sample space S with probability function p. If r is a positive real number, then p(|X(s) − E(X)| ≥ r) ≤ V (X) . r2