Download Notes on Expected Value and Variance

October 25, 2002 Expected Value (again), Variance, and Standard Deviation Notes for Math 295 We introduce the notions of variance and standard deviation of a random variable, and summarize the algebraic rules for manipulating the operations E( ), V( ), and D( ). These rules are also covered in the text, Sections 3.10-3.11 (expectation) and 3.12-3.13 (variance). 1. Expected Value (summary) Here is a summary of the formulas we have for expected value. If X is a discrete random variable, then E(X)   kp X (k) (1) k where pX(k) denotes the probability function for X, and the sum is taken over all possible values of k. Equation (1) means exactly the same thing as E(X)   k P  X  k  . (2) k We have seen that—if the sample space itself is discrete—then equation (1) gives the same result as E(X)   X(s)P(s) (3) s where this time, P(s) represents the probability function for the sample space, and the sum is taken over all outcomes s. If a second random variable Y is defined as a function of X, say Y = h(X), then we can get the expected value of Y from the formula E(Y)   h(k)p X (k) . (4) k This formula is never really necessary, because we can always just construct the probability function pY(k) for Y and use equation (1) directly. But sometimes equation (4) is easier, especially if we have already constructed a table of pX(k). If X is a continuous random variable—that is, if X has a density function fX(x)— then   E(X)  x  f X (x) dx . (5) x  If Y is a function of X, say Y = h(X), then the expected value of Y is given by  E(Y)   h(x)  f X (x) dx . (6) x  Equation (6) is more important than equation (4), because sometimes it is hard to get E(Y) by any other method. We’ll see an example later on. We also have a formula for E(X) in terms of FX, the cumulative distribution function (cdf) of X. This is rarely used, but it is worth writing down because every random variable has a cdf, and so this formula applies even if X is not discrete and does not have a density function: E(X)   0 x  FX (x)dx    x 0 1  FX (x)  dx (7) whenever both integrals exist. 2. Variance If X is any random variable, then the variance of X, denoted V(X), is defined by V(X)  E  X2    E(X)  2 (8) provided both expected values (E(X) and E(X2) ) exist. Sometimes V(X) is written Var(X). Equation (8) is worth memorizing. It may be easier in the form of a slogan: Variance = Mean of the Square minus Square of the Mean. (Recall “mean” is a synonym for expected value.) The standard deviation of X, denoted D(X), is defined by D(X)  V(X) . (9) 2 Obviously, if you have the variance, you can get the standard deviation in one step, and vice versa (since V(X) = D(X)2 ). People use the symbol  (lower-case Greek sigma) for standard deviations. This usage is so common that some people refer to the standard deviation of X as “the sigma of X.” If there are other random variables in the problem, they might write  X for D(X). Also, people use  2 or  X 2 for the variance. (Note: The notation V(X) is almost universal, but not quite. Among strangers you might want to write Var(X) to be absolutely clear. The notation D(X) isn’t standard. Writing StDev(X) is clear but ugly. Most people just write X.) Example 1. Suppose that a discrete random variable X has the probability function k P(X=k) 0 1/8 1 3/8 2 3/8 3 1/8 What are E(X), V(X), and D(X) ? (By the way: This is a binomial distribution with n=3 and p=0.5. We saw it before when we tossed three coins, and X was the number of heads.) Solution: First, E(X)   kp X (k) k = (0) (1/8) + (1) (3/8) + (2) (3/8) + (3) (1/8) = 1.5. (That’s no surprise. If we toss three coins, then on average, we get 1.5 heads.) In order to get V(X), we need to calculate E(X2). This is a good time to use equation (4). (In this case we can write Y=X2; then Y = h(X) where h(x) = x2. ) E(X 2 )   k 2 p X (k) k = (0)2 (1/8) + (1)2 (3/8) + (2)2 (3/8) + (3)2 (1/8) = (0) (1/8) + (1) (3/8) + (4) (3/8) + (9) (1/8) = 3. So the variance is given by 3 V(X) = E(X2) – E(X)2 = 3 – (1.5)2 = 0.75. (10) The standard deviation is just the square root of the variance: 0.75  12 3 . D(X) = 3. A small point of notation Let’s agree that the notation E(X)2 always means anything else.  E(X)  2 , and never means That allows us to save a set of parentheses in the definition of variance: V(X) = E(X2) – E(X)2. 4. More examples Example 2. Let X be the result of the spinner experiment. Then X has a uniform density function on the interval from 0 to 1: 1 if 0  x  1 f (x)   0 otherwise. What are E(X), V(X), and D(X)? Solution:  E(X)   x  f (x) dx x  1   x 1 dx 0  x2  1    |0  2  1  . 2 This should be no surprise, either. When we pick a number at random between 0 and 1, the average value is ½. Continuing: We need E(X2). 4  E(X 2 )   x 2  f (x) dx x  1   x 2 1 dx 0  x3     |10  3 1  . 3 So: V(X) = E(X2) – E(X)2 = (1/3) – (1/2)2 = 1/12. And, D(X) = 1/12  16 3 . 5. Another way to look at the variance There is another formula for the variance that is in common use. It isn’t usually as good for computation, but it is good for insight. If X is any random variable, and of E(X) is its expected value, then the variance of X is given by   V(X)  E  X  E(X)  . 2 (11) Some people write  (lower-case Greek mu) for the mean—that is,  is a synonym for E(X)—so we could also write this formula as   V(X)  E  X    . 2 (12) We will prove below that these equations really give the same result as equation (8). For now, let’s concentrate on what equation (11) means. The variance is a measure of how dispersed (or how scattered) the values of X are. A low variance means that the values don’t really vary a lot, and that they are usually close to their average value. A high variance means that they are often very far from their average value. With that in mind, let’s “deconstruct” equation (11). 5 The expression X – E(X) is just the difference between X and its mean. Notice that E(X) is just a number, but that X is a random variable. So, the quantity X – E(X) is itself a random variable. We call it the “deviation of X from the mean”, or just the “deviation.” (For example: Let X be the result of the spinner experiment. Suppose we carry out the experiment three times, and X happens to have the values 0.766, 0.249, and 0.545. Then in these three cases, the random variable X – E(X) takes on the values +0.266, -0.751, and +0.045. That is, X-E(X) is just another random variable, defined as a function of X.) If we want to measure the dispersion of X, then the size of the deviations are a good start. But that’s a random variable, and we want a simple number. So what about the average of the deviations? Bad idea! The average of the deviations is always exactly zero. So, that’s a useless concept. The trouble is that the deviations can be positive or negative, and they cancel out when we take the average. So, what about the average of the absolute values of the deviations? That’s a better idea, but mathematicians have had better success by taking the squares of the deviations. That makes them all positive, so the average really means something. Now: (X-E(X))2 is the “square deviation.” So, E ( (X – E(X))2 ) is the “mean square deviation.” Variance = Mean Square Deviation. It may be hard to remember equation (11), but it’s not hard to remember the slogan in the box. So the variance is a summary measure of the dispersion of the random variable X. It turns out to be a very useful measure from a theoretical point of view. But intuitively, it is somewhat confusing, because the units are strange. For example, if X is measured in inches, then V(X) is in square inches. If X is in hours, then V(X) is in square hours. If you aren’t used to dealing in square hours, then you may find V(X) hard to use. That’s why we have standard deviations. The standard deviation is just the square root of the variance. In order to be cute, let’s just say “root” when we mean “square root”: Standard Deviation = Root Mean Square Deviation. 6 Have you heard the term “root-mean-square” before? Or the abbreviation, RMS? It is used in several places, mostly in engineering applications, but also in measuring the output of stereo speakers. It’s a kind of average. So, the standard deviation is a kind of “average” or “typical” deviation of X from its own expected value. Statisticians often use “standard” as a synonym for “root mean square.” So, now you know where the “standard deviation” got its name. It’s not just a name, it’s a formula! 6. Variances are never negative. From equation (11), we can see that the variance of X is the expected value of a square. Since squares are never negative, neither is this expected value. V(X) is never negative. Also, standard deviations are never negative. 7. Variances, Standard Deviations, and Repeated Experiments Have you seen the terms “variance” and “standard deviation” in connection with a list of numbers? For example: The standard deviation of the column of figures 4 5 6 7 8 9 10 is 2.0. Is that a familiar concept? If not, skip this section. Suppose X is a random variable whose value depends on some experiment. Suppose you repeat the experiment a large number of times, and write down the value of X each time. You now have a list of numbers. The mean of this list of numbers will be about E(X). The variance of this list of numbers will be about V(X). The standard deviation will be about D(X). The more times you repeat the experiment, the closer this approximation will be. That’s the connection between the terms “variance” and “standard deviation” in probability, and the same terms in statistics. 7 8. E, V, and D are operators. The notations E( ), V( ), and D( ) look like functions. In fact, they really are functions, but their domains aren’t sets of numbers. Instead, the domains of E( ), V( ), and D( ) are sets of random variables. That means that the argument of E( ) or V( ) or D( ) is always a random variable. And the value of E( ), V( ), or D( ) is always a number (if it exists). In other words: If we write E(X)=t or V(X)=w then X is a random variable, and t and w are numbers. A function whose domain is a set of functions is usually called an “operator.” That’s actually just a fancy synonym for function. Since random variables are functions, that means E( ), V( ), and D( ) are operators. A fine point of notation: We hardly ever write “E(x)”. We use upper-case letters for random variables, and lower-case letters for placeholders and other stuff. So, people who write “E(x)” usually mean “E(X)”. 9. An easy example: Constant random variables Sometimes a random variable isn’t really random. For example, suppose we have some sample space S, and we define a random variable W by W(s) = 20 for every outcome s in S. Then W satisfies the definition of a random variable, even though it is a fairly silly one. What is the pmf of W? It’s simple: pW(20) = 1.0, and pW(anything else) = 0. What is the expected value of W? We have: E(W) = 20, obviously. What is the variance of W? Equation (11) is useful here. The deviation W – E(W) is always zero, so the square deviation is always zero, so the mean square deviation is zero. So, the variance V(W) = 0. Also, D(W) = 0. Constant random variables aren’t much use in themselves, but they come up once in a while in computations, so it’s worth remembering: E(constant) = the constant; and V(constant) = D(constant) = 0. 8 For example, E ( 20 ) = 20. (13) But isn’t that inconsistent with the previous section? Doesn’t the argument of E( ) have to be a random variable? We have to be careful how we interpret equation (13). The last “20” is just a number. But the first “20” should be understood as a random variable which happens to have the constant value 20. 10. Expectation of a sum We have seen in an earlier set of notes that E( X + Y ) = E( X ) + E( Y ) (14) for any two random variables X and Y. Example: Each day Joe eats a random number of apples and a random number of oranges. On average, Joe eats 2.2 apples per day. On average, Joe eats 1.1 oranges per day. How many fruits does Joe eat per day, on average? Solution: E(apples + oranges) = E(apples) + E(oranges) = 2.2 + 1.1 = 3.3. The example is in small type because it is so obvious. We have been using equation (14) all our lives. Now we are just expressing it formally. We didn’t prove equation (14) before, but now we have the machinery to prove it, at least in the case of a discrete sample space. We saw an example in homework #5. Here is the proof, based on equation (3) above. Suppose Z = X + Y. That means that Z(s) = X(s) + Y(s) for every outcome  . Therefore, using equation (3): E(Z)   Z(s)P(s) s    X(s)  Y(s)  P(s) s   X(s)P(s)   Y(s)P(s) s s  E(X)  E(Y). In two words: “Expectations add.” 9 11. Expectation is a Linear Operator We also have this identity: E( aX ) = a E(X) for every random variable X and every number a. Example: Joe sells a random number of books every day, for $10 each. On average, he sells 6 books per day. How much money does he receive per day, on average? Solution: Let X = number of books sold, Y = number of dollars received. Then Y = 10X. So, E(Y) = E( 10 X ) = 10 E(X) = 10 * 6 = 60. This is also pretty easy to prove from equation (3). Combining the results from this section and last section, we have E(X + Y) = E(X) + E(Y) and E(aX) = a E(X). Any operator with these properties is called a “linear operator.” So, we can summarize these results by saying that E( ) is a linear operator. (You don’t have to remember this definition, and we won’t use it again.) By the way, V( ) and D( ) are not linear operators. We can also combine these results to get formulas like this:  a n Xn )  a1E(X1 )  a 2E  X2   E(a1X1  a 2 X2   a n E  Xn  . 12. Our two definitions of variance are the same Let’s prove that equation (11) gives the same result as equation (8). We calculate starting with equation (11):  V(X)  E  X  E(X)  2   E  X 2  2XE(X)  E(X) 2  (I just expanded the square)  E(X 2 )  E(2XE(X))  E  E(X) 2  (Because “expectations add”) 10  E(X 2 )  2  E(X)  E(X)  E  E(X) 2  (Because 2E(X) is just a number, and E(aX)=aE(X) )  E(X2 )  2  E(X)  E(X)  E(X) 2 (Because E(X)2 is just a number too, and E(constant)=constant )  E(X2 )  E(X)2 . 13. How do constants affect variance and standard deviation? Theorem: If X is any random variable and a is any constant, then V( aX ) = a2 V(X). Proof: V(aX) = E((aX)2) – E(aX)2 = E(a2X2) – (aE(X))2 = a2 E(X2) – a2E(X)2 = a2 ( E(X2) – E(X)2 ) = a2 V(X). Theorem: If X is any random variable and a is any constant, then D(aX) = |a| D(X). Proof: In the previous theorem, just take square roots. // Both of these results are tricky. It’s important to remember the square in the first theorem, and the absolute value in the second theorem. Theorem: If X is any random variable and c is any constant, then V( X + c ) = V( X ) and D( X + c ) = D( X ). Proof. Omitted. 11 The last theorem says: “Adding a constant doesn’t affect variance.” Example. Suppose X is a random variable, with E(X) = 110 and V(X) = 4. Suppose that Y is related to X by Y = –7 X + 23. What are E(Y), V(Y), and D(Y) ? Solution. First, E( Y ) = E (–7 X + 23 ) = E(-7X) + E(23) (because expectations add) = –7 E(X) + E(23) (because E(cX)=cE(X)) = –7 E(X) + 23 (because E(constant)=constant) = -7 * 110 + 23 = 793. Next, V( Y ) = V (–7 X + 23) = V ( -7X ) (because adding c doesn’t affect V) = (-7)2 V(X) (by the last section) = 49 * 4 = 196. Finally, D( Y ) = V(Y)  196 = 14, OR D( Y ) = D ( -7X ) = |-7| D(X) (by the last section) = 7 * 2 = 14. 12 14. Independent Random Variables When X and Y are independent, we have: E(XY) = E(X) E(Y). (We saw in homework #7a that this isn’t always true when X and Y aren’t independent.) We aren’t ready to prove this result yet, but we can use it, as follows: Theorem. If X and Y are independent, then V( X + Y ) = V(X) + V(Y). (This isn’t usually true if X and Y are not independent.) Proof: If X and Y are independent, then V( X + Y ) = E( (X+Y)2 ) – E ( X+Y )2 = E ( X2 + 2XY + Y2 ) – E ( X+Y )2 (expanding the square) = E(X2) + 2E(XY) + E(Y2) – E ( X+Y )2 = E(X2) + 2E(XY) + E(Y2) – (E(X)+E(Y))2 (expectations add) = E(X2) + 2E(XY) + E(Y2) – E(X)2 – 2E(X)E(Y) – E(Y)2 = ( E(X2) – E(X)2 ) + ( E(Y2) – E(Y)2 ) + 2 ( E(XY) – E(X)E(Y) ) (Note the last term is zero when X and Y are independent.) = V(X) + V(Y). 13

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes on Expected Value and Variance