Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistics – Mean and Variance I. The Mean In this lesson we use f(x), which is just the PDF, to compute two important parameters – the mean and the variance of the distribution. The mean is just a weighted average of the x's using the PDF given by f(x) as the weighting function. The variance is a weighted average of the square of the dispersion of the x's about the mean using f(x) as the weighting function. These are theoretical objects. We do not use data to compute them. All that we need to compute the mean and variance is the PDF of the random variable X. Do not confuse this with estimating the mean and variance using data. The mean and variance (sometimes called the population mean and population variance) are computed respectively using the PDF for X as follows The mean of the random variable X is the expected value of the random variable X. Note however that it is not the most likely value of X that will occur. In fact, the expected value of X may never occur or may be impossible to occur (e.g. 2.5 children per household on average). The expected value of X, which we write as E[X], tells us the general vicinity where most of the probability under the curve f(x) is located. This is why we call it a measure of central tendency. There are two other measures of central tendency – the median and the mode. The median is where 50% of the area under f(x) is located to the left and 50% is located to the right. It is the balancing point of the support of the density f(x). The mode is where f(x) is highest. This can be considered the most likely value that X will take on. When the PDF is symmetric, the mean, mode, and median will be the same Suppose that we look at a discrete f(x) (called a probability mass function) which is as follows: f(0) = 1/2, f(1) = 1/3, and f(2) = 1/6 with f(x) = 0 elsewhere. Note that f(x) ≥ 0 and sums to 1. Therefore, we can use it to discuss probability since it is a PDF. The highest point of f(x) is at x = 0, so the mode = 0. The median = 0 since 50% of the probability is at or below 0 and 50% is at or above 0. The mean is 2/3 and is computed as E[X] = 0f(0) + 1f(1) + 2f(2) = 2/3 To summarize, for the PDF that we have assumed, the mode = 0, the median = 0, and the mean = 2/3. Now suppose that f(0) = 1/2, f(1) = 1/3, and f(3) = 1/6 with f(x) = 0 elsewhere. In this case, mode = 0, median = 0, and mean = 5/6. Note how that the mean has increased but the median has remained the same. This is because the mean treats each x differently (i.e., they are weighted) depending on f(x), while the median treats each x the same and concentrates only on the probability (or area) associated with f(x). Let’s suppose that we have a different PDF. Let’s suppose that the PDF is continuous and is given by where x ≥ 0. The mode is simply x = 0, while the median can be shown to be equal to x . The mean is equal to x = . Note how that the median in this case is larger than the mean. Taking yet another case, suppose that the PDF of random variable X is f(x) with x where 0 ≤ x ≤ 2. Clearly the mode is equal to x = 2, the median x shown to be x . The median again exceeds the mean. , and the mean is easily The mean is often determined by a parameter in the PDF. Here are some examples of PDFs that have parameters or constants that are related to the mean PDF (1) parameter λ (2) (3) (4) b and a mean 1/ λ II. The Variance Next, we turn to another type of constant or parameter that is derived from the PDF. It is called the variance. The (population or theoretical) variance is defined as the weighted average of the dispersion of X about its (population or theoretical) mean, E[X]. We defined the variance as E[(X-E[X])2], and often it is written as σ2. It is a single, positive number completely dependent on the PDF of the random variable X. In other words, different PDFs give different variances. Many economists consider the variance to be a measure of how risky the random variable X is. A larger variance implies a riskier random variable. If the variance of X is zero, then X is no longer random. It should be clear from the diagram above that PDF 1 has a smaller variance than PDF 2. A smaller variance usually means that the PDF is taller and narrower. A large variance makes the PDF shorter and fatter. The variance shows how that the probability, or the area under f(x), is spread about the mean. If the variance becomes smaller and smaller, all of the probability closes in on E[X] and at the limit, when the variance is zero, the random variable ceases to be random and becomes deterministic equaling E[X]. The idea that a variance can collapse to zero and that a random variable can become equal to its mean is important in econometrics. It is absolutely central to the idea of consistency which is an important property of estimators. In the diagram above you should judge which of the two PDFs has the larger variance. Now compute the variance of each and see if your intuition was correct. The variance of PDF 2 is approximately 0.875 while the variance of PDF 1 is approximately 0.055. Sometimes we are interested in the square root of the variance. The square root of the variance is called the standard deviation of the random variable X. Consider PDF 1 in the diagram above. The mean is equal to 2/3 = 0.667. The standard deviation is approximately equal to 0.237. One standard deviation above and below the mean covers much of the area under the PDF f(x) III. Basic Laws Regarding the Mean and Variance of a Random Variable Here are a set of laws dealing with the mean and variance. Law #1. Law #2. Law #3. Law #4. Law #5 Law #6. Law #7. E[non-random variable] = non-random variable E[constant] = constant E[αX] = αE[X] E[X + Y] = E[X] + E[Y] E[XY] = E[X]E[Y] if X and Y are unrelated randomly (i.e. independent) If g(x) > 0 for all x, then E[g(X)] > 0 . E[X2] > E[X]E[X] for any random variable X. Law #8. E[E[X]] = E[X] Law #9 Var(αX) = α2var(X) for any α. Law #10. Var(X + Y) = var(X) + var(Y) if X and Y are not related randomly (i.e. independent) Law #11. Var(non-random variable) = 0 including constants Law #12. Var(X) = E[X2] – E[X]E[X] > 0 Law #13. Var(X) = E[X2] if E[X] = 0 Here are a set of problems to study: #1. Suppose a. b. c. d. e. f. g. h. i. j. on support 0 ≤ x ≤ 1. Show f(x) is a PDF. Graph f(x) Find E[X] Find E[X2] Find Var(X) and show Var(X) = E[X2] – E2[X] Find the standard deviation of X Show E[E[X]] = E[X] Show Var(3X) = 9Var(X) Show Var(-3X) = 9Var(X) Show E[ 1/(1+X) ] ≠ 1/( 1+E[X] ) #2. Suppose that a. b. c. d. e. for support 0 ≤ x ≤ 2. Show f(x) is a PDF Graph f(x) Find E[X] Find E[X2] Find Var(X) #3. Suppose that X and Y are unrelated (i.e., independent). If Var(X) = 2 and Var(Y) = 3, then what does Var(4X -2Y) equal? #4. Suppose that E[X] = 1 and E[Y] = 2, Is it true that E[X/Y] = ½? If not, give and example. #5. Suppose Xk is random an E[Xk] = k, for k = 1,2,3,… a. Find E[Xk + k]. Also, find E[kXk]. b. Finally, find Var[Xk + k] and Var[ kXk]. #6. Explain carefully why that Var(X) = Var(X + constant) #7. Is it true that Var(X – Y) = Var(X) – Var(Y) when X and Y are statistically independent? Explain your answer. #8. Suppose that you have a Bernoulli random variable. In this case X = 1 with probability p and X = 0 with probability (1-p). a. Find E[X] and Var[X]. b. Also, find E[X2] c. Show Var[X] = E[X2] – E2[X]. #9. Suppose you have N independent random Bernoulli variables each with probability p that Xi = 1. Consider S = X1 + …+XN. a. Find E[X] b. Find Var(X) c. Use parts a and b above to compute E[X2]. #10. Suppose that it is possible to hold the mean constant and let the variance of a random variable increase. Why would this cause the random variable to become riskier? #11. The general Normal PDF is defined as a. Graph f(x) b. Explain why that the mean, mode, and median are all the same at x = c. Show that the height of f(x) at the mean x = 0 is determined by d. Show that the right and left inflection points are determined by the variance parameter, e. What happens to the graph of f(x) when increases?