Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
November 19, 2010 [RYAN PALMER: A BRIEF REVIEW ON SAMPLE VARIANCE ] Estimation of Variance Population Variance In Unit 1, we were introduced to the formula generally used to estimate the x 2 population variance of finite size N. This is given as 2 N where x . n (See page 24 of Unit 1 course notes.) Practically, however, we usually do not know the true variance of the population, and it can be either time-consuming or costly to get a precise value by including every member of the population. Sample Variance Taking a sample of size n, where sample values are drawn independently with replacement, and where n <N, we can calculate the sample variance. There are two formulas that are usually used to obtain an estimate for the population variance when it is not known or is too costly to obtain. These formulas are: S n2 y y n y 1 ; and , y2 n n 2 2 y y n 2 S2 y y n 1 2 2 n 1 1 1 y 2 y2 2 n 1 n n n S n2 . Again, checking page 24 of course n 1 notes, we observe that the formula that we use to estimate the population variance is S 2 rather than S n2 . Although there is some difference in the formulas, for large By simple observation we see that S 2 enough samples the difference is not material. Biased and unbiased estimate of population variance While S n2 can be considered as the variance of the population when n = N, S 2 provides us an unbiased estimate of the population variance. In other words, the November 19, 2010 [RYAN PALMER: A BRIEF REVIEW ON SAMPLE VARIANCE ] n 1 2 (you may verify n this for yourselves). What this result tells us is that S n2 underestimates the true expected value of S 2 is 2 , while the expected value of S n2 is value of the population variance. This is because in order to calculate the sample variance, we take deviations with respect to the sample mean: y . However, sample observations, y i , tend to be closer to the sample mean than the population mean. Therefore, the calculation y y tends to be smaller. 2 Sample Standard Deviation Generally speaking, therefore, when the course materials refer to the sample standard deviation, it is implied that the formula for calculation of that statistics is y y 2 S S2 .This estimate for standard deviation, however, is notoriously n 1 an unbiased estimate of the population standard deviation, although for large enough sample sizes this bias is negligible. A Brief Aside As To Why The Sample Standard Deviation Is Biased The square root function is a concave function, ie, when we draw a tangent to it, it is decreasing (relative to the tangent) as we move away from the point of tangency. In order for E S 2 , (ie, the expected value of the square root of the variance to be equal to the population standard deviation), it would have to be the case that the square root function is a linear function (which it is not). According to a result called Jensen’s Inequality, for a concave function like the square root function, E S 2 E S 2 E S 2 . What this tells us is that the sample standard deviation underestimates the population standard deviation. Corrections are available for this bias, but this course is satisfied that for large enough sample sizes, the bias is negligible. Comment [o1]: You are not required to know this