Download A brief review on sample variance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
November 19, 2010
[RYAN PALMER: A BRIEF REVIEW ON SAMPLE VARIANCE ]
Estimation of Variance
Population Variance
In Unit 1, we were introduced to the formula generally used to estimate the
 x   
2
population variance of finite size N. This is given as  2 
N
where  
x .
n
(See page 24 of Unit 1 course notes.)
Practically, however, we usually do not know the true variance of the population,
and it can be either time-consuming or costly to get a precise value by including
every member of the population.
Sample Variance
Taking a sample of size n, where sample values are drawn independently with
replacement, and where n <N, we can calculate the sample variance. There are two
formulas that are usually used to obtain an estimate for the population variance
when it is not known or is too costly to obtain. These formulas are:
S n2 
 y  y

n
y
1
 ; and ,
y2  

 n 
n


2
2
 y 
y  n
2
S2 
 y  y
n 1
2
2

n 1

1
1
 y 2
y2  2

n 1
n n
n
S n2 . Again, checking page 24 of course
n 1
notes, we observe that the formula that we use to estimate the population variance
is S 2 rather than S n2 . Although there is some difference in the formulas, for large
By simple observation we see that S 2 
enough samples the difference is not material.
Biased and unbiased estimate of population variance
While S n2 can be considered as the variance of the population when n = N, S 2
provides us an unbiased estimate of the population variance. In other words, the
November 19, 2010
[RYAN PALMER: A BRIEF REVIEW ON SAMPLE VARIANCE ]
n 1 2
 (you may verify
n
this for yourselves). What this result tells us is that S n2 underestimates the true
expected value of S 2 is  2 , while the expected value of S n2 is
value of the population variance. This is because in order to calculate the sample
variance, we take deviations with respect to the sample mean: y . However, sample
observations, y i , tend to be closer to the sample mean than the population mean.
Therefore, the calculation  y  y  tends to be smaller.
2
Sample Standard Deviation
Generally speaking, therefore, when the course materials refer to the sample
standard deviation, it is implied that the formula for calculation of that statistics is
 y  y
2
S  S2 
.This estimate for standard deviation, however, is notoriously
n 1
an unbiased estimate of the population standard deviation, although for large
enough sample sizes this bias is negligible.
A Brief Aside As To Why The Sample Standard Deviation Is Biased
The square root function is a concave function, ie, when we draw a tangent to it, it
is decreasing (relative to the tangent) as we move away from the point of tangency.
 
In order for E S 2   , (ie, the expected value of the square root of the variance to
be equal to the population standard deviation), it would have to be the case that the
square root function is a linear function (which it is not).
According to a result called Jensen’s Inequality, for a concave function like the
 
 
 
square root function, E S 2  E S 2  E S 2   . What this tells us is that the
sample standard deviation underestimates the population standard deviation.
Corrections are available for this bias, but this course is satisfied that for large
enough sample sizes, the bias is negligible.
Comment [o1]: You are not required to know
this