Download Sample Deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Law of large numbers wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Dr.Neal, WKU
MATH 382
The Sample Deviation
Let x1 , x2 , . . . , xn be a random sample of size n of a measurement with unknown mean
µ and unknown standard deviation σ . Let x be the sample mean. We then define the
sample variance S 2 by
S2 =
1 n
∑ ( xi − x ) 2 .
n − 1 i =1
The sample deviation is given by S = S2 .
Why Do We Divide by n − 1 ?
2
When { xi } is a census of measurements, then we obtain the true variance by σ =
1 n
∑ (x i − µ )2 , which is the average squared distance from the mean µ . To obtain this
n i =1
average, we necessarily divide by n . But when { xi } is only a random sample of
2
measurements and when µ is unknown, then we cannot obtain σ . But we can use x
as an unbiased estimate of µ , where the average of all possible x equals µ .
2
Likewise, we wish to define an unbiased estimator of σ . We should naturally
1 n
2
begin with the expression V =
∑ (x i − x )2 . However the average of all possible
n i =1
2
2
such V over all possible random samples of size n does not equal σ . It can be shown
n −1 2
2
that the average of all possible V equals
σ . To adjust the average, we multiply
n
n
1 n
2
2
V 2 by
to obtain S =
∑ ( x − x ) 2 . Now the average of all possible S over
n −1
n − 1 i =1 i
2
all possible random samples of size n equals σ .
2
By dividing by n − 1 in the definition of S ,
2
2
then S becomes an unbiased estimator of σ ;
2
that is, E[S 2 ] = σ .
Dr.Neal, WKU
2
How Good is the Estimator S ?
2
2
In order to determine how well S estimates σ , we would like to know the magnitude
2
2
of the variance of all possible S . That is, are the S widely spread out with some
2
2
2
much less than σ and others much more than σ ? Or do the S have small variance
2
which makes them consistently close to their average σ ?
2
Only in certain cases can we find the variance of all possible S . When sampling
from an arbitrary unknown population, then we generally cannot determine the
2
variance of S . However when sampling from a normally distributed population, then we
do know the following results:
Theorem. When choosing random samples of size n from a normally distributed
measurement with mean µ and standard deviation σ , then
(i) The distribution of all possible samples means x is normally distributed,
2
(ii) The variance of all possible sample variances S is given by Var(S2 ) =
2 σ4
.
n −1
Example. Suppose composite ACT scores are found to be normally distributed with
mean µ = 22.4 and standard deviation σ = 4.2. To check for discrepancies, various
random samples of size n = 400 are collected in various regions. The sample means x
2
and sample variances S are noted in each case. What are the average, variance, and
standard deviation of all possible sample means and of all possible sample variances?
Solution. Assuming that the population of test takers is of size N that is much larger the
n = 400, we can say
µ x = µ = 22.4
σ 2x ≈
σ 2 4.22
=
= 0.0441
n
400
σ x = σ 2x ≈ 0.21.
Thus x is normally distributed with a mean of 22.4 and a standard deviation of about
0.21. So about 68.27% of the time, an x from a random sample of size 400 should lie
within 22.4 ± 0.21. That is, P(22.19 ≤ x ≤ 22.61) ≈ 0.6827.
2
Because S is an unbiased estimator, we can say E[S 2 ] = σ
Applying the theorem, we can further say
Var(S2 ) =
2 σ4
2 ×(4.2)4
=
≈ 1.5597
n −1
399
2
2
2
= 4.2
= 17.64.
and σ S 2 ≈ 1.249.
So all the sample variances S should average out to 17.64 (the true measurement
variance), with a standard deviation of about 1.249.
Dr.Neal, WKU
Arithmetic Relationships Between S and σ
Your calculator or spreadsheet should display the sample deviation S along with the
basic statistics. Here are a few computational facts:
When x1 , x2 , . . . , xn is a census, then x = µ . Thus,
σ2 =
1 n
1 n
∑ (x i − µ )2 = ∑ (x i − x )2 ,
n i =1
n i =1
2
and S =
2
Therefore, σ =
1 n
∑ ( xi − x ) 2 .
n − 1 i =1
n −1 2
S and σ =
n
n −1
×S.
n
Thus if you have the value of S , then you can multiply S by
you actually have a census of data.
2
n −1
to obtain σ if
n
Moreover, if you wish to show work in computing S and S “by hand,” then we
x12 + x22 +... +x n2
n
n
2
2
know that σ =
– µ 2 . Thus S =
σ 2 and S =
× σ.
n −1
n −1
n
Dr.Neal, WKU
Exercises
1. A group of WKU freshmen were asked to give the number of hours that they spend
on Facebook per week. The results were:
Hrs
# Fr
0
12
3
8
6
16
10
14
12
20
15
14
20
9
25
5
30
2
Compute sample mean and sample deviation.
2. Adult heights are found to be normally distributed with mean µ = 68 inches and
standard deviation σ = 3.6 inches. Suppose various random samples of size n = 225 are
collected.
(a) What are the average, variance, and standard deviation of all possible sample means
x?
(b) What are the average, variance, and standard deviation of all possible sample
2
variances S ?
(c) What is the probability that a sample mean x is
(i) at most 67.9
(ii) at least 68.05
(iii) from 67.99 to 68.01?
(d) Find the bounds in between which lie 80% of all sample mean heights from random
samples of size n = 200 .