Download 33 Estimating Standard Deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
“Teach A Level Maths”
Statistics 1
Estimating the Standard
Deviation
© Christine Crisp
Estimating the Standard Deviation
S1: Estimating the Standard
Deviation
AQA
"Certain images and/or photos on this presentation are the copyrighted property of JupiterImages and are being used with
permission under license. These images and/or photos may not be copied or downloaded without permission from JupiterImages"
Estimating the Standard Deviation
The formula for the standard error ( the standard
deviation of the sample means ) is
standard error (s.e. ) =
s
n
where s is the population standard deviation and n is
the sample size.
However, we may not know the population standard
deviation so we must estimate this from our sample.
The obvious quantity to use is the sample standard
deviation but it can be shown that this is too small so we
need to make an adjustment.
Estimating the Standard Deviation
When we are estimating in Statistics, we talk about
biased and unbiased estimators.
An unbiased estimator is one that on average gives the
value we are estimating.
So, for example, if the value we wanted to estimate was
equal to 2, and all possible samples gave us these values:
1.8, 1.9, 2, 2.1, 2.2
the statistic giving the values would be unbiased: it’s mean
is 2.
( We mustn’t worry that 4 of the 5 values are wrong.
That isn’t the point. We must be right on average. )
If we had
1.8, 2, 2, 2, 2.3
our estimator is biased since the average is not correct,
even though more individual values are correct.
We want unbiased estimators.
Estimating the Standard Deviation
Let’s look at the hens eggs again.
We’ve met 3 different standard deviations (s.ds.) so we
need to be clear which s.d. we are talking about.
Population, 1st sample and
Population and 1000
sample means
mean of 1st sample
s
s
n
The 1st s.d. is the Population standard deviation ( the one
we want to estimate )
The 2nd s.d. is the standard error or standard deviation of all
sample means. It is also unknown as it depends on the
unknown s.
Estimating the Standard Deviation
Let’s look at the hens eggs again.
We’ve met 3 different standard deviations (s.ds.) so we
need to be clear which s.d. we are talking about.
Population, 1st sample and
Population and 1000
sample means
mean of 1st sample
s
s could be the standard
deviation of this sample
s
n
We are left with the 3rd s.d., the standard deviation, s,
of our one sample.
It can shown ( although we don’t need to do it ) that
this is a biased estimator. However, we can tweak it to
change it into an unbiased estimator.
Estimating the Standard Deviation
Let’s look at the hens eggs again.
We’ve met 3 different standard deviations (s.ds.) so we
need to be clear which s.d. we are talking about.
Population, 1st sample and
Population and 1000
sample means
mean of 1st sample
s
s
n
n
The unbiased estimator of s is s 
n1
where s is the standard deviation of a sample.
Estimating the Standard Deviation
The unbiased estimator of s 2
In your formula book you will find the unbiased
estimator of s 2, the population variance, written as
S2 
2
(
X

X
)
 i
n1
To use this, replace the capital Xi by xi ( the sample data )
and X by x ( the sample mean ).
It gives the same result as S 2  s 2 
n
n1
( For standard deviation, just square root. )
However, you’ll probably be using calculator functions not
a formula and your calculator gives the unbiased
estimator of the population standard deviation as well as
the sample standard deviation.
Try the following:
Estimating the Standard Deviation
Enter the following data in your calculator:
1, 3, 5
Select the list of statistics and you should find the values
1  15470. . .
S  s
0  94280. . .
s
n
n1
The 1st ( larger ) of these is the unbiased estimator of s.
The other is the standard deviation of the sample.
If you are not sure which to use, think about whether you
are making estimates from a sample. If so, use the 1st
( larger ) value.
n
One
further
point: ifused
n is by
large,
is veryThey
close are
to 1not
.
Ignore
the symbols
the calculator.
n1
the ones we use.
The biased and unbiased estimators are nearly the same.
Estimating the Standard Deviation
SUMMARY
 An unbiased estimator is one where the average of all
possible values equals the quantity being estimated.
 To estimate the variance of a population we use the
unbiased estimator, S2, where
n
S s 
n1
2
2
and s2 is the variance of a sample of size n.
 S2 can also be found from S 2 
2
(
x

x
)

n1
where x
represents each data item and x is the sample mean.
 Calculators give the values of both s, the sample
standard deviation and S the unbiased estimator of
population standard deviation but we must ignore the
calculator notation.
Estimating the Standard Deviation
e.g. 1. Six people in a factory were selected and asked
how long they took to get to work.
The results, in minutes, were as follows:
7, 12, 13, 20, 30, 35
Calculate the mean and variance of the times in the sample
and hence find unbiased estimates of the mean and
variance of the times for all the workers.
Solution:
x
Sample mean, x    19  5
n
Although
showing
the formulae
the solution,
I’mm
This
is theI’m
unbiased
estimate
of theinpopulation
mean,
2
using the calculator functions
to find each answer.
x

2
2
Sample variance, s 
 x  10  0 2  101 ( 3 s. f . )
n
n
2
2
2
S  s  gives the
( 3 sas
. f .10·0457
)
The calculator
s.d.
. . . so
11  0sample
 121
n  1 to find the variance. I’ve written down
we need to square
2, is
The
unbiased
estimate
of
the
variance,
s
121 ( 3 s. f . )
3 s.f. but will use the more exact calculator value.
Estimating the Standard Deviation
e.g. 2. The following sets of data are from samples,
each from a different Normal population. Find unbiased
estimates of the mean, m, and standard deviation, s, of
each of the populations.
(a) 17, 24, 25, 31, 42
2
(b)
x
x

422
,
  18002 ,

(c)
 x  330,  ( x  x ) 2  828 ,
n  10
n5
Solutions:
(a) Sample:
x  27  8 ,
s  8  38
Unbiased estimates of population parameters are:
m  27  8 ,
s  9  36
Estimating the Standard Deviation
(b)
2
x
 x  422,   18002 , n  10
Sample mean,

x

x
 42  2
n
Unbiased estimate of mean, m is 42 2
2
x

18002
Sample variance, s 
 42  2 2  19  36
x 
10
n
n
Estimate of population variance: S 2  s 2 
n1
10
2
 S  19  36   21  5
9
2
2
Unbiased estimate of population standard deviation, s is
S  21  5  4  64 ( 3 s. f . )
Estimating the Standard Deviation
(c)
 x  330,
2
(
x

x
)
 828 , n  5

Sample mean,

x

x
 66
n
Unbiased estimate of mean, m is 66
Unbiased estimate of s is S where
S2 
2
(
x

x
)

n1
 207
 S  207  14  4 ( 3 s. f . )
Estimating the Standard Deviation
Exercise
The following sets of data are from samples, each from
a different Normal population. Find unbiased estimates
of the mean, m, and standard deviation, s, of each of
the populations.
(a) 5·2, 7·9, 8·1, 9·3
2
(b)
x
x

678
,
  52302 ,

(c)
 x  282,
n  10
2
(
x

x
)
 1046 , n  5

Solutions:
(a) Sample: x  7  625,
s  1  50
Unbiased population estimates: m  7  63 ( 3 s. f . ) , s  1  73
Estimating the Standard Deviation
(b)
 x  678,
2
x
  52302 , n  10
Sample mean,

x

x
 67  8
n
Unbiased estimate of mean, m is 67  8
Sample variance, s 2 
2
x

n
 x2 
52302
 67  8 2  633  36
10
n
S s 
 704
n1
2
2
Unbiased estimate of s is S  704  26  5 ( 3 s. f . )
Estimating the Standard Deviation
(c)
 x  282,
2
(
x

x
)
 1046 , n  5

Sample mean,

x

x
 56  4
n
Unbiased estimate of mean, m is 56  4
Unbiased estimate of s is S where
S2 
2
(
x

x
)

n1
 261  5
 S  261  5  16  2 ( 3 s. f . )
Estimating the Standard Deviation
The following slides contain repeats of
information on earlier slides, shown without
colour, so that they can be printed and
photocopied.
For most purposes the slides can be printed
as “Handouts” with up to 6 slides per sheet.
Estimating the Standard Deviation
The unbiased estimator of s 2
The unbiased estimator of s 2 ( the population variance )
is given by
n
S 2  s2 
n1
In your formula book you will find this written as
S2 
2
(
X

X
)
 i
n1
You can use either, replacing the capital Xi by xi and
X by x ( data and mean ) from your sample.
However, you’ll probably be using calculator functions not
a formula and your calculator the unbiased estimator of
population standard deviation as well as the sample
standard deviation.
Estimating the Standard Deviation
Enter the following data in your calculator:
1, 3, 5
Select the list of statistics and you should find the values
1  15470. . .
S  s
0  94280. . .
s
n
n1
The 1st ( larger ) of these is the unbiased estimator of s.
The other is the standard deviation of the sample.
If you aren’t sure which to use, think about whether you
are making estimates from a sample. If so, use the 1st
( larger ) value.
n
One further point: if n is large,
is very close to 1.
n1
The biased and unbiased estimators are nearly the same.
SUMMARY
Estimating the Standard Deviation
 An unbiased estimator is one where the mean of all
possible values equals the quantity being estimated.
 To estimate the variance of a population we use the
unbiased estimator, S2, where
n
S s 
n1
2
2
and s2 is the variance of a sample of size n.
 S2 can also be found from S 2 
2
(
x

x
)

n1
where x
represents each data item and x is the sample mean.
 Calculators give the values of both s, the sample
standard deviation and S the unbiased estimator of
population standard deviation but we must ignore the
calculator notation.
Estimating the Standard Deviation
e.g. The following sets of data are from samples, each
from a different Normal population. Find unbiased
estimates of the mean, m, and standard deviation, s, of
each of the populations.
(a) 17, 24, 25, 31, 42
2
(b)
x
x

422
,
  18002 ,

(c)
 x  330,  ( x  x ) 2  828 ,
n  10
n5
Solutions:
(a) Using calculator functions,
For the sample, x  27  8 ,
s  8  38
Unbiased estimates of population parameters are:
m  27  8 ,
s  9  36
Estimating the Standard Deviation
(b)
2
x
 x  422,   18002 , n  10
Sample mean,

x

x
 42  2
n
Unbiased estimate of mean, m is 42 2
2
x

18002
Sample variance, s 
 42  2 2  19  36
x 
10
n
n
Estimate of population variance: S 2  s 2 
n1
10
2
2
 S  19  36   21  5
9
2
2
Unbiased estimate of population standard deviation, s is
S  21  5  4  64 ( 3 s. f . )
Estimating the Standard Deviation
(c)
 x  330,  ( x  x ) 2  828 , n  5
x

Sample mean,
x
 66
n

Unbiased estimate of mean, m is 66
Unbiased estimate of population standard deviation, s,
is S where
S2 
2
(
x

x
)

n1
 207
 S  207  14  4 ( 3 s. f . )