Download CHAPTER 8 : ESTIMATION OF POPULATION PARAMETERS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
1
CHAPTER:
POINT ESTIMATION AND CONFIDENCE INTERVALS
Contents
0
Introduction
1
Point Estimation and Unbiasness
2
Unbiased Estimators for Population Mean and Variance
3
Unbiased Estimator for Population Proportion
4
Confidence Interval for Population Mean,  , when Population is Normal and 2 known
5
Confidence Interval for Population Mean,  , when Population is Normal and 2 unknown
6
Confidence Interval for Population Proportion
7
Miscellaneous Examples
 0 Introduction
Suppose that a population has an unknown parameter, such as the mean, or the variance, or the proportion of
‘successes’. Then an estimate of the unknown parameter can be made from the information supplied by a
random sample (or samples) taken from the population. A statistic used to estimate the value of a parameter is
called an estimator and it is denoted by a capital letter (e.g. U, T, ...). The numerical value taken by the
estimator in a particular instance is called an estimate and is denoted by a small letter (e.g. u, t, ...)
For example, a coin is known to be biased but the probability of obtaining a head, p, is unknown.
If this coin is tossed 100 times, we know that X ~ Bin(100,p) where X is the random variable ‘the number of
65
heads in 100 tosses’. If in this experiment we observed 65 heads, then we could estimate p by
= 0.65.
100
x
We can then use
as an estimate of p.
100
 1 Point Estimation and Unbiasness
An estimate of a population parameter given by a simple number is called a point estimate.
An estimator is unbiased if the expected value of its sampling distribution equals the parameter it is estimating
_ X1 + X2 + ... + Xn
_
For example, X =
is an unbiased estimator of  because E(X ) = .
n
We could form many estimators, the most efficient estimator is one which
a) is unbiased, and
b) has the smallest variance i.e. as n  , Var(X) 0.
Notations for
Mean
Population

Variance
2
Sample
_
x
s2
Proportion
p
ps
Unbiased estimator
^


2
^
p
Example 1.1
An educational psychologist at a major university wants to estimate the mean IQ of the students. To this end,
n = 30 students are to be randomly selected and given an IQ test. Suppose the IQ scores of the students selected
are as follows :
107 99 101 93 99 103 134 132 103 109 104 103 101 128 113
106 126 103 131 106 119 102 98 116 108 103 111 119 112 105
_
Compute X . [Note : x = 3294] [109.8]
Solution
2
Example 1.2
If X1, X2, X3 is a random sample taken from a population with mean  and variance 2, find which of the
following estimators for  are unbiased, and which is the most efficient of these.
X1 + X 2 + X 3
X1 + 2X2
X1 + 2X2 + 3X3
T1 =
, T2 =
, T3 =
[T1, T2; T1]
3
3
3
Solution
3
 2 Unbiased Estimators for Population Mean and Variance
From a population with unknown mean  and unknown variance 2, take a random sample of size n.
(xi -a)
 _ 1 n
Then the most efficient unbiased estimator for the population mean, is x = xi or a +
n
n

i=1
 ns2
And the most efficient unbiased estimator for the population variance, 2 is
,
 n-1
(xi -a)2 (xi -a)2
1
_
1
_
 x2  x2

where s2 =
-
where a any constant or
 (xi - x )2 or
 x2 - (x )2 or
n
n
n
n  n 
 n 
1
_

Hence, it can also be written as 2 =
 (xi - x )2
 n-1
Example 2.1 (YJC 96/2/10a modified)
A random sample of 100 observations of a random variable X gives the following results:  (x – 49) = -25 and
 (x – 49)2 = 195. Calculate the unbiased estimate of the population variance and population mean.
[48.8, 1.91]
Solution
Example 2.2
Find an unbiased estimate of the population mean and variance from which each of the following sample is
drawn: a) 35, 42, 38, 55, 70, 69
_
b)  x = 120,  (x - x )2 = 302, n = 8
c)  x = 120,  x2 = 2102, n = 8
d)  (t – 300) = 2012,  (t – 300)2 = 525 262, n = 200
[51.5, 241, 15.0, 43.1, 15.0, 43.1, 310, 2537]
Solution
4
 3 Unbiased Estimator for Population Proportion
Take a binomial population in which an unknown p is the proportion of successes.
A random sample of size n is taken.
Let X be the random variable ‘the number of successes obtained from sample of size n’.
and Ps be the random variable ‘the proportion of successes in the sample’.
X

Then, an unbiased estimator for population proportion p, , is Ps where Ps =
i.e. E(Ps) = p
n
p
Note:
X
E(X) np
E(Ps) = E( ) =
=
=p
n
n
n
X
1
npq pq
Var(Ps) = Var( ) = 2 Var(X) = 2 =
. When n, Var(Ps)0. Hence Ps is efficient estimator.
n
n
n
n
Example 3.1
A random sample of 50 children from a large school is chosen and the number who are left handed is noted.
It is found that 6 are left handed. Obtain an unbiased estimate of the proportion of children in the school who
are left handed. [0.12]
Solution
Example 3.2
A drawing pin is tossed 100 times. It lands ‘point-up’ 64 times. Obtain an unbiased estimate of the proportion of
‘point-up’ tosses. [0.64]
Solution
5
 4 Confidence Interval for Population Mean,  , when Population is Normal and 2 known
When we have a population with unknown parameters (such as the mean, the variance, . . . etc.), we can estimate
these parameters by using either point estimates (discussed above) or interval estimates.
A confidence-interval estimate of an unknown population parameter is a random interval constructed so that it
has a given probability of including the parameter.
Consider a population with unknown parameter . If we can find an interval (a, b) such that P(a<  <b)= 0.95,
we say that (a, b) is a 95% confidence interval for . In this case, 0.95 is the probability that the interval
includes . (It is not the probability that  lies in the interval.). Values a and b are the 95% confidence limits.
Note: In interval estimation
a) The shorter the length of the interval (a,b), the better the estimation.
b) The higher the confidence interval, the better the estimation.
c) The end values of the interval a and b are called the confidence limits.
Consider a normal population, with mean  and variance 2, i.e. X  N(, 2).
Now take a random sample of size n (n can be big or small) from the population, X 1, X2, . . . Xn
_
_ 1
and consider the distribution of sample mean, X where X =  Xi.
n
_
2
Since X is normally distributed, X N(, ).
n
_
x-
Standardising, we have Z=
where ZN(0,1).

n
From the standard normal table, we know that the central 95% of N(0, 1) lies between the values  1.96.
So, P(- 1.96  Z  1.96) = 0.95
N(0, 1)
_
x-
 P(- 1.96 
 1.96) = 0.95
95%

n
 _

 P(- 1.96
 x -   1.96
) = 0.95
n
n
 _
 _
 P(- 1.96
- x  -   1.96
- ) = 0.95
-1.96
0
1.96
n
n x
_
_


 P( 1.96
+ x    x - 1.96
) = 0.95
(multiplying by -1)
n
n
_
_


 P( x - 1.96
   x + 1.96
) = 0.95 (rearranging)
n
n
_
 _

So we have found an interval (x - 1.96
, + 1.96
) such that the probability that the interval includes  is
n x
n
0.95. This is called the 95% confidence interval for .
_
_
 _


Similarly, a 90% confidence interval for  is given by (x - 1.645
, + 1.645
) or (x  1.645
)
n x
n
n
_
_
 _


and a 98% confidence interval for  is given by (x - 2.326
, + 2.326
) or (x  2.326
)
n x
n
n
_
_
 _


and a 99% confidence interval for  is given by (x - 2.575
, x + 2.575
) or (x  2.575
)
n
n
n
Note : If the population is not normally distributed, then we require n to be large (n  30) for the result to be
_
used. This is because by the Central Limit Theorem, if X is not normally distributed,then X is normally
_
2
distributed, i.e. X ~ N(, ), only when n is large.
n
6
Example 4.1
A random sample of six items with sample mean 12.45 cm is taken from a normal population with variance 4.5
cm2. Find the 95% confidence interval for the population mean . [(10.8, 14.2)]
Solution
Example 4.2
On the basis of the results obtained from a random sample of 100 men from a particular district, the 95%
confidence interval for the mean height of the men in the district is found to be (177.22cm, 179.18cm).
_
Find the value of x , the mean of the sample, and , the standard deviation of the normal population from which
the sample is drawn. Calculate the 98% confidence interval for the mean height.
[178.2, 5, (177.04, 179.36)]
Solution
7
Example 4.3
A machine produces washers whose diameter has a standard deviation 0.04 cm. In order to find the mean
diameter of the washers produced, a random sample of 9 washers is taken whose mean diameter is found to be
3.14 cm. Calculate symmetric
a) 95%
b) 98 % confidence intervals for the mean diameter of washers produced by the machine.
[(3.114cm, 3.166cm), (3.109cm, 3.171 cm)]
Solution
 5 Confidence Interval for Population Mean,  , when Population is Normal and 2 unknown
When the population is normally distributed, and 2 is unknown, it is neccessary to use an unbiased estimator,
 ns2
=
where s2 is the sample variance.
2 n-1

_
_
ns2 1
_
s

Then a 95% confidence interval for  is given by (x  1.96
) = ( x  1.96
) = ( x  1.96
)
n
n-1 n
n-1

_
_
s

Similarly, a 98% confidence interval for  is given by (x  2.236
) or (x  2.236
)
n
n-1

_
_
s

and
a 99% confidence interval for  is given by (x  2.575
) or (x  2.575
).
n
n-1
Note:
When n is large (n  30),
n


 1. So 2  s2 
s.
n-1


_
s
Then a 95% confidence interval for  can be given approximately by (x  1.96
).
n
8
Example 5.1
A random sample of 120 measurements taken from a normal population gave the following data; n = 120,
_
x = 1008,  (x - x )2 = 172.8. Find
a) a 97% and
b) a 99% confidence intervals for the population mean .
[(8.16, 8.64), (8.12, 8.68)]
Solution
 6 Confidence Interval for the Population Proportion (n  30)
Consider a binomial population where p, the proportion of ‘successes’ in the population is unknown.
Take a random sample of size n from the population.
Let Ps be the random variable ‘the proportion of successes in the sample’.
pq
Then Ps ~ N(p,
) where q = 1  p. Now, as p is unknown, we use an estimator for it.
n
pq ps qs
An unbiased estimator for p is ps. So, assume that an estimator for
is
where qs = 1 - ps .
n
n
ps qs
Since n is large (n  30), by the Central Limit Theorem, we have Ps ~ N(p,
) approximately.
n
ps - p
Standardising, we have Z =
where Z ~ N(0,1).
ps qs
n
ps - p
Since from the standard normal table, P(-1.96 < Z < 1.96) = 0.95 or 95%  P(-1.96 <
< 1.96) = 0.95.
ps qs
n
So, rewriting P(ps - 1.96
ps qs
< p < ps + 1.96
n
ps qs
) = 0.95.
n
If a random sample of size n (n  30) the proportion with a particular property is p s , the 95% confidence interval
p s qs
ps qs
for the population proportion p is given by (p s - 1.96
, ps + 1.96
) where qs = 1 - ps .
n
n
ps qs
This can be written as (ps  1.96
).
n
ps qs
Similarly, 98% confidence interval for p is (ps  2.326
)
n
ps qs
and
99% confidence interval for p is (ps  2.575
).
n
9
Example 6.1
A manufacturer wants to assess the proportion of defective items in a large batch produced by a particular
machine. He tests a random sample of 300 items and finds that 45 are defective. Calculate
a) a 95%
b) a 98% confidence intervals for the proportion of defective items in a complete batch.
[(0.110, 0.190),(0.101,0.198)]
Solution
Example 6.2
In a sample of 400 carpet shops taken in 1998, it was discovered that 136 of them sold carpets at below the list
prices which had been recommended by manufacturers.
a) Estimate the percentage of all carpet selling shops selling below list prices.
b) Calculate the 95% confidence interval for this estimate, and explain briefly what these mean.
c) What size sample would have to be taken in order to estimate the percentage to within  2%?
[34%, (29.4%, 38.6%), 2156]
Solution
10
Example 6.3
An opinion poll is taken as to how an electorate of 20 million will vote in a forthcoming referendom.
Out of a random sample of 100, 40 say 'yes' and 60 say 'no'. What is the 95 % confidence interval for the
proportion who will vote 'yes'? [between 30.4% and 49.6%]
Solution
 7 Miscellaneous Examples
Example 7.1
The weights, x kg, of a random sample of 100 fifteen-year-old girls from a school were taken and the data
obtained is summarised by  (x – 40) = 82 and  (x – 40)2 = 362.
a) Calculate the unbiased estimate of the population mean.
b) Construct a symmetric 98% confidence interval for the population mean.
c) Fifty schools are taken and the symmetric 98% confidence interval for the population mean is determined
for each school. Find the expected number of thses intervals that would contain the population mean.
[2.977, (40.4, 41.2), 49]
Solution
11
SUMMARY
Notations for
Mean
Variance
2
Sample
_
x
s2
Proportion
p
ps
Estimator
_
x
2
^

ss
Population

Formula
1 n
x or
n  i
i=1
ns2
n-1
a+
Unbiased estimator
^


2
^
p
(xi -a)
n
(xi -a)2 (xi -a)2

-
n
 n  or
1
_
 (xi - x )2 or
n
1
_
 x2 - (x )2
n
or
 x2  x2
n  n 
_
If X is the mean of a random sample of size n taken from a normal population with known variance 2:
_

A central 90% confidence interval for  is given by (x  1.645
)
n
_

a central 95% confidence interval for , is given by (x  1.96
)
n
_

A central 98% confidence interval for  is given by (x  2.326
)
n
_

A central 99% confidence interval for  is given by (x  2.575
)
n
_
If X and s2 are the mean and variance of a random sample of size n from a normal population with unknown
_
variance 2 , then a central 95% confidence interval for  is given by (x  1.96
s
).
n-1
If a random sample of size n (n  30) the proportion with a particular property is p s ,
ps qs
the 95% confidence interval for the population proportion p is (ps  1.96
).
n
ps qs
Similarly, 98% confidence interval for p is (ps  2.326
)
n
ps qs
and
99% confidence interval for p is (ps  2.575
).
n
Must it be a Polar Bear?
There is this familiar story about how a hunter travelled one mile south, one mile east and one mile north, ended
up in the same spot where he started off and shot a bear. The colour of the bear he shot was white because he
can only be at the North Pole.
But with more thought, there are actually many places, in fact, infinite number of places where one can travel
one mile south, one mile east and one mile north and yet end up in the same starting point. Can you find out
where these places are? [ There is no trick question here. It can actually be proven mathematically]
Puzzle taken from lectures by Prof Tan Eng Chye, NUS Maths Dept