Download Interval Estimation (Means and Proportions) Recall: The observed

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Interval Estimation
(Means and Proportions)
Recall: The observed value of an estimator of a population parameter
is called a point estimate. For example, the sample mean x = 35 is a point
estimate of the population mean µ.
Question: How precise is the estimate?
We could use the standard error of the estimate as a measure of the error.
It measures approximately the error that we will observe on average.
We will now make a refinement of this error called a confidence interval.
It will permit us to measure the precision of the estimate.
Before constructing a confidence interval for a population mean µ, we will
need to define the concept of an upper quantile from the standard normal
distribution.
Definition: Let Z ∼ N (0, 1), that is a standard normal random variable.
Its upper quantile of order A is a number zA such that
A = P (Z > zA ) = 1 − Φ(zA ),
that is the area under the density of Z to the right of the value zA est A.
Remark: Since T (∞) = N (0, 1), then zA = tA,∞ . Thus, the row ν = ∞
in Table 17.4 in the textbook, we find some quantiles for the standard N (0, 1).
The following graph illustrates an upper quantile of order 5% for the standard
normal distribution, that is z0.05 = 1.645.
1
Estimating the population mean µ (σ known)
Conditions:
• normal population or a large sample size (n ≥ 30);
• population variance σ 2 is known.
Under these conditions:
X −µ
√
σ/ n
follows a N (0, 1) distribution. It is approximative if n is large but the population is not normal. Hence
X −µ
√ ≤ zα/2
1 − α = P −zα/2 ≤
σ/ n
σ
σ
= P X − zα/2 √ ≤ µ ≤ X + zα/2 √
n
n
Z=
Hence, if the population is normal or if the sample size is large (n ≥ 30),
then a 100 (1 − α)% confidence interval for µ is
σ
σ
σ
x ± zα/2 √ = [x − zα/2 √ , x + zα/2 √ ].
n
n
n
2
Remarks:
• 1 − α is known as the confidence level. Often we set the level to 95%,
but any large value would be reasonable. For example, 90%, 98% or
99%.
• The length of the interval will be used as a measure of precision. The
shorter the interval the more precise is the estimate and the longer the
interval is interpreted as a less precise estimate.
Further remarks: Usually in practice, we do not know σ. We will use
the sample standard deviation s instead of σ. Recall that
X −µ
√ ∼ N (0, 1) approximately,
S/ n
when n is large (n ≥ 40). Thus, we obtain the following confidence interval.
Confidence interval for µ
(Large sample case, that is n ≥ 40)
If n is large (n ≥ 40), then a 100(1 − α)% confidence interval for µ is
s
x ± zα/2 √ .
n
3
Example 1: Consider the following summary statistics for the length of
the unsuccessful songs of crickets.
(a) Construct a 95% confidence interval for the true mean length of an unsuccessful song.
(b) Give a 98% interval for the population mean length of an unsuccessful song.
(c) Compare the length of the intervals from part (a) and (b).
4
Remark: Frequentist interpretation of the confidence level. Say 1 − α =
95% and that the population mean is µ = 4. As we collect n observations and
compute a 95% confidence interval for µ, say it is [3.5, 4.2]. Then either the
value of the mean is in the interval or not. In this case, it is in the interval.
So with probability 1, µ = 4 falls in the interval [3.5, 4.2]. So what does the
95% represent? Well as we repeatedly collect other samples these intervals
will vary. At times, the value is in the interval and sometimes it does not
fall in the interval. However as we repeat this process a large number of
times about 95% of the constructed intervals will contain the true value of
the mean. So we say that we are 95% confident that µ is in our observed
confidence interval since the interval belongs to a class of intervals such that
95% of them contain the true mean.
Precision:
We will use the length of the interval as a measure of the precision:
zα/2 σ
σ
σ
x + zα/2 √ − (x − zα/2 √ ) = 2 √ .
n
n
n
Remarks:
• A short interval is interpreted as a precise estimate.
• The precision is a function of the confidence level, the standard deviation and the sample size n.
• The more dispersed is the population, the less precise is the estimate.
Note: We cannot manipulate σ.
• If we increase the level of confidence, then the estimate is less precise.
• If we increase the sample size, then the estimate is more precise.
• We want both a precise estimate and a high level of confidence. To
achieve both, we fix the level of confidence to a high level, say 95%,
then we choose the appropriate sample size to control the precision.
5
Sample Size
If x is used as a point estimate for µ, then we are 100 (1 − α)% confident
that the error |x − µ| will not be greater than E, if the sample size satisfies
n≥
z
α/2
E
σ 2
.
Remarks:
• If n is not an integer, the we round up to the closest integer.
• If σ is unknown, then in practice we can try to use past information
from past experiments or to perform a preliminary study and use the
corresponding sample standard deviation s instead of σ in the formula
to calculate n.
Example 2: Suppose that the standard deviation of a unsuccessful
cricket song is σ = 4 minutes. How large should the sample size be in
order to be 95% confident that the error of the estimate of the mean will not
exceed 1 minute.
6
Estimating the mean from a normal population (σ unknown)
For the particular case of a normal population it is possible to construct
a confidence interval for the mean µ even when σ is unknown regardless of
the sample size.
Recall: Consider a random sample X1 , . . . , Xn from a normal population
with mean µ and standard deviation σ. If we standardize the sample mean,
but use the sample standard deviation S instead of population standard
deviation σ, then
X −µ
√
S/ n
does not follow a normal distribution. In fact it follows a distribution known
as Student’s t with ν = n − 1 degrees of freedom.
Thus, if the population is normal, a 100 (1 − α)% confidence interval for
µ is
s
x ± tα/2,n−1 √ .
n
7
Example 3: Consider the growth (in mm) of radish after three days in
darkness :
15
20
22
20
29
37
11
35
15
30 33
8 10
25
Below we find a quantile-quantile plot.
(a) Does it appear to be reasonable to model the distribution of the radish
growth with a normal distribution?
(b) Using the following summary data, produce a 95% confidence interval
for the mean growth.
(c) Use the following commands in minitab to produce a 95% confidence
interval for the mean growth. We are assuming that the observations on in
8
column C1.
MTB > onet c1;
SUBC>
Confidence 95.
9
Estimating a population proportion p
Consider a sample proportion Pb = X/n, where X is the number of successes among n independent trials.
So X ∼ B(n, p). But X/n is an average of 0s and 1s, so by the central
limit theorem
p (1 − p)
2
b
approximately.
P ∼ N µPb , σPb = N p,
n
So
1−α ≈ P
= P
Pb − p
!
≤ zα/2
−zα/2 ≤ p
p (1 − p)/n
!
r
r
p
(1
−
p)
p
(1
−
p)
≤ p ≤ Pb + zα/2
.
Pb − zα/2
n
n
Thus, for large n, a 100 (1 − α)% confidence interval for the population
proportion p is
r
p (1 − p)
.
pb ± zα/2
n
Problem: In the above formula the confidence interval uses the unknown
parameter p. In practice, we use its point estimate p̂ instead to get an
approximate confidence interval.
For large n by using the following rule of thumb:
n pb ≥ 5 and n (1 − pb) ≥ 5,
a 100 (1 − α)% confidence interval for p is
r
pb (1 − pb)
pb ± zα/2
.
n
10
Example 4: Consider the following table concerning the location of nests
of sparrows.
location vines
frequency
56
building
60
low tree
46
cavities
49
Compute a 95% confidence interval for the population proportion of sparrows that build nests in vines.
Sample size: If we use pb as a point estimate for p, we are 100 (1 − α)%
confident that the error |b
p − p| will not exceed E when the size of the sample
satistifies
z 2
α/2
p (1 − p).
n≥
E
Problem: The formula involves the unknown quantity p.
Solution: Consider p (1 − p) = p − p2 . It is a quadratic function turned
down with zeros at p = 0 and p = 1. So the axis of symmetry of the parabola
is p = 1/2. Thus, the maximum value of p (1 − p) is 1/2(1 − 1/2) = .25. If
we use the value n at p = 1/2, then we obtain a value that will be at least
as large as the required sample size for the true proportion p.
Thus, if pb is an estimate for p, we are at least 100 (1 − α)% confident that
the error |b
p − p| will not be greater than E with a sample size
n≥
z
α/2
2
E
11
(0.25).
Example 5: Suppose that we would like to be 95% confident that the error in the estimation of a population proportion p be at most 0.025. Compute
the required sample size?
12
Related documents