Download Large-Sample Confidence Interval for a Population Mean and a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Section 7.1: Large-Sample Condence
Interval for a Population Mean
We now know how to construct probability statements
about X using the sampling distribution of X obtained
by the CLT
Now, we'll use the sampling distribution of X to compute
something called a condence interval (CI) for . This is
an interval estimator of the population mean .
All condence intervals for are of the following form:
(x ; Kx; x + Kx),
where x = pn and K is a constant that depends on n
and a confidence level that you specify.
The confidence level of an interval is the percentage
of the time that the interval will enclose the population
parameter, if the interval is computed by repaeting the
sampling procedure a large number of times.
The confidence coefficient is the condence level
expressed as a fraction.
Typical condence levels are 90%, 95%, and 99%
How would you report a condence interval?
Example:
Say we have a 95% CI of (4:1; 7:9) for .
We are 95% condent that the true mean of ....
lies between 4.1 and 7.9 (units).
Report:
How would you interpret a condence interval?
Recall that we have no idea what the true mean is, so we
take a sample and compute the 95% CI, say, (4:1; 7:9).
Say another sample is taken and a new 95% CI is computed. Would you expect the 95% CI from the second
sample to be (4:1; 7:9)? No.
Say, the second sample gives a 95% CI of (2:7; 6:5).
Both intervals are 95% CI's for .
What does the 95% mean ?
The 95% means that if we repeat the whole process
100 times and compute 100 intervals for , 95 of
the intervals will contain the true mean .
Now, one must decide what to use for the constant K.
For large sample estimation, we can use the CLT
to compute K. The CLT says that for n 30,
x N (x; x2)
29
30
Keeping the CLT in mind, one may expect to use the Normal distribution in constructing a large sample CI.
Example:
A large-sample 100(1 ; )% CI for is given by:
(x ; z=2 x; x + z=2 x) or, x z=2 x
or, equivalently,
0
1
x z=2 BB@ p CCA
n
If is not known, which is usually the case, and n is
large, then an approximate CI is
1
0
s CC
B
B
p
x z=2 @ A
n
where s = sample standard deviation.
a) Construct a 90% CI for .
1
0
0
10:82 1
s
x z0:05 B@B p CAC = 23:43 (1:645) B@ p CA
n
96
giving 23:43 1:82 or (21:61; 25:25) as the required CI.
How do you nd z=2?
z=2 is the z value such that the area to the right is =2.
By choosing z=2, we are specifying that
- the condence level of the CI is 100(1 ; )%.
- or, equivalently, the condence coecient is (1 ; ).
Thus we chose to specify the condence level of our CI.
Example:
See Table
Example: 95% CI ) = 0:05 ) 2 = 0:025
90% CI ) = 0:10 ) 2 = 0:05
7.2
and follow Example
31
7.1
In a study to estimate the mean number of
years of service of bank executives with degrees in business or economics, 96 such bank executives were sampled
and the number of years of service of each determined.
The sample had a mean of x = 23:43 years and a standard deviation of s = 10:82 years.
b) Interpret the CI in the context of the problem.
We can be 90% condent that the true mean number of years of service is between 21.61 and 25.25
years.
c) Construct an 80% CI for .
0
1
0
10:82 1
s
x z0:10 BB@ p CCA = 23:43 (1:28) B@ p CA
n
96
giving 23:43 1:41 or (22:02; 24:84) as the CI. Note that
a decrease in condence level corresponds to a decrease
in the width of the interval (i.e., a narrower width).
What would happen if n is increased?
If and are xed, and assuming x remains the same as
n increases, x decreases, which implies that the width of
the CI decreases as n increases.
32
Section 7.2 { Small-Sample Condence
Interval for a Population Mean
In the large sample situation (n 30), the CLT helped
us to formulate a CI for a population mean by making
it possible for us to use a standard normal percentile z=2 .
However, in some cases, we may not be able to obtain a
large sample, but may still want to formulate a CI.
Because the sample size is small we now have 2 potential
problems:
1. We can no longer use the CLT.
2. The sample standard deviation (s) may be a poor
approximation of .
In the case of the rst problem, we will make the following
assumption:
If the population being sampled is approximately
Normal, the sampling distribution of X will be approximately Normal, even for small sample sizes.
Now, if we have a good approximation of (from past
data, for instance), and our assumption of approximate
Normality is correct, we can use the following:
x z=2 p
n
However, if we don't know and use s computed from the
sample to estimate it, and our assumption of approximate
Normality is correct, we may use the following CI:
s
x t=2;(n;1) p
n
where t=2;(n;1) is a percentile from the t-distribution
based on (n ; 1) degrees of freedom.
This CI is based on the t-statistic
X ; t= p
S= n
which is said to have a t-distribution
degrees of freedom.
with
(n ; 1)
The t-distribution is a continuous, symmetric, moundshaped distribution, similar to a N (0; 1).
However, the t-distribution is a squashed N (0; 1). That
is, the t-distribution is not as tall as a N (0; 1) and it has
more area in the tails. (see Figure 7.7)
One can view t=2;(n;1) as a conservative estimate to z=2 .
We need to be conservative because s may be a poor approximation of .
33
34
Compute the t percentile given n = 25 and
condence level 95%. (Use Table VI and look at Figure
7.8)
b) Interpret the interval in (a).
We are 90% condent that the true mean LOS in 1990
will be between 3.34 and 4.26 days.
c) Suppose is known to be 1.2 days (say, from past
data). How will the interval change?
Since is known we may use the Normal approximation here .
A 90% CI for is:
(1:2)
X z0:05 p = 3:8 (1:645) p
n
20
= 3:8 0:44
= (3:36; 4:24)
Note that the CI in (c) is smaller than the CI in (a).
Example:
Since (n ; 1) = 24; =2 = :025 we look up Table VI
with df = 24 and nd t:025;24 to be 2:064.
The variance of a t-distribution depends on the sample
size (n). The smaller the sample size, the more variability. This dependence of the t-distribution on sample size
is expressed through the degrees of freedom (df) of
the t-distribution. If we have n observations in the sample, we will have (n ; 1) df for the t-statistic.
Example: (Exercise 7.24) Health insurers and the
federal government are both putting pressure on hospitals
to shorten the average length of stay (LOS) of their patients. A random sample of 20 hospitals on one state had
a mean LOS in 1990 of 3.8 days, and a standard deviation
of 1.2 days.
a) Use a 90% CI to estimate the population mean LOS
for the state's hospitals in 1990.
n = 20; x = 3:8 days ; s = 1:2 days and t0:05;19 = 1:729
A 90% CI for is:
s
(1:2)
x t0:05;19 p = 3:8 (1:729) p
n
20
= 3:8 0:46
= (3:34; 4:26)
35
Section 7.4 { Determining the Sample Size
Necessary to Estimate a Population Mean
The question that needs to be answered now is: \How
big of a sample do I need to take?"
Usually want a sample that is just \big" enough to be able
estimate the population mean to be within a bound B
with 100(1 ; )% condence. i.e.,the width of a
100(1 ; )% CI for must be at least 2B .
Thus if we have the right n, then B is equal to one-half
the width of the CI i.e.,
B = z=2 p :
n
36
This results in the value
2
B2
To use this formula to get a value for n, the researcher
needs to specify values for ; ; and B . For , researchers usually use a value from a previous study of
similar data.
n = (z=2 )2
Example: (Exercise 7.61) The USGA tests all new
brands of golf balls to assure that they meet USGA specs.
One test is to have the balls hit by \Iron Byron". Suppose the USGA wants to estimate the mean distance for a
new brand to within 1 yard with a 90% condence. Past
tests indicate that standard deviation of distances hit is
approximately 10 yards. How many balls need to be hit
to achieve the desired accuracy?
For a 90% condence interval we need to use z0:05 =
1:645. We are given that = 10 yards and B = 1 yard.
Thus:
2
n = (z=2 )2 2
B
(10)2
= (1:645)2
1
= 270:6025
Thus the needed sample size is (rounding up the above
answer) at least 271 balls.
37