Download z 12.2 HE TEST OF THE POPULATION MEAN, Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
12.2 THE z TEST OF THE POPULATION MEAN, ␮
Introduction
Consider a null hypothesis bootstrap step 1 population, constructed as
in Section 12.1. In the central limit theorem of Chapter 11, bootstrap
sample means approximately follow a standard normal density provided
we standardize using the bootstrap population’s theoretical mean and
standard deviation and the number of observations n used to compute the
sample means is at least 20. In the hypothesis-testing problem of Figure
12.1, we note that the relative frequency histogram of the sample data looks
roughly like a normal density. Although not discussed in Chapter 11, when
such a rough resemblance to the shape of a normal distribution holds,
sample means will approximately follow a normal density even when the
number of observations n used to compute each such mean is much less
than 20. Indeed, the closer the shape of the original data is to normal (in turn
suggesting a population distribution roughly normally shaped), the smaller
the sample size can be with the sample means still approximately obeying a
normal distribution.
Suppose we believe that the bootstrap sample means approximately
obey a normal distribution, because of the central limit theorem when
the sample size is large or because we have evidence that the population
distribution is itself roughly normal. Then to test hypotheses about the
population mean we can use the normal table (Appendix E) to tell us about
the probability behavior of the bootstrap sample means under the null
hypothesis rather than do 100 bootstrap simulations of sample means as we
did in the previous section.
Recall that we made an analogous decision to use the chi-square table
to test hypotheses in Chapter 7 because the relative frequency histograms
of the simulated chi-square statistics appeared to be similar to the smooth
chi-square density. That decision is also supported by a theory of the same
kind as the central limit theorem, and hence using the chi-square density to
compute probabilities was also theoretically justified.
At step 5, then, instead of counting the proportion of the bootstrapsimulated means that are too big (or too small), we will theoretically
compute this proportion using the normal density of Chapter 8. But in
order to use the normal curve, we need a mean and standard deviation for
standardizing the observed sample mean.
We will now address the Key Problem hypothesis using a normal
distribution based z test. Consider how we must modify the six-step
solution to the Key Problem presented in Section 12.1. In step 4 the 100
bootstrap-simulated null hypothesis means have a mean of 98.6048 and a
standard deviation of 0.0606. Thus we do have available the needed mean
and standard deviation to use for standardizing the observed sample mean.
Now we can substitute the following for step 5:
New Step 5. Estimation of the Probability of the Obtained Average
or Less (the Probability of a Successful Trial): In order to compute a
probability for a statistic that obeys a normal distribution, we must first
standardize the statistic by subtracting an appropriate mean and dividing
by an appropriate standard deviation, namely the standard error of the
statistic. The most straightforward way to do this when the statistic has
been repeatedly simulated is to subtract the sample average of the statistic
and divide by the sample standard deviation. As the result of sampling from
the null hypothesis population, we have obtained 100 bootstrap-sampled
means. Thus we can standardize the obtained mean of 98.25 by centering
at (subtracting) the grand sample mean of the 100 bootstrapped means
and normalizing (dividing) by the sample standard deviation of these 100
bootstrapped means. That is, we obtain the z statistic:
z⳱
obtained mean ⫺ mean of 100 bootstrapped means
sample standard deviation of 100 bootstrapped means
98.25 ⫺ 98.6048
⳱ ⫺5.85
.0606
Now we apply the central limit theorem to conclude under H0 that such
a z is approximately standard normal and hence we use the normal table
(Appendix E) to find the probability that a standard normal variable is less
than ⫺5.85. Appendix E shows that this probability is essentially 0—that the
chance that a bootstrap simulated mean is less than or equal to the obtained
mean 98.25 is essentially 0. (Note that using the regular step 5 based on
bootstrap simulations also gave the same answer of 0.)
⳱
Step 6. Decision: We make the same decision as in step 6 of Section 12.1
because a z of ⫺5.85 implies the probability is essentially 0 that a bootstrap
sample mean is ⱕ 98.25, assuming H0 . We reject the null hypothesis and
declare that the population mean is not 98.6.
The new step 5, which took advantage of the fact that bootstrap sample
means follow a normal distribution, did not save much time, because in
order to find the standardizing mean and standard deviation of the sample
mean we still had to bootstrap-sample the 100 means from the invented
null population, and doing so required 100 ⫻ 130 ⳱ 13,000 simulated
observations.
The good news is that we can deduce what theoretical mean and
standard deviation should be used to standardize 98.25, based only on the
actual 130 obtained data points—that is, without doing the time-consuming
100 bootstrap simulations (needing 13,000 body temperatures).
Recall from Section 8.6 that when standardizing to produce a statistic it is
better, when possible, to center and divide by the theoretical mean and theoretical standard deviation of that statistic, which here is SD/ 冪sample size,
where SD denotes the theoretical standard deviation of the null hypothesis
step 1 box model population. Can we do so? A review of Section 11.5 shows
us we can. There we learned to center X at the mean of the population being
sampled from. In our setting, we have constructed the step 1 population to
have its mean given by the null hypothesis, which is 98.6. Moreover, we
learned to divide by SD/ 冪sample size, the theoretical standard error of the
constructed null hypothesis population. Here we have used the fact that
the standard deviation in our constructed step 1 population turns out to be
S, the standard deviation of the original data. This is true because the only
difference between the original sample and the constructed population is
that we have added .35 to each of the sample points, which does not change
the standard deviation.
We have already seen that the sample standard deviation is S ⳱ 0.73.
Thus the theoretical standard error of a bootstrap sampled X is given by
0.73
S
⳱
⳱ 0.0640
冪sample size
冪130
Thus we have the two quantities needed to standardize 98.25. Note that
they are quite close to the bootstrap-sampled values 98.6084 and 0.0606 that
we used before.
These results show that in the equation for z in step 5 we can replace
the bootstrapped 98.6048 by the population mean under H0 , which is 98.6,
and we can replace the bootstrap-based standard deviation of X’s, namely
0.0606, by the sample standard deviations of the obtained data divided by
冪130, which we have seen is 0.0640. The new standardized z is then
z⳱
⳱
obtained mean ⫺ theoretical mean under H0
S/冪n
98.25 ⫺ 98.6
⳱ ⫺5.47
0.0640
The probability of having a standard normal less than ⫺5.47 is essentially 0,
so we again conclude that the null hypothesis is not plausible, and we have
saved a lot of time.
The z Test
We now have an alternative approach to testing the null hypotheses in
Section 12.1, one that requires no bootstrap sampling. It works well when
our interest is in making inferences about the mean of the population and
when the sample size is reasonably large—say, n ⱖ 30—or for smaller n
when the original data set is roughly shaped like a normal density. The basic
change is in steps 4 and 5. We will illustrate by repeating the example of
the husbands’ and wives’ years of education using this normal-curve-based
approach.
The null hypothesis is exactly the same:
H0 : The average difference in education for the population
of Illinois husband-wife couples is 0.
The six steps are next.
1. Choice of a Model (Population): The population represents the difference (husband ⫺ wife) in number of years of education of husband-wife
couples in Illinois, shifted so that the average of these differences in the population is 0—that is, so that H0 holds. Since we will not be doing bootstrap
sampling from a null hypothesis population, we do not need to explicitly
build a null hypothesis box model.
2. Definition of a Trial (Sample): Under the general bootstrap approach,
a trial would consist of randomly choosing 177 differences without replacement from the large created null hypothesis population. Again, we will not
be doing simulation trials, so we can skip this step.
3. Definition of a Successful Trial The trial is a success if the average of
the 177 differences is larger than the observed average difference 0.24. We
will use this 0.24 in step 5, but we will not be doing the simulation trials of
steps 2 through 4.
4. Repetition of Trials: Instead of actually bootstrap-sampling 100 times
from the null hypothesis population and finding the means, we use our new
z-test approach, which bypasses bootstrap sampling and yields theoretically
justified means and standard deviations for centering the z statistic. In
particular, we need to standardize the observed sample mean of 0.24. The
centering is, as we learned, at the theoretical mean of the null hypothesis
population, namely, 0. The estimated standard error of the sample mean
that we will divide by is given by
SD of the 177 difference
2.58
2.58
⳱
⳱
⳱ 0.194
13.304
冪sample size
冪177
(We omit details showing that the standard deviation of the 177 differences
is 2.58.) Thus we do not repeat the sampling of 177 students 100 times!
5. Estimation of the Probability of the Obtained Average or More
(Probability of a Successful Trial): We want to know the chance of
obtaining an average difference as large as or larger than 0.24. Because we
are using the normal curve with mean 0 and standard deviation 0.194, we
have to standardize:
0.24 ⫺ 0
⳱ 1.24
z⳱
0.194
The area under the normal curve below 1.24 is 0.8925, so the area above 1.24
is 0.1075. By standardizing z and using a normal distribution table, then,
we have avoided the process of simulating trials to obtain the experimental
probability of success.
6. Decision: If the null hypothesis is true, the chance that a bootstrap
sample mean difference is as high as 0.24 is estimated to be about 0.1.
(Compare that with the 0.09 we found in Section 12.1 using bootstrap
sampling.) Again, we will decide to accept (barely) the null hypothesis that
the average difference in the population is 0.
We now are in the same place with z testing as with doing chi-square
testing using a chi-square table. Namely, no simulation is required in order to
carry out the z test. This z test is heavily used in statistics. It applies whenever
the null hypothesis concerns the population mean and whenever the sample
size is fairly large (n ⱖ 30 is the convention usually followed by professional
statisticians) or the population itself is known to be approximately normal
in shape and hence n can be small.
You might ask whether the bootstrap method of Section 12.1 is ever
the preferred method. The answer is a definite yes. Whenever the shape
of the population cannot be assumed to be roughly normal and the size
of the sample is well less than 30, the method of Section 12.1 is the one
many statisticians would use to test a hypothesis about the population
mean. This situation occurs often in statistical applications. Indeed, this
bootstrap approach is becoming a keystone of modern statistics, because
computer power is inexpensive and widely available, thus allowing fast
and inexpensive simulations whenever needed.
Between the bootstrap method of Section 12.1 and the z test of Section 12.2
we have two methods from which we can always choose one for any
test involving a hypothesis concerning a population mean. Thus you are
empowered to test a hypothesis about a population mean in any setting.
Sometimes both methods are appropriate, and one can use both and compare
their answers.
SECTION 12.2 EXERCISES
1. The popularity of a television show is found
by taking a random sample of 100 households
around the country. The producers of a particular show believe the rating for their show is
33, meaning 33% of the TV viewers are tuned
to that show. For the sample of 100 households, the mean rating was 31 with a standard
deviation of 4.3. Do you believe the producers
are correct?
2. It is believed that the length of a particular
microorganism is 25.5 micrometers. A set of
64 independent measurements of the length
of the microorganism is made. The mean
value of the measurements is 27.5 micrometers with a standard deviation of 3.2 micrometers. Is the length of the microorganism
really 25.5 micrometers?
3. The chancellor of the University of Illinois at
Urbana believes that undergraduates study,
on average, 20 hours per week. A random
sample of 40 students found the mean study
time was 19.5 hours per week with a standard
deviation of 4.05 hours. Do you believe the
chancellor is correct?
4. For Exercises 1–2 in Section 12.1, use the z
test to determine whether the true population
mean is 69 inches. Is your answer the same as
before?
5. For Exercises 3–4 in Section 12.1, use the z
test to determine whether the true population
mean is 4200 pounds. Is your answer the same
as before?
For additional exercises, see page 731.
12.3 MAKING A WRONG DECISION
In Section 12.1 we decided that the average body temperature in the
population was less than 98.6 and that the difference in years of education
between the husbands and wives could be 0. Were we correct? It is
impossible to know without actually surveying the entire populations in
question (usually a practical impossibility), but we certainly could have
made a mistake. There are two types of errors one can make:
䢇
䢇
Type I error: rejecting the null hypothesis when the null hypothesis is
true
Type II error: accepting the null hypothesis when the null hypothesis
is not true
Unfortunately “type I error” and “type II error” are not well-chosen terms
in the sense of being easy to memorize, but they are what statisticians say!
As we have already stated, it is fairly standard practice in statistical
work to want the probability of making a type I error to be smaller than
some small probability, such as 0.10 or 0.05 or 0.01. This chosen value is
called the level of significance. Usually 0.05 is used. The way to guarantee
that your chance of making a type I error is less than 0.05, say, is to reject
the null hypothesis only when the probability is less than 0.05 that the null
hypothesis model could yield a result as extreme as or more extreme than
your data. In this case we say that the data are statistically significant at
level 0.05. In the body temperature example in Sections 12.1 and 12.2, we
figured that the chance of obtaining an average temperature as low as or
lower than the data’s 98.25 was essentially 0. Because 0 is well below 0.05,
it is safe to reject the null hypothesis that the average body temperature
in the population is 98.6. By contrast, in the educational levels example of
Section 12.2, the chance of obtaining an average difference between husband
and wife as large as or larger than the data’s 0.24 was 0.09 or 0.10, depending
on which method one relies on. Because these values are greater than 0.05,
we have to “accept” the null hypothesis that the husband-wife difference
is 0, knowing we are not sure that the hypothesis is in fact true but lacking
strong evidence that it is not.