Download z 12.2 HE TEST OF THE POPULATION MEAN, Introduction

12.2 THE z TEST OF THE POPULATION MEAN, ␮ Introduction Consider a null hypothesis bootstrap step 1 population, constructed as in Section 12.1. In the central limit theorem of Chapter 11, bootstrap sample means approximately follow a standard normal density provided we standardize using the bootstrap population’s theoretical mean and standard deviation and the number of observations n used to compute the sample means is at least 20. In the hypothesis-testing problem of Figure 12.1, we note that the relative frequency histogram of the sample data looks roughly like a normal density. Although not discussed in Chapter 11, when such a rough resemblance to the shape of a normal distribution holds, sample means will approximately follow a normal density even when the number of observations n used to compute each such mean is much less than 20. Indeed, the closer the shape of the original data is to normal (in turn suggesting a population distribution roughly normally shaped), the smaller the sample size can be with the sample means still approximately obeying a normal distribution. Suppose we believe that the bootstrap sample means approximately obey a normal distribution, because of the central limit theorem when the sample size is large or because we have evidence that the population distribution is itself roughly normal. Then to test hypotheses about the population mean we can use the normal table (Appendix E) to tell us about the probability behavior of the bootstrap sample means under the null hypothesis rather than do 100 bootstrap simulations of sample means as we did in the previous section. Recall that we made an analogous decision to use the chi-square table to test hypotheses in Chapter 7 because the relative frequency histograms of the simulated chi-square statistics appeared to be similar to the smooth chi-square density. That decision is also supported by a theory of the same kind as the central limit theorem, and hence using the chi-square density to compute probabilities was also theoretically justified. At step 5, then, instead of counting the proportion of the bootstrapsimulated means that are too big (or too small), we will theoretically compute this proportion using the normal density of Chapter 8. But in order to use the normal curve, we need a mean and standard deviation for standardizing the observed sample mean. We will now address the Key Problem hypothesis using a normal distribution based z test. Consider how we must modify the six-step solution to the Key Problem presented in Section 12.1. In step 4 the 100 bootstrap-simulated null hypothesis means have a mean of 98.6048 and a standard deviation of 0.0606. Thus we do have available the needed mean and standard deviation to use for standardizing the observed sample mean. Now we can substitute the following for step 5: New Step 5. Estimation of the Probability of the Obtained Average or Less (the Probability of a Successful Trial): In order to compute a probability for a statistic that obeys a normal distribution, we must first standardize the statistic by subtracting an appropriate mean and dividing by an appropriate standard deviation, namely the standard error of the statistic. The most straightforward way to do this when the statistic has been repeatedly simulated is to subtract the sample average of the statistic and divide by the sample standard deviation. As the result of sampling from the null hypothesis population, we have obtained 100 bootstrap-sampled means. Thus we can standardize the obtained mean of 98.25 by centering at (subtracting) the grand sample mean of the 100 bootstrapped means and normalizing (dividing) by the sample standard deviation of these 100 bootstrapped means. That is, we obtain the z statistic: z⳱ obtained mean ⫺ mean of 100 bootstrapped means sample standard deviation of 100 bootstrapped means 98.25 ⫺ 98.6048 ⳱ ⫺5.85 .0606 Now we apply the central limit theorem to conclude under H0 that such a z is approximately standard normal and hence we use the normal table (Appendix E) to find the probability that a standard normal variable is less than ⫺5.85. Appendix E shows that this probability is essentially 0—that the chance that a bootstrap simulated mean is less than or equal to the obtained mean 98.25 is essentially 0. (Note that using the regular step 5 based on bootstrap simulations also gave the same answer of 0.) ⳱ Step 6. Decision: We make the same decision as in step 6 of Section 12.1 because a z of ⫺5.85 implies the probability is essentially 0 that a bootstrap sample mean is ⱕ 98.25, assuming H0 . We reject the null hypothesis and declare that the population mean is not 98.6. The new step 5, which took advantage of the fact that bootstrap sample means follow a normal distribution, did not save much time, because in order to find the standardizing mean and standard deviation of the sample mean we still had to bootstrap-sample the 100 means from the invented null population, and doing so required 100 ⫻ 130 ⳱ 13,000 simulated observations. The good news is that we can deduce what theoretical mean and standard deviation should be used to standardize 98.25, based only on the actual 130 obtained data points—that is, without doing the time-consuming 100 bootstrap simulations (needing 13,000 body temperatures). Recall from Section 8.6 that when standardizing to produce a statistic it is better, when possible, to center and divide by the theoretical mean and theoretical standard deviation of that statistic, which here is SD/ 冪sample size, where SD denotes the theoretical standard deviation of the null hypothesis step 1 box model population. Can we do so? A review of Section 11.5 shows us we can. There we learned to center X at the mean of the population being sampled from. In our setting, we have constructed the step 1 population to have its mean given by the null hypothesis, which is 98.6. Moreover, we learned to divide by SD/ 冪sample size, the theoretical standard error of the constructed null hypothesis population. Here we have used the fact that the standard deviation in our constructed step 1 population turns out to be S, the standard deviation of the original data. This is true because the only difference between the original sample and the constructed population is that we have added .35 to each of the sample points, which does not change the standard deviation. We have already seen that the sample standard deviation is S ⳱ 0.73. Thus the theoretical standard error of a bootstrap sampled X is given by 0.73 S ⳱ ⳱ 0.0640 冪sample size 冪130 Thus we have the two quantities needed to standardize 98.25. Note that they are quite close to the bootstrap-sampled values 98.6084 and 0.0606 that we used before. These results show that in the equation for z in step 5 we can replace the bootstrapped 98.6048 by the population mean under H0 , which is 98.6, and we can replace the bootstrap-based standard deviation of X’s, namely 0.0606, by the sample standard deviations of the obtained data divided by 冪130, which we have seen is 0.0640. The new standardized z is then z⳱ ⳱ obtained mean ⫺ theoretical mean under H0 S/冪n 98.25 ⫺ 98.6 ⳱ ⫺5.47 0.0640 The probability of having a standard normal less than ⫺5.47 is essentially 0, so we again conclude that the null hypothesis is not plausible, and we have saved a lot of time. The z Test We now have an alternative approach to testing the null hypotheses in Section 12.1, one that requires no bootstrap sampling. It works well when our interest is in making inferences about the mean of the population and when the sample size is reasonably large—say, n ⱖ 30—or for smaller n when the original data set is roughly shaped like a normal density. The basic change is in steps 4 and 5. We will illustrate by repeating the example of the husbands’ and wives’ years of education using this normal-curve-based approach. The null hypothesis is exactly the same: H0 : The average difference in education for the population of Illinois husband-wife couples is 0. The six steps are next. 1. Choice of a Model (Population): The population represents the difference (husband ⫺ wife) in number of years of education of husband-wife couples in Illinois, shifted so that the average of these differences in the population is 0—that is, so that H0 holds. Since we will not be doing bootstrap sampling from a null hypothesis population, we do not need to explicitly build a null hypothesis box model. 2. Definition of a Trial (Sample): Under the general bootstrap approach, a trial would consist of randomly choosing 177 differences without replacement from the large created null hypothesis population. Again, we will not be doing simulation trials, so we can skip this step. 3. Definition of a Successful Trial The trial is a success if the average of the 177 differences is larger than the observed average difference 0.24. We will use this 0.24 in step 5, but we will not be doing the simulation trials of steps 2 through 4. 4. Repetition of Trials: Instead of actually bootstrap-sampling 100 times from the null hypothesis population and finding the means, we use our new z-test approach, which bypasses bootstrap sampling and yields theoretically justified means and standard deviations for centering the z statistic. In particular, we need to standardize the observed sample mean of 0.24. The centering is, as we learned, at the theoretical mean of the null hypothesis population, namely, 0. The estimated standard error of the sample mean that we will divide by is given by SD of the 177 difference 2.58 2.58 ⳱ ⳱ ⳱ 0.194 13.304 冪sample size 冪177 (We omit details showing that the standard deviation of the 177 differences is 2.58.) Thus we do not repeat the sampling of 177 students 100 times! 5. Estimation of the Probability of the Obtained Average or More (Probability of a Successful Trial): We want to know the chance of obtaining an average difference as large as or larger than 0.24. Because we are using the normal curve with mean 0 and standard deviation 0.194, we have to standardize: 0.24 ⫺ 0 ⳱ 1.24 z⳱ 0.194 The area under the normal curve below 1.24 is 0.8925, so the area above 1.24 is 0.1075. By standardizing z and using a normal distribution table, then, we have avoided the process of simulating trials to obtain the experimental probability of success. 6. Decision: If the null hypothesis is true, the chance that a bootstrap sample mean difference is as high as 0.24 is estimated to be about 0.1. (Compare that with the 0.09 we found in Section 12.1 using bootstrap sampling.) Again, we will decide to accept (barely) the null hypothesis that the average difference in the population is 0. We now are in the same place with z testing as with doing chi-square testing using a chi-square table. Namely, no simulation is required in order to carry out the z test. This z test is heavily used in statistics. It applies whenever the null hypothesis concerns the population mean and whenever the sample size is fairly large (n ⱖ 30 is the convention usually followed by professional statisticians) or the population itself is known to be approximately normal in shape and hence n can be small. You might ask whether the bootstrap method of Section 12.1 is ever the preferred method. The answer is a definite yes. Whenever the shape of the population cannot be assumed to be roughly normal and the size of the sample is well less than 30, the method of Section 12.1 is the one many statisticians would use to test a hypothesis about the population mean. This situation occurs often in statistical applications. Indeed, this bootstrap approach is becoming a keystone of modern statistics, because computer power is inexpensive and widely available, thus allowing fast and inexpensive simulations whenever needed. Between the bootstrap method of Section 12.1 and the z test of Section 12.2 we have two methods from which we can always choose one for any test involving a hypothesis concerning a population mean. Thus you are empowered to test a hypothesis about a population mean in any setting. Sometimes both methods are appropriate, and one can use both and compare their answers. SECTION 12.2 EXERCISES 1. The popularity of a television show is found by taking a random sample of 100 households around the country. The producers of a particular show believe the rating for their show is 33, meaning 33% of the TV viewers are tuned to that show. For the sample of 100 households, the mean rating was 31 with a standard deviation of 4.3. Do you believe the producers are correct? 2. It is believed that the length of a particular microorganism is 25.5 micrometers. A set of 64 independent measurements of the length of the microorganism is made. The mean value of the measurements is 27.5 micrometers with a standard deviation of 3.2 micrometers. Is the length of the microorganism really 25.5 micrometers? 3. The chancellor of the University of Illinois at Urbana believes that undergraduates study, on average, 20 hours per week. A random sample of 40 students found the mean study time was 19.5 hours per week with a standard deviation of 4.05 hours. Do you believe the chancellor is correct? 4. For Exercises 1–2 in Section 12.1, use the z test to determine whether the true population mean is 69 inches. Is your answer the same as before? 5. For Exercises 3–4 in Section 12.1, use the z test to determine whether the true population mean is 4200 pounds. Is your answer the same as before? For additional exercises, see page 731. 12.3 MAKING A WRONG DECISION In Section 12.1 we decided that the average body temperature in the population was less than 98.6 and that the difference in years of education between the husbands and wives could be 0. Were we correct? It is impossible to know without actually surveying the entire populations in question (usually a practical impossibility), but we certainly could have made a mistake. There are two types of errors one can make: 䢇䢇 Type I error: rejecting the null hypothesis when the null hypothesis is true Type II error: accepting the null hypothesis when the null hypothesis is not true Unfortunately “type I error” and “type II error” are not well-chosen terms in the sense of being easy to memorize, but they are what statisticians say! As we have already stated, it is fairly standard practice in statistical work to want the probability of making a type I error to be smaller than some small probability, such as 0.10 or 0.05 or 0.01. This chosen value is called the level of significance. Usually 0.05 is used. The way to guarantee that your chance of making a type I error is less than 0.05, say, is to reject the null hypothesis only when the probability is less than 0.05 that the null hypothesis model could yield a result as extreme as or more extreme than your data. In this case we say that the data are statistically significant at level 0.05. In the body temperature example in Sections 12.1 and 12.2, we figured that the chance of obtaining an average temperature as low as or lower than the data’s 98.25 was essentially 0. Because 0 is well below 0.05, it is safe to reject the null hypothesis that the average body temperature in the population is 98.6. By contrast, in the educational levels example of Section 12.2, the chance of obtaining an average difference between husband and wife as large as or larger than the data’s 0.24 was 0.09 or 0.10, depending on which method one relies on. Because these values are greater than 0.05, we have to “accept” the null hypothesis that the husband-wife difference is 0, knowing we are not sure that the hypothesis is in fact true but lacking strong evidence that it is not.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download z 12.2 HE TEST OF THE POPULATION MEAN, Introduction