Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4. Repetition of Trials: Instead of actually sampling 100 times from the population and finding the means, we imagine what the results of such sampling would be. We can skip to step 5 and use a t table instead of doing simulations to evaluate p(t ⬎ 1.32). 5. Estimation of the Probability of the Obtained Average or More (Probability of a Successful Trial): We want to know the chance of obtaining an average increase in number of hours of sleep as large as or larger than 0.75. This will equal the chance that when we sample from a population with a mean of 0, the t statistic is as large as or larger than 1.32. We have to look to the t table given by Appendix F. As with the chi-square tables, we need to know the number of degrees of freedom (df). Recall that n denotes our sample size. For this t test the number of degrees of freedom is n ⫺ 1, which in this case is 9. Looking in the t table, we see that for df ⳱ 9, the number 1.38 is in the column under 0.10. What this means is that the chance that the t statistic is above 1.38 is 0.1. We want the chance that the t statistic is above 1.32, which must be slightly more than 0.1. 6. Decision: We found in step 5 that the probability of obtaining from the null-hypothesis population a t statistic of 1.38 or greater is 0.1. Thus we decide to accept the null hypothesis: we are not convinced that the dextro treatment does any good. SECTION 12.5 EXERCISES 1. For the data in Table 12.4, test whether the mean population increase in hours due to the laevo drug is 0. The mean of the 10 observations is 2.33, and the standard deviation is 1.90. 2. For the data in Table 12.4, test whether the mean population difference in hours of sleep between laevo and dextro is 0. The mean of the 10 differences is 1.58, and the standard deviation is 1.17. 3. A farmer believes the corn yield from his land will have a mean of 50 bushels per acre. In a random sample of eight plots, the farmer finds a mean yield of 49.25 bushels per acre with standard deviation of 3.25 bushels per acre. Test the farmer’s claim. (The data have an approximately normal histogram.) 4. Suppose the producer of cattle feed in Exercise 1 of Section 12.4 claims that the mean weight gain will be 160 pounds, not 100 pounds. Use the data from the problem to test the company’s claim. For additional exercises, see page 731. 12.6 COMPARING TWO POPULATION MEANS The Bootstrap Approach Until now we have tested hypotheses for one population. But the most common statistical problem is to compare two population means. That is the topic of this section. A small study was conducted several years ago to determine whether taking the drug LSD has deleterious effects on one’s chromosomes. Four users of LSD and four controls (nonusers) were studied. A sample of cells was taken from each person, and the percentage of cells in which there was chromosomal breakage was recorded. Here are the data: Controls: Users: 3.3 0.9 4.8 2.6 6.4 3.4 7.1 11.5 Mean ⳱ 5.40 Mean ⳱ 4.60 SD ⳱ 1.47 SD ⳱ 4.08 That is, the first control had breakage in 3.3% of the studied cells, the second control had 4.8% breakage and so on. Looking at the mean breakages, we see that actually the users had a slightly smaller amount of breakage. Is that difference due to chance? Because there is such a small sample, one would be inclined to say yes. But how can one formally test the hypothesis? What, precisely, is the null hypothesis? It is that the mean breakages in the two populations are equal, the populations being the population of users and the population of controls: H0 : The mean breakage in the control population ⳱ the mean breakage in the user population We will focus on the difference between sample means of the controls and the users in the sample: 5.40 ⫺ 4.60 ⳱ 0.80. Is this difference small enough to be compatible with the null hypothesis? What is the chance of getting a difference that large if the two population means are the same? Let’s turn to the six steps. There are some modifications because we have two populations to worry about instead of just one. 1. Choice of a Model (Two Populations): We have two populations now, the controls and the users. The null hypothesis insists only that the two populations have the same mean; beyond that, we merely expect the populations to have the same distributional shapes as the samples. We have no reason to assume that the populations are normal in shape, so we will not try a t-test approach. Instead, we will use a bootstrap approach, as introduced in Section 12.1. We invent two populations. The control population will be the invented population obtained by replicating the control data: Invented control population 3.3 4.8 6.4 7.1 3.3 4.8 6.4 7.1 3.3 4.8 6.4 7.1 ⭈⭈⭈ ⭈⭈⭈ ⭈⭈⭈ ⭈⭈⭈ 3.3 4.8 6.4 7.1 The mean of that invented control population is still 5.40. In order for our invented user population to have that same mean, we have to add 0.80 to each value in the sample of users before replicating, so that the new values are 0.9 Ⳮ 0.8 ⳱ 1.7, 2.6 Ⳮ 0.8 ⳱ 3.4, 3.4 Ⳮ 0.8 ⳱ 4.2, and 11.5 Ⳮ 0.8 ⳱ 12.3. These new numbers have a mean of 5.40, as desired. Thus we now have two populations with the same population mean of 5.40. Moreover, we have created two populations whose shapes closely approximate the unknown shapes of the true populations. This is true because the two sample histograms that form our created populations are in fact good estimates of the unknown population histograms. Invented user population 1.7 3.4 4.2 12.3 1.7 3.4 4.2 12.3 1.7 3.4 4.2 12.3 ⭈⭈⭈ ⭈⭈⭈ ⭈⭈⭈ ⭈⭈⭈ 1.7 3.4 4.2 12.3 2. Definition of a Trial (Sample): A trial consists of randomly choosing without replacement four persons from the control population and four from the user population. (We could have avoided the creation of the large null hypothesis populations and instead sampled with replacement from small populations—review Section 12.1 if needed to understand this assertion.) 3. Definition of a Successful Trial: The statistic of interest is the difference between the means of the four controls and four users sampled in step 2. A trial is a success if the difference is 0.8 or more. 4. Repetition of Trials: We repeat steps 2 and 3 100 times. Thus we obtain 100 simulated differences in means. Table 12.5 gives the stem-and-leaf plot of the 100 simulated differences. 5. Estimation of the Probability of the Obtained Difference in Means or More (Probability of a Successful Trial): The difference in means obtained from the original data was 0.8. If we look at the stem-and-leaf plot of step 4, we see that 36 of the simulated differences were 0.8 or more. 6. Decision: From step 5 we estimate a 0.36 chance that under the null hypothesis, the difference in the means will be as large as or larger than what we obtained. That is, it is not at all unusual to see a difference of 0.8 when the populations have the same mean. Thus we accept the null hypothesis. Now is a good time to reiterate that accepting the null hypothesis is not the same as believing that the null hypothesis is actually true. All we have done is shown that the null hypothesis and the data are compatible. It could very well be that the reason we do not see much difference in Table 12.5 One Hundred Simulated Mean Differences for Sleep Example ⫺6 ⫺5 ⫺4 ⫺3 ⫺2 ⫺1 ⫺0 0 1 2 3 4 5 4 Key: “⫺6 310 8887652210 9754222 65433220 9766664443332222221100 122223344456689 022333378888 000122456788899 002448 0 4” stands for ⫺6.4. the means is that we have too small a sample. This analysis does not prove that LSD has no effects on chromosomal breakage. Nor does it prove that LSD does effect chromosomal breakage. We cannot conclude much of anything! In fact, the actual means suggest that LSD could actually retard chromosomal damage—something that is very unlikely to be true. Perhaps the real problem is the very small sample size of four from each population used! Larger Sample Sizes: Another z Test In Section 11.7 we learned that if both sample sizes are fairly large (each ⱖ 30 is a common rule), the difference in the two sample means is approximately normally distributed. Hence, if the sample sizes are fairly large, we can use this fact to find a z statistic to test a hypothesis involving the equality of two population means. Let’s consider an example. A beginning statistics class (STAT 100) at the University of Illinois at Urbana-Champaign had 104 students, 64 women and 40 men. The average percentage on homework assignments among the women was 78.56, and that among the men was 75.04. Thus the women did on average 3.52 percentage points better than the men. Was this due to chance, or do women in general perform better on the homework? The standard deviations were 25.08 for the men and 19.54 for the women. We first have to decide what the populations we are trying to compare are. The students in the class are not a random sample from the entire population of the United States, nor even of the university. It does seem reasonable to think of these people as a random sample of all the people who take STAT 100 now or will do so in the near future. We will go on that supposition. The null hypothesis is that the average percentage scores on homework are the same for the populations of women and of men: H0 : The mean homework score for the population of women ⳱ the mean homework score for the population of men We will follow the six steps, except that we will introduce a new z statistic to replace the bootstrap simulations. That is, we will develop a z test for comparing the means. Although good practice for educational purposes to develop the bootstrap-based steps 1–4, we will, as before with z-testing, not need to do any simulations. 1. Choice of a Model (Populations): The two populations are the women who take STAT 100 and the men who take STAT 100. The null hypothesis is that the two populations have the same theoretical mean. We invent two populations so that they have a common theoretical mean. The population of men is a large replication of the sample of 40 men. Their mean is 75.04, so to create a population of women with the same mean, we have to subtract 3.52 from each woman’s score before replication, and then replicate the new data as often as we did for the male population (maybe 500 or 1000 times for each original observation). 2. Definition of a Trial (Sample): A trial consists of randomly choosing 64 students without replacement from the women and 40 without replacement from the men. (Again, we could have replaced replication and sampling without replacement with no replication and sampling with replacement.) 3. Definition of a Successful Trial: The statistic is the difference between the mean of the 40 men and that of the 64 women sampled in step 2. A trial is a success if the observed difference exceeds 3.52. 4. Repetition of Trials: Instead of actually bootstrap sampling from the step 1 population, we appeal to theory, which tells us what the theoretical average and standard deviation of such differences of two sample means are. The theoretical mean of the differences in the two sample means is 0 under the null hypothesis. The theoretical standard deviation of the differences in the two sample means is 冪theoretical variance of difference in means ⳱ 冪 (theoretical SD of men)2 (theoretical SD of women)2 Ⳮ number of men number of women This formula is based on a result that is fully developed in Chapter 14 and also was discussed in Section 11.1. Here we give a brief partial justification. First note that by theoretical SD of men we mean the theoretical SD of the male population, and similarly for the theoretical standard deviation of the women. Chapter 14 will show that the theoretical variance of either the sum or difference of two independent statistics is the sum of the individual variances. Thus Theoretical variance of (sample average of men ⫺ sample average of women) ⳱ theoretical variance of sample average of men Ⳮ theoretical variance of sample average of women ⳱ theoretical variance of male population theoretical variance of female population Ⳮ number of men in sample number of women in sample ⳱ (theoretical SD of men)2 (theoretical SD of women)2 Ⳮ number of men in sample number of women in sample The women’s and men’s theoretical standard deviations in the formula are estimated from the corresponding sample standard deviations computed from the women’s and men’s data. Thus for our data Estimated SD of the differences in means ⳱ 冪 (25.08)2 (19.54)2 Ⳮ 40 64 ⳱ 冪21.69 ⳱ 4.66. Now we can define the z statistic, using the fact that the subtracted 0 is the hypothesized difference under the null hypothesis: z⳱ Difference in sample means ⫺ 0 Sample SD of difference in means 3.52 4.66 ⳱ 0.76 ⳱ 5. Estimation of the Probability of the Obtained Difference in Means or More (Probability of Success): We wish to calculate the chance that the difference in means is equal to or exceeds 3.52, which is approximately the chance that a standard normal exceeds z ⳱ 0.76, so all we need to do is look up 0.76 in the normal table (Appendix E). The area to the right of 0.76 is 0.7764, so the area to the left is 0.2236. That is, there is approximately a 0.22 chance of seeing a difference of 3.52 just by chance. 6. Decision: The chance calculated in step 5, 0.22, shows that it is not unusual to see such a difference when the null hypothesis that the populations have the same means is true. Thus we accept the null hypothesis. As an educational “thought experiment,” imagine how you would do a bootstrap-based test of the above hypothesis. SECTION 12.6 EXERCISES 1. To test the effectiveness of the new weightloss drug Redux, 40 women were split into two groups: group A, the control group, took a placebo drug, and group B, the experimental group, took Redux. The amount of weight lost by each member of the groups over a sixmonth period is given below. Explain how you would test the claim that the population mean weight loss of the two groups is the same. (Hint: You cannot use the z test because the sample sizes are not large enough.) Group A: 3 4 5 6 7 10 11 12 15 18 19 20 23 24 25 30 33 38 40 42 Group B: 5 5 7 7 10 10 10 10 15 18 20 24 28 29 38 42 44 49 50 55 2. The following is a stem-and-leaf plot of 100 simulated differences between the population means of the amount of weight lost by the two groups in Exercise 1 (group A ⫺ group B). (Group A was used as the control group in the simulation.) Each difference is based on a bootstrap sample of 20 from the invented control population and 20 from the invented treatment population. Using the stem-andleaf plot, test the claim that the difference between the population means is 0. Stem Leaf Stem Leaf ⫺11 ⫺10 ⫺9 ⫺8 ⫺7 ⫺6 ⫺5 ⫺4 ⫺3 ⫺2 ⫺1 32 54 0 20 74322 840 955431 987443 87630 88775441 885410 ⫺0 0 1 2 3 4 5 6 7 8 993 00126789 001356679 02233589 00335 02244478 45679 5799 234 0 Key: “8 0” stands for 8.0. 3. Scientists want to study the effect of exercise on the amount of weight loss. One hundred people are randomly divided into two equal groups. Both groups follow the same diet plan, but the first group also follows an exercise program. In the exercise group the mean weight lost over three months was 25.2 pounds with a standard deviation of 10 pounds. In the nonexercise group the mean weight lost was 20.4 pounds with a standard deviation of 6.3 pounds. Test the claim that the difference between the population means is 0. 4. On a college campus, a professor wanted to test the effect on learning of using a computer program to teach calculus. In his first class of 43 students, he taught using the computer as an instructional aid. In his second class of 35 students, he taught without using the computer. On the final exam the computer class scored a mean of 78.32 with a standard deviation of 8.07 while the noncomputer class scored a mean of 80.41 with a standard deviation of 8.53. Test the claim that the difference of the population means is 0. 5. Fifty independent measurements of the weight of a chemical compound were made on each of two scales. On the first scale the mean of the 50 measurements was 19.45 grams with a standard deviation of 0.49 gram. On the second scale, the mean of the 50 measurements was 18.42 grams with a standard deviation of 0.27 gram. Test the claim that the difference between the population means is 0. 6. One hundred and fifty high school students were randomly assigned to two groups of 75. Group A used a new math text, and group B used the usual math text. The mean and standard deviation of the SAT math scores for group A were 549.45 and 21.12, and the mean and standard deviation for group B were 539.25 and 20.91. Test the claim that the difference between the population means is 0. For additional exercises, see page 732.