* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 74 CHAPTER 3 Section 3.1 Solutions 3.1 This mean is a population
Survey
Document related concepts
Transcript
74 CHAPTER 3 Section 3.1 Solutions 3.1 This mean is a population parameter; notation is µ. 3.2 This correlation is a population parameter; notation is ρ. 3.3 This proportion is a sample statistic; notation is p̂. 3.4 This proportion is a population parameter; notation is p. 3.5 This mean is a sample statistic; notation is x. 3.6 This is a population parameter for a proportion, so the correct notation is p. We have p = 170, 000/78, 000, 000 = 0.00217. 3.7 This is a population parameter for a mean, so the correct notation is µ. We have µ = 30, 795/95 = 324.2 students as the average enrollment per charter school. 3.8 This is a sample statistic for a proportion, so the correct notation is p̂. We have p̂ = 0.82. 3.9 This is a sample statistic from a sample of size n = 200 for a correlation, so the correct notation is r. We have r = 0.037. 3.10 This is a sample statistic for a mean, so the correct notation is x. We have x = 13.10 phone calls a day. 3.11 This is a population parameter for a correlation, so the correct notation is ρ. We use technology to see that ρ = −0.131. 3.12 We expect the sampling distribution to be centered at the value of the population proportion, so we estimate that the population parameter is p = 0.30. The standard error is the standard deviation of the distribution of sample proportions. The middle of 95% of the distribution goes from about 0.16 to 0.44, about 0.14 on either side of p = 0.30. By the 95% rule, we estimate that SE ≈ 0.14/2 = 0.07. (Answers may vary slightly.) 3.13 We expect the sampling distribution to be centered at the value of the population mean, so we estimate that the population parameter is µ = 85. The standard error is the standard deviation of the distribution of sample means. The middle of 95% of the distribution goes from about 45 to 125, about 40 on either side of µ = 85. By the 95% rule, we estimate that SE ≈ 40/2 = 20. (Answers may vary slightly.) 3.14 We expect the sampling distribution to be centered at the value of the population mean, so we estimate that the population parameter is µ = 300. The standard error is the standard deviation of the distribution of sample means. The middle of 95% of the distribution goes from about 290 to 310, about 10 on either side of µ = 300. By the 95% rule, we estimate that SE ≈ 10/2 = 5. (Answers may vary slightly.) 3.15 We expect the sampling distribution to be centered at the value of the population proportion, so we estimate that the population parameter is p = 0.80. The standard error is the standard deviation of the distribution of sample proportions. The middle of 95% of the distribution goes from about 0.74 to 0.86, about 0.06 on either side of p = 0.80. By the 95% rule, we estimate that SE ≈ 0.06/2 = 0.03. (Answers may vary slightly.) CHAPTER 3 3.16 75 (a) We see in the sampling distribution that a sample proportion of p̂ = 0.1 is rare for a sample of this size but similar sample proportions occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally. (b) We see in the sampling distribution that a sample proportion of p̂ = 0.35 is not at all unusual with samples of this size, so this value is (i): reasonably likely to occur. (c) We see in the sampling distribution that there are no sample proportions even close to p̂ = 0.6 so this sample proportion is (iii): extremely unlikely to ever occur using samples of this size. 3.17 (a) We see in the sampling distribution that a sample mean of x = 70 is not unusual for samples of this size, so this value is (i): reasonably likely to occur. (b) We see in the sampling distribution that a sample mean of x = 100 is not unusual for samples of this size, so this value is (i): reasonably likely to occur. (c) We see in the sampling distribution that a sample mean of x = 140 is rare for a sample of this size but similar sample means occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally. 3.18 (a) We see in the sampling distribution that there are no sample means even close to x = 250 so this sample mean is (iii): extremely unlikely to ever occur using samples of this size. (b) We see in the sampling distribution that a sample mean of x = 305 is not unusual for samples of this size, so this value is (i): reasonably likely to occur. (c) We see in the sampling distribution that a sample mean of x = 315 is rare for a sample of this size but similar sample means occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally. 3.19 (a) We see in the sampling distribution that a sample proportion of p̂ = 0.72 is rare for a sample of this size but similar sample proportions occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally. (b) We see in the sampling distribution that a sample proportion of p̂ = 0.88 is rare for a sample of this size but similar sample proportions occurred several times in this sampling distribution. This value is (ii): unusual but might occur occasionally. (c) We see in the sampling distribution that there are no sample proportions even close to p̂ = 0.95 so this sample proportion is (iii): extremely unlikely to ever occur using samples of this size. 3.20 The population is all internet users in the US. The population parameter of interest is p, the proportion of internet users who have customized their home page. For this sample, p̂ = 469/1675 = 0.28. Unless we have additional information, the best point estimate of the population parameter p is p̂ = 0.28. To find p exactly, we would have to obtain information about the home page of every internet user in the US, which is unrealistic. 3.21 We are estimating p, the proportion of all US adults who own a laptop computer. The quantity that gives the best estimate is p̂, the proportion of our sample who own a laptop computer. The best estimate is p̂ = 1238/2252 = 0.55. Since the true proportion is unknown, our best estimate for the proportion comes from our sample. We estimate that 55% of all US adults own a laptop computer. 3.22 (a) We are estimating ρ, the correlation between pH and mercury levels of fish for all the lakes in Florida. The quantity that gives the best estimate is our sample correlation r = −0.575. We estimate that the correlation between pH levels and levels of mercury in fish in all Florida lakes is −0.575. 76 CHAPTER 3 (b) We use an estimate because it would be very difficult and costly to find the exact population correlation. We would need to measure the pH level and the mercury in fish level for all the lakes in Florida, and there are over 7700 of them. 3.23 (a) The value 30 is a population parameter and the notation is µ = 30. The value 27.90 is a sample statistic and the notation is x = 27.90. (b) The distribution will be bell-shaped and the center will be at the population mean of 30. The sample mean 27.90 would represent one point on the dotplot. (c) The dotplot will have 1000 dots and each dot will represent the mean for a sample of 75 co-payments. 3.24 (a) The two distributions centered at the population average are probably unbiased, distributions A and D. The two distributions not centered at the population average (µ = 2.61) are biased, dotplots B and C. The sampling for Distribution B gives an average too high, and has large households overrepresented. The sampling for Distribution C gives an average too low and may have been done in an area with many people living alone. (b) The larger the sample size the lower the variability, so distribution A goes with samples of size 100, and distribution D goes with samples of size 500. 3.25 (a) As the sample size goes up, the accuracy improves, which means the spread goes down. We see that distribution A goes with sample size n = 20, distribution B goes with n = 100, and distribution C goes with n = 500. (b) We see in dotplot A that quite a few of the sample proportions (when n = 20) are less than 0.25 or greater than 0.45, so being off by more than 0.10 would not be too surprising. While it is possible to be that far away in dotplot B (when n = 100), such points are much more rare, so it would be somewhat surprising for a sample of size n = 100 to miss by that much. None of the points in dotplot C are more than 0.10 away from p = 0.35, so it would be extremely unlikely to be that far off when n = 500. (c) Many of the points in dotplot A fall outside of the interval from 0.30 to 0.40, so it is not at all surprising for a sample proportion based on n = 20 to be more than 0.05 from the population proportion. Even dotplot B has quite a few values below 0.30 or above 0.40, so being off by more than 0.05 when n = 100 is not too surprising. Such points are rare, but not impossible in dotplot C, so a sample of size n = 500 might possibly give an estimate that is off by more than 0.05, but it would be pretty surprising. (d) As the sample size goes up, the accuracy of the estimate tends to increase. 3.26 The quantity we are trying to estimate is µm − µo where µm represents the average grade for all fourth-grade students who study mixed problems and µo represents the average grade for all fourth-grade students who study problems one type at a time. The quantity that gives the best estimate is xm − xo , where xm represents the average grade for the fourth-grade students in the sample who studied mixed problems and xo represents the average grade for the fourth-grade students in the sample who studied problems one type at a time. The best estimate for the difference in the average grade based on study method is xm − xo = 77 − 38 = 39. 3.27 The quantity we are trying to estimate is pa − pt where pa represents the proportion of adult cell phone users who text message and pt represents the proportion of teen cell phone users who text message. The quantity that gives the best estimate is p̂a − p̂t , where p̂a represents the proportion of the adult cell phone users in the sample of 2,252 who text message and p̂t represents the proportion of teen cell phone users in the sample of 800 who text message. The best estimate for the difference in the proportion who text is p̂a − p̂t = .72 − .87 = −0.15. CHAPTER 3 3.28 77 (a) We expect means of samples of size 30 to be much less spread out than values of budgets of individual movies. This leads us to conclude that Boxplot A represents the sampling distribution and Boxplot B represents the values in a single sample. We can also consider the shapes. Boxplot A appears to be symmetric and Boxplot B appears to be right skewed. Since we expect a sampling distribution to be symmetric and bell-shaped, Boxplot A is the sampling distribution and the skewed Boxplot B shows values in a single sample. (b) Boxplot B shows the data from one sample of size 30. Each data value represents the budget, in millions of dollars, for one Hollywood movie made in 2011. There are 30 values included in the sample. The budgets range from about 1 million to 145 million for this sample. We see in the boxplot that the median is about 30 million dollars. Since the data are right skewed, we expect the mean to be higher. We estimate the mean to be about 40 million or 45 million. This is the mean of a sample, so we have x ≈ 45 million dollars. (Answers may vary.) (c) Boxplot A shows the data from a sampling distribution using samples of size 30. Each data value represents the mean of one of these samples. There are 1000 means included in the distribution. They range from about 27 to 79 million dollars. The center of the distribution is a good estimate of the population parameter, and the center appears to be about µ ≈ 53 million dollars, where µ represents the mean budget, in millions of dollars, for all movies coming out of Hollywood in 2011. (Answers may vary.) 3.29 (a) Both distributions are centered at the population parameter, so 0.05. (b) The proportions for samples of size n = 100 go from about 0 to 0.12. The proportions for samples of size n = 1000 go from about 0.025 to 0.07. (c) The standard error for samples of size n = 100 is about 0.02 (since it appears that about 95% of the data are between 0.01 and 0.09.) The standard error for samples of size n = 1000 is about 0.005 (since it appears that about 95% of the data are between 0.04 and 0.06.) (d) A sample proportion of 0.08 is relatively likely from a sample of 100, but extremely unlikely with a sample size of 1,000. 3.30 (a) It is not unlikely to get a sample mean more then 2 screws on either side of 50. It is however very unlikely to see a mean below 45 or above 55, so it is unlikely for the sample mean to be more then 5 or 10 screws away. (b) The distribution shows that finding a mean number of screws equal to 42 from a sample of 10 boxes is very unlikely if the company’s claim is accurate, so, yes, it would be reasonable to conclude that the company’s claim is likely to be incorrect. (c) The sampling distribution shows us that a mean of 42 screws is very unlikely, but this does not imply that one box containing 42 screws is very unlikely. So a box of 42 screws does not give us information one way or another about the company’s claim. 3.31 (a) Answers will vary. Here is one possible set of randomly selected P oints values. x = 24.0 Points: 26, 18, 3, 16, 57 (b) Answers will vary. Here is another possible set of randomly selected P oints values. Points: 48, 34, 13, 18, 26 x = 27.8 (c) The mean number of points for all 24 players is µ = 26.46 points for the season. Most sample means found in parts (a) and (b) will be somewhat close to this but not exactly the same. (d) The distribution will be roughly symmetric with a peak at the center of 26.46. See the figure. 78 3.32 CHAPTER 3 (a) Answers will vary. Here is one sample: Minutes: 140.42,151.72,127.27,141.85,144.32,161.13,140.38,138.25,137.70,149.47 x = 143.3 (b) Answers will vary. Here is another sample: Minutes: 145.05,135.00,140.42,159.02,161.13,146.92,137.93,137.83,143.78,143.15 x = 145.0 (c) The mean of all times of the 76 finishers is µ = 141.1 minutes, or about 2 hours 21 minutes. The sample means found in parts (a) and (b) were probably close to this but not exactly the same. (d) The distribution will be roughly symmetric with a peak at the center of 141.1. See the figure. 3.33 Answers will vary, but a typical distribution is shown below. The smallest mean is just below 10 and the largest is just below 50 (but answers will vary). The standard deviation of these 1000 sample means is about 7.2. CHAPTER 3 79 3.34 Answers will vary, but a typical distribution is shown below. The smallest mean is about 134 minutes and the largest is about 148 minutes. The standard deviation of these sample means is about 2.2. 3.35 (a) This is a population proportion so the correct notation is p. We have p = 41/273 = 0.150. (b) We expect it to be symmetric and bell-shaped and centered at the population proportion of 0.150. 3.36 (a) This is a population proportion so the correct notation is p. We have p = 181/273 = 0.663. (b) We expect it to be symmetric and bell-shaped and centered at the population proportion of 0.663. 3.37 (a) The standard error is the standard deviation of the sampling distribution (given in the upper right corner of the sampling distribution box of StatKey) and is likely to be about 0.11. Answers will vary, but the sample proportions should go from 0 to about 0.5 (as in the cotplot below). In that case, the farthest sample proportion from p = 0.15 is p̂ ≈ 0.5, and it is 0.5 − 0.15 = 0.35 off from the correct population value. In other simulations the maximum proportion might be as high as 0.6 or even 0.7. (b) The standard error is the standard deviation of the sampling distribution and is likely to be about 0.08. Answers will vary, but the sample proportions should go from 0 to about 0.4 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.15 is p̂ ≈ 0.4, and it is 0.4 − 0.15 = 0.25 off from the correct population value. Some simulations might produce even larger discrepancies. (c) The standard error is the standard deviation of the sampling distribution and is likely to be about 0.05. Answers will vary, but the sample proportions should go from near 0 to about 0.3 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.15 is p̂ ≈ 0.3, and it is 0.5 − 0.15 = 0.15 off from the correct population value. Some simulations might have even larger discrepancies. (d) Accuracy improves as the sample size increases. The standard error gets smaller, the range of values gets smaller, and values tend to be closer to the population value of p = 0.150. 80 3.38 CHAPTER 3 (a) The standard error is the standard deviation of the sampling distribution (given in the upper right corner of the sampling distribution box in StatKey) and is likely to be about 0.15. Answers will vary, but the sample proportions should go from about 0.2 to about 1.0 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.663 is p̂ = 0.2, and it is 0.663 − 0.2 = 0.463 off from the correct population value. (b) The standard error is the standard deviation of the sampling distribution and is likely to be about 0.11. Answers will vary, but the sample proportions should go from about 0.35 to about 0.95 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.663 is p̂ = 0.35, and it is 0.663 − 0.35 = 0.313 off from the correct population value. (c) The standard error is the standard deviation of the sampling distribution and is likely to be about 0.06. Answers will vary, but the sample proportions should go from about 0.44 to about 0.84 (as shown in the dotplot below). In that case, the farthest sample proportion from p = 0.663 is p̂ = 0.44, and it is 0.663 − 0.44 = 0.223 off from the correct population value. (d) Accuracy improves as the sample size increases. The standard error gets smaller, the range of values gets smaller, and values tend to be closer to the population value of 0.663. CHAPTER 3 81 Section 3.2 Solutions 3.39 Using ME to represent the margin of error, an interval estimate for µ is x ± M E = 25 ± 3 so an interval estimate of plausible values for the population mean µ is 22 to 28. 3.40 Using ME to represent the margin of error, an interval estimate for p is p̂ ± M E = 0.37 ± 0.02 so an interval estimate of plausible values for the population proportion p is 0.35 to 0.39. 3.41 Using ME to represent the margin of error, an interval estimate for ρ is r ± M E = 0.62 ± 0.05 so an interval estimate of plausible values for the population correlation ρ is 0.57 to 0.67. 3.42 Using ME to represent the margin of error, an interval estimate for µ1 − µ2 is x1 − x2 ± M E = 5 ± 8 so an interval estimate of plausible values for the difference in population means is −3 to 13. 3.43 (a) Yes, plausible values of µ are values in the interval. (b) Yes, plausible values of µ are values in the interval. (c) No. Since 105.3 is not in the interval estimate, it is a possible value of µ but is not a very plausible one. 3.44 (a) No. Since 0.85 is not in the interval estimate, it is a possible value of p but is not a very plausible one. (b) Yes, plausible values of p are values in the interval. (c) No. Since 0.07 is so far out of the interval estimate, it is an extremely unlikely value of the population parameter p. 3.45 The 95% confidence interval estimate is p̂ ± 2 · SE = 0.32 ± 2(0.04) = 0.32 ± 0.08, so the interval is 0.24 to 0.40. We are 95% confident that the true value of the population proportion p is between 0.24 and 0.40. 3.46 The 95% confidence interval estimate is x ± 2 · SE = 55 ± 2(1.5) = 55 ± 3, so the interval is 52 to 58. We are 95% confident that the true value of the population mean µ is between 52 and 58. 3.47 The 95% confidence interval estimate is r ± 2 · SE = 0.34 ± 2(0.02) = 0.34 ± 0.04, so the interval is 0.30 to 0.38. We are 95% confident that the true value of the population correlation ρ is between 0.30 and 0.38. 3.48 The interval estimate is r ± margin of error = −0.46 ± 0.05, so the interval is -0.51 to -0.41. We are 95% confident that the true value of the population correlation ρ is between -0.51 and -0.41. 3.49 The 95% confidence interval estimate is (x1 − x2 ) ± margin of error = 3.0 ± 1.2, so the interval is 1.8 to 4.2. We are 95% confident that the true difference in the population means µ1 − µ2 is between 1.8 and 4.2 (which means we believe that the mean of population 1 is between 1.8 and 4.2 units larger than the mean of population 2.) 3.50 The interval estimate is (p̂1 − p̂2 ) ± margin of error = 0.08 ± 0.03, so the interval is 0.05 to 0.11. We are 95% confident that the true difference in population proportions p1 − p2 is between 0.05 and 0.11 (which means we believe that the proportion for population 1 is between 0.05 and 0.11 larger than the proportion for population 2.) 3.51 (a) The information is from a sample, so it is a statistic. It is a proportion, so the correct notation is p̂ = 0.30. 82 CHAPTER 3 (b) The parameter we are estimating is the proportion, p, of all young people in the US who have been arrested by the age of 23. Using the information in the sample, we estimate that p ≈ 0.30. (c) If the margin of error is 0.01, the interval estimate is 0.30 ± 0.01 which gives 0.29 to 0.31. Plausible values for the proportion p range from 0.29 to 0.31. (d) Since the plausible values for the true proportion are those between 0.29 and 0.31, it is very unlikely that the actual proportion is less than 0.25. 3.52 (a) The population is all people ages 18 and older living in the US. The sample is the 147,291 people who were actually contacted and asked whether or not they got health insurance from an employer. The parameter of interest is p, the proportion of the entire population of US adults who get health insurance from an employer. The relevant statistic is p̂ = 0.45, the proportion of people in the sample who get health insurance from an employer. (b) An interval estimate is found by taking the best estimate (p̂ = 0.45) and adding and subtracting the margin of error (±0.01). We are relatively confident that the population proportion is between 0.44 and 0.46, or that the percent of the entire population that receive health insurance from an employer is between 44% and 46%. 3.53 We are 95% confident that the proportion of all adults in the US who think a car is a necessity is between 0.83 and 0.89. 3.54 (a) The population is all cell phone users age 18 and older in the US. The population parameter of interest is µ, the mean number of text messages sent and received per day. The best point estimate for µ is the sample mean, x = 41.5. (b) The point estimate is x, so a 95% confidence interval is given by: x ± 41.5 ± 41.5 ± 29.3 to 2 · SE 2(6.1) 12.2 53.7. We are 95% confident that the mean number of text messages a day for all cell phone users in the US is between 29.3 and 53.7. 3.55 We are estimating p, the proportion of all US adults who agree with the statement that each person has one true love. The best point estimate is p̂ = 735/2625 = 0.28. We find the confidence interval using: p̂ ± 2 · SE 0.28 0.28 ± ± 2(0.009) 0.018 to 0.298. 0.262 The margin of error for our estimate is 0.018 or 1.8%. We are 95% sure that the proportion of all US adults who agree with the statement on one true love is between 0.262 and 0.298. 3.56 We are estimating pM − pF , the difference in proportions between males and females. For males, we have p̂M = 372/1213 = 0.31 and for females, we have p̂F = 363/1412 = 0.26. The best point estimate for CHAPTER 3 83 the difference in proportions is p̂M − p̂F = 0.31 − 0.26 = 0.05. We find the confidence interval using: (p̂M − p̂F ) ± 2 · SE (0.31 − 0.26) 0.05 ± ± 2(0.018) 0.036 to 0.086. 0.014 We are 95% confident that the difference in proportion agreeing that we have only one true love between males and females is between 0.014 and 0.086. Since zero is not in this interval, it is not one of the plausible values for the difference. We are fairly sure that the difference in these proportions is positive; thus men are more likely than women to agree with the statement on one true love. 3.57 (a) We are 95% confident that the mean response time for game players minus the mean response time for non-players is between -1.8 to -1.2. In other words, mean response time for game players is less than the mean response time for non-players by between 1.8 and 1.2 seconds. (b) It is not likely that they are basically the same, since the option of the difference in means being zero is not in the interval. The game players are faster, and we can tell this because the confidence interval for µg − µng has only negative values so the mean time is smaller for the game players. (c) We are 95% confident that the mean accuracy score for game players minus the mean accuracy score for non-players is between -4.2 to 5.8. (d) It is likely that they are basically the same, since the option of the difference in means being zero is in the interval. There is little discernible difference in accuracy between game players and non-game players. 3.58 (a) This is a matched pairs design since all participants participated in both treatments (canned soup for five days and fresh soup for five days). There might be a great deal of variability in people’s BPA concentrations and a matched pairs experiment reduces that variability. (b) The population is all people, and we are estimating µC − µF , where µC is mean urinary BPA concentration after eating canned soup for five days and µF is mean urinary BPA concentration after eating fresh soup for five days. Since this is a matched pairs design, we could also use µD where µD is the mean difference in urinary BPA concentration between the two treatments. (c) We are 95% confident that BPA concentration is, on average, between 19.6 and 25.5 µg/L higher in people who have eaten canned soup for five days than it is in people who have eaten fresh soup for five days. (d) A larger sample size increases the accuracy, so we would expect the confidence interval to be narrower. 3.59 (a) Using the margin of error, we see that the likely proportion voting for Candidate A ranges from 49% to 59%. Since this interval includes some proportions below 50% as plausible values for the election proportion, we cannot be very confident in the outcome. (b) Using the margin of error, we see that the likely proportion voting for Candidate A ranges from 51% to 53%. Since all values in this interval are over 50%, we can be relatively confident that Candidate A will win. (c) Using the margin of error, we see that the likely proportion voting for Candidate A ranges from 51% to 55%. Since all values in this range are over 50%, we can be relatively confident that Candidate A will win. 84 CHAPTER 3 (d) Using the margin of error, we see that the likely proportion voting for Candidate A ranges from 48% to 68%. Since this interval includes some proportions below 50% as plausible vaues for the election proportion, we cannot be very confident in the outcome. 3.60 (a) The parameter of interest is µ, the mean effect on weight 2.5 years after a month of overeating and being sedentary. (b) The only way to find the exact value would be to have all members of a population overeat and be inactive for a month and then measure the effect 2.5 years later. This is not a good idea! (c) The 95% confidence interval using the standard error is x ± 2 · SE = 6.8 ± 2(1.2) = 6.8 ± 2.4. We are 95% sure that the mean weight gain over 2.5 years by people who overeat for a month is between 4.4 and 9.2 pounds. (d) The margin of error is ±2.4 which means we are relatively confident that our estimate of 6.8 pounds is within 2.4 pounds of the true mean weight gain for the population. 3.61 Let µ represent the mean time for a golden shiner fish to find the yellow mark. A 95% confidence interval is given by x 51 ± ± 2 · SE 2(2.4) 51 46.2 ± to 4.8 55.8. A 95% confidence interval for the mean time for fish to find the mark is between 46.2 and 55.8 seconds. We are 95% sure that the mean time it would take fish to find the target for all fish of this breed is between 46.2 seconds and 55.8 seconds. In other words, the plausible values for the population mean µ are those values between 46.2 and 55.8. Therefore, 60 is not a plausible value for the mean time for all fish, but 55 is. 3.62 We are 95% confident that schools of fish in this situation will end up going with the majority over the opinionated minority only between 9% and 26% of the time. It is not plausible that the schools of fish in this situation are equally likely to go for either option since that would indicate a proportion of p = 0.5 for each option, and 0.5 is not in the range of plausible values. The highly opinionated fish are definitely having an effect! 3.63 We are estimating the difference in population proportions p1 − p2 where p1 is the proportion of times a school of fish will pick the majority option if there is an opinionated minority, a less passionate majority, and also some additional members with no preference and p2 is the proportion of times a school of fish will pick the majority option if there is an opinionated minority and a less passionate majority and no other fish in the group, as described above in Fish Democracies. (We could also have defined the proportions in the other order.) The best point estimate is p̂1 − p̂2 = 0.61 − 0.17 = 0.44. We find a 95% confidence interval as follows: (p̂1 − p̂2 ) (0.61 − 0.17) ± ± 2 · SE 2(0.14) 0.44 ± 0.16 to 0.28 0.72. We are 95% sure that the proportion of schools of fish picking the majority option is 0.16 to 0.72 higher if fish with no preference are added to the group. If adding the indifferent fish had no effect, then the CHAPTER 3 85 population proportions with and without the indifferent fish would be the same, which means the difference in proportions would be zero. Since zero is not a plausible value for the difference in proportions, it is very unlikely that adding indifferent fish has no effect. The indifferent fish are helping the majority carry the day. 3.64 (a) Interval is for the mean, not all students. (b) Interval is for the population mean, not the sample mean. (c) The interval is not uncertain, only whether or not it captures the population mean. (d) Interval is trying to capture the mean, not 95% of individual student pulse rates. (e) Scope of inference could apply to the mean pulse rate for all students at this college, but sample was not taken from all U.S. college students. (f) The population mean pulse rate is a single fixed value. (g) Interval is for the population mean, not other sample means. 86 CHAPTER 3 Section 3.3 Solutions 3.65 (a) No. The value 12 is not in the original. (b) No. A bootstrap sample has the same sample size as the original sample. (c) Yes. (d) No. A bootstrap sample has the same sample size as the original sample. (e) Yes. 3.66 (a) Yes. (b) Yes. (c) No. A bootstrap sample has the same sample size as the original sample. (d) No. The value 78 is not in the original sample. (e) Yes. (f) Yes 3.67 The distribution appears to be centered near 0.7 so the point estimate is about 0.7. Using the 95% rule, we estimate that the standard error is about 0.1 (since about 95% of the values appear to be within 0.2 of the center). Thus our interval estimate is Statistic 0.7 0.7 ± ± ± 2 · SE 2(0.1) 0.2 0.5 to 0.9. The parameter being estimated is a proportion p, and the interval 0.5 to 0.9 gives plausible values for the population proportion p. Answers may vary. 3.68 The distribution appears to be centered near 25 so the point estimate is about 25. Using the 95% rule, we estimate that the standard error is about 3 (since about 95% of the values appear to be within 6 of the center). Thus our interval estimate is Statistic 25 ± ± 2 · SE 2(3) 25 ± to 6 19 31. The parameter being estimated is a mean µ, and the interval 19 to 31 gives plausible values for the population mean µ. Answers may vary. 3.69 The distribution appears to be centered near 0.4 so the point estimate is about 0.4. Using the 95% rule, we estimate that the standard error is about 0.05 (since about 95% of the values appear to be within 0.1 of the center). Thus our interval estimate is Statistic ± 2 · SE 0.4 0.4 ± ± 2(0.05) 0.1 to 0.5. 0.3 CHAPTER 3 87 The parameter being estimated is a correlation ρ, and the interval 0.3 to 0.5 gives plausible values for the population correlation ρ. Answers may vary. 3.70 The distribution appears to be centered near 6 so the point estimate is about 6. Using the 95% rule, we estimate that the standard error is about 4 (since about 95% of the values appear to be within 8 of the center). Thus our interval estimate is Statistic ± 2 · SE 6 6 ± ± to 2(4) 8 14. −2 The parameter being estimated is a difference in means µ1 − µ2 , and the interval -2 to 14 gives plausible values for the difference in population means µ1 − µ2 . Answers may vary. 3.71 The statistic for the sample is p̂ = 35/100 = 0.35. Using technology, the standard deviation of the sample proportions for 1000 bootstrap samples is about 0.048 (answers may vary slightly), so we estimate the standard error is SE≈ 0.048. Thus our interval estimate is Statistic 0.35 0.35 ± ± ± 2 · SE 2(0.048) 0.096 0.254 to 0.446. Plausible values of the population proportion range from 0.254 to 0.446. 3.72 The statistic for the sample is p̂ = 180/250 = 0.72. Using technology, the standard deviation of the sample proportions for 1000 bootstrap samples is about 0.028 (answers may vary slightly), so we estimate the standard error is SE≈ 0.028. Thus our interval estimate is Statistic ± 2 · SE 0.72 0.72 ± ± 2(0.028) 0.056 to 0.776. 0.664 Plausible values of the population proportion range from 0.664 to 0.776. 3.73 The statistic for the sample is p̂ = 112/400 = 0.28. Using technology, the standard deviation of the sample proportions for 1000 bootstrap samples is about 0.022 (answers may vary slightly), so we estimate the standard error is SE≈ 0.022. Thus our interval estimate is ± ± 2 · SE 2(0.022) 0.28 ± 0.236 to 0.044 0.324. Statistic 0.28 Plausible values of the population proportion range from 0.236 to 0.324. 88 CHAPTER 3 3.74 The statistic for the sample is p̂ = 382/1000 = 0.382. Using technology, the standard deviation of the sample proportions for 1000 bootstrap samples is about 0.015 (answers may vary slightly), so we estimate the standard error is SE≈ 0.015. Thus our interval estimate is ± ± 2 · SE 2(0.015) 0.382 ± 0.352 to 0.03 0.412. Statistic 0.382 Plausible values of the population proportion range from 0.352 to 0.412. 3.75 (a) The best point estimate is the sample proportion, p̂ = 26/174 = 0.149. (b) We can estimate the standard error using the 95% rule, or we can find the standard deviation of the bootstrap statistics in the upper right of the figure. We see that the standard error is about 0.028. Answers will vary slightly with other simulations. (c) We have p̂ 0.149 0.149 0.093 ± ± ± 2 · SE 2(0.028) 0.056 to 0.205. We are 95% confident that the percent of all snails of this kind that will live after being eaten by a bird is between 9.3% and 20.5%. (d) Yes, 20% is within the range of plausible values in the 95% confidence interval. 3.76 (a) We find for the 8 values in the table that x = 34.0 and s = 14.63. (b) We put the 8 values on the 8 slips of paper and mix them up. Draw one and write down the value and put it back. Mix them up, draw another, and do this 8 times. The resulting 8 numbers form a bootstrap sample, and the mean of those 8 numbers form one bootstrap statistic. (c) We expect that the bootstrap distribution will be bell-shaped and centered at approximately 34. (d) The population parameter of interest is the mean, µ, number of ants on all possible peanut butter sandwich bits set near this ant hill. There are other possible answers for the population; for example, you might decide to limit it to the time of day at which the student conducted the study. The best point estimate is the sample mean x = 34. (e) We have x 34.0 34.0 24.3 ± ± ± 2 · SE 2(4.85) 9.7 to 43.7. We are 95% confident that the mean number of ants to climb on a bit of peanut butter sandwich left near an ant hill is between 24.3 ants and 43.7 ants. 3.77 (a) The mean is x = 67.59 and the standard deviation is s = 50.02. CHAPTER 3 89 (b) Select 20 values at random (with replacement) from the original set of skateboard prices and record the mean for those 20 values as the bootstrap statistic. (c) We expect the bootstrap distribution to be symmetric and bell-shaped and to be centered at the sample mean: 67.59. (d) We find the 95% confidence interval: x ± 67.59 ± 67.59 ± 45.79 to 2 · SE 2(10.9) 21.8 89.39. We are 95% confident that the mean price of skateboards for sale online is between $45.79 and $89.39. 3.78 The mean of the five sales numbers is x = 605. Using StatKey or other technology, we obtain a bootstrap distribution for sample means like the one below. The standard deviation of these means shows the standard error is about 72.24. This will vary for other sets of bootstrap samples. We find a 95% confidence interval by ± ± 2 · SE 2(72.24) 605 ± 460.5 to 144.5 749.5 Statistic 605 We would tell the CEO that we are 95% confident the average of all monthly sales of Saabs in the US is between about 460.5 and 749.5 cars. 3.79 (a) The best point estimate for the proportion, p, of rats showing empathy is p̂ = 23/30 = 0.767. (b) On 23 of the slips, we write “yes” (showed empathy) and on the other 7, we write “no”. We then mix up the slips of paper, draw one out and record the result, yes or no. Put the slip of paper back and repeat the process 30 times. This set of yes’s and no’s is our bootstrap sample. The proportion of yes’s in the sample is our bootstrap statistic. 90 CHAPTER 3 (c) Using technology, we see that the bootstrap distribution is bell-shaped and centered approximately at 0.767. We also see that the standard error is about 0.077. (d) We have p̂ 0.767 ± ± 2 · SE 2(0.077) 0.767 0.613 ± to 0.154 0.921. For all laboratory rats, we are 95% confident that the proportion of rats that will show empathy in this manner is between 61.3% and 92.1%. 3.80 The sample proportion of females showing compassion is p̂F = 6/6 = 1.0. The sample proportion of males showing compassion is p̂M = 17/24 = 0.708. The best point estimate for the difference in proportions pF − pM is p̂F − p̂M = 1.0 − 0.708 = 0.292. Using StatKey to create a bootstrap distribution for a difference in proportions using this sample data, we see a standard error of 0.094. CHAPTER 3 91 We have (p̂F − p̂M ) ± 2 · SE (1.0 − 0.708) 0.292 ± ± 2(0.094) 0.188 to 0.480. 0.104 Based on this interval the percentage of female rats likely to show compassion is between 10.4% and 48% higher than the percentage of male rats likely to show compassion. Since zero is not in the interval estimate, it is not very plausible that male and female rats are equally compassionate. 3.81 (a) The standard error is about 0.015 since most (roughly 95%) of the bootstrap distribution is between 0.12 and 0.18, which is about two standard deviations on either side of the center at 0.15. (b) The 95% confidence interval is given by: (p̂t − p̂a ) (0.87 − 0.72) ± ± 2 · SE 2 · (0.015) .15 0.12 ± to 0.03 0.18. We are 95% sure that the proportion of teens who text is between 0.12 and 0.18 higher than the proportion of adults who text. 3.82 Using StatKey or other technology, we create a bootstrap distribution to estimate the difference in means µt − µc where µt represents the mean immune response for tea drinkers and µc represents the mean immune response for coffee drinkers. In the original sample the means are xt = 34.82 and xc = 17.70, respectively, so the point estimate for the difference is xt − xc = 34.82 − 17.70 = 17.12. We see from the bootstrap distribution that the standard error for the differences in bootstrap means is about SE = 7.9. This will vary for other sets of bootstrap differences. 92 CHAPTER 3 For a 95% confidence interval, we have (xt − xc ) ± 2 · SE (34.82 − 17.70) 17.12 ± ± 2(7.9) 15.8 1.32 to 32.92. We are 95% sure that the mean immune response is between 1.32 and 32.92 units higher in tea drinkers than it is in coffee drinkers. 3.83 (a) We are estimating µD , the mean difference in delay time for public transportation for all traffic situations in Dresden, Germany. (b) Put all 24 slips in a container. Pull out one and write down the value and put it back in the container. Mix up the slips, pull out one and repeat that process until there are 24 values written down. Those 24 values form one bootstrap sample. (c) Record the sample mean for the 24 values in the bootstrap sample. (d) The distribution will be bell-shaped and centered at 61. (e) We calculate the standard deviation of the bootstrap statistics. (f) For a 95% confidence interval, we have xD 61 61 54.8 ± ± ± to 2 · SE 2(3.1) 6.2 67.2. We are 95% confident that the average time savings is between 54.8 and 67.2 seconds, if the city moves to the new system. 3.84 (a) For the original sample the mean commute distance is 18.16 miles and the standard deviation is 13.8 miles. (b) One bootstrap distribution of distance means is shown below. It is bell-shaped, centered around 18.2, and shows sample means ranging between about 16.5 and 20.5 miles. CHAPTER 3 93 (c) The standard error of the means for this set of 2000 bootstrap samples is 0.61 miles. (d) A 95% confidence interval is given by 8.16 8.16 16.94 ± ± to 2(0.61) 1.22 19.38. We are 95% sure that the mean commuting distance for all Atlanta commuters is between 16.94 miles and 19.38 miles. 3.85 (a) We use technology to compute the correlation between commute distances and times, r = 0.807, for the 500 data values. (b) The distribution of bootstrap correlations (shown below) is fairly bell-shaped (perhaps a slight left skew), centered around 0.81, and ranges between about .70 and .90. (c) The standard deviation of the bootstrap correlations for this bootstrap distribution is 0.0355 so the margin of error is 2 · 0.0355 = 0.071. The interval estimate for the correlation between commute distances and time is 0.807 ± 0.071 or between 0.736 and 0.878. (d) The interval is shown on a dotplot of the bootstrap distribution below. The interval includes roughly 95% of the bootstrap correlations. 3.86 (a) The original sample is right-skewed with outliers at 107, 121, 175, and 190 minutes. 94 CHAPTER 3 (b) We find that the mean is x = 49.6 with a standard deviation of s = 49.1 (c) The distribution of bootstrap means is fairly symmetric and centered near 50. It does not show the same skewness as in the sample. (d) The standard deviation of these bootstrap means is SE = 9.86 (answers will vary for other simulations) which is much smaller than the standard deviation in the sample, s = 49.1. (e) For a 95% confidence interval, we have x ± 49.6 ± 49.6 ± 29.9 to 2 · SE 2 · 9.86 19.7 69.3 minutes. (f) The style of play on one team might be more or less aggressive than the league as a whole, so the estimate of mean penalty minutes could be biased. CHAPTER 3 95 3.87 The standard deviation for the sample of penalty minutes for n=24 players is s = 49.1 minutes. For one set of 3000 bootstrap sample standard deviations (shown below), the estimated standard error is SE = 11.0. Based on this the interval estimate is s 49.1 ± ± 2 · SE 2 · 11.0 49.1 27.1 ± to 22.0 71.1. We estimate that the standard deviation in penalty minutes for all NHL players is somewhere between 27.1 and 71.1 minutes. 96 CHAPTER 3 Section 3.4 Solutions 3.88 (a) We keep the middle 95% of values by chopping off 2.5% from each tail. (b) We keep the middle 90% of values by chopping off 5% from each tail. (c) We keep the middle 98% of values by chopping off 1% from each tail. (d) We keep the middle 99% of values by chopping off 0.5% from each tail. 3.89 (a) We keep the middle 95% of values by chopping off 2.5% from each tail. Since 2.5% of 1000 is 25, we eliminate the 25 highest and the 25 lowest values to create the 95% confidence interval. (b) We keep the middle 90% of values by chopping off 5% from each tail. Since 5% of 1000 is 50, we eliminate the 50 highest and the 50 lowest values to create the 90% confidence interval. (c) We keep the middle 98% of values by chopping off 1% from each tail. Since 1% of 1000 is 10, we eliminate the 10 highest and the 10 lowest values to create the 98% confidence interval. (d) We keep the middle 99% of values by chopping off 0.5% from each tail. Since 0.5% of 1000 is 5, we eliminate the 5 highest and the 5 lowest values to create the 99% confidence interval. 3.90 To find a 99% confidence interval, we go farther out on either side than for a 95% confidence interval, so (A) is the most likely result. 3.91 To find a 90% confidence interval, we go less far out on either side than for a 95% confidence interval, so (C) is the most likely result. 3.92 If the sample size goes up, we get greater accuracy and the spread of the bootstrap distribution decreases, so the confidence interval will be narrower. Thus, (C) is the most likely result. 3.93 If the sample size is smaller, we have less accuracy and the spread of the bootstrap distribution increases, so the confidence interval will be wider. Thus, (A) is the most likely result. 3.94 As long as the number of bootstrap samples is reasonable, the width of the confidence interval does not change much as we take more or fewer bootstrap samples. Thus, (B) is the most likely result. 3.95 As long as the number of bootstrap samples is reasonable, the width of the confidence interval does not change much as we take more or fewer bootstrap samples. Thus, (B) is the most likely result. 3.96 The sample proportion who agree is p̂ = 35/100 = 0.35. One set of 1000 bootstrap proportions is shown in the figure below. For a 95% confidence interval we need to find the 2.5%-tile and 97.5%-tile, leaving 95% of the distribution in the middle. For this distribution those points are at 0.26 and 0.44, so we are 95% sure that the proportion in the population who agree is between 0.26 and 0.44. Answers will vary slightly for different simulations. CHAPTER 3 97 3.97 The sample proportion who agree is p̂ = 180/250 = 0.72. One set of 1000 bootstrap proportions is shown in the figure below. For a 95% confidence interval we need to find the 2.5%-tile and 97.5%-tile, leaving 95% of the distribution in the middle. For this distribution those points are at 0.664 and 0.776, so we are 95% sure that the proportion in the population who agree is between 0.664 and 0.776. Answers will vary slightly for different simulations. 3.98 The sample proportion who agree is p̂ = 112/400 = 0.28. One set of 1000 bootstrap proportions is shown in the figure below. For a 90% confidence interval we need to find the 5%-tile and 95%-tile, leaving 90% of the distribution in the middle. For this distribution those points are at 0.242 and 0.315, so we are 90% sure that the proportion in the population who agree is between 0.242 and 0.315. Answers will vary slightly for different simulations. 98 CHAPTER 3 3.99 The sample proportion who agree is p̂ = 382/1000 = 0.382. One set of 1000 bootstrap proportions is shown in the figure below. For a 99% confidence interval we need to find the 0.5%-tile and 99.5%-tile, leaving 99% of the distribution in the middle. For this distribution those points are at 0.343 and 0.423, so we are 99% sure that the proportion in the population who agree is between 0.343 and 0.423. Answers will vary slightly for different simulations. 3.100 (a) The bootstrap distribution is centered at about 100, so we estimate that the sample mean of the orignal IQ scores is x ≈ 100. (b) Since we are finding a 99% confidence interval, we want to keep the middle 99%. That means we want an interval that includes the middle 990 of the 1000 bootstrap statistics. We need to cut off 5 values on each end, which appears to give an interval from about 88 to 112. 3.101 The 98% confidence interval uses the 1%-tile and 99%-tile from the bootstrap means. We are 98% sure that the mean number of penalty minutes for NHL players in a season is between 29.4 and 76.7 minutes. 3.102 Using StatKey or other technology, we produce a bootstrap distribution such as the figure shown below. For a 90% confidence interval, we find the 5%-tile and 95%-tile points in this distribution to be 0.730 CHAPTER 3 99 and 0.774. We are 90% confident that the percent of American adults who think exercise is an important part of daily life is between 73.0% and 77.4%. 3.103 Using StatKey or other technology, we produce a bootstrap distribution such as the figure shown below. For a 99% confidence interval, we find the 0.5%-tile and 99.5%-tile points in this distribution to be 0.467 and 0.493. We are 99% confident that the percent of all Europeans (from these nine countries) who can identify arm or shoulder pain as a symptom of a heart attack is between 46.7% and 49.3%. Since every value in this interval is below 50%, we can be 99% confident that the proportion is less than half. 3.104 The dog got p̂B = 33/36 = 0.917 or 91.7% of the breath samples correct and p̂S = 37/38 = 0.974 or 97.4% of the stool samples correct. (A remarkably high percentage in both cases!) We create a bootstrap distribution for the difference in proportions using StatKey or other technology (as in the figure below) and then find the middle 90% of values. Using the figure, the 90% confidence interval for pB − pS is -0.14 to 0.025. We are 90% confident that the difference between the proportion correct for breath samples and the proportion correct for stool samples for all similar tests we might give this dog is between -0.14 and 0.025. Since a difference of zero represents no difference, and zero is in the interval of plausible values, it is plausible that there is no difference in the effectiveness of breath vs stool samples in having this dog detect cancer. 100 CHAPTER 3 3.105 Using one bootstrap distribution (as shown below), the standard error is SE = 0.19. The mean tip from the original sample is x = 3.85, so a 95% confidence interval using the standard error is x ± 3.85 ± 3.85 ± 3.47 to 2 · SE 2(0.19) 0.38 4.23. For this bootstrap distribution, the 95% confidence interval using the 2.5%-tile and 97.5%-tile is 3.47 to 4.23. We see that the results (rounding to two decimal places) are the same. We are 95% confident that the average tip left at this restaurant is between $3.47 and $4.23. 3.106 (a) A 99% confidence interval is wider than a 90% confidence interval, so the 90% interval is A (3.55 to 4.15) and the 99% interval is B (3.35 to 4.35). CHAPTER 3 101 (b) We multiply the lower and upper bounds for the average tip by 20 to get the average daily tip revenue (assuming 20 tables per day). With 90% confidence, the interval is 20 · 3.55 = 71 to 20 · 4.15 = 83. With 99% confidence, the interval is 20 · 3.35 = 67 to 20 · 4.35 = 87. We are 90% confident that this waitress will average between 71 and 83 dollars in tip income per day, and we are 99% confident that her mean daily tip income is between 67 and 87 dollars. 3.107 (a) We have p̂m = 27/193 = 0.140 and p̂f = 16/169 = 0.095 so the best point estimate for the difference in population proportions is p̂m − p̂f = 0.140 − 0.095 = 0.045. In this sample, a larger proportion of males smoke. (b) Using StatKey or other technology, we create a bootstrap distribution and find the boundaries for the middle 99% of values. We see that a 99% confidence interval for pm − pf is the interval from about -0.039 to 0.132. We are 99% confidence that the difference between males and females in the proportion that smoke is between -0.039 and 0.132. 3.108 (a) The population of interest is all FA premier league football matches. The specific parameter of interest is proportion of matches the home team wins. (b) Our best estimate for the parameter is 70/120 = 0.583. (c) Using StatKey or other technology, we create a bootstrap distribution as shown below. Taking 5% from each tail, the 90% confidence interval is 0.508 to 0.650. We are 90% sure that the home team wins between 50.8% and 65.0% of all FA premier league football matches. 102 CHAPTER 3 (d) Using the same bootstrap distribution we see that a 99% confidence interval goes from 0.467 to 0.692. We are 99% sure that the home team wins between 46.7% and 69.2% of all FA premier league football matches. (e) If the population parameter is 0.50 or less, then no home field advantage is present. With the 90% confidence interval we are 90% confident the population parameter is between 0.508 and 0.650. Since this interval does not contain 0.50, we are 90% confident that there is a home field advantage. However the 99% confidence interval does contain 0.50, so we are not 99% confident that there is a home field advantage. 3.109 (a) We have xt − xc = 34.82 − 17.7 = 17.12, where xt represents the sample mean immune response for tea drinkers and xc represents the sample mean immune response for coffee drinkers. (b) We are estimating µt − µc where µt represents the mean immune response for all tea drinkers and µc represents the mean immune response for all coffee drinkers. (c) Using StatKey or other technology, we obtain a bootstrap distribution of sample differences in means as shown below. We see that a 90% confidence interval for the difference in means is about 4.17 to 29.70. We are 90% confident that tea drinkers have a mean immune response between 4.17 and 29.70 higher than the mean immune response for coffee drinkers. Answers may vary for other sets of bootstrap differences in means. (d) Using the same bootstrap distribution, we see that a 99% confidence interval for the difference in means is about -3.30 to 37.04. We are 99% confident that the difference in mean immune response is between -3.30 and 37.04. (e) We are 90% confident that tea drinkers have a stronger mean immune response, since all values in the 90% confidence interval are positive, but we are not 99% confident, since some plausible values for the difference in means in that interval are negative. 3.110 (a) For one set of 1000 bootstrap sample standard deviations shown below, the 2.5%-tile and 97.5%tile are 21.5 and 66.9, respectively. Thus we can say with 95% confidence that the standard deviation of the number of penalty minutes awarded to all NHL players in a season is between 21.5 and 66.9 minutes. CHAPTER 3 103 (b) The midpoint of the interval in part (a) is (21.5 + 66.9)/2 = 44.2 which is less than the standard deviation of the original sample, s = 49.1. In general, an interval based on bootstrap percentiles does not need to be centered at the point estimate. 3.111 The mean area for the sample of ten countries is x = 111.3 thousand square kilometers. Using technology we obtain a bootstrap distribution as shown below. From this distribution the 99% confidence interval is (30.1, 228.3). (Answers will vary.) We are 99% confident that the average country size for all 213 countries is between 30,100 and 228,300 square kilometers. 3.112 (a) We compute the regression line to be \ = 29.0 + 0.079 · Area. P ctRural The slope of the line for this sample is 0.079. (b) Using technology to produce the bootstrap distribution below for the sample slopes, we get a 95% confidence interval for the slope from 0.008 to 0.149. Answers will vary – for this small a sample with strongly skewed data the bootstrap slopes might contain some very extreme values. We are 95% confident that the slope of the regression line for all countries to predict percent rural from land area is between 0.008 and 0.149. 104 CHAPTER 3 (c) The 95% confidence interval from part (b) is (0.008,0.149), so we don’t quite successfully capture the true population slope of 0. The lower bound is very close to zero, so this answer may vary, depending on the results of the simulation from part (b). 3.113 (a) We see that both cities have a significant number of outliers, with very long commute times. The quartiles and median are all bigger for Atlanta than for St. Louis, so we expect that the mean commute time is larger for Atlanta. (b) We are estimating the difference between the cities in mean commute time for all commuters, µatl −µstl . We get a point estimate for the difference in mean commute times between the two cities with the difference in the sample means, xatl − xstl = 29.11 − 21.97 = 7.14 minutes. (c) Since the two samples were taken independently in different cities, for each bootstrap statistic we take 500 Atlanta times with replacement from the original Atlanta data and 500 St. Louis times with replacement from the original St. Louis sample, compute the mean within each sample, and take the difference. This constitutes one bootstrap statistic. (d) A bootstrap distribution for the difference in means with 2000 bootstrap samples is shown in the figure. The standard error for xatl − xstl , found in the upper corner of the figure, is SE = 1.125. We find an interval estimate for the difference in the population means with 7.14 ± 2 · 1.125 = 7.14 ± 2.25 = (4.89, 9.39) CHAPTER 3 105 We are 95% confident that the average commuting time for commuters in Atlanta is somewhere between 4.89 and 9.39 minutes more than the average commuting time for commuters in St. Louis. 3.114 (a) The parameter of interest is ρ, the correlation between weight gain during a month of overeating and inactivity and weight gain over the next 2.5 years, for those adults who spend one month (possibly during December) overeating and being sedentary. The best point estimate for this parameter is r = 0.21. (b) To create the bootstrap sample, we sample from the original sample with replacement. In this case, we randomly select one of the 18 ordered pairs, write down the values, and return them to the pile. Then we randomly select one of the 18 ordered pairs (possibly the same one), and write down those values as our second pair. We do this until we have 18 ordered pairs, and that dataset is our bootstrap sample. (c) For each bootstrap sample, we record the correlation between the one month and 2.5 year weight gains of the 18 ordered pairs. (d) We find the standard error by finding the standard deviation of the 1000 bootstrap correlations. (e) The interval estimate is r ± 2 · SE = 0.21 ± 2(0.14) = 0.21 ± 0.28, so a 95% confidence interval for the population correlation ρ is −0.07 to 0.49. (f) There is a reasonable possibility that there is no correlation at all between the amount of weight gained during the one month intervention and how much weight is gained over the long-term. We know that this is a reasonable possibility because 0 is inside the interval estimate so ρ = 0 is included as one of the plausible values of the population correlation. (g) A 90% confidence interval needs to only include the middle 90% of data values in a bootstrap distribution, so it will be narrower than a 95% confidence interval. 3.115 (a) We see that the bootstrap distribution is relatively symmetric and bell-shaped, so it is reasonable to use the distribution to estimate a 95% confidence interval for the standard deviation of prices of all used Mustang cars. Using either the standard error method or the percentile method (estimating values that include the middle 95%), we estimate a 95% confidence interval to be about 7 to 14. We are 95% confident that the standard deviation of all prices of used Mustangs is between 7 thousand dollars and 14 thousand dollars. (b) This bootstrap distribution is not symmetric and is not bell-shaped. It would not be appropriate to use this distribution to find a 95% confidence interval. The sample size is so small (at only n = 5) that the distribution ends up looking a bit bizarre. It is important to always look at the graph of the distribution. These methods apply only when the bootstrap distribution is reasonably symmetric and bell-shaped. 3.116 The bootstrap distribution for the standard deviations (shown below) has at least four completely separate clusters of dots. It is not at all symmetric and bell-shaped so it would not be appropriate to use this bootstrap distribution to find a confidence interval for the standard deviation. The clusters of dots represent the number of times the outlier is included in the bootstrap sample (with the cluster on the left containing statistics from samples in which the outlier was not included, the next one containing statistics from samples that included the outlier once, the next one containing statistics from samples that included the outlier twice, and so on.) 106 CHAPTER 3