Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence Interval Review – Solutions 1) Julia enjoys jogging. She has been jogging over a period of several years, during which time her physical condition has remained relatively constant. Usually, she jogs 2 miles per day. The standard deviation of all of her times is σ = 1.80 minutes. During the past year Julia has recorded her times required to run 2 miles. She has a random sample of 90 of these times. For these 90 times the mean is x = 15.60 minutes. Let µ be the mean jogging time for the entire distribution of Julia’s 2-mile running times. Find a 95% confidence interval for µ. Follow the inference toolbox. (1) Choose procedure and check conditions: Use a one-sample z confidence interval for a population mean (because we know the population standard deviation). (a) SRS: We are told the sample is a random sample of her times. (b) Normality: We can assume the sampling distribution of her sample mean times ( x ) is at least approximately normal (by the CLT with n = 90 ≥ 30 ). (c) Independence: We are willing to assume the 90 times make up less than 10% of her jogging times, since she has been jogging over the past several years (jogging every day for three years would produce over 900 times). (2) Do the math: x ± z * σ n = 15.60 ± 1.960 1.80 = (15.228,15.972) 90 (3) Conclusion: We are 95% confident that the mean time for Julia to jog 2 miles is between 15.228 and 15.972 minutes. In the above problem, what sample size is required to have a margin of error of at most ± 0.2 minutes? σ 1.80 3.528 3.528 z * ( ) ≤ m 1.960( ) ≤ 0.2 ≤ n 311.1696 ≤ n n ≥ 312 . ≤ 0.2 0.2 n n n 2) How much does a sleeping bag cost? Let’s say you want a sleeping bag that should keep you warm in temperatures from 20 to 45°F. A random sample of prices ($) for sleeping bags in this temperature range was taken from Backpacker Magazine: Gear Guide (Vol. 25, Issue 157, No. 2): 80 90 100 120 75 37 30 23 100 110 105 95 105 60 110 120 95 90 60 70 Find a 90% confidence interval for the mean price µ of all sleeping bags used for this temperature range. Follow the steps of the inference toolbox. (1) Choose procedure and check conditions: Use a one-sample t confidence interval for the population mean (we do not know the population standard deviation and must estimate it with the sample standard deviation s). (a) SRS: We are told that this sample is a random sample from the population of sleeping bags in this temperature range. (b) Normality: Boxplot of Price Probability Plot of Price Normal 120 99 95 100 90 70 80 60 50 40 30 Price Percent 80 60 20 10 5 40 1 0 20 40 60 80 Price 100 120 140 160 20 The normal probability plot of the prices looks fairly linear with no apparent outliers or severe skewness; the boxplot shows no outliers but is not quite symmetric. However, the sample size is 20, so we can still use the t-procedures (in the absence of strong skewness or influential outliers) because the sampling distribution of sample mean prices ( x ) should be at least approximately normal. (c) Independence: We assume there are more than 200 different prices of sleeping bags in this temperature range, so the sample makes up less than 10% of the total population, and the assumption of independence of observations is justified. s 28.97 = 83.75 ± 1.729 = (72.55,94.95) (2) Do the math: x ± t * n 20 (3) Conclusion: We are 90% confident that the mean price for sleeping bags useful for this temperature range is between 72.55 and 94.95 dollars. 3) Suppose that 800 students were selected at random from a student body of 20,000 college students and given shots to prevent a certain type of flu. All 800 students were exposed to the flu, and 600 of them did not get the flu. Let p represent the probability that the shot will be successful for any single student selected at random from the entire population of 20,000. Find a 99% confidence interval for p. Follow the steps of the inference toolbox. (1) Choose procedure and check conditions: Use a one-sample z confidence interval for a population proportion. (a) SRS: We are told the students were selected at random from this particular population of college students. (b) Normality: 800(.75) = 600 > 10 and 800(.25) = 200 > 10, so we are willing to assume the sampling distribution of sample proportions of students avoiding the flu ( p̂ ) is approximately normal. (c) Independence: The sample of 800 makes up less than 10% of the total student body of 20,000, so we are willing to consider the individual observations independent of one another. pˆ (1 − pˆ ) (.75)(.25) (2) Do the math: pˆ ± z * = .75 ± 2.576 = (.71057,.78943) n 800 (3) Conclusion: We are 99% confident that the proportion of students who do not get sick after receiving the flu shot is between .71057 and .78943. 4) For large U.S. companies, what percent of their total income comes from foreign sales? A random sample of technology companies (IBM, Hewlett-Packard, Intel, and others) gave the following data: 62.8 55.7 47.0 59.6 55.3 41.0 65.1 51.1 53.4 50.8 48.5 44.6 49.4 61.2 39.3 41.8 Another independent random sample of basic consumer product companies (Goodyear, Sarah Lee, H.J. Heinz, and others) gave the following data: 28.0 30.5 34.2 50.3 11.11 28.8 40.0 44.9 40.7 60.1 23.1 21.3 42.8 18.0 36.9 28.0 32.5 Find each sample mean, sample standard deviation, and sample size. Technology: n = 16, x = 51.66, s = 7.93 Consumer: n = 17, x = 33.6, s = 12.26 Find a 90% confidence interval for µ1-µ2 when µ1 represents the technology companies and µ2 represents the consumer companies. Follow the inference toolbox. (1) Choose procedure and check conditions: Use a two-sample t confidence interval for the difference in two means. (a) SRS: Both samples are said to be random samples from their respective populations and are independent of one another. (b) Normality: For the technology sample, the normal probability plot of the sample data is roughly linear so we assume the sampling distribution of sample mean incomes ( x1 ) is approximately normal. The normal probability plot for the consumer companies is also roughly linear and we can again assume the sampling distribution of sample mean incomes ( x2 ) is also approximately normal. Probability Plot of Tech, Consumer Normal 99 Variable Tech C onsumer 95 90 Percent 80 70 60 50 40 30 20 10 5 1 0 10 20 30 40 Data 50 60 70 (2) Do the math: s12 s2 2 7.932 12.26 2 + = (51.66 − 33.66) ± 1.753 + = (11.981, 24.143) n1 n2 16 17 (3) Conclusion: We are 90% confident that the difference in the mean percentage of income that comes from foreign sales for technology companies and the mean percentage of income that comes from foreign sales for consumer product companies is between 11.981% and 24.143%. ( x1 − x2 ) ± t * 5) The U.S. department of Commerce Environmental Data Service gave the following information about average temperature (°F) in January in Phoenix, Arizona, for the past 39 years. Assume σ = 3.04 °F. 52.8 52.3 50.4 52.2 51.6 50.7 52.7 43.7 54.0 53.8 49.7 52.4 54.5 49.9 51.5 48.5 52.6 48.4 53.3 51.2 46.7 52.4 50.7 53.0 51.7 54.2 51.4 43.2 48.7 49.6 51.4 56.0 54.6 42.8 54.0 52.3 48.5 51.9 54.9 Find a 99% confidence interval for the January mean temperature in Phoenix. Follow the inference toolbox. (1) Choose procedure and check conditions: Use a one-sample z confidence interval for a population mean (because we are given the population standard deviation). (a) SRS: We are not told that this is an SRS. However, it is probably safe to assume this sample is representative of the January temperatures in Phoenix. (b) Normality: We are not told that the distribution of temperatures is normal, however because our sample size is relatively large (n = 39), we are assured that the sampling distribution of sample mean temperatures ( x ) is at least approximately normal due to the Central Limit Theorem. (c) Independence: We are willing to assume that the observations are independent of one another (temperatures from year to year probably are, for the most part). σ 3.04 (2) Do the math: x ± z * = 51.13 ± 2.576 = (49.879,52.387) n 39 (3) Conclusion: We are 99% confident that the true mean temperature in Phoenix, Arizona in January is between 49.879 and 52.387 (°F). 6) Suppose an archaeologist discovers only 7 fossil skeletons from a previously unknown species of miniature horse. Reconstructions of the skeletons of the 7 miniature horses show the shoulder heights (in cm) to be: 45.3 47.1 44.2 46.8 46.5 45.5 47.6 Even though one condition is violated, find a 99% confidence interval for µ, the mean shoulder height of the entire population of such a horse. Follow the inference toolbox. In your work mention the condition that is violated and state why you can still carry on with the procedure. (1) Choose a procedure and check conditions: Use a one-sample t confidence interval for a population mean. (a) SRS: This is not quite an SRS – it is made up of all of the 7 skeletons that are known to exist; however, we will carry out the t-procedure, keeping in mind that generalizing to the entire extinct population might be somewhat of a stretch. (b) Normality: Although we only have 7 data points, the normal probability plot does not show any obvious deviations from normality, so we are willing to conclude that the sampling distribution of sample mean shoulder heights ( x ) is approximately normal. We should proceed with caution because our sample size is so small (i.e., our results might be somewhat inaccurate). Probability Plot of Sholder Height Normal 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 43 44 45 46 Sholder Height 47 48 49 (c) Independence: We are willing to assume that the entire extinct population numbered over 70 (10 times the size of the sample), and hence will assume that these observations are independent of one another. (However, all of these skeletons were found in the same place, and it is conceivable that there is a reason for this that might be related to their shoulder heights.) s 1.19 = 46.142 ± 3.707 = (44.475, 47.81) n 7 (3) We are 99% confident that the mean shoulder height µ for this species of miniature horse is between 44.475 and 47.81 cm. (But keep in mind the warnings from step 1!) (2) Do the math: Step 3: x ± t * 7) David E. Brown is an expert in wildlife conservation. In his book The Wolf in Southwest: The Making of an Endangered Species (University of Arizona Press), he records the following weights of adult gray wolves from two regions in Old Mexico. Chihuahua region (in pounds) – sample 1: 86 75 91 70 79 80 68 71 74 64 Durango region (in pounds) – sample 2: 68 72 79 68 77 89 68 59 63 66 58 54 62 71 55 59 68 67 Find a 90% confidence interval for the difference in the mean weights between the wolves from the two regions. Follow the inference toolbox. Chihuahua: n = 10, x = 75.8, s = 8.324 Durango: n = 18, x = 66.83, s = 8.867 (1) Choose a procedure and check conditions: Use a two-sample t confidence interval for the difference in two means. (a) SRS: Although it is not stated, we will assume that both samples are random samples from their respective populations (regions), and that they are independent samples. However, if these sample data were not collected via SRSs, we will not be able to generalize our results to the two populations of wolves. (b) Normality: Both normal probability plots look roughly linear, so we are willing to assume that the sampling distributions of sample mean Chihuahua wolf weights ( x1 ) and Durango wolf weights ( x2 ) are approximately normal. Probability Plot of Chihuahua, Durango Normal 99 Variable Chihuahua Durango 95 90 Percent 80 70 60 50 40 30 20 10 5 1 40 50 60 70 Data 80 90 100 (c) Independence: We will assume that both populations are at least 10 times the size of their corresponding sample, and hence will consider the observations in each sample independent of one another. (2) Do the math: s12 s2 2 8.322 8.862 + = (75.8 − 66.83) ± 1.83 + = (2.812,15.127) n1 n2 10 18 (3) We are 90% confident that the true difference in the mean weight of wolves from the Chihuahua region and the mean weight of those from the Durango region of Old Mexico is between 2.812 and 15.127 pounds. ( x1 − x2 ) ± t * 8) What price to farmers get for their watermelons? In the third week of July, a random sample of 40 farming regions gave a sample mean of $6.88 per 100 pounds of watermelon. Assume that σ is known to be $1.92 per 100 pounds. ( Reference: Agricultural Statistics, U.S. Department of Agriculture). Find a 90% confidence interval for the population mean price (per 100 pounds) that farmers in this region get for their watermelon crop. Follow the inference toolbox. (1) Choose a procedure and check conditions: Use a one-sample z confidence interval for a population mean (because we know the population standard deviation). (a) SRS: We are told that the sample was a random sample from a population of farming regions. (b) Normality: We are not told that the distribution of prices is normal. By the Central Limit Theorem (CLT), we can conclude that the sampling distribution of sample mean watermelon prices per 100 lbs. ( x ) is at least approximately normal. (c) Independence: We will assume the observations are independent of one another (i.e., we assume that there are more than 400 farming regions). σ 1.92 = (6.3807,7.3793) n 40 (3) We are 90% confident that the true mean price for 100 pounds of watermelon is between 6.38 and 7.38 dollars. (2) Do the math: Step 3: x ± z * = 6.88 ± 1.645 Assume a confidence interval for the watermelon data was calculated and found to be (6.285 to 7.475). What is the confidence level for this interval? Since the mean is 6.88: 7.475 − 6.88 = .595 which is the margin of error 1.92 40 Then .595 = z * and .595 = z * (.30357) . Therefore, z * = 1.959 which is close to the 1.96 that corresponds to the 95% confidence level. 9) Most married couples have two or three personality preferences in common. Myers used a random sample of 375 married couples and found that 132 had three preferences in common. Another sample of 571 couples found that 217 had two personality preferences in common. Let p1 be the population proportion of all married couples with three personality preferences in common and let p2 be the population proportion of all married couples with two personality preferences in common. Find a 90% confidence interval for the difference in these two proportions. Follow the steps of the inference toolbox. (1) Choose procedure and check conditions: Use a two-sample z confidence interval for 132 217 proportions, with pˆ1 = = .352 and pˆ 2 = = .38 375 571 (a) SRS: We are told that both samples are random samples from their respective populations, and we assume that the two samples are independent of one another. (b) Normality: Since 132 ≥ 5 and 243 ≥ 5 , and 217 ≥ 5 and 254 ≥ 5 , all are bigger than 5 so we can assume the sampling distributions of the sample proportions p̂1 and p̂2 are approximately normal. (c) Independence: 10(375) = 3750 and 10(571) = 5710 ; we assume there are more than 5710 married couples. We can therefore assume independence of individual observations. (3) Do the math: pˆ (1 − pˆ1 ) pˆ 2 (1 − pˆ 2 ) ( pˆ1 − pˆ 2 ) ± z * 1 + n1 n2 = (.352 − .38) ± 1.645 .352(1 − .352) .38(1 − .38) + 375 571 = (−.0806,.02452) (4) Conclusion: We are 90% confident that the true difference in the proportion of couples that have 3 personality preferences in common and the proportion of couples that have 2 personality preferences in common is between -0.0806 and 0.02452. 10) The home run percentage is the number of home runs per 100 times at bat. A random sample of 43 professional baseball players gave the following statistics for home run percentages. (Reference: The Baseball Encyclopedia, Macmillan). For this data, x = 2.29 and s = 1.40. Compute a 95% confidence interval for the population mean µ of home run percentages for all professional baseball players. Follow the steps of the inference toolbox. (1) Choose procedure and check conditions: Use a one-sample t confidence interval for a population mean (even though we’re given the standard deviation, note that it is the sample standard deviation, s, hence we use t instead of z). (a) SRS: It is stated that this is a random sample from the population of (presumably) all professional baseball players. (b) Normality: We do not know if the home run percentages are normally distributed. However, because the sample size is 43, the CLT assures us that the sampling distribution of sample mean home run percentages ( x ) is at least approximately normal. (c) Independence: Mathematically, we know that the sample size is less than 10% of the population. However, logically, there might be a dependent relationship between one player’s home run average and another player’s average, so we might want to be careful about the interpretation of the results of our calculations. 1.40 s (2) Do the math: x ± t * = 2.29 ± 2.021 = (1.8591, 2.7209) n 43 (3) Conclusions: We are 95% confident that the mean home run percentage in 100 at bats for professional baseball players is between 1.8591 and 2.7209. 11) The manager of the dairy section of a large supermarket took a random sample of 250 egg cartons and found that 40 cartons had at least one broken egg. Find a 90% confidence interval for p. Follow the steps of the inference toolbox. (1) Choose procedure and check conditions: Use a one-sample z confidence interval for a 40 population proportion, with pˆ = . 250 (a) SRS: We are told the sample is a random sample of egg cartons from the population of all egg cartons at this large supermarket. (b) Normality: npˆ = 40 and n (1 − pˆ ) = 210 , so we can assume the sampling distribution of sample proportions ( p̂ ) is at least approximately normal. (c) Independence: We are willing to assume the cartons are independent of one another, and we assume that the population consists of more than 2500 egg cartons. (2) Do the math: pˆ ± z * pˆ (1 − pˆ ) .16(1 − .16) = .16 ± 1.645 = (.12186,.19814) n 250 (3) Conclusion: We are 90% confident that the proportion p of egg cartons in the dairy section of this particular large supermarket that have at least one broken egg is between .12186 and .19814. Assume the manger repeats this same process a different week and found the same sample proportion of cartons broken for 250 egg cartons. Since the manager knows some statistics he calculated a confidence interval for the mean proportion and found it to be (.11456 to .20544). A couple days later he looked at his work and realized he forgot to write down the confidence level for his interval. Find the manager’s confidence level for this interval. 0.20544-0.16=.04544 is the margin of error z* pˆ (1 − pˆ =m n z* .16(1 − .16) = .04544 250 z * (.023186) = .04544 z * = 1.9597 , which is close to the z-score of 1.96 that corresponds to a confidence level of 95% 12) Independent random samples of professional football and basketball player’s heights gave the following information (Reference: Sports Encyclopedia of Pro Football and Official NBA Basketball Encyclopedia). Football: n = 45, x1 = 6.179 feet and s1 = 0.366 feet Basketball: n=40, x2 = 6.453 feet and s2 = 0.314 feet Follow the steps of the inference toolbox to construct a 95% confidence interval for the difference between the means of football and basketball player’s heights. 7 (1) Choose procedure and check conditions: Use a two-sample t confidence interval for the difference in two population means. (a) SRS: We are told the samples are independent random samples from their corresponding populations. (b) Normality: We are not told that the players’ heights vary normally. However, this is typically true of heights. Even so, by the CLT, we can conclude that the sampling distributions of x1 and x2 do vary at least approximately normally. (c) Independence: We are willing to treat the observations as independent observations because each sample makes up less than 10% of its corresponding population. (2) Do the math: ( x1 − x2 ) ± t * .3662 .3142 s12 s2 2 + = (6.179 − 6.453) ± 2.042 + = (−.4207, −.1273) n1 n2 45 40 (3) Conclusion: We are 95% confident that the difference in the mean height of professional football and the mean height of professional basketball players is between −0.4207 and − 0.1273 . 13) At a community hospital, the burn center is experimenting with a new plasma compress treatment. A random sample of n1 = 316 patients with minor burns received the plasma compress treatment. Of these patients, it was found that 259 had no visible scars after treatment. Another random sample of n2 = 419 patients with minor burns received no plasma compress treatment. For this group, it was found that 94 had no visible scars after treatment. Let p1 be the population proportion of all patients with minor burns receiving the plasma compress treatment that have no visible scars. Let p2 be the population proportion of all patients with minor burns not receiving the plasma compress treatment that have no visible scars. Find a 95% confidence interval for p1 – p2. Follow the steps of the inference toolbox. (1) Choose procedure and check conditions: Use a two-sample z confidence interval for 259 94 the difference in two population proportions, with pˆ1 = = .82 and pˆ 2 = = .22 . 316 419 (a) SRS: We are told the samples are random samples from their corresponding populations, and we will assume that these samples are independent of one another. (b) Normality: Since (316)(.82) and 316(1-.82) are bigger than 5 and 419(.22) and 419(1-.22) are bigger than 5 we can assume the distribution of p1 – p2 is approximately normal. (c) Independence: We can assume there are more than 316(10) and 419(10) burn patients, hence we are willing to consider each patient independent from each of the other patients within the same sample. (3) Do the math: .82(1 − .82) .22(1 − .22) pˆ (1 − pˆ1 ) pˆ 2 (1 − pˆ 2 ) ( pˆ1 − pˆ 2 ) ± z * 1 + = (.82 − .22) ± 1.96 + = (.53703,.65352) n1 n2 316 419 (4) Conclusion: We are 95% confident that the true difference p1 – p2 in the proportion of patients that have no visible scars after plasma compress treatment (p1) and the proportion of patients that have no scars without the plasma compress treatment (p2) is between 0.53703 and 0.65352 . 14) Attending sporting events is a popular source of entertainment. When 1000 people were surveyed, 590 said that getting together with friends was an important reason for attending a sporting event (USA Today). Find a 99% confidence interval for p. Follow the steps of the inference toolbox. (1) Population and Parameter: We wish to estimate p, the proportion of people who would say that getting together with friends is an important reason for attending a sporting event. (2) Choose procedure and check conditions: Use a one-sample z confidence interval for a 590 population proportion, with pˆ = = .59 . 1000 (a) SRS: We are not told whether this is a random sample. If it is not, we need to be careful about generalizing our findings to the entire population. (b) Normality: np = 590 and n(1 – p) = 410. Both are greater than 10, so we can assume that the distribution of p̂ is at least approximately normal. (c) Independence: We are not told that the sample observations are independent of one another, so we must be careful about interpreting the results of our calculations. .59(1 − .59) pˆ (1 − pˆ ) (3) Do the math: pˆ ± z * = .59 ± 2.576 = (.54994,.63006) n 1000 (4) Conclusion: We are 99% confident that the proportion of people who would say that getting together with friends is an important reason for attending a sporting event is between .54994 and .63006. Assume we have the same sample proportion as stated above but an unknown sample size. If we want to be 99% confident and want the margin of error to be ± .06, how large of a sample is needed? 2.576 .59(1 − .59) ≤ .06 n 2.576 .59 (.41) ≤ n .06 21.116 ≤ n 21.1162 ≤ n n ≥ 445.887 So we need a sample size of at least n = 446. 15) From public records, individuals were identified as having been charged with drunken driving not less than 6 months or more than 12 months from the starting date of a given study. Two random samples from this group were studied. In the first sample of 30 individuals, the respondents were asked in face-to-face interviews if they had been charged with drunken driving in the last 12 months. Of these 30 people, 16 answered the question accurately. The second random sample consisted of 46 people who had been charged with drunken driving. During a telephone interview, 25 of these responded accurately. Assume the samples are representative of all the people recently charged with drunken driving. Let p1 be the population proportion of those interviewed face-to-face that answered correctly and let p2 be the population proportion of those interviewed over the phone that answered correctly. Find a 98% confidence interval for p1 – p2. Follow the steps of the inference toolbox. (1) Choose procedure and check conditions: Use a two-sample z confidence interval for 16 25 the difference in two population proportions, with pˆ1 = = .533 and pˆ 2 = = .5434 . 30 46 (a) SRS: We are told both samples are random samples, and we are willing to assume that the samples are independent. (b) Normality: n1p1 = 16 and n1 (1 – p1) = 14, while n2p2 = 25 and n2 (1 – p2) = 21. All are greater than 5, so we can assume that the distributions of both sample proportions are at least approximately normal. (c) Independence: We are not told that the sample observations are independent of one another, but this seems like a reasonable assumption in this case. (2) Do the math: ( pˆ1 − pˆ 2 ) ± z * pˆ1 (1 − pˆ1 ) pˆ 2 (1 − pˆ 2 ) + n1 n2 = (.533 − .5434) ± 2.326 .533(1 − .533) .5434(1 − .5434) + = (−.2823,.26205) 30 46 (3) Conclusion: We are 98% confident that the difference in the proportion of those individuals charged with drunken driving who were truthful when interviewed via phone (p2) and the proportion of those individuals charged with drunken driving who were truthful when interviewed face-to-face (p1).is between -.2823 and .26205.