Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence Intervals Math 283 Confidence Intervals for the Population Mean Recall that from the empirical rule that the interval of the mean plus/minus 2 times the standard deviation will contain about 95% of the observations. So if X is distributed σ σ approximately normally, P µ − 2 < X < µ+2 ≈ 0.95 if we rearrange it so µ is n n σ σ in the middle then P x − 2 < µ < x +2 ≈ 0.95 . The interval n n σ σ ,x +2 x −2 has a probability of 0.95 of capturing the mean. n n Definition: If X is the sample mean of a random sample of size n from a population with variance σ 2 , a (1 − α )100% confidence interval for µ is given by σ σ , X + Zα / 2 X − Zα / 2 n n Where Zα / 2 is the Z value from the normal table with area α / 2 to its right. If σ , the population standard deviation is unknown, it can be replaced by s, the sample standard deviation with no serious loss of accuracy for large sample cases. If we use α = 0.05 , we report that we are 95% confident that the population mean will be within our interval. Why are we able to say we are 95% confident? We know (from the Empirical Rule) that about 95% of all possible sample means will lie within two standard errors of the actual population mean. We hope that our sample mean is one of these, because if it is, then our confidence interval will contain the population mean, and our estimate will be correct. If not, then our interval will be incorrect. But this only happens 5% of the time. The term "95% confidence" means that if we took repeated samples, and found a confidence interval for each sample, 95% of those confidence intervals would actually contain the population mean; 5% of them would not. Whether our own confidence interval contains the population mean, we will never know! The Empirical Rule Theorem v.s. A Confidence Interval The Empirical Rule Theorem and A Confidence Interval for the Mean are used to answer two different research questions. 1 Confidence Intervals Math 283 The Empirical Rule Theorem is used to answer the question "Most of the values for the variable fall between what two values?" This is a range of values used to discuss what we know about the individuals in our sample or population. A Confidence Interval is used to answer the question "What is the mean of the population?" This is a range of values used to give reasonable values for the population mean. Example: The average zinc concentration recovered from a sample of zinc measurements in 36 different locations in a river is found to be 2.6 grams per milliliter. Find the 95% and 99% confidence intervals for the mean zinc concentration in the river. Assume the population standard deviation is 0.3. Example: An important property of plastic clays is the percent of shrinkage on drying. For a certain type of plastic clay 45 test specimens showed an average shrinkage of 18.4% with a standard deviation of 1.2. Estimate the mean percent shrinkage for this type of clay with a 90% confidence interval. Another way to think about the confidence interval is: x ± MOE where MOE is the margin of error, MOE = Zα / 2 σ . Notice the width or precision of our confidence n interval depends on confidence level 1 − α , sample size n, and standard deviation of the population. The accuracy of our sample mean depends on the sample size, n, the standard deviation of the population, σ , and bias. 2 Confidence Intervals Math 283 Example: Decision Making with a Confidence Interval The owners of General Light are planning to advertise their light bulbs in the Sunday edition of the newspaper. In the ad, they want to report "the mean lifetime of their light bulbs." To determine the mean lifetime of their light bulbs, they took a random sample of 40 light bulbs. For their sample, the bulbs lasted on average, 299.5 hours with a standard deviation of 58 hours. 1. Construct a 95% confidence interval for the mean lifetime of light bulbs. 2. Should General Light advertise that the mean lifetime of their light bulbs is 350 hours? Why or why not? 3. Should General Light advertise that the mean lifetime of their light bulbs is 310 hours? Why or why not? Determining Sample Size When our objective is to estimate the population mean, µ , we should do the following to determine our sample size: 1. Determine the largest margin of error you are willing to accept and a confidence level. 2. Obtain or estimate the population standard deviation. 3. Find the sample size, n, that makes the following true: Your MOE = Zα / 2 σ n . 4. Check the sample size against your budget. If necessary, return to step 1. Example: You are planning a survey of starting salaries for liberal arts major graduates from you college. From a pilot study you estimate that the standard deviation is about $9000. What sample size do you need to have a margin of error equal to $400 with 95% confidence? 3 Confidence Intervals Math 283 Cautions: ∼ Data must come from a SRS ∼ No correct method from data haphazardly collected with bias of unknown size. ∼ The sample mean is not resistant to outliers. So look at your data carefully before determining a CI. ∼ If n is small and population is not normal, the true confidence level may be different from what you used. o As long as n ≥ 30 , CLT applies o If n ≥ 15 , it is ok unless there are extreme outliers i.e. quite strong skewness. ∼ Must know σ . The Case of the Unknown σ If X is the sample mean of a random sample of size n where X 1 , X n are from a normal distribution then the random variable X −µ s/ n has a probability distribution called the t-distribution with degrees of freedom n − 1 . t= Properties of the t-distribution (or Student’s t-distribution) • Bell shaped with the mean zero. ν where ν is the degrees of freedom. • The variance • The limiting distribution of the t is the standard normal distribution as n goes to infinity. See Table attached. • ν −2 A (1 − α )100% Confidence Interval for µ when σ is unknown Let x and s be the sample mean and standard deviation of a random sample of size n from a normally distributed population then the confidence interval is given by s s , X + tα / 2 X − tα / 2 n n where tα / 2 is the value from the t-distribution with degrees of freedom n − 1 and α / 2 is the upper tail probability. Note, this interval is fairly robust to non-normal data. If the data is not too skewed, then t procedure is useful when 15 ≤ n < 40 . When n ≥ 40 , the t procedure can be used even for skewed data. 4 Confidence Intervals Math 283 Example: The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2, and 9.6 liters. Find a 95% confidence interval for the mean of all such containers, assuming the data are from a normal distribution. Example: A random sample of 12 graduates of a certain secretarial school typed an average of 79.3 words per minute with a sample standard deviation of 7.8 words per minute. Assuming a normal distribution for the number of words typed per minute, find a 99% confidence interval for the mean number of words per minute for all graduates. Confidence Interval for the Population Proportion If X 1 , X n are independent observations from a population with probability of success, n then the random variable X = ∑ X i is distributed binomial with E ( X ) = np and i =1 V (= X ) np (1 − p ) . We showed that Z n = X − np np (1 − p ) approaches the standard normal distribution as n goes to infinity. So the sampling distribution of pˆ = X / n is approximately normal with µ p̂ = p and σ pˆ = p (1 − p ) . n 5 Confidence Intervals Math 283 A (1 − α )100% confidence interval for p, the population proportion is given by pˆ − Zα / 2 pˆ (1 − pˆ ) , pˆ + Zα / 2 n pˆ (1 − pˆ ) n Where Zα / 2 is the Z value from the normal table with area α / 2 to its’ right. Example: A survey of 1280 student loan borrowers found that 448 had loans totaling more than $20,000 for their undergraduate education. Give a 95% confidence interval for the proportion of all student loan borrowers who have loans of $20,000 or more for their under graduate degree. Determining Sample Size When our objective is to estimate the population proportion, p, we should do the following to determine our sample size: 1. Determine the largest margin of error you are willing to accept and a confidence level. 2. Determine p from previous study or use p = 0.5 . 3. Find the sample size, n, that makes the following true: p (1 − p ) . n 4. Check the sample size against your budget. If necessary, return to step 1. Your MOE = Zα / 2 Example: You are planning an evaluation of an alcohol awareness program at your college that will take place six months after the program. How large a sample should you take if you want the margin of error for 95% to be about 0.1? 6 Confidence Intervals Math 283 Student’s t Distribution d.f. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 35 40 45 50 55 60 90 120 ∞ 0.125 2.414 1.604 1.423 1.344 1.301 1.273 1.254 1.240 1.230 1.221 1.214 1.209 1.204 1.200 1.197 1.194 1.191 1.189 1.187 1.185 1.183 1.182 1.180 1.179 1.178 1.177 1.176 1.175 1.174 1.173 1.170 1.167 1.165 1.164 1.163 1.162 1.158 1.156 1.150 0.100 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.306 1.303 1.301 1.299 1.297 1.296 1.291 1.289 1.282 Upper tail probability 0.075 0.050 0.025 4.165 6.314 12.706 2.282 2.920 4.303 1.924 2.353 3.182 1.778 2.132 2.776 1.699 2.015 2.571 1.650 1.943 2.447 1.617 1.895 2.365 1.592 1.860 2.306 1.574 1.833 2.262 1.559 1.812 2.228 1.548 1.796 2.201 1.538 1.782 2.179 1.530 1.771 2.160 1.523 1.761 2.145 1.517 1.753 2.131 1.512 1.746 2.120 1.508 1.740 2.110 1.504 1.734 2.101 1.500 1.729 2.093 1.497 1.725 2.086 1.494 1.721 2.080 1.492 1.717 2.074 1.489 1.714 2.069 1.487 1.711 2.064 1.485 1.708 2.060 1.483 1.706 2.056 1.482 1.703 2.052 1.480 1.701 2.048 1.479 1.699 2.045 1.477 1.697 2.042 1.472 1.690 2.030 1.468 1.684 2.021 1.465 1.679 2.014 1.462 1.676 2.009 1.460 1.673 2.004 1.458 1.671 2.000 1.452 1.662 1.987 1.449 1.658 1.980 1.440 1.645 1.960 0.010 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.438 2.423 2.412 2.403 2.396 2.390 2.368 2.358 2.326 0.005 63.656 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.724 2.704 2.690 2.678 2.668 2.660 2.632 2.617 2.580 7 Confidence Intervals Math 283 1. The average weight of 40 randomly selected minivans was 4150 pounds. a. Find and interpret a 98% confidence interval for the mean weight of all minivans. The standard deviation is known to be 480 pounds. b. What could we do to reduce the width of this interval? c. What are the advantages/disadvantages of your answers in b? 2. The weight of grapefruit follows a normal distribution. A random sample of 12 new hybrid grapefruit had a mean weight of 1.7 pounds with a standard deviation of 0.24 pounds. Find a 95% confidence interval for the mean weight of the population of new hybrid grapefruits. 3. A researcher wishes to estimate, within $25, the true average amount of postage that parents of college students spend each year. If she wishes to be 90% confident, how large a sample is necessary? The standard deviation is known to be $80. 4. A survey by Brides magazine found that 8 out of 10 brides are planning to take the surname of their new husband. How large a sample is needed to estimate the true proportion to within 3% with 98% confidence? 5. A researcher wishes to estimate the proportion of adult females under 5 feet tall. He wants to be 90% confident that his estimate is within 5% of the true proportion. What sample size should he use? 6. In a survey of 200 workers, 169 said they were interrupted three or more times an hour by phone messages, faxes, etc. Find and interpret a 90% confidence interval of the population of proportion of workers who are interrupted three or more times an hour. 7. A sample of 17 states had these cigarette taxes (in cents): 112, 120, 98, 55, 71, 35, 99, 124, 64, 150, 150, 55, 100, 132, 35, 70, 93. Find a 98% confidence interval for the mean cigarette tax in all 50 states. What assumption is necessary? 8