Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Elementary Statistics Triola, Elementary Statistics 11/e Unit 14 The Confidence Interval for Means, Ο Unknown Estimation Unit 14 The Confidence Interval for Means, Ο Unknown (Section 7-4) We are now ready to begin our exploration of how we make estimates of the population mean. Before we get started, I want to emphasize the importance of having collect a representative sample, i.e. one that is a simple random sample. Without that, our estimates are useless. Μ , the mean of our sample. However, we do not The best estimate of the mean that is available to us is π expect π₯Μ equal π, therefore, this single estimate, while a good start is somewhat useless because we do not know how far off from π we might be. What we need is a Lower Bound and an Upper Bound in which we could have some confidence that π falls between these two limits. The first thing we need to quantify is the word βconfidenceβ. As example, letβs say that we want to be 95% confident, and we now explore what that means, other than βpretty darn sureβ. First note, that we are working with averages, π₯Μ and π. That means that the probability distribution we will be working with is the sampling distribution of the mean, whose mean is ππΜ and standard deviation, ππΜ . According to the Central Limit Theorem,. ππ₯Μ = π and ππ₯Μ = π , βπ and so we will be working with those latter values. Now picture the sampling distribution with π at its center. All possible π₯Μ are in the sampling distribution somewhere, and so if we find a value E such that the interval, π ± πΈ which is centered on π, captures 95% of the area under the curve, it will also capture 95% of all possible π₯Μ . Unfortunately, I donβt have any good pictures to show this, so Iβll lecture on this in class. One last piece, if 95% of the π₯Μ lie within πΈ of π then π must lie within πΈ of 95% of the π₯Μ . To put this in another way. 95% of the time we take a sample, weβre going to get π₯Μ such that π lies in π₯Μ . ±πΈ. Therefore this is our 95% confidence interval and πΈ is called the margin of error. Now all we have to do is to find E. Unfortunately, this is easier said than done. Letβs go back to our sampling distribution. If π ± πΈ encompasses 95% under the normal curve, then πΈ must be 1.96 standard deviations units from π. Under any normal curve, 95% of the area centered under the curve, is bounded by 1.96π from the mean. Hence, πΈ = 1.96ππ₯Μ = 1.96 π . βπ Weβre done, right? Well, not exactly. Remember, we set out to estimate π because we didnβt know it. So what would we know π if we donβt know π? Fortunately, for very large samples, say size 100 or larger, we can use s, the sample standard deviation in place of π. Therefore, we can have πΈ = 1.96 π , βπ but what do we do about smaller sample, say size 20 or 30? This problem wasnβt solved until around the turn of the 20th century, when William Gosset, working for the Guinness Brewery company worked out a probability distribution that could be used to perform quality control tests using small samples. He called it the Student t distribution. This distribution is very similar in shape to the Normal distribution, except that it is wider, i.e. it has a larger standard deviation. Furthermore, the size of the standard deviation depends on the sample size, the smaller the sample the 48 Elementary Statistics Triola, Elementary Statistics 11/e Unit 14 The Confidence Interval for Means, Ο Unknown larger the standard deviation. Take at a look at the following figure that shows different shapes for the t-distribution as a function of sample size, as well as comparing it to the Normal distribution. Here are a few more rules for working with the t-distribution. If you know for a fact, or you strongly suspect (because you carefully examined the histogram of your sample) that the underlying population is normally distributed, then the sample size is not that important other than its effect on the shape of the t-curve. However, if you suspect that the underlying population is not all that normally shaped, then you sample size should be a minimum of 30. Nomenclature When we were working with the Standard Normal distribution, we called the horizontal axis, the z-axis. The z value that bordered the 95% area centered under the curve to the right was called a critical value and denoted π§πΌβ2 . πΆ is called the significance and is the sum of the area of the tails, i.e. the area under the curve to the left and right of the centered 95% area. Hence, in this case πΌ = 0.05. See the figure below. The critical value corresponding to a centered 95% area is, π§πΌβ2 = 1.96. (1.96 = NORM.S.INV(.975)) 49 Elementary Statistics Triola, Elementary Statistics 11/e Unit 14 The Confidence Interval for Means, Ο Unknown We have a completely analogous situation when it comes to using the t-distribution. The axis is called the t-axis, the critical value is π‘πΌβ2 , and πΈ = π‘πΌβ2 π . βπ In order to find π‘πΌβ2 we need to know to things, the confidence level and the size of the sample. There is one more twist when finding πππ π«πππ¬ π¨π ππ«ππππ¨π¦ . First, we work with πΌ, the significance which is one minus the confidence level expressed as a decimal. For a confidence level of 95% we have, πΌ = 1.0 β 0.95 = 0.05 We also need to use the degrees of freedom which is simply the sample size minus one, Deg_ freedom = π β 1 Worked Example We receive a batch of 50,000 washers, and we wish to estimate the average inside diameter of the washers. We carefully select a simple random sample of size 20 and find that the average inside diameter is 24.78mm with a standard deviation of 1.62mm. We want to calculate a 95% confidence interval for our estimate of the batch µ. First we calculate π‘πΌβ2 using T.INV.2T, 50 Elementary Statistics Triola, Elementary Statistics 11/e Unit 14 The Confidence Interval for Means, Ο Unknown Note that the tool uses the work Probability instead of πΌ or Significance. We see that π‘πΌβ2 = 2.0930 and we proceed to calculate E, (2.0930)(1.62) π πΈ = π‘πΌβ2 = = 0.7582 βπ β20 Finally, the confidence interval is, π₯Μ ± πΈ = 24.78 ± 0.76 = (24.02, 25.54) Below is the Excel spreadsheet that I used to calculate these values. If you double click on the table, you will bring up a copy of Excel. Then if you select any of the cells, such as the value for t, you will see the Excel formula in the formula bar, ππ toward the top. x 24.78 s n 1.62 t E x-E x+E 20 2.093024 0.7582 24.02182 25.53818 One last note. Suppose that the manufacturer of the washers had claimed that the average inside diameter was 25.00mm. On the basis of this sample, could you refute the claim? You could not because 25.00 lies within (24.02, 25.54). Hence, thereβs no reason to doubt the manufacturerβs claim. Now itβs your turn to have some fun. Youβll want to open Excel 2010 and label some cells as I did. Hereβs the situation. Assume the population is normally distributed. For a sample size of 61, the average weight loss was 4.0 kg with a standard deviation of 6.4 kg. Find a 99% confidence interval for the mean of the population. Use Excel in exactly the same way I did. See the answer at the end of this unit. This is the end of Unit 14. In class, you will get more practice with these concepts by working exercises in MyMathLab. 51 Elementary Statistics Triola, Elementary Statistics 11/e Unit 14 The Confidence Interval for Means, Ο Unknown Answers x s 4 n 6.4 t E x-E x+E 61 2.660283 2.1799 1.8201 6.1799 52