Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MGMT 201: Statistics Interval Estimation (ASW Chapter 8) What is interval estimation? An interval estimate is range of values such we believe a certain characteristic falls within that range with some given probability. This is quite useful in nearly every field. Consider political polling, space flights, etc. Interval Estimation of a Population Mean Recall that x is defined as the sampling error. We typically will never know the sampling error, but we can say something about it. Basic Intuition: The CLT tells us that x will be approximately normally distributed when n is large. We can therefore use the normal table to establish confidence intervals (and, in fact, do much more). example: Consider the following population data. 14 91 87 48 18 24 32 12 34 33 80 25 42 24 33 67 24 67 95 77 82 36 0 5 35 85 30 18 82 91 44 25 76 0 86 88 2 91 75 91 4 70 25 58 31 72 63 73 48 66 24 55 16 25 64 34 97 32 12 88 96 35 94 72 90 91 78 41 97 29 =43.686; =30.251 If we were to draw a random sample of size n from the population…. We are 95% certain that x will fall within 1.96 standard deviations of the mean….BUT, we need the standard deviation of x . Suppose n=36. Since we know the population, we can use the finite population formula. Most software packages do not provide for this and instead use the infinite population formula. This is reasonable because if we know the population, there’s no need to be taking samples! Below are the solutions using both approaches. Finite population approach (the correct one for this problem). 43.686+1.963.54 (36.747 and 50.621). This is our interval estimate. Infinite population approach (the one used by software packages and encountered most often in reality) 70 36 30.251 3.54 . 70 1 36 So, we are 95% sure that x will fall between 43.686–1.963.54 and x x 30.251 36 5.04 . So, we are 95% sure that x will fall between 43.686–1.965.04 and 43.686+1.965.04 (33.804 and 53.568). This is our interval estimate. We might alternatively look at a 99% confidence interval. From the normal table, we see that we are 99% sure that x will fall within 2.576 standard deviations of the mean. Finite population: We are 99% sure that x will fall between 34.567 and 52.801. Infinite population: We are 99% sure that x will fall between 30.698 and 56.674. We call such statements precision statements. To be clear, let me stress that the finite population approach is the correct one for this problem. In nearly every “real” case, we will be using the infinite population approach. In general, P x z / 2 x 1 . Here, is the probability of falling outsider the given range and 1- is the confidence level. z/2 corresponds to the area in the tail. So if =5%, we want to find the entry in the normal table corresponding to a tail of area 2.5%. We therefore look up 0.5-0.025 = 0.475 to find the appropriate z-score. example: A random variable has population mean 200 and population standard deviation of 50. What is a 90% confidence interval for a sample size of 100? = 10%, so we need to find the z score corresponding to a tail of 5% (i.e., we look up 0.45). From the normal table, we see that z5% = 1.645. x 50 100 5 , so a 90% confidence interval is {200-1.6455, 200+1.6455} = {191.775,208.225}. We are 90% sure that x will fall in this range when n=100. Formally, we write x z / 2 n for a 1- confidence interval. Now, suppose we want to choose n so that we are 95% sure that x will fall between 190 and 210. What is n? Recall that z2.5% = 1.96. We therefore need x to satisfy 1.96 x = 10. So, x = 5.102. , so n x n x 2 50 96.04 . 5.102 2 We therefore need a sample size of 97 to be 95% sure that x will fall between 190 and 210. Notice that z / 2 n E here (E is the margin of error specified in the units of the z / 2 random variable), so we can rearrange the formula to get n E 2 50 1.96 In this problem, E=10, =50, and z/2 = 1.96, so n = 96.04. 10 2 Interval Estimation using Sample Means In most cases, we do not know the population parameters. In such cases, we must use the information contained in samples to extract information about the underlying population. The only substantial difference in the approach is that we use s instead of . example: Consider election polls and suppose that a poll of 1000 likely voters showed Bush with 48%. What is the margin of error? Recall that s p p1 p , so s p n 1 0.48 1 0.48 0.0158 1000 1 The “margin of error” is typically quoted as a 95% confidence interval, so we are interested in the range of numbers within 1.96 standard deviations of the mean. In this case, 1.960.0158 = 0.03097. The margin of error is then about 3%. example: Suppose that we want to do an election poll and want a margin of error of 1%. What sample size do we need? We do not know p or p prior to taking the sample, so we must arbitrarily choose some value. p1 p is highest when p=0.5. Any other value results in a numerator n Notice that p that is less than 0.25. Because this represents the worst case scenario, it is common practice to assume p=0.5 when designing a poll. We want 1.96 0.5 1 0.5 0.01 . n Solving gives n=9604. example: Consider the following sample data: 1698 1926 1566 1812 1858 1807 1241 1248 1263 1367 1687 1388 1119 1714 1022 1544 1881 1636 1389 1039 1875 1492 1552 1827 1848 1341 1601 1053 1768 1408 1503 1372 1786 1550 1474 1257 1238 1625 1648 1842 x = 1565.14; s = 274.95 What is a 92% confidence interval? sx s n 274.95 70 1388 1454 1210 1604 1915 1103 1623 1496 1255 1536 1097 1872 1253 1538 1957 1586 1188 1885 1960 1549 1825 1991 1697 1251 1938 1989 1890 1885 1541 1817 32.863 From the normal table, we see that 0.46 corresponds to a z score of about 1.75. We look up 0.46 because it is half of 92%. In that way, we allow for a 4% tail on each side of the distribution. So, a 92% confidence interval is [1565.14-1.7532.863, 1565.14+1.7532.863] = [1507.6,1622.6]. We are 92% certain that is within that range. Interestingly, I generated the data using a discrete uniform distribution between 1000 and 2000. That distribution has = 1500. We therefore might incorrectly assume from the sample that the population mean is above 1500. This error is called a Type I error and will be considered in the next chapter. Dealing with Small Samples The Central Limit Theorem is great, but we are often faced with situations in which we have a small sample. Perhaps, for example, we are doing a test that destroys the product. If we use a large sample, it is extremely costly. If the underlying distribution is not approximately normal in distribution, we are in a bind. Unless we know the underlying distribution and can form other test statistics, we cannot adequately compute test statistics. In such cases, our only option is to increase the sample size to the point where the CLT is reasonable. If, however, the underlying distribution is approximately normal, we have some hope. Why? Recall that if the underlying random variable is normally distributed, then x will be “approximately” normally distributed for any n. By approximately, I mean that x has a normal-like shape, but isn’t quite normal for small n. Fortunately, we know the distribution of x for those cases. The t-Distribution The t-distribution is used in precisely the same way as the normal, but is used when n is small and the underlying distribution is approximately normal in distribution. standard normal f(x) t with n = 30 t with n = 15 x Notice that as n gets larger, the t-distribution becomes closer and closer to the normal distribution. example: Consider the following sample data: 47.00 67.33 43.10 37.22 28.16 33.10 52.44 47.66 31.53 62.76 60.95 40.22 61.98 39.13 42.26 What is an 80% confidence interval? Looking at the table (Table 2 in Appendix B), we see that the t-distribution requires us to choose something called the degrees of freedom. Intuitively, the degrees of freedom are the number of opportunities the data has to vary. We have 15 observations, but 14 degrees of freedom. Also notice that the t table differs from the normal table. Instead of having the z-score on the axis and the probabilities tabulated, the t table has the probability and degrees of freedom on the axes and the equivalent of the z-score tabulated. For an 80% confidence interval, we need 10% in each tail. So, we look up the entry for 14 degrees of freedom and 0.10 in the upper tail. This gives us t = 1.345. x = 46.32 and s x 12.34 15 3.19 , so the 80% confidence interval is [46.32- 1.3453.19,46.32+1.3453.19] = [42.04,52.61]. Formally, we write x t / 2 s n for a 1- confidence interval.