Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 7.1: Large-Sample Condence Interval for a Population Mean We now know how to construct probability statements about X using the sampling distribution of X obtained by the CLT Now, we'll use the sampling distribution of X to compute something called a condence interval (CI) for . This is an interval estimator of the population mean . All condence intervals for are of the following form: (x ; Kx; x + Kx), where x = pn and K is a constant that depends on n and a confidence level that you specify. The confidence level of an interval is the percentage of the time that the interval will enclose the population parameter, if the interval is computed by repaeting the sampling procedure a large number of times. The confidence coefficient is the condence level expressed as a fraction. Typical condence levels are 90%, 95%, and 99% How would you report a condence interval? Example: Say we have a 95% CI of (4:1; 7:9) for . We are 95% condent that the true mean of .... lies between 4.1 and 7.9 (units). Report: How would you interpret a condence interval? Recall that we have no idea what the true mean is, so we take a sample and compute the 95% CI, say, (4:1; 7:9). Say another sample is taken and a new 95% CI is computed. Would you expect the 95% CI from the second sample to be (4:1; 7:9)? No. Say, the second sample gives a 95% CI of (2:7; 6:5). Both intervals are 95% CI's for . What does the 95% mean ? The 95% means that if we repeat the whole process 100 times and compute 100 intervals for , 95 of the intervals will contain the true mean . Now, one must decide what to use for the constant K. For large sample estimation, we can use the CLT to compute K. The CLT says that for n 30, x N (x; x2) 29 30 Keeping the CLT in mind, one may expect to use the Normal distribution in constructing a large sample CI. Example: A large-sample 100(1 ; )% CI for is given by: (x ; z=2 x; x + z=2 x) or, x z=2 x or, equivalently, 0 1 x z=2 BB@ p CCA n If is not known, which is usually the case, and n is large, then an approximate CI is 1 0 s CC B B p x z=2 @ A n where s = sample standard deviation. a) Construct a 90% CI for . 1 0 0 10:82 1 s x z0:05 B@B p CAC = 23:43 (1:645) B@ p CA n 96 giving 23:43 1:82 or (21:61; 25:25) as the required CI. How do you nd z=2? z=2 is the z value such that the area to the right is =2. By choosing z=2, we are specifying that - the condence level of the CI is 100(1 ; )%. - or, equivalently, the condence coecient is (1 ; ). Thus we chose to specify the condence level of our CI. Example: See Table Example: 95% CI ) = 0:05 ) 2 = 0:025 90% CI ) = 0:10 ) 2 = 0:05 7.2 and follow Example 31 7.1 In a study to estimate the mean number of years of service of bank executives with degrees in business or economics, 96 such bank executives were sampled and the number of years of service of each determined. The sample had a mean of x = 23:43 years and a standard deviation of s = 10:82 years. b) Interpret the CI in the context of the problem. We can be 90% condent that the true mean number of years of service is between 21.61 and 25.25 years. c) Construct an 80% CI for . 0 1 0 10:82 1 s x z0:10 BB@ p CCA = 23:43 (1:28) B@ p CA n 96 giving 23:43 1:41 or (22:02; 24:84) as the CI. Note that a decrease in condence level corresponds to a decrease in the width of the interval (i.e., a narrower width). What would happen if n is increased? If and are xed, and assuming x remains the same as n increases, x decreases, which implies that the width of the CI decreases as n increases. 32 Section 7.2 { Small-Sample Condence Interval for a Population Mean In the large sample situation (n 30), the CLT helped us to formulate a CI for a population mean by making it possible for us to use a standard normal percentile z=2 . However, in some cases, we may not be able to obtain a large sample, but may still want to formulate a CI. Because the sample size is small we now have 2 potential problems: 1. We can no longer use the CLT. 2. The sample standard deviation (s) may be a poor approximation of . In the case of the rst problem, we will make the following assumption: If the population being sampled is approximately Normal, the sampling distribution of X will be approximately Normal, even for small sample sizes. Now, if we have a good approximation of (from past data, for instance), and our assumption of approximate Normality is correct, we can use the following: x z=2 p n However, if we don't know and use s computed from the sample to estimate it, and our assumption of approximate Normality is correct, we may use the following CI: s x t=2;(n;1) p n where t=2;(n;1) is a percentile from the t-distribution based on (n ; 1) degrees of freedom. This CI is based on the t-statistic X ; t= p S= n which is said to have a t-distribution degrees of freedom. with (n ; 1) The t-distribution is a continuous, symmetric, moundshaped distribution, similar to a N (0; 1). However, the t-distribution is a squashed N (0; 1). That is, the t-distribution is not as tall as a N (0; 1) and it has more area in the tails. (see Figure 7.7) One can view t=2;(n;1) as a conservative estimate to z=2 . We need to be conservative because s may be a poor approximation of . 33 34 Compute the t percentile given n = 25 and condence level 95%. (Use Table VI and look at Figure 7.8) b) Interpret the interval in (a). We are 90% condent that the true mean LOS in 1990 will be between 3.34 and 4.26 days. c) Suppose is known to be 1.2 days (say, from past data). How will the interval change? Since is known we may use the Normal approximation here . A 90% CI for is: (1:2) X z0:05 p = 3:8 (1:645) p n 20 = 3:8 0:44 = (3:36; 4:24) Note that the CI in (c) is smaller than the CI in (a). Example: Since (n ; 1) = 24; =2 = :025 we look up Table VI with df = 24 and nd t:025;24 to be 2:064. The variance of a t-distribution depends on the sample size (n). The smaller the sample size, the more variability. This dependence of the t-distribution on sample size is expressed through the degrees of freedom (df) of the t-distribution. If we have n observations in the sample, we will have (n ; 1) df for the t-statistic. Example: (Exercise 7.24) Health insurers and the federal government are both putting pressure on hospitals to shorten the average length of stay (LOS) of their patients. A random sample of 20 hospitals on one state had a mean LOS in 1990 of 3.8 days, and a standard deviation of 1.2 days. a) Use a 90% CI to estimate the population mean LOS for the state's hospitals in 1990. n = 20; x = 3:8 days ; s = 1:2 days and t0:05;19 = 1:729 A 90% CI for is: s (1:2) x t0:05;19 p = 3:8 (1:729) p n 20 = 3:8 0:46 = (3:34; 4:26) 35 Section 7.4 { Determining the Sample Size Necessary to Estimate a Population Mean The question that needs to be answered now is: \How big of a sample do I need to take?" Usually want a sample that is just \big" enough to be able estimate the population mean to be within a bound B with 100(1 ; )% condence. i.e.,the width of a 100(1 ; )% CI for must be at least 2B . Thus if we have the right n, then B is equal to one-half the width of the CI i.e., B = z=2 p : n 36 This results in the value 2 B2 To use this formula to get a value for n, the researcher needs to specify values for ; ; and B . For , researchers usually use a value from a previous study of similar data. n = (z=2 )2 Example: (Exercise 7.61) The USGA tests all new brands of golf balls to assure that they meet USGA specs. One test is to have the balls hit by \Iron Byron". Suppose the USGA wants to estimate the mean distance for a new brand to within 1 yard with a 90% condence. Past tests indicate that standard deviation of distances hit is approximately 10 yards. How many balls need to be hit to achieve the desired accuracy? For a 90% condence interval we need to use z0:05 = 1:645. We are given that = 10 yards and B = 1 yard. Thus: 2 n = (z=2 )2 2 B (10)2 = (1:645)2 1 = 270:6025 Thus the needed sample size is (rounding up the above answer) at least 271 balls. 37