Download Ch7 - Statistics

7 Statistical Intervals Based on a Single Sample 7.1 Basic Properties of Confidence Intervals Basic Properties of Confidence Intervals The basic concepts and properties of confidence intervals (CIs) are most easily introduced by first focusing on a simple and problem situation. Suppose that the parameter of interest is a population mean  and that 1. The population distribution is normal 2. The value of the population standard deviation  is known 3 Basic Properties of Confidence Intervals Irrespective of the sample size n, the sample mean X is normally distributed with expected value  and standard deviation Standardizing X by first subtracting its expected value and then dividing by its standard deviation yields the standard normal variable (7.1) 4 Basic Properties of Confidence Intervals Because the area under the standard normal curve between –1.96 and 1.96 is .95, (7.2) The equivalence of each set of inequalities to the original set implies that (7.3) 5 Basic Properties of Confidence Intervals To interpret (7.3), think of a random interval having left endpoint X – 1.96  and right endpoint X + 1.96  In interval notation, this becomes (7.4) 6 Basic Properties of Confidence Intervals This CI can be expressed either as or as A concise expression for the interval is x  1.96  , where – gives the left endpoint (lower limit) and + gives the right endpoint (upper limit). 7 Interpreting a Confidence Interval With 95% confidence, we can say that µ should be within roughly  n 1.96 standard deviations (1.96/√n) from our sample  mean x. x  • In 95% of all possible samples of this size n, µ will indeed fall  in our confidence interval. • In only 5% of samples would be farther from µ. 8 Example 2 The quantities needed for computation of the 95% CI for true average preferred height are  = 2.0, n = 31, and x = 80.0. The resulting interval is That is, we can be highly confident, at the 95% confidence level, that 79.3 <  < 80.7. This interval is relatively narrow, indicating that  has been rather precisely estimated. 9 Other Levels of Confidence As Figure 7.4 shows, a probability of 1 –  is achieved by using z/2 in place of 1.96. P(–z/2  Z < z/2) = 1 –  Figure 7.4 10 Other Levels of Confidence Definition A 100(1 – )% confidence interval for the mean  of a normal population when the value of  is known is given by (7.5) or, equivalently, by The formula (7.5) for the CI can also be expressed in words as point estimate of   (z critical value) (standard error of the mean). 11 Example 3 The production process for engine control housing units of a particular type has recently been modified. Prior to this modification, historical data had suggested that the distribution of hole diameters for bushings on the housings was normal with a standard deviation of .100 mm. It is believed that the modification has not affected the shape of the distribution or the standard deviation, but that the value of the mean diameter may have changed. A sample of 40 housing units is selected and hole diameter is determined for each one, resulting in a sample mean diameter of 5.426 mm. 12 Example 3 cont’d Let’s calculate a confidence interval for true average hole diameter using a confidence level of 90%. This requires that 100(1 – ) = 90, from which  = .10 and z/2 = z.05 = 1.645 (corresponding to a cumulative z-curve area of .9500). The desired interval is then With a reasonably high degree of confidence, we can say that 5.400 <  < 5.452. This interval is rather narrow because of the small amount of variability in hole diameter ( = .100). 13 Properties of Confidence Intervals  User chooses the confidence interval  We want High confidence Small confidence interval  The confidence interval gets narrower when z gets smaller σ is smaller n is larger 14 Confidence Level and Sample Size A general formula for the sample size n necessary to ensure an interval width w is obtained from equating w to 2  z/2  and solving for n. The sample size necessary for the CI (7.5) to have a width w is The smaller the desired width w, the larger n must be. In addition, n is an increasing function of  (more population variability necessitates a larger sample size) and of the confidence level 100(1 – ) (as  decreases, z/2 increases). 15 Example 4 Extensive monitoring of a computer time-sharing system has suggested that response time to a particular editing command is normally distributed with standard deviation 25 millisec. A new operating system has been installed, and we wish to estimate the true average response time  for the new environment. Assuming that response times are still normally distributed with  = 25, what sample size is necessary to ensure that the resulting 95% CI has a width of (at most) 10? 16 Example 4 cont’d The sample size n must satisfy Rearranging this equation gives = 2  (1.96)(25)/10 = 9.80 So n = (9.80)2 = 96.04 Since n must be an integer, a sample size of 97 is required. 17 7.2 Large-Sample Confidence Intervals for a Population Mean and Proportion 18 A Large-Sample Interval for  In Ch7.1, we have come across the CI for  which assumed that 1.The population distribution is normal 2.The value of  is known In Ch7.2, we now present a large-sample CI whose validity does not require these assumptions. 19 A Large-Sample Interval for  Let X1, X2, . . . , Xn be a random sample from a population having a mean  and standard deviation . Provided that n is large, the Central Limit Theorem (CLT) implies that has approximately a normal distribution whatever the nature of the population distribution. It then follows that has approximately a standard normal distribution, so that 20 A Large-Sample Interval for  Proposition If n is sufficiently large, the standardized variable has approximately a standard normal distribution. This implies that (7.8) is a large-sample confidence interval for  with confidence level approximately 100(1 – )%. This formula is valid regardless of the shape of the population distribution. 21 A Large-Sample Interval for  Generally speaking, n > 40 will be sufficient to justify the use of this interval. This is somewhat more conservative than the rule of thumb for the CLT because of the additional variability introduced by using S in place of . 22 Example 6 Haven’t you always wanted to own a Porsche? The author thought maybe he could afford a Boxster, the cheapest model. So he went to www.cars.com on Nov. 18, 2009, and found a total of 1113 such cars listed. Asking prices ranged from $3499 to $130,000 (the latter price was one of only two exceeding $70,000). The prices depressed him, so he focused instead on odometer readings (miles). 23 Example 6 cont’d Here are reported readings for a sample of 50 of these Boxsters: 24 Example 6 cont’d A boxplot of the data (Figure 7.5) shows that, except for the two outliers at the upper end, the distribution of values is reasonably symmetric (in fact, a normal probability plot exhibits a reasonably linear pattern, though the points corresponding to the two smallest and two largest observations are somewhat removed from a line fit through the remaining points). A boxplot of the odometer reading data from Example 6 Figure 7.5 25 Example 6 cont’d Summary quantities include n = 50, = 45,679.4, = 45,013.5, s = 26,641.675, fs = 34,265. The mean and median are reasonably close (if the two largest values were each reduced by 30,000, the mean would fall to 44,479.4, while the median would be unaffected). The boxplot and the magnitudes of s and fs relative to the mean and median both indicate a substantial amount of variability. 26 Example 6 cont’d A confidence level of about 95% requires z.025 = 1.96, and the interval is 45,679.4  (1.96) = 45,679.4  7384.7 = (38,294.7, 53,064.1) That is, 38,294.7 <  < 53,064.1 with 95% confidence. This interval is rather wide because a sample size of 50, even though large by our rule of thumb, is not large enough to overcome the substantial variability in the sample. We do not have a very precise estimate of the population mean odometer reading. 27 One-Sided Confidence Intervals (Confidence Bounds) Starting with P(–1.645 < Z)  .95 and manipulating the inequality results in the upper confidence bound. A similar argument gives a one-sided bound associated with any other confidence level. Proposition A large-sample upper confidence bound for  is and a large-sample lower confidence bound for  is 28 7.3 Intervals Based on a Normal Population Distribution 29 Intervals Based on a Normal Population Distribution The CI for  presented in 7.2 is valid when n is large. The resulting interval can be used whatever the nature of the population distribution. The CLT cannot be invoked, however, when n is small. 30 Intervals Based on a Normal Population Distribution The result on which inferences are based introduces a new family of probability distributions called t distributions. Theorem When is the mean of a random sample of size n from a normal distribution with mean , the rv (7.13) has a probability distribution called a t distribution with n – 1 degrees of freedom (df). 31 Properties of t Distributions Properties of t Distributions Let tn denote the t distribution with n df. 1. Each tn curve is bell-shaped and centered at 0. 2. Each tn curve is more spread out than the standard normal (z) curve. 3. As n increases, the spread of the corresponding tn curve decreases. 4. As n , the sequence of tn curves approaches the standard normal curve (so the z curve is often called the t curve with df = ). 32 Properties of t Distributions Figure 7.7 illustrates several of these properties for selected values of n. tn and z curves Figure 7.7 33 Properties of t Distributions Notation Let t,n = the number on the measurement axis for which the area under the t curve with n df to the right of t,n is ; t,n is called a t critical value. For example, t.05,6 is the t critical value that captures an upper-tail area of .05 under the t curve with 6 df. The general notation is illustrated in Figure 7.8. Illustration of a t critical value Figure 7.8 34 The One-Sample t Confidence Interval The standardized variable T has a t distribution with n – 1 df, and the area under the corresponding t density curve between –t/2,n – 1 and t/2,n – 1 is 1 –  (area /2 lies in each tail), so P(–t/2,n – 1 < T < t/2,n – 1) = 1 –  (7.14) Expression (7.14) differs from expressions in previous sections in that T and t/2,n – 1 are used in place of Z and but it can be manipulated in the same manner to obtain a confidence interval for . 35 The One-Sample t Confidence Interval Proposition Let and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean . Then a 100(1 – )% confidence interval for  is (7.15) or, more compactly 36 The One-Sample t Confidence Interval An upper confidence bound for  is and replacing + by – in this latter expression gives a lower confidence bound for , both with confidence level 100(1 – )%. 37 Example 11 Even as traditional markets for sweetgum lumber have declined, large section solid timbers traditionally used for construction bridges and mats have become increasingly scarce. The article “Development of Novel Industrial Laminated Planks from Sweetgum Lumber” (J. of Bridge Engr., 2008: 64–66) described the manufacturing and testing of composite beams designed to add value to low-grade sweetgum lumber. 38 Example 11 cont’d Here is data on the modulus of rupture (psi; the article contained summary data expressed in MPa): 6807.99 6981.46 6906.04 7295.54 7422.69 7637.06 7569.75 6617.17 6702.76 7886.87 6663.28 7437.88 6984.12 7440.17 6316.67 6165.03 6872.39 7093.71 8053.26 7713.65 6991.41 7663.18 7659.50 8284.75 7503.33 6992.23 6032.28 7378.61 7347.95 7674.99 39 Example 11 cont’d Figure 7.9 shows a normal probability plot from the R software. A normal probability plot of the modulus of rupture data Figure 7.9 40 Example 11 cont’d The straightness of the pattern in the plot provides strong support for assuming that the population distribution of MOR is at least approximately normal. The sample mean and sample standard deviation are 7203.191 and 543.5400, respectively (for anyone bent on doing hand calculation, the computational burden is eased a bit by subtracting 6000 from each x value to obtain yi = xi – 6000; then from which = 1203.191 and sy = sx as given). 41 Example 11 cont’d Let’s now calculate a confidence interval for true average MOR using a confidence level of 95%. The CI is based on n – 1 = 29 degrees of freedom, so the necessary t critical value is t.025,29 = 2.045. The interval estimate is now We estimate 7000.253 <  < 7406.129 that with 95% confidence. 42 Example 11 cont’d If we use the same formula on sample after sample, in the long run 95% of the calculated intervals will contain . Since the value of  is not available, we don’t know whether the calculated interval is one of the “good” 95% or the “bad” 5%. Even with the moderately large sample size, our interval is rather wide. This is a consequence of the substantial amount of sample variability in MOR values. A lower 95% confidence bound would result from retaining only the lower confidence limit (the one with –) and replacing 2.045 with t.05,29 = 1.699. 43 Intervals Based on Nonnormal Population Distributions The one-sample t CI for  is robust to small or even moderate departures from normality unless n is quite small. By this we mean that if a critical value for 95% confidence, for example, is used in calculating the interval, the actual confidence level will be reasonably close to the nominal 95% level. If, however, n is small and the population distribution is highly nonnormal, then the actual confidence level may be considerably different from the one you think you are using when you obtain a particular critical value from the t table. 44 Intervals Based on Nonnormal Population Distributions It would certainly be distressing to believe that your confidence level is about 95% when in fact it was really more like 88%! The bootstrap technique, has been found to be quite successful at estimating parameters in a wide variety of nonnormal situations. In contrast to the confidence interval, the validity of the prediction and tolerance intervals described in this section is closely tied to the normality assumption. 45 Intervals Based on Nonnormal Population Distributions These latter intervals should not be used in the absence of compelling evidence for normality. The excellent reference Statistical Intervals, cited in the bibliography at the end of this chapter, discusses alternative procedures of this sort for various other situations. 46

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Ch7 - Statistics