Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
All rights reserved, © Golde I. Holtzman Confidence Intervals, 6-Step Method General Form of the 6-Step Method for Confidence Interval Estimation 1. Model: a. Verbally identify the underlying random variable of interest (characteristic and population). b. Verbally identify the underlying parameter of interest. c. State the assumptions being made regarding the underlying distribution. 2. Hypotheses: None 3. Formulate Confidence limits and cite a reference: a. State the formula that is appropriate for the parameter under the assumptions, and b. cite a reference, e.g., Zar (1995) or Baldi and Moore (2009). 4. Design: a. Choose the confidence coefficient, 1−α, which ultimately determines our confidence in the accuracy of the estimate, and partially determines the width (or length) of the interval, and, thereby, partially determines the precision, of the estimate. b. Choose the sample size, n, which partially determines the width (or length) of the interval, and, thereby, partially determines the precision, of the estimate. 5. Gather the data and: a. Compute the best point estimate of the parameter of interest. b. Compute the standard error of the estimate. Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 1 ci6step.docx, 7/18/2012 c. Determine the percentile or critical value of the appropriate distribution. 1 d. Compute the lower (L) and upper (U)confidence limits. 6. State the conclusion verbally: a. Methods (or Statistical Methods). Confidence intervals are calculated assuming that [1-c, verbally] ([cite reference as in 3-b]). b. Results. "I am ____ % confident that __________________ is between ____ and ____." "I am [confidence coefficient] % confident that [parameter of interest, verbally] is between [L] and [U]." Example 1, estimation of population mean when population SD known Reconsider the 25 Ash trees of Region 1, Monongahela National Forest. Based on that sample, estimate the mean DBH of all ash trees in Region 1, with 95% confidence. Assume that DBH is normally distributed, and (unrealistically) that the standard deviation of ash DBH is known to be 10.0 cm. Use the 6-step method. 1. Model: a. The underlying r.v. of interest (Yi) is the DBH (cm) of the ith randomly sampled ash tree from Region 1, i = 1, 2, …,n. b. The parameter of interest is the population mean, μ, the mean DBH (cm) of all ash trees in Region 1. c. Assume i. Population SD,σ = 10.0 cm. (i.e., we are assuming population SD is known). ii. 1 Yi distributed Normally. Some authors (e.g., Daniel, 1987) call this number the reliability coefficient. Others: (e.g., Koopmans, Zar) call it a critical value. Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 2 ci6step.docx, 7/18/2012 2. Hypotheses: None 3. CI:Formulate confidence limits a. (sample mean) ± (margin of error) (sample mean) ± [(Z)(Pop SE)] σ Y ± z1− α n 2 (quantile notation) or Y ± z* σ n (Baldi and Moore simplified notation) b. See Zar (2010), or Baldi and Moore (2009) 4. Design: a. Confidence coefficient = (1 −α) = 0.95. b. Sample size = n = 25. 5. Gather the data: (Recall that for the ash trees of Region 1, the sample mean was 15.5 cm.) a. The best estimate of the pop mean = the sample mean = Y = 15.5 cm. b. Pop SE = (Pop SD) n =σ n = 10.0 25 = 10 / 5 = 2.00. c. Z0.975= Z*= 1.96. [Explanation: Since the parameter of interest is the population mean, and since the pop SD is (assumed to be) known, and since the random variable of interest is distributed Normally (also by assumption), it is appropriate to use a percentile of the standard normal distribution, i.e., the appropriate percentile Z1 − (α/2). In Step 4, the design step, we specified the confidence level (1 − α) = 0.95, therefore α = 0.05, therefore α/2= 0.025, therefore (1 −α/2) = 0.975. Hence, the required percentile is the 97.5th percentile of the standard normal distribution, Z0.975 = 1.96.] Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 3 ci6step.docx, 7/18/2012 d. Confidence limits: = estimator ± (margin of error) = estimator ± [(percentile)(standard error)] = (sample mean) ± [z1 –(α/2)(Pop SE)] = Y ± z1−(α 2) σ n = 15.5 ± [(1.96)(2.00)] = 15.5 ± 3.92 = (11.6, 19.4) 6. State the conclusion verbally: a. Methods. Confidence intervals are calculated assuming that (i) the population standard deviation is 10.0 cm and (ii) the diameter at breast height of all trees in Region 1 is normally distributed (Sall et al., 2001, pp. 106-108). b. Results. I am 95% confident that the mean DBH of all ash trees in Region 1 is between 11.6 and 19.4 cm. Example 2, estimation of population mean when population SD known Estimate the mean DBH of all ash trees in Region 1, with 90% confidence. Assume that DBH is normally distributed, and (unrealistically) that the standard deviation of ash DBH is known to be 10 cm. Use the 6-step method. 1. Model: Same as earlier example. 2. Hypotheses: None 3. Formulate CL: a. Y ± z1− α 2 σ n or Y ± z* σ n b. See Zar (1995) or Baldi and Moore (2009, 2012) 4. Design: a. Confidence coefficient = (1 - α) = 0.90 b. n = 25. Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 4 ci6step.docx, 7/18/2012 5. Gather the data: a. Sample mean = 15.5 cm. b. Pop SE = ( pop SD ) n = 10.0 25 = 10.0 / 5 = 2.00. c. z1 –(α/2) = z0.95 = 1.645 (from bottom line of Student's t distribution, i.e., a standard normal z = t∞, a Student’s t with infinite degrees of freedom) d. Confidence limits: = 15.5 ± (1.645)(2.00) = 15.5 ± 3.29 = (12.21, 18.79) 6. Conclusion. a. Methods. Confidence intervals are calculated assuming that (i) the population standard deviation is 10.0 cm and (ii) the diameter at breast height of all trees in Region 1 is normally distributed (Sall et al., 2001, pp. 106-108). b. Results. I am 90% confident that the mean DBH of all ash trees in Region 1 is between 12.2 and 18.8 cm. Notes on Examples 1 and 2 • The standard deviation used in Examples 1 and 2 was the population standard deviation (usually denoted by the Greek letter σ) rather than the sample standard deviation (usually denoted by the English letter s). o That the population standard deviation is known to be σ = 10.0 is an assumption. It is something the investigator assumed to be true before he performed the study, before he obtained the sample, and before he calculated the sample standard deviation. o That the standard deviation we used was assumed to be the population standard deviation is what caused us to use the standard normal critical values (z =1.96 for 95% confidence and z = 1.645 for 90% confidence). If we had not used that assumed population standard deviation and instead had used the sample standard deviation, then we would have used critical values from a different distribution, Student’s T Distribution, rather than Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 5 ci6step.docx, 7/18/2012 from the Standard Normal Z Distribution. Use of the T Distribution will be illustrated in the next example. • There is a tradeoff between confidence in accuracy and precision. The only difference between Examples 1 and 2 is the confidence limits. Confidence Level Critical Value Confidence Limits Length Margin of Error (MoE) Tradeoff Example 1 Example 2 (1 − α ) = 0.95 (1 − α ) = 0.90 z = 1.96 z = 1.645 (11.6, 19.4) (12.2, 18.8) 19.4 – 11.6 = 7.8 18.8 – 12.2 = 6.6 7.8/2 = 3.9 6.6/2 = 3.3 More confidence in Less confidence in accuracy(95%), accuracy (90%), Less precision More precision (MoE = 3.9) (MoE = 3.3) Example 3, estimation of population mean when population SD unknown Reconsider the 25 Ash trees of Region 1, Monongahela National Forest. Based on that sample, estimate the mean DBH of all ash trees in Region 1, with 95% confidence. As in Example 1 above, assume that DBH is normally distributed, but unlike in Example 1, drop the (unrealistically) that the standard deviation of ash DBH is known to be 10.0 cm. Use the 6-step method. In other words: Reconsider the 25 Ash trees of Region 1, Monongahela National Forest. Based on that sample, estimate the mean DBH of all ash trees in Region 1, with 95% confidence. Assume that DBH is normally distributed. Use the 6-step method. Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 6 ci6step.docx, 7/18/2012 1. Model a. Let Xi represent the DBH (cm) of the ith randomly sampled ash tree from Region 1,i= 1, 2, …, n. b. Let μ represent the mean DBH (cm) of all ash trees in Region 1. c. Assume Xi distributed Normally. 2. Hypotheses: None 3. Formulate confidence limits: ( sample mean ) ± ( Margin of Error ) = ( sample mean ) ± ( t )( estimated SE ) = Y ± tn −1,1− α 2 s n (quantile notation) or Y ± t* s n (Baldi and Moore simplified notation) where t * = tn −1,1− a 2 is the (1 − α2 ) quantile of Student's t distribution with (n − 1) degrees of freedom, (Zar 1995, p. 100, Baldi and Moore 2009, Chapter 17). 4. Design: a. Confidence coefficient = (1 − α) = 0.95. b. n = 25. 5. Gather the data and compute: a. Sample mean = 15.5 cm b. Estimated SE = s n = 8.6 25 = 8.6 5 = 1.72. c. (Since the parameter of interest is the population mean, and since the population standard deviation is not (assumed to be) known, and since the underlying random variable of interest is distributed normally (also by assumption), it is appropriate to use a percentile of Student's t distribution, i.e., the appropriate percentile is) tn −1,1− α =t24,0.975 =2.064 . 2 Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 7 ci6step.docx, 7/18/2012 d. Confidence limits: estimator ± MOE =estimator ± [(percentile)(standard error)] = 15.5 ± [(2.064)(1.72)] = 15.5 ± 3.55 = (12.0, 19.1) 6. Conclusion Methods. Confidenceintervals are computed assuming that the DBH of all ash trees in Region 1 is normally distributed (Zar 1995, p. 100). Results. I am 95% confident that the mean DBH of all ash trees in Region 1 is between 12.0 and 19.1 cm. Example 3-b, estimation of population mean when population SD unknown Yet again, reconsider the 25 Ash trees of Region 1, Monongahela National Forest. Based on that sample, estimate the mean DBH of all ash trees in Region 1, with 90% (rather than 95%) confidence. Assume that DBH is normally distributed. Use the 6-step method. 1. Model a. Let Xi represent the DBH (cm) of the ith randomly sampled ash tree from Region 1,i= 1, 2, …, n. b. Let μ represent the mean DBH (cm) of all ash trees in Region 1. c. Assume Xi distributed Normally. 2. Hypotheses: None 3. Formulate confidence limits: Y ± t* s n (Baldi and Moore simplified notation) where t * = tn −1,1− a 2 is the (1 − α2 ) quantile of Student's t distribution with (n − 1) degrees of freedom, (Zar 1995, p. 100, Baldi and Moore 2009, Chapter 17). Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 8 ci6step.docx, 7/18/2012 4. Design: c. Confidence coefficient = (1 − α) = 0.90. d. n = 25. 5. Gather the data and compute: e. Sample mean = 15.5 cm f. Estimated SE = s n = 8.6 25 = 8.6 5 = 1.72. g. Percentile = t * = tn −1,1− α = t24,0.95 = 1.711 . 2 h. Confidence limits: estimator ± MOE =estimator ± [(percentile)(standard error)] = X ± t* s n = 15.5 ± [(1.711)(1.72)] = 15.5 ± 2.94 = (12.56, 18.44) 6. Conclusion Methods. Confidenceintervals are computed assuming that the DBH of all ash trees in Region 1 is normally distributed (Zar 1995, p. 100, Baldi & Moore 2009, 2012, Chapter 14). Results. I am 90% confident that the mean DBH of all ash trees in Region 1 is between 12.6 and 18.4 cm. Practice Quiz 1. Does increasing the confidence coefficient increase, decrease, or not affect the width of the confidence interval? 2. A confidence interval is said to be accurate if it contains the true value of the parameter of interest. Consider 90% and 95% confidence intervals for the same data. Copyright © Golde I. Holtzman 2002, 2007, 2010. 9 ci6step.docx, 7/18/2012 All rights reserved. a. Is it possible for the 90% CI to be accurate if the 95% CI is not? b. Is it possible for the 95% CI to be valid if the 90% CI is not? 3. Does increasing the confidence coefficient increase, decrease, or not affect our confidence in the accuracy of the confidence interval. 4. The more narrow a confidence interval is, the more precise it is said to be. Does increasing the confidence coefficient increase, decrease, or not affect the precision of the estimate? 5. Accuracy versus precision. a. Which is more important, accuracy or precision? b. Then which should be determined first, the confidence coefficient or the sample size? 6. Design. a. In general, does a larger sample size yield more, or less, information about the value of the parameter of interest? b. Does a larger sample size yield a larger, or smaller, standard error. c. For a fixed confidence coefficient, e.g. 95%, does a smaller standard error yield a wider, or narrower, confidence interval? d. Is a narrower confidence interval considered more, or less, precise? e. In general, for a fixed confidence coefficient, would a larger sample yield a more, or less, precise interval estimate? 7. How do we know whether to use Student's t distribution rather than the standard normal (z) distribution to find the required percentile? 8. Does use of the t distribution always require assuming that the underlying distribution is normal? 9. Explain the difference between the widths of the confidence for the population mean when the population standard deviation is known, versus when the population standard deviation is unknown, as in Example 1 versus Example 3. Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 10 ci6step.docx, 7/18/2012 Example 4-a, estimation of population proportion With 95% confidence, estimate the proportion of trees in Region 1 of the MNF that are ash. 1. Model a. [The r.v. of interest is a categorical variable.] Let Xidenote the genus (ash, birch, spruce, …) or the ith randomly sampled tree from Region 1,i= 1, 2, . . . , n. b. [The parameter of interest is the population proportion, φ.] Let φ represent the proportion of all trees in Region 1 that are ash trees. c. Assumptions about the underlying distribution: None [When estimating the population proportion of a categorical variable, it is never necessary to make assumptions about the underlying categorical population distribution, as long as there is a simple random sample.] 2. Hypotheses: None 3. Formulate Confidence Limits: ( sample proportion ) ± ( Margin of error ) ( = p ± z1− α 2 ( p )(1 − p ) ) n , where z* = z1−α 2 is the 1 − α 2 percentile of the standard normal Z distribution, and where p is the sample proportion. See Snedecor and Cochran 1967, pp. 210211; Baldi and Moore 2009, Chapter 19, “Large-sample confidence intervals for a proportion”). 4. Design: a. Confidence coefficient = (1 − α) = 0.95. b. n = 75. 5. Gather the data and compute: a. (Sample proportion) = 25/75 = 1/3 b. Estimated SE = (1 3)( 2 3) Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 75 = 0.0029630 = 0.054433 11 ci6step.docx, 7/18/2012 c. z* = z1− α = z0.975 = 1.96 (from bottom line of Student's t distribution, i.e., a 2 standard normal z = t∞, a Student’s t with infinite degrees of freedom) d. Confidence limits: estimator ± MOE estimator ± [(percentile)(standard error)] = 1/3 ± [(1.96)(0.054433)] = .33333 ± 0.10669 = (0.227, 0.440) 6. Conclusion Methods. Confidence intervals are computed by the method of Snedecor and Cochran 1967, pp. 210-211. Results. I am 95% confident that the proportion of all trees in Region 1 that are ash is between 22.7% and 44.0%. Example 4b, estimation of population proportion, BAREBONES It is not necessary to include all of the explanation once you know what you’re doing. Here’s the brief version. With 95% confidence, estimate the proportion of trees in Region 1 of the MNF that are ash. 1. Model a. Let Xidenote the genus (ash, birch, spruce, …) or the ith randomly sampled tree from Region 1,i= 1, 2, . . . , n. b. Let φ represent the proportion of all trees in Region 1 that are ash trees. c. Assumptions about the underlying distribution: None 2. Hypotheses: None 3. Formulate Confidence Limits: ( ( p )(1 − p ) CL = p ± z * or ( CL = p ± z1− α 2 ( p )(1 − p ) Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. ) n , ) n , [ Baldi and Moore notation ] [quantile notation ] 12 ci6step.docx, 7/18/2012 See Snedecor and Cochran 1967, pp. 210-211; Baldi and Moore 2009, Chapter 19, “Large-sample confidence intervals for a proportion”). 4. Design: a. (1 − α) = 0.95. b. n = 75. 5. Gather the data and compute: a. p = 25/75 = 1/3 b. SE = c. (1 3)( 2 3) 75 = 0.0029630 = 0.054433 z* = z1− α = z0.975 = z = 1.96 2 d. CL= 1/3 ± [(1.96)(0.054433)] = 0.33333 ± 0.10669 = (0.227, 0.440) 6. Conclusion Methods. Confidence intervals are computed by the method of Snedecor and Cochran 1967, pp. 210-211. Results. I am 95% confident that the proportion of all trees in Region 1 that are ash is between 22.7% and 44.0%. References Cited Snedecor, G.W., and W.G. Cochran (1967). Statistical Methods, 6th ed. IowaStateUniversity Press, Ames, Iowa. Baldi, Brigette, and Moore (2009), The Practice of Statistics in the Life Sciences, W.H. Freeman and Company, New York. Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 13 ci6step.docx, 7/18/2012 Example 5, estimation of population proportion, BARE-BONES This is the 6-Step Method applied to Example 19.3 of Baldi and Moore (2009). The National AIDS Behavioral Surveys found that 170 of a sample of 2673 adult heterosexuals had multiple partners. What can we say about the population of all adult heterosexuals? Estimate, with 99% confidence, the proportion of all adult heterosexuals who have multiple partners. 1. Model a. Xi= whether or not the ith randomly sampled adult heterosexual had multiple partners (multiple, not multiple),i= 1, 2, . . . , n. b. φ= the proportion of all adult heterosexuals that who have multiple partners. c. Assumptions about the underlying distribution: None 2. Hypotheses: None 3. Formulate Confidence Limits: ( ( p )(1 − p ) CL = p ± z * or ( CL = p ± z1− α 2 ) n , ( p )(1 − p ) ) n , [ Baldi and Moore notation ] [quantile notation ] where p denotes the sample proportion. See Snedecor and Cochran 1967, pp. 210211; Baldi and Moore 2009, Chapter 19, “Large-sample confidence intervals for a proportion”). 4. Design: a. (1 − α) = 0.99 b. n = 2673 5. Gather the data and compute: a. p = 170 / 2673 = 0.06360 b. SE = 0.06360 (1 − 0.06360 ) 2673 = 0.00002280 = 0.004720 c. z1− α = z0.995 = z = 2.576 2 Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 14 ci6step.docx, 7/18/2012 d. CL= 0.06360 ± [(2.576)(0.004720)] = 0.06360± 0.01216 = (0.0514, 0.0758) 6. Conclusion Methods. Confidence intervals are computed by the large-sample method of Baldi and Moore, 2009. Results. I am 99% confident that the proportion of all adult heterosexuals who have multiple partnersis between 5.14% and 7.58%. Example 6, estimation of population variance and standard deviation With 95% confidence, estimate the variance and standard deviation of the DBH (cm) of all ash trees in Region 1 of the MNF. 1. Model a. The r.v. of interest is Xi, the DBH of of the ith randomly sampled ash tree from Region 1 of the MNF, i = 1, 2, . . . , n. b. The parameter of interest is the population variance, the variance of all ash trees in Region 1 of the MNF. c. Assume Xi distributed normal 2. Hypotheses: None 3. Formulation of confidence limits: ( lower limit ) = ( n − 1) s 2 χ n2−1,1− ( upper limit ) = ( n − 1) s 2 χ n2−1, α 2 α 2 where χ n2−1,1− α is the (1−α/2)quantile of the chi-squared distribution with (n – 1) 2 degrees of freedom, and where χ n2−1, α is the (α/2) percentile of the chi-squared 2 distribution with(n−1) degrees of freedom. Percentiles of the Chi-Squared Distribution are posted. See Zar 1995, pp. 110-111. Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 15 ci6step.docx, 7/18/2012 4. Design: a. Confidence coefficient = (1 −α) = 0.95. b. n = 25. 5. Gather the data and compute: a. Sample variance = s2 = (8.6)2 = 73.96 2 = 39.364, b. χ n2−1,1− α = χ 24,0.975 2 2 χ n2−1, α = χ 24,0.025 = 12.401, 2 c. Confidence limits: Lower limit = (24)(73.96) / (39.354) = 45.104, Upper limit = (24)(73.96) / (12.401) = 143.14 6. Conclusion Methods. Confidence intervals are estimated assuming that the DBH (cm) of all trees ins Region 1 are normally distributed (Zar 1995, pp 110-111). Results. I am 95% confident that the variance of all trees in Region 1 of the MNF is between 45.104 and 143.14. Or: I am 95% confident that the standard deviation of all trees in Region 1 of the MNF is between 6.72 and 12.0 cm. Department of Statistics Send Suggestions or Comments to Golde Holtzman Last updated: 7/18/2012 URL: ../STAT5605/ci6step.pdf Copyright © Golde I. Holtzman 2002, 2007, 2010. All rights reserved. 16 ci6step.docx, 7/18/2012