* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 7 - Confidence Intervals
Survey
Document related concepts
Transcript
Chapter 7 Confidence Intervals J.C. Wang Goal and Objectives Goal: to learn confidence intervals Objectives: I To understand that each interval has two end-points (lower and upper bound) and Interpret the confidence interval I To compute the confidence interval: a point estimate ± the margin of error I To determine the sample size Outline Introduction Applications, Definations and Notation Confidence Intervals Computation z-Confidence Intervals t-Confidence Intervals Sample Size Determination An Example Comparing Two Populations Comparing Means of Two Independent Populations Comparing Means of Two Dependent Groups Confidence Interval for Proportion Confidence Interval for Population Proportion Sample Size Determination Applications of Estimation in Business examples I Store inventory value I Manufacture process I Distribution process I Drug delivery I Auditor Definitions I Sample statistic: a value computed from the sample (i.e., from data). I Point estimate (pt.est): a single sample statistic that estimates the population parameter such as the mean or proportion. I Interval estimate of the true population parameter takes into account the sampling distribution of the point estimate where we have an upper bound and a lower bound. Notations to be discussed and used later I CI — confidence interval I CVal — critical value I ME — margin of error I SE — standard error I SD — standard deviation I pt.est. — point estimate I Zα/2 — normal distribution critical value (use invnorm) I tn−1 — students t distribution critical value with n − 1 degrees of freedom (use math solver or the invT) Computation of confidence intervals I pt.est ± ME I Where the point estimate estimates population mean µ (by x) or population proportion p (by p̂) I marginOfError = criticalValue × standardError I In other words, ME = (CVal)(SE). iClicker Question 7.1 pre-lecture iClicker Question 7.1 pre-lecture Standard Error I Most of the time we will not have the SD of population mean, but we can compute sample SE of the mean: s SEx = √ n I Also, we will not have the SD of population proportion, but we can compute sample proportion SE: r p̂(1 − p̂) SEp̂ = n Critical Value I z for normal distribution I t for students t-distribution I The students t-distribution has n − 1 degrees of freedom, df = n − 1 z-Critical Value I Notation: zα/2 = upper (100 × α/2)th standard normal percentile I That is: P(Z > zα/2 ) = α/2 ⇐⇒ P(Z ≤ zα/2 ) = 1 − α/2 So, zα/2 = invNorm(1 − α/2) I Example 95% confidence interval will give 2.5% in each tail of the bell-shaped curve; therefore, the z-CVal, zcv = z.025 = invNorm(1−.025) = invNorm(.975) = 1.96. z-Critical Value continued area to the left of zα is 1 − 2 α 2 1−α α α 2 2 zα 2 t-Critical Value using TI calculators 1. math −→ solver −→ tcdf(L, U, D) − A/T , where I I I I I L = tcv (to be solved) U = 9999 (i.e., U = ∞) D = df = n − 1 A = α (error rate) T = number of tails = 2 for c.i. 2. or use invT(1 − α/2, df ) Cereal Box Packaging Example Consider a cereal packaging plant in Battle Creek that is concerned with putting 368 gram of cereal into a box. I What are the costs associated with putting too much cereal in a box? I What are the costs associated with putting too little cereal in a box? I Let’s construct a 95% confidence interval. Cereal Box Packaging Example continued I Suppose sample size n = 25 I Suppose sample average x = 365 grams I Suppose SD is a process SD; therefore, σ = 15 grams I Suppose we want a 95% confidence interval I Therefore, the critical value is zcv = 1.96 Cereal Box Packaging Example continued, margin of error I Recall ME = CVal × SE I The critical value (CVal) for 95% CI means that the area under the curve of one tail is 5% ÷ 2 or 0.025; therefore, zcv = invNorm(1 − .025) = invNorm(.975) = 1.96 I I σ 15 SE = √ = √ = 3 n 25 ME = 1.96 × 3 = 5.88 Cereal Box Packaging Example continued, confidence interval I Since the confidence interval is the pt.est ± ME I CI = 365 ± 5.88 = (359.12, 370.88). I Therefore, we are 95% confident that the population mean is between 359 and 371. I Since 368, the value that is printed on the box indicates the manufacturing process is working properly (is within the interval), there is no reason to conclude that anything is wrong with the process. z-Confidence Interval Using TI Calculators example Let’s use TI calculator: I Do this: STAT → TESTS → Zinterval → STATS ↓ σ:15 ↓ x:365 ↓ n:25 ↓ C-Level:.95 ↓ CALCULATE I READOUT: Zinterval (359.12, 370.88) x = 365 n = 25 Since 368, the target of the package, is within the interval; production should continue. Note on z-Confidence Intervals I The value of z selected for constructing such a confidence interval is called the critical value for the distribution. I There are different critical values for each level of confidence (or confidence level, CL), 1 − α, where α = significance level, SL (or error rate). I Frequently Used zcv : SL 10% 5% 1% CL 90% 95% 99% 2-tailed CVal 1.645 1.96 2.58 Note: There is a trade off between the width of the confidence interval and the level of confidence. Problem When SD is Unknown We have been dealing with N(µ, σ) where σ (population or process SD) is known. What happens when standard deviation (σ) is not from a population or process SD? Is this requirement rigid? Can we compute standard deviation from the sample? Let us review some history first. History of the Student t Distribution William Gosset, an employee of Guinness Breweries in Ireland, had a preoccupation with making statistical inferences about the mean when SD was unknown. Since the employees of the company were not allowed to publish their scientific work under their own name. He chose the pseudonym “Student.” Therefore, his contribution is still known as Student’s t-Distribution. Comparing Standard Normal Curve with t curves Comparison of Standard Normal with t Curves 0.4 N(0,1) t1 t5 t10 density 0.3 0.2 0.1 0.0 −3 −2 −1 0 1 x 2 3 4 t-Confidence Interval for the Mean summer II quiz example Construct a 95% CI for the mean score for Summer II Quiz Data of 14 students Given: 95% CL, x = 25, s = 10.777, n = 14, 10.777 √ = 2.8803 14 CVal = tα/2,n−1 = t.025,13 = invT(1 − .025, 13) = 2.1604 SE = ME = 2.1604 × 2.8803 = 6.2225 pt.est = 25 95%CI = pt.est ± ME = (18.778, 31.222) t-Confidence Interval for the Mean using TI calculators I Do this: STAT → TESTS ↓ TInterval → STATS ↓ x:25 ↓ Sx :10.777 ↓ n:14 ↓ C-Level:.95 ↓ CALCULATE I READOUT: Tinterval (18.778, 31.222) x = 25 n = 14 I We are 95% confident that the true mean quiz score is between 18.8 and 31.2. iClicker Question 7.2 iClicker Question 7.2 Sample Size Determination based on confidence intervals I What sample size should we use for the average quiz score determination if we want 95% confidence, ME =5, and σ = 10.777 I n= z 2σ2 1.962 × 10.7772 == = 17.8 ≈ 18 ME 2 52 Slow Wave Sleep Example page 100, problem #1 21 20 22 7 9 14 23 9 10 25 15 17 11 I (a) x = 15.6154 and s = 6.1310 I (b) population average and SD: not possible. I (c) the sample average will miss the population average by the SE. √ √ (d) SEx = s/ n = 6.1310/ 13 = 1.7 I I (e) ME = CVal × SE = t.975,13−1 × 1.7 = 2.1788 × 1.7 = 3.704 I (f) 95% CI is 15.6154 ± 3.704 = (11.91, 19.32) Slow Wave Sleep Example continued I (f) (continued) can also do this (assuming data have been entered into list 1, L1 ): STAT → TESTS ↓ tInterval → DATA ↓ List:L1 ↓ CALCULATE I (g) If the confidence level is reduced to 90%, the new interval will be shorter. I (h) 90% CI → (12.585, 18.646) I (i) Interpret the 90% CI.: We are 90 percent confident that the true (population average) is between 12.585 and 18.646. Slow Wave Sleep Example continued I (j) Does the 95% CI suggest that elderly men over 60 spend 20% of their sleep in REM? No, since 20 (%) is not in the 95% CI. I (k) What sample size should we use if we change the ME to 2.5? CVal 2 × SD 2 1.962 × 6.132 use n= = = 23.10 = 24 ME 2 2.52 Eg., CI A manager of a consumer electronics company wants to investigate the TV viewing habits of residents of a small midwestern town. A random sample of 40 respondents is selected, and each respondent is instructed to keep a detailed record of all TV viewing in a certain week. The viewing time per week was X = 15.3 hours, S = 3.8 hours and 27 respondents watch the evening news at least three weeknights. Compute the margin of error for a 95% confidence interval. I I I I I Error rate α = 1 − .95 = .05, 1 − α2 = 1 − .025 = .975. Degrees of freedom = n − 1 = 40 − 1 = 39. CVal = invT(.975, 39) = 2.0227, SE = √3.8 = .6008 40 ME = 2.0227 × .6008 = 1.215 95% CI: (15.3 − 1.215, 15.3 + 1.215) = (14.085, 16.515) Calculate the Margin of Error from a given confidence interval Note that, since a confidence interval is computed by pt. est. ± ME = (LB, UB) where LB = Lower confidence Bound and UB = Upper confidence Bound, we have ME = UB − LB . 2 So, if a 95% confidence interval is known = (14.085,16.515), then 16.515 − 14.085 ME = = 1.215 2 iClicker Question 7.3 iClicker Question 7.3 Comparing Means of two independent populations I We are not limited to comparing an average to a constant. Suppose we want to compare the means of two independent populations. I Parameter of interest: δ = µ1 − µ2 I Recall: CI is pt.est ± ME pt.est ME = x 1 − x 2, = CVal × SE where CVal = tn1 +n2 −2 , q SE = SE12 + SE22 Example battery example A statistics student designed an experiment to see if there was any real difference in battery life between brand-name AA batteries and generic AA batteries. He used six pairs of AA alkaline batteries from two major battery manufactures: a well known brand name and a generic brand. He measured the length of battery life while playing a CD player continuously. He recorded the time (in minutes) when the sound stopped. Battery Example continued Generic x 206 S 10.3 n 6 Want 95% CI Brand Name 187.4 14.6 6 I (a) What is the standard error? I (b) What is the 95% CVal? I (c) What is the ME? I (d) What is the 95% CI? I (e) Does this confidence interval suggest that generic AA batteries will last longer than brand-name AA batteries? I (f) Interpret the 95% CI. Battery Example continued, answers I (a) s (n1 − 1)s12 + (n2 − 1)s22 = 12.6, n1 + n2 − 2 r 12.62 12.62 + = 7.27 6 6 pooled SD = SE = I (b) CVal = tn1 +n2 −2 = invT(.975, 10) = 2.2281 I (c) ME = CVal × SE = 2.2281 × 7.27 = 16.1983 I (d) 95% CI → (2.35, 34.85) Battery Example continued, answers I (e) Does this confidence interval suggest that generic AA batteries will last longer than brand-name AA batteries? Yes, because zero is not within the interval I (f) Interpret the 95% CI. We are 95% confident that the true mean difference is between 2.35 and 34.85. Battery Example continued, using TI calculator I Do this: STAT → TESTS ↓ 2-SampTInt → STATS ↓ x 1 :206.0 ↓ Sx1:10.3 ↓ n1 :6 ↓ x 2 :187.4 ↓ Sx2:14.6 ↓ n2 :6 ↓ C-Level:.95 ↓ Pooled:Yes ↓ CALCULATE I READOUT: 2-sampTInt (2.3471, 34.853) df=10 : Sxp: 12.6342788 : I Zero is not within this interval, we can conclude that there is a difference between the two means. iClicker Question 7.4 iClicker Question 7.4 Comparing Means of 2 dependent groups I We are not limited to comparing two averages of independent samples. Suppose we want to compare the means of two related samples. I Remember CI = pt. est. ± ME, where pt. est. = D = X 1 − X 2. I ME = CVal × SE, where CVal = tα/2,n−1 = invT(1 − S α , n − 1), and SE = √diff 2 n Example computer stock prices We want to compare January 2002 prices vs. January 2003 prices of computer companies, see page 92. Computer Stock Prices Jan. 02 Jan. 03 Diff. x 25.91 17.96 7.946 s 6.34 5.65 6.1426 size n 5 5 5 Computer Stock Prices Example continued I What is Standard Error? I What is 95% Critical Value? I What is 95% Margin of Error? I What is a 95% Confidence Interval? I Does this confidence interval suggest a difference in stock prices between Jan. 2002 and Jan. 2003? I Interpret the 95% CI Computer Stock Prices Example answers I I I I I I s 6.1426 SE = √diff = √ = 2.7471 n 5 CVal = t.025,n−1 = invT(1 − .025, 4) = 2.7764 ME = 2.7764 × 2.7471 = 7.6271 95%CI −→ 7.946 ± 7.6271 = (0.3189, 15.573) Does this confidence interval suggest a difference in stock prices between Jan. 2002 and Jan. 2003? Yes, because zero is NOT within CI. Interpret the 95% CI: We are 95% confident that the true difference is between .3189 and 15.573. Computer Stock Prices Example answers using TI calculators I Do this: STAT → EDIT and Enter data into L1 and L2 then place cursor on L3 , do 2nd2 − 2nd1 (i.e.,L2 − L1 ) → STAT → TESTS ↓ tInterval → DATA ↓ List:L3 ↓ C-Level:.95 ↓ CALCULATE I READOUT: TInterval (0.3189, 15.573) x = 7.946 Sx = 6.1426 n = 5 I Zero is not within this interval, we can conclude that there is a difference between the two means. West Michigan Telecom Example problem 13 on page 104 Some stock market analysts have speculated that parts of West Michigan Telecom might be worth more that the whole. For example, the company’s communication systems in Ann Arbor and Detroit can be sold to other communications companies. Suppose that a stock market analyst chose nine (9) acquisition experts and asked each to predict the return (in percent) on investment (ROI) in the company held to the year 2003 if (i) it does business as usual, or (ii) if it breaks up its communication system and sells all its parts. Their predictions follow: West Michigan Telecom Example, continued Expert Not Break Break Up I 1 2 3 4 5 6 7 12 21 8 20 16 5 18 15 25 12 17 17 10 21 √ √ SE = sdiff / n = 2.8626/ 9 = 0.9542 8 21 28 9 10 15 I CVal = tα/2,n−1 = t.025,8 = invT(1 − .025, 8) = 2.3060 I ME = 2.306 × .9542 = 2.2004 I 95%CI −→ (1.0218, 5.4226) I Does this confidence interval suggest a difference between breaking up the company or not? Yes, because zero is NOT within CI. I Interpret the 95% CI: We are 95% confident that the true difference among the experts is between 1.0218 and 5.4226. West Michigan Telecom Example continued, using TI calculators I STAT → EDIT and Enter data into L1 and L2 and place cursor on L3 , do 2nd2 − 2nd1 (i.e., L2 − L1 ) then do STAT → TESTS ↓ tInterval → Data ↓ List:L3 ↓ C-Level:.95 ↓ CALCULATE I READOUT: TInterval (1.02, 5.42) x = 3.22 Sx: 2.8626 n = 9 I Zero is not within this interval, we can conclude that there is a difference between the two means. Eg., CI #2 An on-line grocery store in a mid-sized midwestern city that has more than 10,000 customers. The following statistics summarizes the May 1999 prices for a shopping list of eight items an on-line grocery and a local supermarket. The average difference between the two shopping lists and the standard deviation are 0.0375 and 0.22, respectively. What is the 90% margin of error? I I I I I Given: n = 8, D = .0375, Sdiff = .212, want 90% CI α = 1 − .9 = .1, 1 − α2 = .95, df = n − 1 = 8 − 1 = 7. .22 CVal = invT(.95, 7) = 1.89458, SE = √ = .07778 8 ME = 1.89458 × .07778 = 0.1474 If a 90% confidence interval was instead given: (−.1099, .18485), then ME = .18485−(−.1099) = 0.1474 2 iClicker Question 7.5 iClicker Question 7.5 Confidence Interval for population proportion Suppose we want to estimate the population proportion using intervals. I Recall CI is pt.est ± ME I Therefore, use pt.est = I ME = CVal × SE I This CI works well if success x = sampleSize n n × p > 5 and n × (1 − p) > 5 (Note: that is, it works well if the expected number of successes and the expected number of failures are both greater than 5.) EAS Sensor Example If a sales clerk fails to remove the EAS sensor when an item is purchased, it can result in an embarrassing situation for the customer. A survey was conducted to study consumer reaction to such false alarms. Of 250 customers surveyed, 40 said that if they were to set off an EAS alarm because store personnel did not deactivate the merchandise, then “they would never shop at the store again.” EAS Sensor Example continued I 40 pt.est = = 0.16, SEp̂ = 250 r .16(1 − .16) = 0.02319 250 I CVal = zα/2 = z.025 = invNorm(1 − .025) = 1.96 I ME = 1.96 × 0.02319 = 0.04544 I 95%CI −→ (.11456, .20544) I Interpret the 95% CI: We are 95% confident that the true proportion is between 0.1146 and 0.2054. EAS Sensor Example continued, using TI calculators I Do this: STAT → TESTS ↓ 1-PropZint ↓ x:40 ↓ n:250 ↓ C-Level:.95 ↓ CALCULATE I READOUT: Zinterval (.11456, .20544) P̂ = .16 n = 250 I We are 95% confident that the true proportion is between 11.46% and 20.54%. Exercise #17 on page 105 Given: x = 600, n=2000 I 600 = 0.3 2000 r .3(1 − .3) SEp̂ = = 0.0102 2000 CVal = zα/2 = z.025 = invNorm(1 − .025) = 1.96 pt.est ME = = 1.96 × 0.0102 = 0.0201 I 95%CI −→ (.2799, .3201) I Interpret the 95% CI: We are 95% confident that the true proportion is between 0.28 and 0.32. Sample Size Determination based on CI of proportion I What is the true proportion of success p? I Decide which confidence level to use I Determine margin of error that you’re willing to accept EAS Example Suppose that you are a student with a grant to study this EAS issue, and you realize that there are not enough funds to gather data on 250 subjects. So you want to determine a new sample size by relaxing the confidence level to 90% and use p=.16 and the ME of 0.04544, what is the new sample size? EAS Example continued, answer z2 × [p̂(1 − p̂)] n= ME 2 I CVal90% = z(1−.9)/2 = z.05 = invNorm(1 − .05) = 1.645 I ME = .04544 I p̂ = .16 (note: see discussions on next slide) I 1.6452 use n= × [.16(1 − .16)] = 176.1 = 177 .045442 Sample Size Determination for Proportion discussions z2 × [p̂(1 − p̂)] n= ME 2 I When a rough estimate of p is available (such as that from a pilot study, or some educated guess), use it for p̂ above. Otherwise, use 21 for p̂ (a conservative estimate). I It is recommended to always round it up to the next integer (as n = 177 here which is rounded up from 176.1). Eg., CI #3 In a study reported in a well known business newspaper, a large plastic container manufacturer surveyed 1007 U.S. worker. Of the people surveyed, 665 indicated that they take their lunch to work with them. Of the 665 taking their lunch, 200 reported that they carry the lunch in a brown bag. Consider the population of U.S. workers who take their lunch to work with them. Set up a 90% confidence interval estimate of the population proportion who take brown-bag lunches. I I I Given: x = 200, n = 665, 90% CI Do this: STAT → TESTS ↓ 1-PropZint ↓ x:200 ↓ n:665 ↓ C-Level:.9 ↓ CALCULATE (.2715, .33) is a 90% CI for the population proportion who take brown-bag lunches. iClicker Question 7.6 iClicker Question 7.6 Eg., CI #3, continued In a study reported in a well known business newspaper, a large plastic container manufacturer surveyed 1007 U.S. worker. Of the people surveyed, 665 indicated that they take their lunch to work with them. Of the 665 taking their lunch, 200 reported that they carry the lunch in a brown bag. Consider the population of U.S. workers who take their lunch to work with them. After reading this story, an analyst wants to determine the sample size necessary for a 95% confidence level of estimating the true proportion of workers who take their lunch to work with them when the margin of error is 0.03. I I I Initial estimate is available: p̂ = 200 665 = 0.3. CVal = .975 = 1.96 for 95% confidence level The required sample size is CVal 2 1.962 n= × p̂×(1− p̂) = ×0.3×(1−0.3) = 896.4 ≈ 897 ME 2 0.032 iClicker Question 7.7 iClicker Question 7.7