* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 21: What is a Confidence Interval?
Survey
Document related concepts
Transcript
Chapter 21: What is a Confidence Interval? Thought Question To estimate the percentage of all adults who have an Internet connection in their homes, a properly chosen sample of 1100 adults across the U.S. were contacted, and 60% said “yes”. If this poll were repeated many times, would it get the same sample proportion each time? How close do you think this sample proportion is to the percentage of the entire country who have an internet connection? Within 30%? 10%? 5%? 1%? Exactly the same? 1 Estimating Population Parameters Terminology • Statistical inference: draws conclusions about a population on the basis of data about a sample. • Parameter: fixed, unknown number that describes the population. • Statistic: known value calculated from a sample, which can change from sample to sample. • Population proportion p: a parameter, describes the proportion of the population with a particular characteristic. • Sample proportion p̂: a statistic, calculates the proportion of the sample with a particular characteristic. How do we estimate an unknown parameter? Choose a sample from the population and use a sample statistic as an estimate. Observations • Statistical conclusions are uncertain because the sample isn’t the entire population. • Statistical inference must both give conclusions and say how uncertain they are. 2 Review of Quantifying Uncertainty We will never estimate the population parameter exactly. How far off are we? Terminology • Margin of Error: how close the sample statistic lies to the population parameter. • Level of Confidence: what percentage of all possible samples satisfy the margin of error. • Confidence Statement: combines margin of error and level of confidence. Interpretation of Common Language Statement: “The margin of error is plus or minus two percentage points.” Translation: If we took many samples, 95% of them would give a value of p̂ within ±2% of p. Equivalently: If we took many samples, p is within ± 2% of 95% of the values of p̂. Equivalently: If we took many samples, p will captured by 95% of the intervals [p̂ − 0.02, p̂ + 0.02]. Interpretation of 95% Confidence using Margin of Error • 95% of the time p̂ will be no more than the margin of error away from p. • 5% of the time p̂ will “miss” p by more than the margin of error. • Can’t tell if we “hit” or “miss.” This is just a fact of life! 3 Quick Method √ Use p̂ from an SRS of size n to estimate p. The margin of error for 95% confidence is ≈ 1/ n. Example: Internet Connection √ Margin of error for 95% confidence: ≈ 1/ 1100 ≈ 0.0302 = 3.02%. Confidence statements: (i) Margin of error interpretation: We are 95% confident that the true proportion of adults who have an internet connection in their homes is within ± 3.02% of our sample proportion 60%. (ii) Interval interpretation: We are 95% confident that between 56.98% and 63.02% of adults have an internet connection in their homes. Important Points • The conclusion of a confidence statement always applies to the population, not to the sample. • Our conclusion about the population is never completely certain. • We can choose to use a confidence level other than 95%. • It is usual to report the margin of error for 95% confidence. • Take a larger sample to get a smaller margin of error with the same confidence. 4 Confidence Intervals For Sample Proportions Terminology • 95% confidence interval: an interval calculated from sample data by a process that is guaranteed to capture the true population parameter in 95% of all samples. Facts About Sample Proportions For large enough n: • The sampling distribution of p̂ is approximately Normal. • The mean of the sampling distribution is p. r • The standard deviation of the sampling distribution is p(1 − p) . n Example: Internet Connection Assume that the true proportion p of adults who have an internet connection in their homes is 0.62. What are the mean and standard deviation of the sample proportion p̂? mean = p = 0.62 r standard deviation = 0.62(1 − 0.62) ≈ 0.015 1100 5 Using the 68-95-99.7 rule, determine two values of p̂ in between which 95% of all values of p̂ will lie. 95% of all values of p̂ will fall between: mean - 2 standard deviations = 0.62 − 2 × 0.015 = 0.59 and mean + 2 standard deviations = 0.62 + 2 × 0.015 = 0.65. Note that: • Margin of error interpretation: In 95% of all samples of size 1100, the statistic p̂ is within ± 0.030 of the parameter p. • Confidence interval interpretation: Equivalently, 95% of all samples of size 1100 give an outcome p̂ such that the population truth p is captured by the interval [p̂ − 0.030, p̂ + 0.030]. In General Approximately 95% of all samples catch p in the interval " # r r r p(1 − p) p(1 − p) p(1 − p) p̂ ± 2 = p̂ − 2 , p̂ + 2 . n n n Why can’t we use this formula to compute a confidence interval from our data? We don’t know the value of p! Approximate p by p̂. 6 95% Confidence Interval for a Proportion An approximate 95% confidence interval for p is # " r r r p̂(1 − p̂) p̂(1 − p̂) p̂(1 − p̂) . p̂ ± 2 = p̂ − 2 , p̂ + 2 n n n Example: Internet Connection Calculate a 95% confidence interval for the proportion of adults who have an internet connection at home. r r p̂(1 − p̂) 0.60(1 − 0.60) p̂ ± 2 = 0.60 ± 2 ≈ 0.60 ± 0.0295 ≈ [0.5705, 0.6295] n 1100 Write the two versions of a confidence statement. (i) Margin of error interpretation: We are 95% confident that the proportion of adults that have an internet connection in their homes is within ± 2.95% of our sample proportion 60%. (i) Confidence interval interpretation: We are 95% confident that between 57.05% and 62.95% of adults have an internet connection in their homes. 7 Understanding Confidence Intervals So Far... confidence interval = estimate ± margin of error In General A level C confidence interval for a parameter has two parts: • An interval calculated from the data. • A confidence level C, which gives the probability that the interval will capture the true parameter value with repeated samples. Example: Internet Connection Assume three additional polls of adults were taken, with the following results: (i) 671 out of 1100 adults had an internet connection in their homes (ii) 704 out of 1100 adults had an internet connection in their homes (iii) 638 out of 1100 adults had an internet connection in their homes 8 Calculate a 95% confidence interval for each poll. r (i) p̂ ± 2 p̂(1 − p̂) = n r (ii) p̂ ± 2 p̂(1 − p̂) = n r (iii) p̂ ± 2 r 671 1100 p̂(1 − p̂) = n ±2 671 1100 (1 r 704 1100 ±2 671 − 1100 ) ≈ 0.610 ± 0.029 ≈ [0.581, 0.639] 1100 r 638 1100 ±2 704 − 1100 ) ≈ 0.640 ± 0.029 ≈ [0.611, 0.669] 1100 704 1100 (1 638 1100 (1 638 − 1100 ) ≈ 0.580 ± 0.030 ≈ [0.550, 0.610] 1100 Which of these confidence intervals contain the true parameter 0.62? The first and second. If we sample forever, 95% of our confidence intervals will capture the true parameter. 9 Interpretation of 95% Confidence using Confidence Intervals • 95% of the time the confidence interval will capture p. • 5% of the time the confidence interval will not capture p. • Can’t tell if we “hit” or “miss.” This is just a fact of life! 10 Changing the Confidence Level What if we want a confidence level other than 95%? Confidence Level C 50% 60% 70% 80% 90% 95% 99% 99.9% Critical Value z ∗ 0.67 0.84 1.04 1.28 1.64 1.96 2.58 3.28 Observations • The sample proportion p̂ takes a value within z ∗ standard deviations of p, with probability C. • The interval extending z ∗ standard deviations either side of p̂ captures p, with probability C. 11 Level C Confidence Interval for a Proportion An approximate level C confidence interval for p is # " r r r p̂(1 − p̂) p̂(1 − p̂) p̂(1 − p̂) , p̂ ± z ∗ = p̂ − z ∗ , p̂ + z ∗ n n n where z ∗ is the critical value for confidence level C. Example: Internet Connection Using the sample proportion p̂ = 0.60, calculate a 99% confidence interval for the proportion of adults who have an internet connection at home. r r p̂(1 − p̂) 0.60(1 − 0.60) p̂ ± 2.58 = 0.60 ± 2.58 ≈ 0.60 ± 0.038 ≈ [0.562, 0.638] n 1100 Using the sample proportion p̂ = 0.60, calculate a 99.9% confidence interval for the proportion of adults who have an internet connection at home. r p̂ ± 3.29 r p̂(1 − p̂) 0.60(1 − 0.60) = 0.60 ± 3.29 ≈ 0.60 ± 0.049 ≈ [0.551, 0.649] n 1100 What happens to the confidence interval width as the confidence level increases? The confidence interval gets wider. 12 Review of Confidence Intervals; Means Goal : draw conclusions about a population mean µ. How do we estimate µ? Choose a sample from the population and use the sample mean x̄ as an estimate of µ. Why do different samples give different values of x̄? Different samples are made up of different people/objects by random chance. So we get different values of x̄. Why don’t we report just the value of x̄ that we calculate in order to draw conclusions about µ? Why do we give the margin of error or the confidence interval? x̄ is most likely not the same as µ, because the sample isn’t the entire population. We also give the margin of error or confidence interval to indicate the uncertainty in our estimate of µ. 13 Sampling Distribution of the Sample Mean Facts About Sample Means (Central Limit Theorem) Choose an SRS of size n from a population with mean µ and standard deviation σ. For large enough n: • The sampling distribution of x̄ is approximately Normal. • The mean of the sampling distribution is µ. √ • The standard deviation of the sampling distribution is σ/ n. What proportion of possible x̄ values will fall within ± 2 standard deviations of the mean µ? µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ 95% of x̄ values will be within ± 2 standard deviations of µ. This is a statement about x̄, not µ. 14 Turn it around: µ will be within ± 2 standard deviations of 95% of the x̄ values. This is a statement about µ, not x̄. µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ σ σ √ √ So the interval x̄ − 2 , x̄ + 2 is an approximate 95% confidence interval for µ. Approxn n imately 95% of these intervals will capture µ. 15 Confidence Intervals: Sample Means An approximate level C confidence interval for µ is ∗ σ ∗ σ ∗ σ x̄ ± z √ = x̄ − z √ , x̄ + z √ . n n n where z ∗ is the critical value for confidence level C. Why can’t we use this formula to compute a confidence interval from our data? We don’t know the value of σ! Approximate σ by s. Level C Confidence Interval for a Sample Mean An approximate level C confidence interval for µ is ∗ s ∗ s ∗ s x̄ ± z √ = x̄ − z √ , x̄ + z √ . n n n where z ∗ is the critical value for confidence level C. 16 Example: Blood Pressure The medical director of a large company looks at the medical records of 72 executives between the ages of 35 and 44 years. He finds that the mean systolic blood pressure in this sample is x̄ = 126.1 and the standard deviation is s = 15.2. Find a 95% confidence interval for µ, the unknown mean systolic blood pressure of all executives in the company. s 15.2 x̄ ± z ∗ √ = 126.1 ± 1.96 √ ≈ 126.1 ± 3.6 ≈ [122.5, 129.7] n 72 Write a confidence statement. We are 95% confident that the mean systolic blood pressure of all executives in the company is within ±3.6 of 126.1. or We are 95% confident that the mean systolic blood pressure of all executives in the company is between 122.5 and 129.7. Interpret the meaning of 95% confidence if we repeat the study many times. 95% of the time x̄ will be within ± 3.6 of µ. or 95% of the confidence intervals will contain µ. Find a 99% confidence interval for µ, the unknown mean systolic blood pressure of all executives in the company. s 15.2 x̄ ± z ∗ √ = 126.1 ± 2.58 √ ≈ 126.1 ± 4.6 ≈ [121.5, 130.7] n 72 17 We can interpret 95% confidence in two ways: (i) 95% of the time x̄ will be no more than the margin of error away from µ. (ii) 95% of the time the confidence interval will capture µ. Explain why these statements are equivalent. If 95% of x̄ values are no more than the margin or error away from µ, then µ is no more than the margin of error away from 95% of the values of x̄. Therefore, the confidence interval x̄ ± margin of error will contain µ for 95% of the values of x̄. What will happen to the confidence interval if we keep the sample size the same and increase the confidence level? The confidence interval width will increase. The interval can’t “miss” µ as often, so it has to get wider. What will happen to the confidence interval if we keep the confidence level the same and increase the sample size? The confidence interval will shrink, since the margin of error decreases. 18 Example: ACT Scores A college admissions counselor looks at the ACT scores of 1000 high school students. He finds that the mean ACT score in this sample is x̄ = 18.2 and the standard deviation is s = 5.8. Find a 95% confidence interval for µ, the unknown mean ACT score for high school students. s 5.8 x̄ ± z ∗ √ = 18.2 ± 2 √ ≈ 18.2 ± 0.37 ≈ [17.83, 18.58] n 1000 Write a confidence statement for the true mean ACT score. We are 95% confident that the mean ACT score of all high school students is within ±0.37 of 18.2. or We are 95% confident that the mean systolic blood pressure of all executives in the company is between 17.83 and 18.58. The true mean ACT score is µ = 18. Did your confidence interval capture µ? Yes! 19