Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 ECO 72 - INTRODUCTION TO ECONOMIC STATISTICS Topic 8 Confidence Intervals These slides are copyright © 2003 by Tavis Barr. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/). 2 Confidence Intervals Assume a random interval σ ⎡ I = ⎢X − ,X + n ⎣ Then, σ ⎤ ⎥ n ⎦ σ σ ⎤ ⎡ P [μ ∈ I] = P ⎢ X − ≤μ ≤ X+ ⎥= n n⎦ ⎣ ⎡ ⎤ X−μ = P ⎢ −1 ≤ ≤ 1⎥ = P [ −1 ≤ Z ≤ 1] = 0.68 σ/ n ⎣ ⎦ since Z ≈ N ( 0,1) 3 Confidence Intervals • So for every sample we have probability 68% to create an interval which includes μ. • In other words 68% of all the possible intervals we can create by selecting a different sample would contain the true value μ. • If in the previous example n = 50, X = 1 and σ = 1 then I = [0.85, 1.14]. • The true value of μ may or may not be in that interval. 4 Confidence Intervals • We are only 68% confident that μ is in that interval. • So, we would like to increase confidence thus bringing the 68% to 90, 95 or even 99%. • This has a cost because this would also increase the bounds of the interval. 5 Confidence Intervals P(1.645<z<1.645) =0.9 2 + f(z) z +2 A normally distributed variable with mean and std. deviation will be between -1.645 and +1.645 90 percent of the time. ● This is called the 90 percent confidence interval. ● 6 Confidence Intervals P(1.96<z<1.96) =0.95 2 + f(z) z +2 A normally distributed variable with mean and standard deviation will be between 1.96 and +1.96 95 percent of the time. ● This is called the 95 percent confidence interval. ● 7 Confidence Intervals P(2.576 < z < 2.576) =0.99 2 + f(z) z +2 A normally distributed variable with mean and std deviation will be between -2.576 and +2.576 99 percent of the time. ● This is called the 99 percent confidence interval. ● 8 Example of a Confidence Interval ● A survey of 2,938 clients of homeless service programs found that the average client earned $367 per month, with a std. deviation of $354. Source: http://www.huduser.org/publications/homeless/homelessness/ch_2e.html ● Standard error: 354/29380.5 = 354/54.2 = 6.53 Lower 90% 95% 99% Bound of C.I. 367 – 1.645(6.53) = 356.26 367 – 1.960(6.53) = 354.20 367 – 2.576(6.53) = 350.18 Upper Bound of C.I. 367 + 1.645(6.53) = 377.74 367 + 1.960(6.53) = 379.80 367 + 2.576(6.53) = 383.82 90% 356.25 to 377.74 95%: 354.20 to 379.80 99%: 350.18 to 383.82 350 360 370 380 390 9 Example of a Confidence Interval ● ● Sample of 300 households. Mean meat consumption is 0.4 lbs/day, std. deviation is 0.2. Standard error is 0.2/3000.5 = 0.2/17.32=0.023 Lower Bound of C.I. 90% 0.4 – 1.645(0.023) = 0.362 95% 0.4 – 1.960(0.023) = 0.355 99% 0.4 – 2.576(0.023) = 0.341 Upper Bound of C.I. 0.4 + 1.645(0.023) = 0.438 0.4 + 1.960(0.023) = 0.445 0.4 + 2.576(0.023) = 0.459 90%: 0.362 to 0.438 95%: 0.355 to 0.445 99%: 0.341 to 0.459 0.3 0.35 0.4 0.45 0.5 10 How do we handle small samples? ● ● ● The Central Limit Theorem requires the sample size to be over 30 If the original variable is normally distributed, the sample mean will follow the t distribution Even if it isn't, pretending it is may give us some guidance 11 How do we handle small samples? ● ● ● For large samples, we multiply the standard error by the same number for all sample sizes: 90% 1.645 95% 1.96 99% 2.576 For the t distribution, we use a different number depending on how many observations there are If our sample has n observations, then we use the t distribution with n-1 degrees of freedom 12 How do we handle small samples? ● ● Example: – A sample of students has the following test scores: 84, 76, 98, 34, 65, 76, 90, 92, 64, 87. – What is a 90% confidence interval for the population mean? We start by calculating the sample mean, sample standard deviation, and standard error the same way 13 How do we handle small samples? ● Sample mean is 76.6 ● Sample s.d. Is 18.7 ● ● So standard error is 18.7/ 10=5.91 There are 10 observations so we use 10-1 = 9 degrees of freedom Confidence Intervals 80% 90% 95% Level of Significance for OneTailed Test df 0.100 0.050 0.025 Level of Significance for TwoTailed Test 0.20 0.10 0.05 1 3.08 6.31 12.71 2 1.89 2.920 4.3 3 1.64 2.35 3.18 4 1.53 2.13 2.78 5 1.48 2.02 2.57 98% 99% 99.9% 0.010 0.005 0.0005 0.02 31.82 6.97 4.54 3.75 3.37 0.01 0.001 63.657 636.619 9.925 31.599 5.841 12.924 4.604 8.610 4.032 6.869 6 7 8 9 10 1.440 1.42 1.4 1.38 1.37 1.943 1.895 1.860 1.833 1.812 2.45 2.37 2.31 2.26 2.23 3.14 3 2.87 2.82 2.76 3.707 3.499 3.355 3.250 3.169 5.959 5.408 5.041 4.781 4.587 11 12 13 14 15 1.36 1.36 1.350 1.35 1.34 1.796 1.782 1.771 1.761 1.753 2.2 2.18 2.160 2.15 2.13 2.72 2.68 2.650 2.62 2.6 3.106 3.055 3.012 2.977 2.947 4.437 4.318 4.221 4.140 4.073 14 How do we handle small samples? ● X=76.6, SE=5.91 ● 9 degrees of Freedom ● 90% Confidence interval: X−1.83×SE to X1.83×SE = 76.6 – 1.83(5.91) to 76.6 + 1.83(5.91) = 65.78 to 87.42 Confidence Intervals 80% 90% 95% Level of Significance for OneTailed Test df 0.100 0.050 0.025 Level of Significance for TwoTailed Test 0.20 0.10 0.05 1 3.08 6.31 12.71 2 1.89 2.920 4.3 3 1.64 2.35 3.18 4 1.53 2.13 2.78 5 1.48 2.02 2.57 98% 99% 99.9% 0.010 0.005 0.0005 0.02 31.82 6.97 4.54 3.75 3.37 0.01 0.001 63.657 636.619 9.925 31.599 5.841 12.924 4.604 8.610 4.032 6.869 6 7 8 9 10 1.440 1.42 1.4 1.38 1.37 1.943 1.895 1.860 1.833 1.812 2.45 2.37 2.31 2.26 2.23 3.14 3 2.87 2.82 2.76 3.707 3.499 3.355 3.250 3.169 5.959 5.408 5.041 4.781 4.587 11 12 13 14 15 1.36 1.36 1.350 1.35 1.34 1.796 1.782 1.771 1.761 1.753 2.2 2.18 2.160 2.15 2.13 2.72 2.68 2.650 2.62 2.6 3.106 3.055 3.012 2.977 2.947 4.437 4.318 4.221 4.140 4.073 15 16 Another Small Sample Example The longevity of 7 patients with a rare cancer after metastasis: 29 67 65 42 33 97 56 weeks weeks weeks weeks weeks weeks weeks What is a 95% confidence interval for the average longevity in the population? 17 Another Small Sample Example The longevity of 7 patients with a rare cancer after metastasis: 29 weeks 67 weeks 65 weeks 42 weeks 33 weeks 97 weeks 56 weeks Sum: 389 weeks Mean: 55.57 X i− X -26.57 11.42 9.42 -13.57 -22.57 41.42 0.42 Xi− X 2 706.04 130.61 88.90 184.18 509.47 1716.33 0.18 Sum: 3335.71 Variance: 555.95 Std Dev: 23.58 Std Err: 23.58 =8.91 7 What is a 95% confidence interval for the average longevity in the population? 18 Another Small Sample Example Longevity of 7 patients with a rare cancer after metastasis Sample Mean: 55.57 Std Error: 8.91 dof: 6 95% confidence interval: 55.57 ± 2.45(8.91) = 33.74 to 77.40 Confidence Intervals 80% 90% 95% Level of Significance for OneTailed Test df 0.100 0.050 0.025 Level of Significance for TwoTailed Test 0.20 0.10 0.05 1 3.08 6.31 12.71 2 1.89 2.920 4.3 3 1.64 2.35 3.18 4 1.53 2.13 2.78 5 1.48 2.02 2.57 98% 99% 99.9% 0.010 0.005 0.0005 0.02 31.82 6.97 4.54 3.75 3.37 0.01 0.001 63.657 636.619 9.925 31.599 5.841 12.924 4.604 8.610 4.032 6.869 6 7 8 9 10 1.440 1.42 1.4 1.38 1.37 1.943 1.895 1.860 1.833 1.812 2.45 2.37 2.31 2.26 2.23 3.14 3 2.87 2.82 2.76 3.707 3.499 3.355 3.250 3.169 5.959 5.408 5.041 4.781 4.587 11 12 13 14 15 1.36 1.36 1.350 1.35 1.34 1.796 1.782 1.771 1.761 1.753 2.2 2.18 2.160 2.15 2.13 2.72 2.68 2.650 2.62 2.6 3.106 3.055 3.012 2.977 2.947 4.437 4.318 4.221 4.140 4.073 19 ● Confidence Interval for Population Proportion A proportion is simply the fraction of responses in a dataset that equal a certain number – For a dummy (0/1, yes/no) variable, the fraction of “yes” or “1” – For a category variable, e.g., the brand of car that a respondent drives, what percent drive a Buick? – For a more general discrete variable, e.g., what percentage of people have exactly two children? 20 ● Confidence Interval for Population proportion All proportion variables can be thought of or re-cast as dummy variables – 1 for “Drives a Buick” 0 for “Doesn't drive a a Buick” – 1 for “Has exactly two children” 0 for “Doesn't have exactly two children Confidence Interval for Population Proportion 21 ● ● ● Consider the question: “In a sample of a dummy variable size n, what is the probability that we observe the value “1” k times? This is a Binomial probability So a sample proportion is basically a Binomial variable divided by n 22 Confidence Interval for Population Proportion ● ● ● A sample proportion is basically a Binomial variable divided by n Remember that as n gets big, a Binomial variable approximates a Normal with mean np and standard deviation np 1−p So if we divide the variable by n, we get a Normal variable with mean p and standard deviation np1−p = n p1−p n 23 ● ● ● ● Confidence Interval for Population Proportion If we divide the variable by n, we get a Normal variable with mean p and standard deviation np 1−p/n= p1−p/n So the expected value of the sample proportion is the population proportion, and its standard error is p1−p/ n We can use this expected value and standard error to generate confidence intervals Requirement: np ≥ 5 and np(1-p) ≥ 5 24 ● ● ● Confidence Interval for Population Proportion Example: Suppose we decide that only 1% of our televisions should break within a year. We do a survey of 500 consumers and find that 8 have broken within the first year. What is the 90% confidence interval for the proportion that break within a year? 25 ● Confidence Interval for Population Proportion Example: A Zogby poll of 2,246 adults found that 83% think text messaging while driving should be illegal. Source: http://www.zogby.com/news/ReadNews.dbm?ID=1323 ● What is a 90 percent confidence interval for the fraction of adults that thinks text messaging while driving should be illegal? 26 ● ● ● Confidence Interval for Population Proportion Example: Suppose we decide that only 1% of our televisions should break within a year. We do a survey of 500 consumers and find that 8 have broken within the first year. What is the 90% confidence interval for the proportion that break within a year? – Proportion is p = 8/500 = 0.016 – Standard error is p1−p/n= 0.016×0.984/500 = 0.000031488=0.0056 27 ● Confidence Interval for Population Proportion What is the 90% confidence interval for the proportion that break within a year? – Proportion is p = 8/500 = 0.016 – Std Error: p1−p/n= 0.016×0.984/500=0.0056 – 90% confidence interval? Same method as before: Lower Bound: 0.016 – 1.645(0.0056) = .00676 Upper Bound: 0.016 + 1.645(0.0056) = .0252 1.645 x 0.0056 0.000 0.005 0.010 0.015 1.645 x 0.0056 0.020 0.025 0.030 28 ● ● Confidence Interval for Population Proportion A sample of 1000 likely voters finds that 560 support a campaign finance reform referendum What is a 99 percent confidence interval for the percentage of voters supporting the referendum? 29 ● ● Confidence Interval for Population Proportion A sample of 1000 likely voters finds that 560 support a campaign finance reform referendum What is a 99 percent confidence interval for the percentage of voters supporting the referendum? – Sample proportion: 560/1000 = 0.56 Standard error: p1−p/ n= 0.56×1−0.56/1000 = 0.0002464=0.0157 Confidence Interval for Population Proportion 30 ● What is a 99 percent confidence interval for the percentage of voters supporting the referendum? – Sample proportion: 560/1000 = 0.56 Standard error: 0.56×0.44/1000=0.0157 – 99% Confidence Interval: 0.56 ± 2.576(.0157) = 0.5195 to 0.6005 2.58 x 0.157 0.50 0.52 0.54 2.58 x 0.157 0.56 0.58 0.60 0.62