Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HOSP 1207 (Business Stats) Learning Centre Confidence Intervals Confidence intervals are a statistics concept that many people find difficult to understand, yet we encounter them every day in polling data, drug test results, and marketing surveys, to name only a few examples. It is important to understand the meaning of these intervals and how they are used to influence and make decisions. A confidence interval is a way of expressing an estimate of a population mean (μ) or proportion (p) based on the results of a study. Recall that in hypothesis testing, whenever we rejected the null hypothesis and accepted the alternative hypothesis we were able to state that the population parameter was more than or less than a certain value. But how do we know for certain what the population parameter is? We can create an estimate based on a sample statistic. The sample mean or proportion, provides a point estimate (single number estimate). Because we know it’s unlikely that the sample mean ( ) or proportion ( ̂ ) we get from a study is exactly right, we give a range of values that we can say (with some statistical certainty) will include the true population parameter. The size of that range depends on the design of the study and how much certainty we wish to express. Imagine the large circle below represents the population, and each dot represents a single value for that population. When we sample a number of values (“n” values) from the population, we get a sample mean, (or alternately a sample proportion). Each individual sample provides a point estimate, or single-number estimate, of the population parameter. Since there is variability within the individual data of a population, depending on the n values that are sampled in each distribution, we will get a sample mean that may or may not be close to the population mean. Sample 2, mean = std dev = σx2 Sample 3, mean = std dev = σx3 2 Population, mean = μ std dev = σ 3 Sample 1, mean = std dev = σx1 1 We can use the sample distribution from a study to construct a confidence interval. We express statistical certainty by giving a confidence level for the interval. Usually we talk about a 90% confidence interval (CI), a 95% CI, a 98% CI, or a 99% CI. The confidence level indicates how likely the confidence interval is to contain the population parameter. What that means is, if you were to repeat the study the way it © 2013 Vancouver Community College Learning Centre. Student review only. May not be reproduced for classes. Authored by by Emily EmilySimpson Simpson was performed on varying samples from the population, the population parameter would be in the constructed interval 90% (or 95%, 98%, 99%...) of the times you repeat the study. A larger confidence level will result in a wider range. A 90% confidence level means that 90% of the time the interval we construct around a sample mean or proportion will include (or capture) the true population parameter. It does NOT mean that we are 90% confident that the population parameter is within a certain interval. This is critical to understand. We can imagine the concept of confidence levels and intervals like a game of horseshoes. The goal (post) is the population parameter. It is FIXED. What changes is our horseshoe toss – how close we get to the post. Each horseshoe toss is like using one sampling distribution and one confidence level to construct an interval that may or may not include the population parameter. A higher confidence level is like making a wider horseshoe, which makes it more likely that you capture the post (or true value) in a toss. So it makes sense that a higher confidence level would result in a larger RANGE of values in the confidence interval. Population parameter, μ or p Confidence interval using sample 1 Confidence interval using sample 2 How do we construct a confidence interval? In general a confidence interval has the following format: (point estimate) (critical value)·(sample standard error) Half-width (HW) The point estimate comes from the sample distribution – either a sample mean or sample proportion. The critical value is determined by the confidence level. Each confidence level corresponds to a particular t-score value (if we’re estimating the population mean) or a critical z-score value (if we’re estimating the population proportion). The quantity after the ± sign is called the half-width (HW) because it is exactly one half of the confidence interval. When the half-width is added to and subtracted from the sample mean, we get the upper and lower limit of the confidence interval. © 2013 Vancouver Community College Learning Centre. Student review only. May not be reproduced for classes. 2 Estimating the Population Proportion If we want to use a sample statistic to estimate p, the population proportion, by creating a confidence interval, there are only two slight adjustments. When checking the normality of the sampling distribution of p, instead of verifying that np and nq are ≥ 10, we use np and nq since we don’t know p. The sample standard error is also calculated . So the formula for the confidence interval is: using the sample statistic: ̂ Example 1: You are trying to estimate the number of VCC students who own iPhones. A random sample of 100 students reveals that 35 of them own an iPhone. Estimate the percentage of all hospitality students who have iPhones with 98% confidence. Solution: This question involves a sample PROPORTION. First we have to make sure the sample size is less than 5% of the population in order to use the binomial distribution, since sampling is done without replacement. 100/.05 2,000 Since the student population is greater than 2,000, our sample is less than 5% of the population and we can use the binomial distribution. Also, n ̂ and n must be ≥10 in order to approximate the binomial distribution with the normal distribution: ̂ 100 0.35 35 100 0.65 65 They are both greater than 10, so we can proceed. Using the table, a 98% confidence level gives a z* of 2.326. ̂ ̂ 0.35 . 2.326 0.35, 0.65 . 0.35 0.111 The confidence interval (0.24, 0.46) captures the true proportion of all VCC students who own an iPhone, with 98% confidence. Estimating the Population Mean We use the sampling distribution of to build a confidence interval for the population mean, μ. After checking the normality of the sample value distribution, we can proceed to calculating the confidence interval: √ Example 2: The average spending for a gym membership by a random sample of 150 students at a university is $125, with a standard deviation of $30. Construct a 95% confidence interval estimate for the average expense of all university students on gym membership. You may assume that the membership costs are normally distributed. © 2013 Vancouver Community College Learning Centre. Student review only. May not be reproduced for classes. 3 Solution: We need to look up the t-score for 149 degrees of freedom (150 – 1) in Appendix 3 for the 95% confidence level. Recall that if we construct a 95% CI, this would leave 2.5% in each tail of the distribution. Therefore we want to use t0.025 , which would be 1.976. √ 125 125 1.976 √ 4.84 The confidence interval is ($120.16, $129.84) or the range from $120.16 to $129.84. Exercises 1. A Smarties company wants to estimate the proportion of green candies being produced at its factory. In a sample of 125 candies, 32 were green. At a 95% confidence level, what is the half-width of the confidence interval? What is the half-width of the confidence interval at a 99% confidence level? 2. The Census wants to estimate the average income for Canadian residents between the ages of 25-35. After randomly selecting 100 respondents in that age category, the mean income was found to be $45,000. The standard deviation was $9,700. Find the 95% confidence interval. 3. A bank branch wants to estimate the proportion of all customers who visit the bank primarily to use the outdoor ATM. In a random sample of 300 customers, 223 had come to the branch primarily to use the ATM. Estimate the proportion of all customers who visit the branch to use the ATM with 90% confidence. 4. A real estate agency wants to estimate the average sale price for a 3 bedroom, 2 bath home in Vancouver. A sample of 45 houses sold reveals an average of $975,000 and a standard deviation of $60,000. Assuming the sale prices are approximately normally distributed, give the lower and upper limit of a 99% confidence interval for housing prices. 5. A survey of 60 employees living in Surrey revealed an average workweek of 48.5 hours, with a standard deviation of 3.9 hours. Provide a 98% confidence interval for the average hours per work week for all employees. The survey data appear to be normally distributed. Continued concepts: use the equation for the confidence interval to determine the sample size needed to obtain the desired confidence level (hint: a little algebra should get you there). 6. How large a sample is needed to estimate to within 5 percentage points above or below the mean the proportion of new graduates from a hospitality management diploma program who are willing to move to find a job, with 90% confidence? (if no estimate for ̂ is given, use ̂ = 0.50). 7. Suppose in question 2, the accuracy of the average income estimate is unsatisfactory. If the Census wants to estimate the average income of Canadians aged 25-35 to within $1000, how large a sample must be taken? (To determine the sample size for an estimate of mean, we use a z-score instead of a t-score. Solve the half-width equation for n, rounding up to next nearest whole number) © 2013 Vancouver Community College Learning Centre. Student review only. May not be reproduced for classes. 4 Solutions 1. At 95% confidence level, half-width is 0.077 (or 7.7 percentage points). At 99% confidence level, half-width is 0.101 (or 10.1 percentage points). 2. $45,000 ± 1,924.48 3. 74.3 ± 4.1 % of all customers, or (70.2%, 78.4%) 4. ($950,939.91, $999,060.09) 5. 48.5 ± 1.2 hours per week, or (47.3, 49.7) Rearranging the equation should give when choosing the sample size to estimate p, or ̂ when choosing the sample size to estimate μ 6. 271 people 7. 362 residents © 2013 Vancouver Community College Learning Centre. Student review only. May not be reproduced for classes. 5