* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Confidence Intervals for Population Mean
Survey
Document related concepts
Transcript
Confidence Intervals for Population Mean Business Statistics Plan for Today • • • • • Inferential Statistics Point and Interval Estimates Confidence Intervals Estimating the required sample size Examples 1 Inferential Statistics • Goal = use information obtained from a sample to increase our knowledge about the population from which the sample was taken (i.e., to estimate or make inferences about the population) • 2 types: – Estimating the value of a population parameter – Testing a hypothesis • Using the Sampling Distribution of the Sample Mean (SDSM) is key Estimating a population mean • One of the purposes of randomly sampling a population is to get an estimate of the mean of the population • Usually, the best estimate of a population mean is the sample mean. Example: mean SEL test score for a group of 64 students is 77.4, thus 77.4 is the best estimate for the population of all students who take SEL test • Logic behind it is that you are more likely to get a sample mean of 77.4 from a population with a mean of 77.4: this is a point estimate 2 Point and Interval Estimates • Point estimate is when you estimate a specific value of a population parameter – Accuracy of the point estimate = SD (how much the scores in this distribution typically vary) • Interval estimate is when you estimate a range in which the population parameter is likely to fall – You can do this because the distribution of means is generally a normal curve, thus you know the percentage of scores that lie at a given area of the distribution: about 68 % of all sample means lie between the mean ± 1 SD Terminology • Point estimate: a single number designed to estimate a quantitative parameter of a population, usually the corresponding sample statistic • Interval estimate: an interval bounded by two values that is calculated from the sample and that is used to estimate the value of a population parameter • Confidence interval: an interval estimate with a specified level of confidence • Level of confidence 1 − 𝛼 : the proportion of all interval estimates that include the parameter being estimated – usually 90% , 95% , 98% or 99% 3 Example Take a city, like Trenton, NJ. We want to know how much time it takes workers living in Trenton to get to work and back: the commuting time • Sample = 36 workers from Trenton • Mean = 49 minutes • This mean becomes the point estimate for the population of all Trenton workers • σ = 15 minutes Example: continued • This mean should be close to the population mean, μ • SDSM and the CLT tell us how close this mean, a point estimate, is to the population mean, μ • Recall: with a large enough sample the SDSM will be close to normally distributed 4 Recall: the Empirical Rule Example: continued If we knew the value of 𝜇, the population mean, then we could have calculated an interval between which 9̴ 5% of the sample average commuting times should fall: From 𝜇 − 2𝜎𝑥 to 𝜇 + 2𝜎𝑥 , i.e. 𝜎 𝑛 to 𝜇 + 2 𝜎 𝑛 15 36 to μ + 2 15 36 from 𝜇 − 2 from μ − 2 , i.e. , i.e. from 𝜇 − 5 to 𝜇 + 5 minutes 5 Sampling Distribution of 𝒙 ’s , unknown μ In algebraic terms: 𝑃 𝜇 − 5 < 𝑥 < 𝜇 + 5 ≈ 95% Interval Estimates • Interval estimate: an interval bounded by two values that is calculated from the sample and that is used to estimate the value of a population parameter • Level of confidence 1 − 𝛼 : the proportion of all interval estimates that include the parameter being estimated • Confidence interval: an interval estimate with a specified level of confidence 6 Example: continued What are the bounds of the interval centered at 𝑥 = 49 minutes? From 𝑥 − 2𝜎𝑥 to 𝑥 + 2𝜎𝑥 , i.e. from 49−5 to 49+5 minutes This means that the 95.44% confidence interval for μ is from 44 to 54 minutes. Confidence Intervals 7 Summary : Calculating Confidence Intervals • • • • Sample Mean: 𝑥 Sample Size: n Population standard deviation: σ Level of confidence we wish to have: 1 − 𝛼 1 − 𝛼 ∙ 100% gives us an estimate of how confident you can be that your mean falls within this interval 0.95 *100% = 95%: you are 95% confident that the population mean falls within this interval Step by step Estimation of Mean μ (σ known) Assumption: either the general population has the bell-shaped symmetric distribution, or the sample size is at least 25. 8 Confidence Coefficient 𝒛(𝜶 𝟐) Constructing a Confidence Interval • Step 1: Set-Up – Describe the population parameter of interest • Step 2: The Confidence Interval Criteria – Check the assumptions – Identify the probability distribution and the formula to be used – State the level of confidence 𝟏 − 𝜶 • Step 3: The Sample Evidence – Collect the sample information 9 Constructing a Confidence Interval • Step 4: The Confidence Interval – Determine the confidence coefficient 𝑧(𝛼 2) – Find the error bound for a population mean 𝐸𝐵𝑀 = 𝑧(𝛼 2) ∙ 𝜎 𝑛 – Find the lower and upper confidence limits • Step 5: State the confidence interval from 𝑥 − 𝐸𝐵𝑀 to 𝑥 + 𝐸𝐵𝑀 (units) The confidence coefficient • Some useful numbers from the table: If if if If If If if 1 − 𝛼 = 0.80 1 − 𝛼 = 0.90 1 − 𝛼 = 0.94 1 − 𝛼 = 0.95 1 − 𝛼 = 0.96 1 − 𝛼 = 0.98 1 − 𝛼 = 0.99 (80%), (90%), (94%), (95%), (96%), (98%), (99%), then then then then then then then 𝑧 𝑧 𝑧 𝑧 𝑧 𝑧 𝑧 𝛼 𝛼 𝛼 𝛼 𝛼 𝛼 𝛼 2 2 2 2 2 2 2 = 1.28 = 1.645 = 1.88 = 1.96 = 2.055 = 2.33 = 2.575 Check for yourself! 10 Example: textbook cost A random sample of 60 students from X University has revealed that their average annual textbook spending is $928. From previous studies, it is known that the standard deviation for annual textbook costs can be takes as $230. Find a 95% confidence interval for the mean annual textbook costs for all students at X University. Example: textbook costs Step 1: What is the population parameter of interest? Step 2: 𝜎 = $230 is known. Is a sample of 60 students good enough? (we need the sampling distribution to be approximately normal); we will therefore use the standard normal distribution; the level of confidence is 1 − 𝛼 = 0.95 (95%) Step 3: 𝑛 = 60, 𝑥 = $928 11 Example: textbook costs Step 4: 0.95/2 = 0.475, 𝑧 𝛼 2 = 1.96 (table) 𝐸𝐵𝑀 = 𝑧 𝛼 2 ∙ 𝜎 𝑛 = 1.96 ∙ 230 60 = 58.2 𝑥 − 𝐸𝐵𝑀 = 869.8 , 𝑥 + 𝐸𝐵𝑀 = 986.2 Step 5: The 95% confidence interval for the population mean 𝜇 is: from $870 to $986 (same precision as the data) How to decrease the error? • To decrease the value of EBM (and thus, to decrease the size of the confidence interval for 𝜇) there are two possibilities: (A) Decrease the confidence level. A smaller confidence level will result in a smaller 𝑧(𝛼/2) аnd thus, you’ll get a smaller EBM. (B) Increase the size of a sample. A larger value of n means a larger value of 𝑛 and thus, you’ll get a smaller value of EBM. • Tradeoffs: (A) less certain, (B) more costly 12 Example: practice A survey by Future Shop involving 35 households in the area revealed the mean spending of $850 on home electronics during the last year. Construct a 98% confidence interval for the average annual spending on home electronics for all households in the area, if the population standard deviation is known to be $300. Answer: from $732 to $968. Estimating the sample size • If we wish the error EBM to be smaller than a certain value, 𝜀, but the confidence level is fixed at 1 − 𝛼 , we can choose the necessary sample size: 𝜎 𝜀 > 𝐸𝐵𝑀 = 𝑧(𝛼 2) ∙ 𝑛 Thus, 𝑛 > 𝑧 𝛼 2 ∙𝜎 2 𝜀 13 Estimating the sample size 𝑧 𝛼 2 ∙𝜎 𝜀 2 • The number rounded up to the nearest integer is denoted by 𝑛𝑚𝑖𝑛 : the minimum required sample size. • Example: a supermarket manager needs to estimate the average weekly grocery spending by his customers at a 90% level of confidence and with an error not exceeding $10. What is the minimum sample size needed, if he knows that the population standard deviation is $60? Example: grocery shopping • • • • • Solution. Given: 1 − 𝛼 = 0.9, 𝜎 = $60, 𝜀 = $10 Find: 𝑛𝑚𝑖𝑛 First, we have 𝑧(𝛼 2) = 1.645 Now, we compute: 𝑧 𝛼 2 ∙𝜎 𝜀 2 = 1.645∙60 2 10 = 97.4 Thus, the minimum required sample size is 𝑛𝑚𝑖𝑛 = 98 customers 14 Example: practice An insurance company wants to estimate the average mileage driven by residents per week in Hamilton, so that the error does not exceed 20 km at the 99% level of confidence. From other studies they know that the population standard deviation can be taken as 100 km. Estimate the sample size needed for this study. Answer: 166 drivers 15