Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
8.3: Confidence Intervals for Means 2.9.2017 Means vs. Proportions • We started with calculating confidence intervals for proportions – Usually categorical/discrete variables – E.g. whether a card is red • Now we’re going to do confidence intervals for means – Usually used for continuous variables – Could also be used for discrete as well (but not categorical) Two Very Different Situations We know the population standard deviation • This is basically what we did in 8.1 • We use the population standard deviation to calculate the standard deviation of the distribution • Critical values follow the normal distribution (just like they did for proportions in 8.2) We DON’T know the population standard deviation • We have to estimate the standard deviation from the sample • We use the sample standard deviation to estimate the standard deviation of the distribution • Critical values follow the t distribution Known Population Standard Deviation • This is the easier case • CI=(point estimate) ± (critical value)(St. dev of distribution) • Standard deviation of the distribution equals standard deviation of the population divided by the square root of n • Example • Researchers would like to estimate the mean cholesterol level of a particular variety of monkey that is often used in laboratory experiments. They take a sample of 200 monkeys, and find a mean of 34.2 mg/dl. A previous study has found that the standard deviation of cholesterol levels is about 5 mg/dl. Find the 95% confidence interval Example • (34.2) ± (critical value)(5 / 200) • (34.2) ± (2)(.354) • (34.2) ± (.707) • 33.493—34.907 Back to the Monkeys • Researchers would like to estimate the mean cholesterol level of a particular variety of monkey that is often used in laboratory experiments. They would like their estimate to be within 1 mg/dl of the true value at a 95% confidence level. A previous study has found that the standard deviation of cholesterol levels is about 5 mg/dl. What sample size do they need to take? Back to the Monkeys • ME=(critical value)(st dev of distribution) • 1=(critical value)(5 / 𝑛) • 1=(2)(5 / 𝑛) • .5=(5 / 𝑛) • 𝑛=5/.5 • 𝑛=10 • N=100 Back to the Monkeys (again) • What if we wanted a 90% confidence level instead of a 95% confidence level Back to the Monkeys (again) • What if we wanted a 90% confidence level instead of a 95% confidence level • • • • • • 1=(critical value)(5 / 𝑛) Critical value = invnorm(.05,0,1) = 1.645 1=(1.645)(5 / 𝑛) .608=(5 / 𝑛) 𝑛= 8.224 N=67.63 • So they should sample 68 monkeys When we DON’T know the population standard deviation • This is the harder situation – And, unfortunately, more common • Note: this only applies to means, NOT PROPORTIONS • The difference occurs in the way that we find critical values – Before, when we knew the standard deviation of the population, we could easily calculate how many standard deviations we needed to go out from the mean to capture the desired proportion of the data • .95 for a 95% confidence interval gave us a critical value of 2 (1.96) Critical Values • So now we are replacing σ (pop. Standard deviation) with 𝑆𝑥 (sample standard deviation) • Z was our critical value – For proportions – For means when we know σ • But now our critical value is t Why do we need a new critical value? • If we take a sample from a population, and measure the standard deviation of the sample, it is (on average) going to vary more than the standard deviation of the population. See the simulation below ( The t distribution • This new distribution is called the t distribution • It looks pretty similar to a normal distribution • The difference is that it has more area in the tails than a normal distribution – Particularly with smaller sample sizes So…how do we get critical values? • Instead of using standard normal probabilities – Table A or invnorm() • Now we will use the t distribution – T distribution only has one input: ‘degrees of freedom’ or ‘df’ – Sample size minus 1 What are degrees of freedom? • This is a common question that people want to ask—what do the degrees of freedom mean? • The teacher’s edition, unsatisfyingly, says: “Unfortunately, there is no simple answer. For now, simply explain that the shape and spread of the t distributions depend on the degrees of freedom, which depend on the sample size. The larger the sample size—and the more degrees of freedom—the closer the t-distributions come to the standard Normal distribution” What are degrees of freedom? • That answer doesn’t help much • Unfortunately, the answer has to do with ideas that are above the pay grade of AP statistics • One fairly accurate way of thinking about it is that it is the number of pieces of information that we have that allow us to make an estimate • For more precise (and more complicated) explanations: https://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics) • In practice, you don’t need to know what degrees of freedom really mean—you just need to know how many there are So…how do we get critical values? • Instead of using standard normal probabilities • Now we will use the t distribution – T distribution only has one input: df • 2 options: 1. Use Table B 2. (if you have a TI-84 or above): invT() Using Table B • Table B is (I think) more intuitive to use than Table A, so it is a viable option – Find your degrees of freedom (df) on the rows, and your confidence level C on the collumns – Where they intersect tells you the critical value Using Table B Using InvT • Only if you have a TI-84 or above – Sorry TI-83 owners • Works the same way as Invnorm – You plug in the area (to the left) – And then the degrees of freedom • Table B told us the critical value was 2.201 • InvT tells us that the critical value is 2.200985143 – Rounds to 2.201 Reminder about normality • We can only use these procedures if the sample size is big enough • If n<15, only if the data appear to be approximately Normal (symmetric, single-peaked, no outliers) • If 15≤n<30, can use unless there are significant outliers or skewness • If n≥30, go for it! • Sidenote: if you are choosing the sample size for a study—it should probably be bigger than 30 so that this isn’t a problem An Example • John Isner is a professional tennis player. I take an SRS of size 51 of his first serves and measure their speed in miles/hour. • In the sample, the mean is 124 mph and the standard deviation is 8 mph • Find the 98% confidence interval for the mean • (Point Estimate) ± ME • (Point estimate) ± (critical value)(St. dev) • (Point Estimate) ± ME • (Point estimate) ± (critical value)(St. dev) • 124 ± • • • • • 𝑆𝑥 (2.109)( ) 𝑛 8 (2.403)( ) 51 124 ± 124 ± (2.403)(1.12) 124 ± 2.691 121.31—126.69 We are 98% confident that John Isner’s mean first serve speed is between 121.31 and 126.69 Different Example • A random sample of 30 students received SAT math scores with a mean of 580 and a standard deviation of 80 • Find the 60% confidence interval • (Point Estimate) ± ME • (Point estimate) ± (critical value)(St. dev) • 580 ± • • • • • 𝑆𝑥 (.854)( ) 𝑛 80 (.854)( ) 30 580 ± 580 ± (.854)(14.606) 580 ± 12.473 567.53—592.47 We are 60% confident that the mean SAT math score is between 567.53 and 592.47