Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 18 Inferences About Means Copyright © 2014, 2012, 2009 Pearson Education, Inc. 1 Objectives Perform a t-test for the population mean, to include: writing appropriate hypotheses, checking the necessary assumptions, drawing an appropriate diagram, computing the P-value, making a decision, and interpreting the results in the context of the problem. 62. Compute and interpret in context a t-based confidence interval for the population mean, checking the necessary assumptions. 61. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 2 18.1 Getting Started: The Central Limit Theorem (Again) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3 The Central Limit Theorem (CLT) When a random sample is drawn from a population with mean m and standard deviation s, the sampling distribution has: • Mean: m • Standard deviation: • Approximately Normal distribution as long as the sample size is large. • The larger the distribution, the closer to Normal. s n Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4 Weight of Angus Cows The weight of Angus cows is Normally distributed with m = 1309 pounds and s = 157 lbs. What does the CLT say about the mean weight of a random sample of 100 Angus cows? • Sample means y will average 1309 pounds. • SD( y ) s n 157 100 15.7 pounds Copyright © 2014, 2012, 2009 Pearson Education, Inc. 5 Weight of Angus Cows (Continued) • The CLT says the sampling distribution will be approximately Normal: N(1309, 15.7). For the means of all random samples, • 68% will be between 1293.3 and 1324.7 lb. • 95% will be between 1277.6 and 1340.4 lb. • 99.7% will be between 1261.9 and 1356.1 lb. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 6 The Challenge of the CLT s • CLT tells us SD( y ) • We would like to use this for Confidence Intervals and Hypothesis Testing. n Unfortunately, we almost never know s. s • Using s almost works: SE ( y ) , but not quite. n • When using s, the Normal model has some error. • • William Gosset came up with new models, one for each n that works better. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 7 18.2 Gosset’s t Copyright © 2014, 2012, 2009 Pearson Education, Inc. 8 Gosset the Brewer At Guinness, Gosset experimented with beer. • The Normal Model was not right, especially for small samples. • Still bell shaped, but details differed, depending on n • Came up with the “Student’s t Distribution” as the correct model Copyright © 2014, 2012, 2009 Pearson Education, Inc. 9 Degrees of Freedom • For every sample size n there is a different Student’s t distribution. • Degrees of freedom: df = n – 1. • Similar to the “n – 1” in the formula for sample standard deviation • It is the number of independent quantities left after we’ve estimated the parameters. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 10 Confidence Interval for Means Sampling Distribution Model for Means • With certain conditions (seen later), the standardized sample mean follows the Student’s t model with n – 1 degrees of freedom. y m t SE ( y ) • We estimate the standard deviation with SE ( y ) s n Copyright © 2014, 2012, 2009 Pearson Education, Inc. 11 One Sample t-Interval for the Mean • When the assumptions are met (seen later), the confidence interval for the mean is y t n* 1 SE ( y ) • * n 1 The critical value t depends on the confidence level, C, and the degrees of freedom n – 1. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 12 Contaminated Salmon A study of mirex concentrations in salmon found • n = 150, y = 0.0913 ppm, s = 0.0495 ppm • Find a 95% confidence interval for mirex concentrations in salmon. • df = 150 – 1 = 149 • SE ( y ) • 0.0495 150 0.0040 * t149 1.976 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 13 Contaminated Salmon Confidence Interval for m • * y t149 SE ( y ) 0.0913 1.976(0.0040) = 0.0913 ± 0.0079 = (0.0834, 0.0992) • I’m 95% confident that the mean level of mirex concentration in farm-raised salmon is between 0.0834 and 0.0992 parts per million. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 14 Thoughts about z and t • • The Student’s t distribution: • Is unimodal. • Is symmetric about its mean. • Has higher tails than Normal. • Is very close to Normal for large df. • Is needed because we are using s as an estimate for s. If you happen to know s, which almost never happens, you could use the Normal model and not Student’s t (but we won’t do this in our course). Copyright © 2014, 2012, 2009 Pearson Education, Inc. 15 Assumptions and Conditions (know these!) Independence Condition • Randomization Condition: The data should arise from a suitably randomized experiment. • Sample size < 10% of the population size. Nearly Normal • For large sample sizes (n > 40), not severely skewed. • (15 ≤ n ≤ 40): Need unimodal and symmetric. • (n < 15): Need almost perfectly normal. • Check with a histogram. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 16 Assumptions for Salmon Contamination Study Researchers tested 150 salmon from 51 farms in eight regions in six countries. Are the assumptions and conditions for inference satisfied? Independence Assumption: Not a random sample, but likely independently selected. Nearly Normal Condition: The histogram is unimodal and not highly skewed. OK since sample size is 51. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 17 Using TI-83+ Stat -> Tests -> #8 Tinterval • A sample of 256 residents of a city indicated a monthly mean electric bill of $90 with a standard deviation of $24. Construct a 95% confidence interval for the mean monthly electric bill of all residents. • Stat…Tests…#8 for T Interval…enter • For inpt: choose Stats and fill in the rest of the screen with the inputted stats. • You should find the interval is $87.05 - $92.95. We are 95% confident that the mean electric monthly bill for residents is between $87.05 and $92.95. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 18 18 Using TI-83+ Stat -> Tests -> #8 Tinterval Use this when the population (true) sigma is unknown. Relies on sample mean and sample standard deviation The survival times in weeks are given for 20 male rats which were exposed to a high level of radiation. 152 152 115 109 137 88 94 77 160 165 20 128 123 136 101 62 152 83 69 125 Determine the 95% confidence interval on the mean survival time for rats. First enter your data in L1 (if you have summary stats skip this) Tinterval: • Inpt: Data • List: L1 • Freq: 1 • C-level .95 • Result:94.57 weeks to 130.23 weeks. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 19 19 23. Suppose the elapsed time of airline itineraries between Washington, D.C. and Boston is normally distributed with an unknown population mean and an unknown population standard deviation. Further suppose that a sample of size 25 (therefore, n=25 and degrees of freedom=24) was taken and the following statistics were gotten from the sample: sample mean (ybar) =135 and sample standard deviation (s) = 40. Construct confidence intervals around the sample mean corresponding to the following confidence levels (express the lower and upper bounds of the intervals as INTEGERS): Stat -> Tests -> Tinterval (here we’ll input stats instead of raw data) Inpt: Stats: x(bar): 135, Sx: 40, n:25, C-level: try .80, .90, .95, .98, & .99 a. 80% (124, 146) b. 90% (121, 149) c. 95% (118, 152) d. 98% (115, 155) e. 99% (113, 157) Hint: Your intervals should get wider and wider and should all be centered around 135. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 20 20 How Much Sleep do College Students Get? n = 25, µ= 6.64, s = 1.075 Build a 90% Confidence Interval for the Mean. Think → • Plan: Data on 25 Students Model → • Randomization Condition The data are from a random survey. • Nearly Normal Condition Unimodal and slightly skewed, so OK • Use Student’s t-Model with df = 25 – 1 = 24. • One-sample t-interval for the mean Copyright © 2014, 2012, 2009 Pearson Education, Inc. 21 How Much Sleep? Show → • Mechanics: n = 25, y = 6.64, s = 1.075 • Stat->Tests-> TInterval 90% CI 6.64 0.368 (6.272, 7.008) Copyright © 2014, 2012, 2009 Pearson Education, Inc. 22 How Much Sleep do College Students Get? Tell → • Conclusion: I’m 90 percent confident that the interval from 6.272 and 7.008 hours contains the true population mean number of hours that college students sleep. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 23 Make a Picture, Make a Picture, Make a Picture Always Test the Normality Assumption • Create a histogram to check for near normality. • Good for seeing the nature of the deviations: outliers, not symmetric, skewed • Also create a normal probability plot to see that it is reasonably straight. • Good for checking for normality • With technology at hand, there is no excuse not to make these two plots. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 24 18.3 Interpreting Confidence Intervals Copyright © 2014, 2012, 2009 Pearson Education, Inc. 25 What Not to Say Don’t Say • “90% of all students sleep between 6.272 and 7.008 hours each night.” • The CI is for the mean sleep, not individual students. • “We are 90% confident that a randomly selected student will sleep between 6.272 and 7.008 hours per night.” • We are 90% confident about the mean sleep, not an individual’s sleep. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 26 What Not to Say (Continued) Don’t Say • “The mean amount of sleep is 6.64 hours 90% of the time.” • The population mean never changes. Only sample means vary from sample to sample. • “90% of all samples will have a mean sleep between 6.272 and 7.008 hours per night.” • This interval does not set the standard for all other intervals. This interval is no more likely to be correct than any other. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 27 What You Should to Say Do Say • “90% of all possible samples will produce intervals that actually do contain the true mean sleep.” • This is the essence of the CI, but it may be too formal for general readers. • “I am 90% confident that the true mean amount that students sleep is between 6.272 and 7.008 hours per night.” • This is more personal and less technical. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 28 18.4 A Hypothesis Test for the Mean Copyright © 2014, 2012, 2009 Pearson Education, Inc. 29 One-Sample t-Test for the Mean • • • Assumptions are the same. H0: m = m0 y m0 t n 1 SE ( y ) s • Standard Error of y : SE ( y ) • When the conditions are met and H0 is true, the statistic follows the Student’s t Model. • Use this model to find the P-value. n Copyright © 2014, 2012, 2009 Pearson Education, Inc. 30 Are the Salmon Unsafe? EPA recommended mirex screening is 0.08 ppm. • Are farmed salmon contaminated beyond the permitted EPA level? • Recap: Sampled 150 salmon. Mean 0.0913 ppm, Standard Deviation 0.0495 ppm. • H0: m = 0.08 • HA: m > 0.08 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 31 Are the Salmon Unsafe? One-Sample t-Test for the Mean • n = 150, df = 149, y = 0.0913, s = 0.0495 • SE ( y ) 0.0495 150 0.0040, t149 0.0913 0.08 2.825 0.0040 • P(t149 2.825) 0.0027 • Since the P-value is so low, reject H0 and conclude that the population mean mirex level does exceed the EPA screening value. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 32 One sample T-test for the mean - TI-83+ Judy is a ad designer who designs the newspaper ads for the Giant grocery store. Electronic counters at the entrance total the number of people entering the store. Before Judy was hired, the mean number of people entering every day was 3018. Since she has started working at the Giant the management thinks that this average has increased. A random sample of 42 business days gave an average of 3333 people entering the store daily with a standard deviation of 287. Does this indicate that the average number of people entering the store every day has increased? Use an alpha of 0.01. Stat…. Tests…T-Test • Inpt: choose data or stats according to what is available (usually you use stats) • μ0 : stands for the hypothecated mean which is in your hypothesis In this case we are testing against the previous 3018. • X bar is the sample mean of 3333 • Sx is the standard deviation of 287 • n: the number in your sample 42 days • μ : choose the notation used in the alternative hypothesis In this case we are looking for an increase so choose > • Calculate We get a p value of .000000006 which is very, very slight. Therefore, we reject the null hypothesis and conclude that the average number of people entering the store has certainly increased. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 33 33 24. Suppose the elapsed time of airline itineraries between Washington, D.C. and Boston is normally distributed with an unknown population mean and unknown population standard deviation. Suppose we randomly sample 25 itineraries and the sample average is calculated to be 135 minutes and the sample standard deviation is calculated to be 40. Further suppose that we want to test the hypothesis that the true population mean (μ) equals 150 minutes. Conduct 2-sided hypothesis tests with the following alpha levels: a. 0.20 b. 0.10 c. 0.05 d. 0.02 e. 0.01 H0: µ=150 HA: µ≠150 Test statistic (t) = -1.875 P-value = 0.0730 Conclusions: • a. 0.20 • b. 0.10 • c. 0.05 • d. 0.02 • e. 0.01 Reject Reject Don’t Reject Don’t Reject Don’t Reject Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 34 34 Do Students Get at Least Seven Hours of Sleep? 25 Surveyed. Think → • Plan: Does the mean amount of sleep exceed 7 hours? • Hypotheses: H0: m = 7 HA: m > 7 • Model Randomization Condition: The students were randomly and independently selected . Nearly Normal Condition: unimodal and symmetric • Use the Student’s t-model, df = 24 • One-sample t-test for the mean Copyright © 2014, 2012, 2009 Pearson Education, Inc. 35 Do Students Get at Least Seven Hours of Sleep? 25 Surveyed. Show → • Mechanics: n = 25, y = 6.64, s = 1.075 • Use StatTestsT-Test P-value P(t25 1.67) 0.054 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 36 Do Students Get at Least Seven Hours of Sleep? 25 Surveyed. Tell → • Conclusion: P-value = 0.054 says that if students do sleep an average of 7 hrs., samples of 25 students can be expected to have an observed mean of 6.64 hrs. or less about 54 times in 1000. • With 0.05 cutoff, there is not quite enough evidence to conclude that the mean sleep is less than 7. • The 90% CI: (6.272, 7.008) contains 7. • Collecting a larger sample would reduce the ME and give us a better handle on the true mean hours of sleep. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 37 The Relationship between Intervals and Tests Confidence Intervals • Start with data and find plausible values for the parameter. • Always 2-sided Hypothesis Tests • Start with a proposed parameter value and then use the data to see if that value is not plausible. • 2-sided test: Within the confidence interval means fail to reject H0. P-value = 1 – C is the cutoff. • 1-sided test: P-value = (1 – C)/2 is the cutoff. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 38 Sleep, Confidence Intervals and Hypothesis Tests 90% Confidence interval: (6.272, 7.008) • For a 2-tailed test with a 10% cutoff, any 6.272 ≤ m0 ≤ 7.008 would result in failing to reject H0. • For a 1-tailed test (“<”) with a 5% cutoff, any m0 ≤ 7.008 would result in failing to reject H0. • For a 1-tailed test (“>”) with a 5% cutoff, any 6.272 ≤ m0 would result in failing to reject H0. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 39 The Special Case with Proportions Confidence Intervals • Use p̂ to calculate SE pˆ . Hypothesis Tests • Use p to calculate SD(p). If SE pˆ and SD(p) are far from each other, the relationship between the confidence interval and the hypothesis test breaks down. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 40 25. Hoping to lure more shoppers downtown, a city builds a new public parking garage. The city plans to pay for the structure through parking fees. During a two month period (44 week days) daily fees collected averaged $126 with a standard deviation of $15. If a consultant claimed that the average daily income would be $130, should we reject her claim using alpha=0.10 (perform a 2-sided test)? H0: µ=130 HA: µ≠130 Test statistic (t): -1.77 P-value: 0.08 Conclusion: Reject the null. It is likely that the consultant’s claim is false. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 41 41 26. In 1998, the Nabisco Company announced a “1000 Chips Challenge” claiming that every 18 ounce bag of Chips Ahoy contained at least 1000 chocolate chips. Below are the counts of chips in selected bags. 1219 1214 1087 1200 1419 1121 1325 1345 1244 1258 1356 1132 1191 1270 1295 1135 Perform a one-sided test (HA: µ>1000). What does this evidence say about Nabisco’s claim (let alpha=0.05)? H0: µ=1000 HA: µ>1000 Test statistic (t) = 10.1 P-Value = tiny # Conclusion: We can reject the null hypothesis. It appears that Nabisco’s claim is valid. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 42 42 27. When consumers apply for credit, their credit is rated using FICO scores. A random sample of credit ratings is obtained, and the FICO scores are summarized with these statistics: n=18, ybar=660.3, s=95.9. Use an alpha of 0.05 and do a 2sided hypothesis test to test the claim that the mean credit score (of the general population) is equal to 700 (Triola 2008). H0: µ=700 HA: µ≠700 Test statistic (t) = -1.76 P-Value = 0.097 Conclusion: We cannot reject the null hypothesis. There is not enough statistical evidence to conclude that the mean is different than 700. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 43 43 28. Different cereals are randomly selected, and the sugar content is obtained for each cereal, with the results given below for Cheerios, Harmony, Smart Start, Cocoa Puffs, Lucky Charms, Corn Flakes, Fruit Loops, Wheaties, Cap’n Crunch, Frosted Flakes, Apple Jacks, Bran Flakes, Special K, Rice Krispies, Corn Pops, and Trix. Use an alpha of 0.05 to test the claim of a cereal lobbyist that the mean of all cereals is LESS than 0.3 g (Triola 2008). 0.03 0.24 0.30 0.47 0.43 0.07 0.47 0.13 0.44 0.39 0.48 0.17 0.13 0.09 0.45 0.43 H0: µ=0.3 HA: µ<0.3 Test statistic (t): -0.12 P-value: 0.45 Conclusion: Fail to reject the null. The statistical analysis does not back up the lobbyist’s claim. Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 44 44 What Can Go Wrong? Don’t confuse proportions and means. • When counting successes with a proportion, use the Normal model. • With quantitative data, use Student’s t. Beware of multimodality. • If the histogram is not unimodal, consider separating into groups and analyzing each group separately. Beware of skewed data. • Look at the normal probability plot and histogram. • Consider re-expressing if the data is skewed. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 45 What Can Go Wrong? Set outliers aside. • Outliers violate the Nearly Normal Condition. • If removing a data value substantially changes the conclusions, then you have an outlier. • Analyze without the outliers, but be sure to include a separate discussion about the outliers. Watch out for bias. • With bias, even a large sample size will not save you. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 46 What Can Go Wrong? Make sure cases are independent. • Look for violations of the independence assumption. Make sure the data are from an appropriately randomized sample. • Without randomization, both the confidence interval and the p-value are suspect. Interpret your confidence interval correctly. • The CI is about the mean of the population, not the mean of the sample, individuals in samples, or individuals in the population. Copyright © 2014, 2012, 2009 Pearson Education, Inc. 47