Download Chapter 24: Comparing Means

Chapter 24: Comparing Means Parameter of interest:  1   2 , the difference between the means Statistic of interest: y1  y 2 , the difference between the observed means Standard Deviation works the same as with proportions, SD( y1  y 2)  Var( y1)  Var( y 2)   12 n1   22 n2 Since we don’t know the population standard deviation, SE ( y1  y 2)  s12 s22  n1 n2 A sampling distribution for the difference between two means When the conditions are met, the standardized sample difference between the means of two independent groups, ( y  y )  ( 1   2 ) t 1 2 , SE ( y1  y2 ) can be modeled by a Student’s t-model with a number of degrees of freedom found with a special formula. We estimate the standard error with SE ( y1  y2 )  s12 s22  n1 n1 Assumptions and Conditions:  Independence Assumption  Randomization Condition  10% Condition: usually not checked for differences of means. Checked only if there is a very small population or a very large sample  Normal Population Assumption  Nearly Normal Condition: check for both groups  Independent Groups Assumption Two-sample t-interval When the conditions are met, we are ready to find the confidence interval for the difference between means of two independent groups, 1  2 . The confidence interval is ( y1  y2 )  tdf*  SE( y1  y2 ) Where the standard error of the difference of the means, SE ( y1  y2 )  s12 s22  . n1 n2 The critical value t df* depends on the particular confidence level, C, that you specify and on the number of degrees of freedom, which we get from the sample sizes and a special formula. **Step-by-Step: pg. 551-553 **TI Tips: pg. 553-554 **Just Checking: pg. 554 Two-sample t-test The conditions for the two-sample t-test for the difference between the means of two independent groups are the same as for the two-sample tinterval. We test the hypothesis H 0 : 1  2   0 , where the hypothesized difference is almost always 0, using the statistic ( y  y )  0 t 1 2 . SE ( y1  y2 ) The standard error of y1  y2 is s12 s22  n1 n2 When the conditions are met and the null hypothesis is true, this statistic can be closely modeled by a Student’s t-model with a number of degrees of freedom given by a special formula. We use that model to obtain a P-value. SE ( y1  y2 )  **Step-by-Step: pg. 556-558 **TI Tips: pg. 558 **Just Checking: pg. 558 HW: #1, 3, 5, 12, 14, 18, 24 *Pooled t-Test* - If we are willing to assume that their variances are equal, we could pool the data from two groups to estimate the common variance. We’d estimate this pooled variance from the data, so we’d still use a Student’s tmodel. - knowing that two means are equal doesn’t say anything about whether their variances are equal - this is the theoretically correct method only when we have a good reason to believe that the variances are equal (it is never wrong not to pool) Assumptions  Equal Variance Assumption: the variances of the two populations from which the samples have been drawn are equal  Nearly Equal Spreads Condition: look at the boxplots to check that the spreads are not wildly different Estimate the common variance: s 2pooled  (n1  1) s12  (n2  1) s22 (n1  1)  (n2  1) Substitute the pooled variance in place of each of the variances in the standard error formula: SE pooled ( y1  y2 )  Degrees of Freedom: s 2pooled n1  s 2pooled n2  s pooled 1 1  n1 n2 df  n1  n2  2 Substitute the pooled-t estimate of the standard error and its degrees of freedom into the steps of the confidence interval or hypothesis test and you’ll be using the pooled-t method. We will not use pooled!!! Examples: 1. Resting pulse rates for a random sample of 26 smokers had a mean of 80 beats per minute (bpm) and a standard deviation of 5 bpm. Among 32 randomly chosen nonsmokers, the mean and standard deviation were 74 and 6 bpm. Both sets of data were roughly symmetric and had no outliers. Is there evidence of a difference in mean pulse rate between smokers and non-smokers? How big? H 0 : S   NS  0 Hypotheses: H A : S   NS  0 Model: We have independent random samples, each less than 10% of the population, and are told that the data appear to be approximately Normal. OK to proceed with a 2-sample t-test. Mechanics: *sketch and label the model nS  26 yS  80 sS  5 nNS  32 y NS  74 sNS  6 t (80  74)  0 52 6 2  26 32 P  0.0001  4.15 df  56 Conclusion: Because the P-value is so small, the observed difference is unlikely to be just sampling error. We reject the null hypothesis. We have strong evidence of a difference in mean pulse rates for smokers and nonsmokers. Follow-up: How big is that difference?  52 6 2  ( yS  y NS )  t 56  SE ( yS  y NS )  (80  74)  t 56     (3.1,8.29)  26 32    * * We can be 95% confident that the average pulse rate for smokers is between 3.1 and 8.9 beats per minute higher than for non-smokers. 2. Here are the saturated fat content (in grams) for several pizzas sold by two national chains. Be sure that in checking the conditions, students plot both sets of data. Brand D Brand PJ 17 8 6 11 12 12 7 3 10 15 11 4 8 7 9 5 8 11 4 8 10 11 4 5 10 13 7 5 5 13 9 16 11 16 12 We want to know if the two pizza chains have significantly different mean saturated fat contents. Hypotheses: The null hypothesis is that there is no difference is mean saturated fat content. The alternative hypothesis is that there is a difference in mean saturated fat content. H 0 :  D   PJ  0 H A :  D  PJ  0 Model: Independent Groups Assumption – The two samples of saturated fat contents were chosen independently of one another. Randomization Condition – There is no mention of randomness, so we will assume that these pizzas are representative of all pizzas by these two chains. Nearly Normal Condition – Both distributions of saturated fat content are roughly unimodal and symmetric. Since the conditions have been met, we can do a two sample t-test for the difference of means, with 32.757 degrees of freedom (from the approximation formula). Mechanics: *sketch and label the distribution nD  20 y D  11.25 s D  3.193 n PJ  15 y PJ  6.53 s PJ  2.588 df  32.757 (from technology) y D  y PJ  4.72 ( y  yPJ )  (  D   PJ ) t D t  4.823 2 sD2 sPJ  nD nPJ t P  2  P( yD  yPJ  4.72) (11.25  6.53)  0  2  P(t32.8  4.823) 3.1932 2.5882  0.00003  20 15 Conclusion: Since the P-value is very low, we reject the null hypothesis. There is strong evidence to suggest that the two pizza brands have different mean saturated fat content. Brand D appears to have more saturated fat on average than Brand PJ. Follow-up: The conditions have been met, so we can create a two-sample t-interval for difference in means, with 95% confidence. yD  yPJ  t * 32.8  SE ( yD  yPJ )  4.72  t * 32.8  3.1932 2.5882    20 15     (2.73,6.71)   I am 95% confident that the average saturated fat content for Brand D is between 2.73 and 6.71 grams higher than the average saturated fat content for Brand PJ. 3. Number of pigs Mean weight gain Standard deviation Feed Diet A Diet B 36 36 55 lb 53 lb 3 lb 4 lb The data is from an experiment to see if there is any difference in the amount of weight gained by two groups of pigs fed different diets. Based on these data we estimate (with 95% confidence) that Diet A would produce a mean weight gain between 53.99 and 56.01 pounds, and Diet B between 51.65 and 54.35 pounds. These two intervals overlap, making it appear that the two diets might actually produce the same mean weight gain (maybe 54.2 pounds, for example). This analysis is incorrect. It rests on the fact that the 2-pound difference is smaller than the sum of the two separate margins of error. Each of those margins of error is based on the standard error of the individual means, here 3  0.5 and 4  0.67 . We must 36 36 instead sum the variances to examine the difference between the two 2 2  3   4  means:      0.83 , less than 0.5  0.67 . The 95% confidence  36   36  interval for the difference in mean weight gains is 0.34 to 3.66 pounds. It does not look likely that the two diets produce the same results, because 0 is not in the confidence interval. (note that there are 65 degrees of freedom rather than only 35, as there would be for single intervals).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 24: Comparing Means