Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Regression toward the mean wikipedia , lookup
Misuse of statistics wikipedia , lookup
"Would it be mean if I said that you were just average?" Traner Chapter 24: COMPARING TWO MEANS (Pages 547 -573) OVERVIEW: One can use the t-distribution on two samples that are not matched pairs. Comparing two means is not much different than comparing two proportions. Basically, one tests the null hypothesis that the two samples came from populations with equal means. The parameter of interest is the difference between the two means. Let = mean of sample of size n1 from population P1 = standard deviation calculated from the sample = mean of sample of size n2 from population P2 = standard deviation calculated from the sample = unknown mean of population P1 = unknown mean of population P2 SE ( x1 x2 ) Remember “variances add" but standard deviations do not. The expression under the radical is an approximation for the variance of the sampling distribution of mean differences. Hence, the square root is an approximation for the standard deviation of the mean differences. A confidence interval for the difference of population means is called a two-sample t-interval. The calculation for the degrees of freedom is crazy so we will let the TI do it for us. Of course we can’t perform a confidence interval or significance test until we meet some assumptions/conditions. Assumptions/Conditions: 1. Independence Assumption: the data in each group must be drawn independently. A) Randomization condition: Data must arise from a random sample. B) 10% condition: The sample is less than 10% of the population. 2. Normal Population Assumption: the underlying populations are each Normally distributed. A) Nearly Normal Condition: check to see if the data from both groups come from a distribution that unimodal and symmetric by making a histogram or Normal probability plot. 3. Independent Groups Assumption: To test the null hypothesis 1 2 = 0, we calculate the t statistic 1. Resting pulse rates for a random sample of 26 smokers had a mean of 80 beats per minute (bpm) and a standard deviation of 5 bpm. Among 32 randomly chosen nonsmokers, the mean and standard deviation were 74 and 6 bpm. Both sets of data were roughly symmetric and had no outliers. Is there evidence of a difference in mean pulse rate between smokers and nonsmokers? How big? Solution: Define the parameters. s ns Hypotheses. H 0 : between the smokers and the non-smokers mean resting pulse rate. Ha : between the smokers and the non-smokers mean resting pulse rate. Model. We have independent random samples, each less than 10% of the population, and are told that the data appear to be approximately Normal. OK to proceed with a 2-sample t-test. Mechanics. ns nns xs xns ss s ns t Now we run the test in the calculator to get the degrees of freedom of Therefore, P-value = Conclusion. Because the P-value is error. We the null hypothesis. We smokers and nonsmokers. and round up to . , the observed difference is unlikely to be just sampling strong evidence of a difference in mean pulse rates for Follow-up. How big is that difference? ( xs xns ) t 56 SE( xs xns ) * We can be 95% confident that the average pulse rate for smokers is between minute than for non-smokers. and beats per 2. Here are the saturated fat content (in grams) for several pizzas sold by two national chains. Be sure that in checking the conditions, students plot both sets of data. Solution: We want to know if the two pizza chains have significantly different mean saturated fat contents. Define the parameter. D PJ Hypotheses. H 0 : The null hypothesis is that there difference in mean saturated fat content. Ha : The alternative hypothesis is that there difference in mean saturated fat content. Model. Independent Groups Assumption – The two samples of saturated fat contents were chosen independently of one another. Randomization Condition: There is no mention of randomness, so we will assume that these pizzas are representative of all pizzas by these two chains. Brand D Nearly Normal Condition: Both distributions of saturated fat content are roughly unimodal and symmetric. Since the conditions have been met, we can do a two sample t-test for the difference of means, with degrees of freedom (from the approximation formula). Mechanics. nD n PJ x D x PJ t ( x D x PJ ) 0 s 2 D s 2 PJ nD n PJ xD x PJ dof = Brand PJ sD s PJ P value Conclusion. Since the P-value the null hypothesis. There to suggest that the two pizza brands have different mean saturated fat content. Brand appears to have more saturated fat on average than Brand . Follow-up.The conditions have been met, so we can create a two-sample t-interval for difference in means, with 95% confidence. x D x PJ t * 33 SE ( x D x PJ ) I am 95% confident that the average saturated fat content for Brand D is between than the average saturated fat content for Brand PJ. and grams