Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT 651 Lecture 9 Copyright (c) Bani K. Mallick 1 Topics in Lecture #9 Comparing two population means Output: detailed look The t-test Copyright (c) Bani K. Mallick 2 Book Sections Covered in Lecture #9 Chapter 6.2 Copyright (c) Bani K. Mallick 3 Relevant SPSS Tutorials Transformations of Data 2-sample t-test Paired t-test Copyright (c) Bani K. Mallick 4 Lecture 8 Review: Comparing Two Populations There a two populations Take a sample from each population The sample sizes need not be the same Population 1: n1 Population 2: n2 Copyright (c) Bani K. Mallick 5 Lecture 8 Review: Comparing Two Populations Each will have a sample standard deviation Population 1: Population 2: s1 s2 Copyright (c) Bani K. Mallick 6 Lecture 8 Review: Comparing Two Populations Each sample with have a sample mean Population 1: X1 Population 2: X2 That’s the statistics. What are the parameters? Copyright (c) Bani K. Mallick 7 Lecture 8 Review: Comparing Two Populations Each sample with have a population standard deviation Population 1: 1 Population 2: 2 Copyright (c) Bani K. Mallick 8 Lecture 8 Review: Comparing Two Populations Each sample with have a population mean Population 1: Population 2: 1 2 Copyright (c) Bani K. Mallick 9 Lecture 8 Review: Comparing Two Populations How do we compare the population means and ???? 2 The usual way is to take their difference: 1 1 2 If the population means are equal, what is their difference? Copyright (c) Bani K. Mallick 10 Lecture 8 Review: Comparing Two Populations The usual way is to take their difference: 1 2 If the population means are equal, their difference = 0 Suppose we form a confidence interval for the difference. From this we learn whether 0 is in the confidence interval, and hence can make decisions about the hypothesis Copyright (c) Bani K. Mallick 11 Log(Saturated Fat) NHANES Comparison Group Statistics Health Status Healthy Cancer N 60 59 Mean 2.9905 2.6969 Copyright (c) Bani K. Mallick Std. Error Std. Deviation Mean .6173 7.969E-02 .6423 8.362E-02 12 g(Saturated Fat) NHANES Comparison: what the output looks like Independent Samples Test Levene's Test for Equality of Variances F Equal variances ass umed Equal variances not as sumed .186 t-tes t for Equality of Means Sig. t .667 df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 2.543 117 .012 .2937 .1155 6.497E-02 .5223 2.542 116.627 .012 .2937 .1155 6.488E-02 .5224 Copyright (c) Bani K. Mallick 13 NHANES Comparison: the variable g(Saturated Fat) Equal variances assumed Equal variances not assumed Independent Samples Test Levene's Test for Equality of Variances F Sig. .186 t-test for Equality of Means t .667 2.543 2.542 df Mean Difference Std. Error Difference .012 .2937 .1155 6.497E-02 .5223 .012 .2937 .1155 6.488E-02 .5224 Sig. (2-tailed) 117 116.627 Copyright (c) Bani K. Mallick 95% Confidence Interval of the Difference Lower Upper 14 NHANES Comparison: The method. If you think the varianes are wildly different, try a transformation Independent Samples Test Levene's Test for Equality of Variances F g(Saturated Fat) Equal variances assumed Equal variances not assumed Sig. .186 t-test for Equality of Means t .667 2.543 2.542 df Mean Difference Std. Error Difference .012 .2937 .1155 6.497E-02 .5223 .012 .2937 .1155 6.488E-02 .5224 Sig. (2-tailed) 117 116.627 Copyright (c) Bani K. Mallick 95% Confidence Interval of the Difference Lower Upper 15 NHANES Comparison: the p-value. g(Saturated Fat) Equal variances assumed Equal variances not assumed Independent Samples Test Levene's Test for Equality of Variances F Sig. .186 t-test for Equality of Means t .667 2.543 2.542 df 95% Confidence Interval of the Difference Lower Upper Sig. (2-tailed) Mean Difference Std. Error Difference .012 .2937 .1155 6.497E-02 .5223 .012 .2937 .1155 6.488E-02 .5224 117 116.627 Copyright (c) Bani K. Mallick 16 NHANES Comparison: the difference in sample means g(Saturated Fat) Equal variances assumed Equal variances not assumed Independent Samples Test Levene's Test for Equality of Variances F Sig. .186 t-test for Equality of Means t .667 2.543 2.542 df Sig. (2-tailed) 117 116.627 Copyright (c) Bani K. Mallick Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper .012 .2937 .1155 6.497E-02 .5223 .012 .2937 .1155 6.488E-02 .5224 17 NHANES Comparison: the standard error of difference in sample means g(Saturated Fat) Equal variances assumed Equal variances not assumed Independent Samples Test Levene's Test for Equality of Variances F Sig. .186 t-test for Equality of Means t .667 2.543 2.542 df Sig. (2-tailed) Mean Difference .012 .2937 117 116.627 Copyright (c) Bani K. Mallick .012 .2937 Std. Error Difference .1155 .1155 95% Confidence Interval of the Difference Lower Upper 6.497E-02 .5223 6.488E-02 .5224 18 NHANES Comparison: the 95% confidence interval Independent Samples Test Levene's Test for Euality of Variances F Equal variances assumed Equal variances not assumed Sig. .186 t-test for Equality of Means t .667 2.543 2.542 df Sig. (2-tailed) Mean Difference Std. Error Difference .012 .2937 .1155 .2937 .1155 117 116.627 Copyright (c) Bani K. Mallick .012 95% Confidence Interval of the Difference Lower Upper 0.0065 6.488E-02 19 .5223 .5224 NHANES Comparison The “Mean Difference” is 0.2937. Since the healthy cases had a higher mean, this is Mean(Healthy) – Mean(Cancer) The 95% CI is from 0.0065 to 0.5223 What is this a CI for? The difference in population mean log(saturated fat) intake between cancer cases and healthy controls: (Healthy) – (Cancer) Copyright (c) Bani K. Mallick 20 NHANES Comparison Mean(Healthy) – Mean(Cancer) The 95% CI is from 0.0065 to 0.5223 The null hypothesis of interest is that the population means are equal, i.e., (Healthy) – (Cancer) = 0 Copyright (c) Bani K. Mallick 21 NHANES Comparison Mean(Healthy) – Mean(Cancer) The 95% CI is from 0.0065 to 0.5223 Is the p-value p < 0.05 or p > 0.05? Copyright (c) Bani K. Mallick 22 NHANES Comparison Mean(Healthy) – Mean(Cancer) The 95% CI is from 0.0065 to 0.5223 Confidence Interval 0= Hypothesized value 0.0065 Copyright (c) Bani K. Mallick 0.5223 23 NHANES Comparison Mean(Healthy) – Mean(Cancer) The 95% CI is from 0.0065 to 0.5223 Is the p-value p < 0.05 or p > 0.05? Answer: p < 0.05 since the 95% CI does not cover zero. Copyright (c) Bani K. Mallick 24 NHANES Comparison Mean(Healthy) – Mean(Cancer) The 95% CI is from 0.0065 to 0.5223 Is the p-value p < 0.01 or p > 0.01? Answer: You cannot tell from a 95% CI. However, from the SPSS output, p = 0.012. (see next slide) Copyright (c) Bani K. Mallick 25 NHANES Comparison: the 95% confidence interval Independent Samples Test Levene's Test for Euality of Variances F Equal variances assumed Equal variances not assumed Sig. .186 t-test for Equality of Means t .667 2.543 2.542 df Sig. (2-tailed) Mean Difference Std. Error Difference .012 .2937 .1155 .2937 .1155 117 116.627 Copyright (c) Bani K. Mallick .012 95% Confidence Interval of the Difference Lower Upper 0.0065 6.488E-02 26 .5223 .5224 NHANES Comparison Mean(Healthy) – Mean(Cancer) The 95% CI is from 0.0065 to 0.5223 What do we conclude from this confidence interval? Copyright (c) Bani K. Mallick 27 NHANES Comparison Mean(Healthy) – Mean(Cancer) The 95% CI is from 0.0065 to 0.5223 What do we conclude from this confidence interval? The population mean log(saturated fat) intake is greater in the Healthy cases by between 0.0065 and 0.5223 (exponentiate to get in terms of grams of saturated fat), with 95% confidence Copyright (c) Bani K. Mallick 28 Comparing Two Population Means: the Formulas X1 s1 n1 X2 s 2 n 2 The data: The populations: 1 1 2 2 The aim: CI for 1 2 Copyright (c) Bani K. Mallick 29 Comparing Two Populations Does it matter which one you call population 1 and which one you call population 2? Not at all. The key is to interpret the difference properly. Copyright (c) Bani K. Mallick 30 Comparing Two Populations 1 2 The aim: CI for This is the difference in population means The estimate of the difference in population means is the difference in sample means This is a random variable: it has sample to sample variability X1 X 2 Copyright (c) Bani K. Mallick 31 Comparing Two Populations X1 X 2 Difference of sample means “Population” mean from repeated sampling is The s.d. from repeated sampling is 1 2 2 1 n1 2 2 n2 Copyright (c) Bani K. Mallick 32 Comparing Two Populations X1 X 2 Difference of sample means The s.d. from repeated sampling is 2 1 n1 2 2 n2 You need reasonably large samples from BOTH populations Copyright (c) Bani K. Mallick 33 Comparing Two Populations If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by sp (n1 1)s (n 2 1)s n1 n 2 2 2 1 Copyright (c) Bani K. Mallick 2 2 34 Comparing Two Populations The standard error then of the value sp X1 X 2 is 1 1 n1 n 2 The number of degrees of freedom is n1 n 2 2 Copyright (c) Bani K. Mallick 35 Comparing Two Populations A (1a)100% CI for 1 2 X1 X 2 ta /2 (n1 +n 2 -2)s p is 1 1 n1 n 2 Note how the sample sizes determine the CI length Copyright (c) Bani K. Mallick 36 Comparing Two Populations Generally, you should make your sample sizes nearly equal, or at least not wildly unequal. Consider a total sample size of 100 X1 X 2 ta /2 (n1 +n 2 -2)sp 1 1 n1 n 2 1 1 n1 n 2 = 1 if n1 = 1, n2 = 99 = 0.20 if n1 = 50, n2 = 50 Thus, in the former case, your CI would be 5 times longer! Copyright (c) Bani K. Mallick 37 Comparing Two Populations The CI can of course be used to test hypotheses H0 : 1 2 vs Ha : 1 2 This is the same as H0 : 1 2 =0 vs Ha : 1 2 0 So we just need to check whether 0 is in the interval, just as we have done Copyright (c) Bani K. Mallick 38 Comparing Two Populations: The ttest H0 : 1 2 =0 vs Ha : 1 2 0 There is something called a t-test, which gives you the information as to whether 0 is in the CI. It does not tell you where the means lie however, so it is of limited use. P-values tell you the same thing. Copyright (c) Bani K. Mallick 39 Comparing Two Populations: The ttest The t-statistic is defined by X1 X 2 t= 1 1 sp n1 n 2 Copyright (c) Bani K. Mallick 40 Comparing Two Populations: The ttest You reject equality of means if |t| > ta /2 (n1 +n 2 -2) In this case, is p < a or is p > a? Copyright (c) Bani K. Mallick 41 Comparing Two Populations: The ttest You reject equality of means if |t| > ta /2 (n1 +n 2 -2) p<a Copyright (c) Bani K. Mallick 42 NHANES Comparison: the t-test ta /2 (n1 +n 2 -2) = t .025 (117) 1.98 Independent Samples Test Levene's Test for Equality of Variances F g(Saturated Fat) Equal variances assumed Equal variances not assumed df t Sig. .186 t-test for Equality of Means .667 Sig. (2-tailed) 2.543 2.542 Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper 117 .012 .2937 .1155 6.497E-02 .5223 116.627 .012 .2937 .1155 6.488E-02 .5224 t = 2.543 > ta /2 (n1 +n 2 -2) 1.98, hence reject the hypothesis that the population means are equal, for a = 0.05 Copyright (c) Bani K. Mallick 43 Comparing Two Populations SPSS Demonstrations: bluebonnets and Framingham Heart Disease and Blood Pressure, as time permits Copyright (c) Bani K. Mallick 44