Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Testing a Hypothesis about means The contents in this chapter are from Chapter 12 to Chapter 14 of the textbook. Testing a single mean Testing two related means Testing two independent means 1 Testing a single mean This chapter uses the gssft.sav data, which includes data for fulltime workers only. The variables are: Hrsl: number of hours worked last week Agecat: age category Rincome: respondents income 2 Example The left plot is a histogram of the number of hours worked in the previous week for 437 college graduates The peak at 40 hours is higher than you would expect for a normal distribution. There is also a tail toward larger values of hours worked. It appears that people are more likely to work a long week than a short week. 3 Example basic statistics S ta ti s ti cs Number of hours worked last week N Valid Missing Mean Median Mode Std. Deviation Variance Skewness Std. Error of Skewness Minimum Maximum 437 2 47.00 45.00 40 10.207 104.193 1.240 .117 15 89 The sample mean (47) is not equals to the sample median (45). The distribution is right-skewed that is consistent with Sk=1.24 The distribution is not normal. How would you go about determining if 47 is an unlikely value if the population mean to be 40. 4 Testing a single mean The variance is unknown, H0 : 0 , H1 : 0 The statistic X μ0 t n s The rejection region t t n1 (α / 2) or t t n1 (α / 2) The critical value of t can be found in many textbooks or SPSS. 5 Testing a single mean The standard error of the mean is 10.2 / 437 0.49 The t -statistic 47 40 t 427 14.3 10.207 The 95% confidence interval of the difference is 6.04 x 7.96 O ne -S a mp le T e st Test Value = 40 t Number of hours worked last week 14.326 Sig. (2-tailed) df 436 .000 Mean Difference 6.995 95% Confidence Interval of the Difference Lower Upper 6.04 7.96 6 The t-distribution The statistic used in the previous page follows a tdistribution with n-1 degrees of freedom. This is a 2-tailed test. The p-value is the probability that a sample t value is greater than 14.3 or less than -14.3. The p-value in this example is less than 0.0005. We can conclude that it’s quite unlikely that college graduates work a 40-hour on average. 7 Normal approximation The degree of freedoms in this test is 437-1=436. The t distribution is very close to the normal. The critical values or confidence interval can be determined based on the normal population. 8 The 95% confidence interval is given by s s 10.207 10.207 x 1.96 , x 1.96 47 1.96 427 ,47 1.96 427 46.0430,47.9570 D es cr i pt iv es Number of hours worked last week Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Lower Bound Upper Bound Statistic 47.00 46.04 Std. Error .488 47.96 46.23 45.00 104.193 10.207 15 89 74 10 1.240 2.356 .117 .233 9 9 Hypothesis Testing Two kinds of errors Type I error: 以真为假 Type II error:以假为真 10 Hypothesis Testing Two kinds of errors The p-value is the probability of getting a test statistic equal to or more extreme than the sample result, given that the null hypothesis is true. If the p - value α, you do not reject H 0 If the p - value α, you reject H 0 If the p - value is low, then H 0 must go 11 Testing a Hypothesis about Two related means We use the endoph.sav data set provided by the author. Dale et al. (1987) investigated the possible role of endorphins in the collapse of runners. endorphins ( 内啡肽) are morphine (吗啡)like substances manufactured in the body. They measured plasma (血浆) endorphins concentrations for 11 runners before and after they participated in a half-marathon run. The question of interest was whether average endorphins levels changed during a run. 12 Testing a Hypothesis about Two related means C as e S um ma r ie sa before 4.30 4.60 5.20 5.20 6.60 7.20 8.40 9.00 10.40 14.00 17.80 11 after 29.60 25.10 15.50 29.60 24.10 37.80 20.20 21.90 14.20 34.60 46.20 11 1 2 3 4 5 6 7 8 9 10 11 Total N a. Limited to first 100 cases. diff 25.30 20.50 10.30 24.40 17.50 30.60 11.80 12.90 3.80 20.60 28.40 11 13 Testing a Hypothesis about Two related means This problem is recommended to use the pairedsamples t test. O ne -S a mp le St at i st ic s N diff 11 Mean 18.7364 Std. Deviation 8.32974 Std. Error Mean 2.51151 O ne -S a mp le Te st Test Value = 0 diff t 7.460 df 10 Sig. (2-tailed) .000 Mean Difference 18.73636 95% Confidence Interval of the Difference Lower Upper 13.1404 24.3324 14 Testing a Hypothesis about Two related means The average difference is 18.74 that is large comparing with S.D.=8.3. The 95% confidence interval for the average difference is (13.14, 24.33) that does not includes the value of o, you can reject the hypothesis. An equivalent way or testing the hypothesis is the t test. The p-value is less than 0.0005, we should reject the hypothesis. 15 Testing a Hypothesis about Two related means P ai re d S am p le s S ta ti st i cs Mean N Std. Deviation Std. Error Mean Pair 1 before after 8.4273 27.1636 11 11 4.24832 9.67794 1.28092 2.91801 P ai re d S am pl e s Co r re la ti o ns N Correlation Sig. Pair 1 before & after 11 .515 .105 P ai re d S am p le s Te s t Pair 1 before - after Paired Differences Mean Std. Deviation -18.73636 8.32974 Std. Error Mean 95% Confidence Interval of the Difference t df Sig. (2-tailed) 2.51151 Lower Upper -24.33236 -13.14037 -7.460 10 .000 16 Testing a Hypothesis about Two related means diff Stem-and-Leaf Plot Frequency Stem & Leaf 1.00 0. 3 4.00 1 . 0127 5.00 2 . 00458 1.00 3. 0 Stem width: 10.00 Each leaf: 1 case (s) Each difference uses only the first two digits with rounding. 17 Testing a Hypothesis about Two related means All the differences are positive. That is, the after values are always greater than the before values. The stem-and-leaf plot doesn’t suggest any obvious departures from normality. A normal probability plot, or Q-Q plot, can helps us to test the normality of the data. 18 Normal Probability Plot For each data point, the Q-Q plot shows the observed value and the value that is expected if the data are a sample from a normal distribution. The points should cluster around a straight line if the data are from a normal distribution. The normal Q-Q plot of the difference variable is nor or less linear, so the assumption of normality appears to be reasonable. 19 Normal Probability Plot 20 Testing Two Independent Means This section uses the gss.sav data set. Consider the number of hours of television viewing per day reported by internet users and nonusers. It is clear that both are not from a normal distribution. 21 Testing Two Independent Means We find that there are some problems in the data. There are people who report watching television for 24 hours a day!! It is impossible. Watch TV is not a very well-defined term. If you have the TV on while you are doing homework, are you studying or watching TV? The observations in these two groups are independent. This fact implies “two independent means”. 22 Testing Two Independent Means D es cr i pt iv es Hours per day watching TV Statistic Std. Error Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Mean Skewness Kurtosis Lower Bound Upper Bound Use Internet? No Yes 3.52 2.42 3.26 2.22 3.77 2.63 3.22 3.00 7.801 2.793 0 24 24 2 2.164 7.946 .128 .112 .224 2.18 2.00 4.604 2.146 0 20 20 2 3.066 16.086 .106 .120 .240 23 Testing Two Independent Means Two sample means, 2.42 hours of TV viewing and 3.52 hours for those who don’t use the internet. A difference is about 1.1 hours. The 5% trimmed means, which are calculated by removing the top and bottom 5% of the values, are 0.3 hours less for both groups than the arithmetic means. The trimmed means are more meaningful in this case study. 24 Testing Two Independent Means For testing the hypothesis H 0: μ1 μ2 , H1: μ1 μ2 There are several cases: 1 2 are known 1 and 2 are known, but 1 2 1 2 are unknown 1 2 , are unknown 25 Testing Two Independent Means Z where ( X 1 X 2 ) ( μ1 μ 2 ) σ 12 σ 22 n1 n2 ~ N (0,1) X 1 mean of the sample taken from population 1 μ1 mean of population 1 σ 12 variance of population 1 n1 size of the sample taken from population 1 X 2 mean of the sample taken from population 2 μ 2 mean of population 2 σ 22 variance of population 2 n2 size of the sample taken from population 2 26 Testing Two Independent Means In most cases the variances are unknown. t ( X 1 X 2 ) ( μ1 μ 2 ) 1 1 s n1 n2 ~ t n1 n2 2 2 p where (n1 1) s12 (n2 1) s 22 s pooled variance (n1 1) (n2 1) 2 p X 1 mean of the sample taken from population 1 s12 variance of the sample taken from population 1 n1 size of the sample taken from population 1 X 2 mean of the sample taken from population 2 s 22 variance of the sample taken from population 2 n2 size of the sample taken from population 2 27 Testing Two Independent Means Output from t test for TV watching hours Independent Samples Test Hours per day watching TV Equal variances assumed 20.261 .000 6.455 884 Levene's Test for F Equality of Variances Sig. t-test for Equality t of Means df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Equal variances not assumed 6.569 870.228 .000 .000 1.092 1.092 .169 .166 .760 1.424 .766 1.418 28 Testing Two Independent Means In the output, there are two difference versions of the t test. One makes the assumption that the variances in the two populations are equal; the other does not. Both tests recommend to reject the hypothesis with a significant level less than 0.0005. The two-tailed test used in the two tests. Testing the equality of two variances will be given next section. 29 Testing Two Independent Means The 95% confidence interval for the true difference is [0.77, 1.42] for equal variances not assumed, [0.76, 1.42] for the equal variances assumed. Both the intervals do not cover the value 0, we should reject the hypothesis. 30 F test for equality of Two Variances 2 1 2 2 s F ~ Fn1 1,n2 1 s where s12 variance of sample 1 s 22 variance of sample 2 n1 size of the sample taken from population 1 n2 size of the sample taken from population 2 n1 1 degree of freedom from sample 1 n2 1 degree of freedom from sample 2 31 F test for equality of Two Variances H 0 : 12 22 H1 : 12 22 Reject H 0 if F Fn1 1,n2 1 (1 / 2), or F Fn1 1,n2 1 ( / 2) 32 F test for equality of Two Variances 33 F test for equality of Two Variances From the results below we have 2.4912 F 1.7821 2 1.866 The critical value is close to 1.00 that implies to reject the hypothesis that two populations have the same variance. Group Statistics Hours per day watching TV Use Internet? No Yes N 469 411 Mean 3.40 2.35 Std. Deviation 2.491 1.866 Std. Error Mean .115 .092 34 Levene’s test for equality of variances The SPSS report used the Levene’s test (1960) that is used to test if k samples have equal variances. Equal variances across samples is called homogeneity of variance. The Lenene’s test is less sensitive than some other tests. The SPSS output recommends to reject the hypothesis. 35 Effect Outliers Some one reported watching TV for very long time, including 24 hours a day. Removed observations where the person watch TV for more than 12 hours. I nd ep e nd en t S am p le s Te s t Hours per day watching TV Levene's Test for Equality of Variances t-test for Equality of Means Equal variances assumed 25.449 .000 7.013 878 F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Equal variances not assumed 7.145 857.737 .000 .000 1.053 1.053 .150 .147 .758 1.347 .763 1.342 36 Effect Outliers The average difference between the two groups reduced from 1.09 to 1.05. The conclusions do not have any change. I nd ep e nd en t S am p le s Te s t Hours per day watching TV Levene's Test for Equality of Variances t-test for Equality of Means Equal variances assumed 25.449 .000 7.013 878 F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper Equal variances not assumed 7.145 857.737 .000 .000 1.053 1.053 .150 .147 .758 1.347 .763 1.342 37 Introducing More Variables Let us consider more related variables to study on the TV watching time Consider age, education, working hours. G ro up St at i st ic s Age of respondent Highest year of school completed Number of hours worked last week Number of hours spouse worked last week Use Internet? No Yes No Yes No Yes No Yes 734 653 733 Mean 51.75 40.79 12.05 Std. Deviation 18.857 13.212 2.702 Std. Error Mean .696 .517 .100 652 14.55 2.523 .099 356 532 171 238 40.80 43.74 40.98 43.38 13.960 13.481 11.990 12.498 .740 .584 .917 .810 N 38 Introducing More Variables We reject the hypothesis that in the population the two groups have the same average age, education, and hours. Internet users are significantly younger, better educated, and work more hours per week. I nd ep e nd en t S am p le s Te s t Levene's Test for Equality of Variances Age of respondent Highest year of school completed Number of hours worked last week Number of hours spouse worked last week Equal variances Equal variances assumed Equal variances Equal variances assumed Equal variances Equal variances assumed Equal variances Equal variances assumed t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper 9.222 12.692 F 131.217 Sig. .000 t 12.388 df 1385 Sig. (2-tailed) .000 Mean Difference 10.957 Std. Error Difference .885 12.637 1314.977 .000 10.957 .867 9.256 12.658 assumed not 7.327 .007 -17.752 1383 .000 -2.503 .141 -2.779 -2.226 -17.823 1379.733 .000 -2.503 .140 -2.778 -2.227 assumed not .441 .507 -3.136 886 .002 -2.936 .936 -4.774 -1.099 -3.114 742.904 .002 -2.936 .943 -4.787 -1.085 assumed not 1.050 .306 -1.948 407 .052 -2.400 1.232 -4.822 .022 -1.961 375.077 .051 -2.400 1.224 -4.806 .006 assumed not 39 Introducing More Variables 40