Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Psychometrics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript

Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives) Now we consider differences of means between two or more groups 1 Two sample t test Compare means on a variable for two different groups. Income differences between males and females Average SAT score for blacks and whites Mean time to failure for parts manufactured using two different processes 2 New Test - Same Logic Find the probability that the observed difference could be due to chance factors in taking the random sample. If probability is very low, then conclude that difference did not happen by chance (reject null hypothesis) If probability not low, cannot reject null hypothesis (no diff. between groups) 3 Sampling Distributions Note in this case each mean is not in the critical region of other sampling dist. Mean 1 Mean 2 4 Sampling Distributions Note each mean is well into the critical region of other sampling distribution. Mean 1 Mean 2 5 Sampling Dist. of Difference Big Differences Hypothesize Zero Diff. Difference of Means 6 Procedure Calculate means for each group Calculate difference Calculate standard error of difference Test to see if difference is bigger than “t” standard errors (small samples) z standard errors (large samples) t and z are taken from tables at 95 or 99 percent confidence level. 7 Standard error of difference s y1 y2 (n1 1) s (n2 1) s n1 n2 2 2 1 Pooled estimate of standard deviation 2 2 1 1 n1 n2 Divide by sample sizes 8 t test y1 y2 t s y1 y2 Difference of Means Standard error of difference of means If t is greater than table value of t for 95% confidence level, reject null hypothesis 9 Three or more groups If there are three or more groups, we cannot take a single difference, so we need a new test for differences among several means. This test is called ANOVA for ANalysis Of VAriance It can also be used if there are only two groups 10 Analysis of Variance Note the name of the test says that we are looking at variance or variability. The logic is to compare variability between groups (differences among the means) and variability within the group (variability of scores around the mean) These are call the between variance and the within variance, respectively 11 The logic If the between variance is large relative to the within variance, we conclude that there are significant differences among the means. If the between variance is not so large, we accept the null hypothesis 12 Examples Large Between Small Between Both examples have same Within 13 Variance Calculate sum of squares and then divide by degrees of freedom (Y Y ) 2 n 1 Three ways to do this 14 Total, Within, and Between Total variance is the mean squared deviation of individual scores around the overall (total) mean Within variance is the mean squared deviation of individual scores around each of the group means Between variance is the mean squared deviation of group means around the overall (total) mean 15 Total, Within, and Between SST ( y y ) 2 Total = SST/dfT dfT n 1 SSW ( y yk ) 2 Within = SSW/dfW dfW n K SS B ( yk y ) 2 Between = SSB/dfB df B K 1 16 F test for ANOVA The F statistic has a distribution somewhat like the chi-square. It made of the ratio of two variances. For our purpose, we will compare the between and within estimates of variance Create a ratio of the two -- called an F ratio. Between variance divided by the within variance 17 F-ratio Table in the back of the book has critical values of the F statistic. Like the t distribution, we have to know degrees of freedom Different than the t distribution, there are two different degrees of freedom we need Between (numerator) and within (denominator) 18 Decision If F-ratio for our sample is larger than the critical value, we reject the null hypothesis of no differences among the means If F-ratio is not so large, we accept null hypothesis of no differences among the means 19 Example (three groups) Observations 123 456 789 Overall mean is 5 TSS (1 5) (2 5) (3 5) 2 2 (4 5) (5 5) (6 5) 2 (7 5) (8 5) (9 5) 2 2 2 2 2 2 60 20 Example (within) 123 2 456 5 Observations Group Means 789 8 WSS (1 2) (2 2) (3 2) 2 2 (4 5) (5 5) (6 5) 2 2 (7 8) (8 8) (9 8) 2 2 2 2 6 2 21 Example (between) Observations Group Means 123 456 789 2 5 8 Overall mean is 5 BSS (2 5) (2 5) (2 5) 2 2 (5 5) (5 5) (5 5) 2 (8 5) (8 5) (8 5) 2 2 2 2 2 2 54 22 F-ratio Between variance divided by within variance. Between= 54 / 2 = 27 (remember k-1 degrees of freedom, so df = 3-1 Within = 6 / 6 = 1 (remember n-k degrees of freedom, so df = 9-3 F-ratio is 27/1 with 2 and 6 df Critical value (95%) of F is 5.14 23