Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Examples of two-mean tests - Independent samples 1. Equal Unknown Variances Data set 1: Mean Standard Deviation Sample Variance Sum Count 1.00811 0.2991 0.08946 20.1622 20 Data set 2 Mean Standard Deviation Sample Variance Sum Count 1.12896 0.30353 0.09213 22.5791 20 ³ ´ x¹ 1 ¡ x¹ 2 ¡ ¹1 ¡ ¹2 The usual test statistic is , with E given by E r q (n1 ¡ 1)s21 + (n2 ¡ 1)s22 ¢ n1 + n1 (note the that first numerator is simply the sum of the n1 + n2 ¡ 2 1 2 n X 2 (xk ¡ x¹ ) ). Under the usual hypotheses, this statistic has a tn 1+n2¡2 two expressions k=1 distribution. With our data, the test for equality of the two means results in E 0.09529 t-score 1.26822 p-value 0.10622 2. Unequal Variances ´ ³ x¹ 1 ¡ x¹ 2 ¡ ¹1 ¡ ¹2 q 2 If we knew the value of the two variances, we could use the statistic , which ¾1 ¾ 22 n1 + n2 would be normally distributed, as the difference of two independent normal variables, but that’s an unlikely situation. The usual solution, with ³unknown´variances, is to .use the sample variances x¹ 1 ¡ x¹ 2 ¡ ¹1 ¡ ¹2 q 2 . However, this expression has a in place of the unknown variances: s1 s22 + n1 n2 complicated distribution that actually depends on the true value of the variances! It turns out that a reasonable approximation is to use a Student distribution with an appropriate number of degrees of freedom. The “best” choice is impractical for table-based work, but is implemented in all software packages (see the formula in the file “Test statistics”). A “pessimistic” choice, which is considered a reasonable compromise is to use min(n 1 ¡ 1; n2 ¡ 1). Consider two data sets: Data set 1 Mean 1.00811 Standard Deviation 0.2991 Sample Variance 0.08946 Sum 20.1622 Count 20 Data set 3 Mean Standard Deviation Sample Variance Sum Count 1.09599 0.08639 0.00746 21.9199 20 Theses were simulated with variances 0.09 and 0.01 respectively. If we use this knowledge, we end up with the test result Variable 1 Variable 2 Mean 1.00811 1.09599 Known Variance 0.09 0.01 Observations 20 20 Hypothesized Mean 0 Difference Observed Mean −0.0879 Difference z −1.2428 P (Z<=z) one-tail 0.10697 z Critical one-tail 1.64485 P (Z<=z) two-tail 0.21393 z Critical two-tail 1.95996 If we do not assume knowledge of the variances, but use the “sophisticated” choice for degrees of freedom, we get the test result Variable 1 Variable 2 Mean 1.00811 1.09599 Variance 0.08946 0.00746 Observations 20 20 Hypothesized Mean 0 Difference Observed Mean −0.0879 Difference df 22.1483 t Stat −1.2624 P (T<=t) one-tail 0.10997 t Critical one-tail 1.71664 P (T<=t) two-tail 0.21994 t Critical two-tail 2.07307 Using the simplified version, the result is E t-score p-value 0.06961 1.2624 0.11104 In this simple example, the differences between the three approaches are minimal, but, of course, there is no guarantee that this will always be the case.