Download Examples of testing for equality of two means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Examples of two-mean tests - Independent samples
1. Equal Unknown Variances
Data set 1:
Mean
Standard Deviation
Sample Variance
Sum
Count
1.00811
0.2991
0.08946
20.1622
20
Data set 2
Mean
Standard Deviation
Sample Variance
Sum
Count
1.12896
0.30353
0.09213
22.5791
20
³
´
x¹ 1 ¡ x¹ 2 ¡ ¹1 ¡ ¹2
The usual test statistic is
, with E given by
E
r
q
(n1 ¡ 1)s21 + (n2 ¡ 1)s22
¢ n1 + n1 (note the that first numerator is simply the sum of the
n1 + n2 ¡ 2
1
2
n
X
2
(xk ¡ x¹ ) ). Under the usual hypotheses, this statistic has a tn 1+n2¡2
two expressions
k=1
distribution.
With our data, the test for equality of the two means results in
E
0.09529
t-score
1.26822
p-value
0.10622
2. Unequal Variances
´
³
x¹ 1 ¡ x¹ 2 ¡ ¹1 ¡ ¹2
q 2
If we knew the value of the two variances, we could use the statistic
, which
¾1
¾ 22
n1 + n2
would be normally distributed, as the difference of two independent normal variables, but that’s
an unlikely situation. The usual solution, with ³unknown´variances, is to .use the sample variances
x¹ 1 ¡ x¹ 2 ¡ ¹1 ¡ ¹2
q 2
. However, this expression has a
in place of the unknown variances:
s1
s22
+
n1
n2
complicated distribution that actually depends on the true value of the variances! It turns out that
a reasonable approximation is to use a Student distribution with an appropriate number of
degrees of freedom. The “best” choice is impractical for table-based work, but is implemented in
all software packages (see the formula in the file “Test statistics”). A “pessimistic” choice, which
is considered a reasonable compromise is to use min(n 1 ¡ 1; n2 ¡ 1).
Consider two data sets:
Data set 1
Mean
1.00811
Standard Deviation
0.2991
Sample Variance
0.08946
Sum
20.1622
Count
20
Data set 3
Mean
Standard Deviation
Sample Variance
Sum
Count
1.09599
0.08639
0.00746
21.9199
20
Theses were simulated with variances 0.09 and 0.01 respectively. If we use this knowledge, we
end up with the test result
Variable 1
Variable 2
Mean
1.00811
1.09599
Known Variance
0.09
0.01
Observations
20
20
Hypothesized Mean
0
Difference
Observed Mean
−0.0879
Difference
z
−1.2428
P (Z<=z) one-tail
0.10697
z Critical one-tail
1.64485
P (Z<=z) two-tail
0.21393
z Critical two-tail
1.95996
If we do not assume knowledge of the variances, but use the “sophisticated” choice for degrees
of freedom, we get the test result
Variable 1
Variable 2
Mean
1.00811
1.09599
Variance
0.08946
0.00746
Observations
20
20
Hypothesized Mean
0
Difference
Observed Mean
−0.0879
Difference
df
22.1483
t Stat
−1.2624
P (T<=t) one-tail
0.10997
t Critical one-tail
1.71664
P (T<=t) two-tail
0.21994
t Critical two-tail
2.07307
Using the simplified version, the result is
E
t-score
p-value
0.06961
1.2624
0.11104
In this simple example, the differences between the three approaches are minimal, but, of course,
there is no guarantee that this will always be the case.
Related documents