Download Chapter Seven: Multi

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Omnibus test wikipedia , lookup

Transcript
Chapter Seven: Multi-Sample Methods
1/52
Introduction
The independent samples t test and the independent samples Z test
for a difference between proportions are designed to analyze data
from research designs that employ two groups of subjects.
You will now study methods that can be used to analyze data from
two or more groups.
We will also consider a few multiple comparison procedures (MCPs).
7.1 Introduction
2/52
One-Way ANOVA: Hypotheses
You will recall that the null hypothesis tested by the independent
samples t test is H0 : µ1 = µ2 . This can be interpreted as asserting
that the treatments afforded two groups of subjects were equal in
their effect.
The null hypothesis tested by the One-Way ANOVA F test is
H0 : µ1 = µ2 = · · · = µk
Which can be interpreted as asserting that treatments afforded a
specified number of groups (k) of subjects were equal in their effect.
7.2 One-Way ANOVA F Test
3/52
Hypotheses (continued)
For a three group design the null hypothesis is
H0 : µ1 = µ2 = µ3
The alternative is any condition that makes the null hypothesis false
which for the three group design would be
1
2
3
4
µ1
µ1
µ1
µ1
= µ2
6= µ2
= µ3
6= µ2
7.2 One-Way ANOVA F Test
6= µ3
= µ3
6= µ2
6= µ3
4/52
Obtained F
The test statistic for the One-Way ANOVA F test is a ratio given by
F =
MSb
MSw
The numerator is termed the mean square between (MSb ) while
the denominator is termed the mean square within (MSw ).
7.2 One-Way ANOVA F Test
5/52
The Mean Square Within (MSw )
The mean square within (MSw ) is also a ratio and is defined as
MSw =
SSw
N −k
Here SSw is the sum of squares within, N is the total number of
observations, and k is the number of groups. The quantity N − k is
termed the denominator degrees of freedom. For example, if there
are three groups with five subjects in each, N = 15, k = 3 and the
denominator degrees of freedom is 15 − 3 = 12.
7.2 One-Way ANOVA F Test
6/52
The Sum of Squares Within (SSw )
The sum of squares within is the sum of the sums of squares for the
individual groups or
SSw = SS1 + SS2 + · · · + SSk
The sum of squares for a given group can be calculated by
X
SS =
(x − x̄)2
or equivalently
SS =
7.2 One-Way ANOVA F Test
X
P
( x)2
x −
n
2
7/52
The Sum of Squares Within (SSw ) (continued)
Thus, SSw can be calculated by
i hP
hP
P
2
SSw =
x12 − ( nx11 ) +
x22 −
7.2 One-Way ANOVA F Test
(
x2 )2
n2
P
i
+ ··· +
hP
xk2 −
(
xk )2
nk
P
i
8/52
Example
The (fictitious) data in the accompanying table represents the weights
of subjects who have been engaged in three different dieting
regimens. Use these data to calculate MSw .
7.2 One-Way ANOVA F Test
Diet
One
Diet
Two
Diet
Three
198
211
240
189
178
214
200
259
194
188
174
176
213
201
158
9/52
Solution
The sums of squares for the three individual groups are as follows.
P
( x1 )2
(1016)2
SS1 =
−
= 208730 −
= 2278.8
n1
5
P
X
( x2 )2
(1055)2
2
SS2 =
x2 −
= 225857 −
= 3252.0
n2
5
P
X
( x3 )2
(922)2
SS3 =
x32 −
= 171986 −
= 1969.2
n3
5
X
7.2 One-Way ANOVA F Test
x12
10/52
Solution (continued)
SSw is then by Equation 7.4
SSw = SS1 + SS2 + SS3 = 2278.8 + 3252.0 + 1969.2 = 7500.0
Then by Equation 7.3
MSw =
7.2 One-Way ANOVA F Test
7500.0
SSw
=
= 625
N −k
15 − 3
11/52
The Mean Square Between (MSb )
As with the mean square within, the mean square between is a ratio
of a sum of squares to a degrees of freedom. More precisely,
MSb =
SSb
k −1
where SSb is the sum of squares between and k is the number of
groups. The quantity k − 1 is termed the numerator degrees of
freedom. For example, if there are three groups the numerator
degrees of freedom is 3 − 1 = 2.
7.2 One-Way ANOVA F Test
12/52
The Sum of Squares Between SSb
Pn1
SSb =
i=1 xi1
n1
2
Pn2
+
i=1 xi2
n2
2
Pnk
+ ··· +
i=1 xik
nk
2
P
( All x.. )2
−
N
The terms before the minus sign indicate that the observations in
each group are to be summed with the sum then being squared and
the result then being divided by the number of observations in the
group. This calculation is carried out for each group with the results
then being summed. The term after the minus sign indicates that all
observations are to be summed and the result squared. The division
of this term is by N which represents the total number of
observations—i.e., n1 + n2 + · · · + nk .
7.2 One-Way ANOVA F Test
13/52
Example
Use the data in the table on slide #9 to calculate MSb . Then
calculate the One-Way ANOVA F statistic.
7.2 One-Way ANOVA F Test
14/52
Solution
By Equation 7.8
Pn1
SSb =
i=1 xi1
2
Pn2
i=1 xi2
2
Pn3
i=1 xi3
+
+
n1
n2
n3
2
2
2
(1016)
(1055)
(922)
(2993)2
=
+
+
−
5
5
5
15
= 599073 − 597203.267
2
P
( All x.. )2
−
N
= 1869.73
7.2 One-Way ANOVA F Test
15/52
Solution (continued)
Dividing SSb by the numerator degrees of freedom of
k − 1 = 3 − 1 = 2 yields MSb = 1869.73
= 934.87.1
2
Using the mean square within calculation from slide #11
F =
MSb
934.87
=
= 1.50.
MSw
625.00
Thus, obtained F for the test of significance is 1.40.
1
The result of 934.88 provided on page 269 of the text is based on a different
calculation where rounding was a bit differently.
7.2 One-Way ANOVA F Test
16/52
The Test of Significance
The test of significance is conducted by comparing obtained F to
critical F . If the former is greater than or equal to the latter, the null
hypothesis is rejected. Otherwise, the null hypothesis is not rejected.
Critical F is obtained by first noting that the numerator degrees of
freedom for the analysis are k − 1 = 3 − 1 = 2 and the denominator
degrees of freedom are N − k = 15 − 3 = 12. To use Appendix C, the
numerator degrees of freedom are located across the top of the table
and the denominator degrees of freedom down the side. For α = .05
with 2 and 12 degrees of freedom, Appendix C shows that critical F is
3.89. In this case 1.50 is not greater than or equal to 3.89 so the null
hypothesis is not rejected.
7.2 One-Way ANOVA F Test
17/52
Example
Suppose a study is conducted with treatments being administered to
four independent groups of subjects. Suppose further that n1 = 5,
n2 = 7, n3 = 8 and n4 = 4. Obtained F is calculated to be 4.19.
Use this information to conduct a One-Way ANOVA F test at
α = .05. What is the null hypothesis being tested? What is your
decision regarding the null hypothesis?
7.2 One-Way ANOVA F Test
18/52
Solution
The numerator degrees of freedom are k − 1 = 4 − 1 = 3.
N = 5 + 7 + 8 + 4 = 24 so that the denominator degrees of freedom
are N − k = 24 − 4 = 20.
Reference to Appendix C give critical F as 2.38.
Because obtained F of 4.19 is greater than critical F of 2.38, the null
hypothesis H0 : µ1 = µ2 = µ3 = µ4 is rejected.
7.2 One-Way ANOVA F Test
19/52
The ANOVA Table
The results of a One-Way ANOVA analysis are traditionally reported in a
table similar to the one shown here.
Source of
Variation
Sum of
Squares
Mean
Squares
F
Critical
df
Ratio
F
p -value
Between
Within
SSb
k−1
SSb /k−1
MSb /MSw
(table)
(computer)
SSw
N−k
SSw /N−k
Total
SSt
N−1
7.2 One-Way ANOVA F Test
20/52
Assumptions
The assumptions underlying the ANOVA F test are the same as those
underlying the independent samples t test, namely
1
2
3
Population normality
Homogeneous variances
Independence of observations
7.2 One-Way ANOVA F Test
21/52
The 2 By k Chi-Square Test: Hypotheses
In Chapter 6 on page 230 you learned to test the null hypothesis
H0 : π1 = π2 by means of an independent samples Z test.
The 2 by k chi-square test extends this concept to test for equality of
any number of proportions.
This null hypothesis is stated as
H 0 : π 1 = π 2 = · · · = πk
which asserts that all population proportions are equal.
The notation indicates that the equality extends to any number of
groups with the last group characterized as group k.
7.3 The 2 By k Chi-Square Test
22/52
The Alternative Hypothesis
The alternative hypothesis is any condition that renders the null
hypothesis false. Thus, given three groups, any of the following
conditions, baring a Type II error, would cause rejection of the null
hypothesis.
1
2
3
4
π1
π1
π1
π1
= π2
6= π2
= π3
6= π2
6= π3
= π3
6= π2
6= π3
When the null hypothesis is rejected, there is no way to know which
of the four conditions listed above caused the rejection.
7.3 The 2 By k Chi-Square Test
23/52
Obtained χ2
As with other statistics with which you are now familiar, the
hypothesis test is carried out by calculating an obtained value with a
subsequent comparison to a critical value.
For the chi-square test the obtained value is calculated by
"
2
χ =
X
all cells
(fo − fe )2
fe
#
where fo and fe are referred to respectively as the observed and
expected frequencies. The observed frequency is simply the number
of outcomes occurring in the given cell as shown in the table on the
next slide (slide #25).
7.3 The 2 By k Chi-Square Test
24/52
2 by 3 Chi-Square Table
In this table we have used double subscripts to indicate the row and
column of each cell entry.
Group Group Group
One
Two
Three
fo11
fo12
fo13
Outcome 1
fe11
fe12
fe13
fo21
fo22
fo23
Outcome 2
fe21
fe22
fe23
7.3 The 2 By k Chi-Square Test
25/52
Observed (f0 ) and expected (fe ) frequencies
The observed frequency is simply the number of outcomes falling
into a given cell of the chi-square table.
The expected frequency represents the expected number of
outcomes to be found in each cell if the null hypothesis is true.
The expected frequency is calculated as follows.
fe =
(NR ) (NC )
N
where NR is the row total for the cell whose expected frequency is
being calculated and NC is the column total for the same cell.
7.3 The 2 By k Chi-Square Test
26/52
Example
Suppose that in the treatment of a terminal illness, the following
results are obtained. Of the patients receiving treatment one, 17 are
dead at the end of five years while 52 are still alive. For treatment
two, 29 are dead while 54 remain alive and for treatment three 11 are
dead and 26 remain alive. Use these data to construct a chi-square
table then test the hypothesis H0 : π1 = π2 = π3 .
7.3 The 2 By k Chi-Square Test
27/52
Solution
We begin by placing the observed frequency of each cell into a
chi-square table as shown on the next slide (#29). We then calculate
the expected frequency for each cell as follows.
(ND ) NG1
(57) (69)
fe11 =
=
= 20.81
N
189
(57) (83)
(ND ) (NG 2 )
fe12 =
=
= 25.0
N
189
(57) (37)
(ND ) (NG 3 )
=
= 11.16
fe13 =
N
189
(NA ) (NG 1 )
(132) (69)
fe21 =
=
= 48.19
N
189
(NA ) (NG 2 )
(132) (83)
fe22 =
=
= 57.97
N
189
(132) (37)
(NA ) (NG 3 )
fe23 =
=
= 25.84
N
189
7.3 The 2 By k Chi-Square Test
28/52
Solution (continued)
Dead
Alive
Group
One
Group
Two
Group
Three
[17]
[29]
[11]
(20.81)
(25.03)
(11.16)
[52]
[54]
[26]
(48.19)
(57.97)
(25.84)
7.3 The 2 By k Chi-Square Test
57
132
29/52
Solution (continued)
Obtained chi-square is then
"
2
χ =
X
all cells
=
+
(fo − fe )2
fe
(17-20.81)2
20.81
(54-57.97)2
57.97
+
+
#
(29-25.03)2
25.03
+
(11-11.16)2
11.16
+
(52-48.19)2
48.19
(26-25.84)2
25.84
= .70 + .63 + .00 + .30 + .27 + .00
= 1.9
7.3 The 2 By k Chi-Square Test
30/52
Solution (continued)
The critical value is obtained by entering Appendix D with k − 1
degrees of freedom where k is the number of groups. For α = .05 and
3 − 1 = 2 degrees of freedom, critical χ2 is 5.991.
The null hypothesis is rejected when obtained chi-square is greater
than or equal to critical chi-square. Because 1.9 is less than 5.991,
the null hypothesis is not rejected.
We conclude, therefore, that a difference between population
proportions cannot be demonstrated. In research terms, we conclude
that we could not show a difference in the effectiveness of the three
treatments.
7.3 The 2 By k Chi-Square Test
31/52
Multiple Comparison Procedures: Introduction
You have learned that rejecting a true null hypothesis when
conducting a significance test results in a Type I error and that the
probability of such is α. For the purposes that follow we will term this
type of rejection as a Per Comparison Error (PCE) and will
symbolize the probability of such as αPCE .
A Familywise Error (FWE) occurs when one or more true null
hypotheses are rejected in a series of tests. The probability of such is
symbolized αFWE .
7.4 Multiple Comparison Procedures
32/52
Introduction (continued)
Familywise errors occur in two broad contexts.
1
2
Multiple comparison analysis refers to the situation where multiple
groups are being compared on a single outcome variable.
Multiple endpoint analysis refers to the situation where two groups
are being compared on multiple outcome measures.
7.4 Multiple Comparison Procedures
33/52
Determinants of Familywise Error
The following observations are demonstrated in the table on the following
slide (#35).
Other factors remaining fixed, as the number of comparisons (i.e.
significance tests) increases, αFWE increases.
Other factors remaining fixed, as αPCE decreases (increases), αFWE
decreases (increases).
7.4 Multiple Comparison Procedures
34/52
Relationship Between αPCE and αFWE
αPCE
.05
.01
Number of
Groups
3
5
10
20
Number of
Comparisons
3
10
45
190
αFWE
.122
.286
.630
.920
3
5
10
20
3
10
45
190
.027
.075
.231
.528
7.4 Multiple Comparison Procedures
35/52
Controlling Familywise Errors
When you reject a single null hypothesis the interpretation is clear.
You have an αPCE probability that you did so incorrectly.
When you perform a series of tests and reject one or more null
hypotheses, the interpretation is not so clear. Did you reject these
hypotheses because they are false or because the familywise Type I
error rate is so high that rejections were highly likely even in the face
of true null hypotheses?
You were confident in your result for the single test because you were
able to control the probability of a false rejection at αPCE . You could
gain this same confidence in your results for multiple tests if you
could control αFWE to some specified level—.05 for example.
7.4 Multiple Comparison Procedures
36/52
The Bonferroni Method Of Controlling Familywise Error
As shown in the table on slide #35, αFWE can be reduced by
reducing αPCE . But suppose you wish to establish αFWE at some
specified value—for example .05.
How low must you set αPCE in order to have αFWE be .05?
One of the oldest, simplest, and most widely used methods for finding
this level is known as the Bonferroni adjustment.
7.4 Multiple Comparison Procedures
37/52
The Bonferroni Method (continued)
The adjustment is made by the following equation.
αPCE =
αFWE
NT
where NT represents the number of tests to be performed.
Thus, for example, if we wish to control αFWE at .05 while we
perform three tests, each test would be carried out at the .05
3 = .017
level of significance.
7.4 Multiple Comparison Procedures
38/52
The Step-Down Bonferroni Method
In 1979, Holm proposed a modification to the Bonferroni procedure
that is usually more powerful than, is never less powerful than, and
maintains familywise error at the same level as, the classical
procedure.
This modified Bonferroni, or more properly, step-down Bonferroni
procedure is illustrated on slide #41 and is carried out as follows.
7.4 Multiple Comparison Procedures
39/52
The Step-Down Bonferroni Method (continued)
1
2
3
4
5
6
The multiple test statistics are calculated.
The p-value for each statistic calculated in 1 is obtained.
The p-values are ordered from smallest to largest with the smallest
being designated p(1) , the second smallest p(2) and so forth with the
largest being p(NT ) where NT is the number of tests.
FWE
FWE
At the first step, p(1) is compared to αNT
. If p(1) ≤ αNT
, the test is
FWE
declared significant and the second step is carried out. If p(1) > αNT
,
the test is declared nonsignificant and testing ceases with all
remaining comparisons being declared not significant.
If the first step is significant, step two is carried out by comparing p(2)
αFWE
αFWE
with NT
−1 . If p(2) ≤ NT −1 , the result is declared significant and
testing continues to the next step. Otherwise, the test is declared
nonsignificant and testing ceases with all remaining tests being
declared nonsignificant.
The steps are continued as shown in the figure on slide #41 until a
nonsignificant result is obtained or until the last step is completed.
7.4 Multiple Comparison Procedures
40/52
The Step-Down Bonferroni Method (continued)
Figure: An illustration of the step-down Bonferroni multiple comparison
procedure.
Step
One
Step
Two
Step
Three
...
Step
NT
P (NT)
P-value
P (1)
P (2)
P (3)
...
Step-down
FWE
NT
FWE
NT-1
FWE
NT-2
...
FWE
1
Classical
FWE
NT
FWE
NT
FWE
NT
...
FWE
NT
7.4 Multiple Comparison Procedures
41/52
Example
A researcher involved in a study employing multiple groups of subjects
wishes to test a series of null hypotheses by means of independent
samples t tests. The null hypotheses with accompanying p-values
associated with each test are given below. Use these results to
perform a step-down Bonferroni procedure with αFWE not to exceed
.05. How do these results compare to results that would be obtained
from classical Bonferroni tests?
H0 :
p-value
µ1 = µ3
.0111
µ2 = µ4
.0419
.0090
µ2 = µ5
µ3 = µ4
.0200
µ4 = µ5
.0181
7.4 Multiple Comparison Procedures
42/52
Solution
The five p values, along with the hypothesis test from which each was
derived, are listed in ascending order below. Also shown are the
step-down values of αPCE (S-D) and the classical Bonferroni values of
αPCE (CB) for each test of significance.
As may be seen the tests of µ2 − µ5 and µ1 − µ3 are significant while
the remaining tests are not.
It is important to understand that the tests of µ3 − µ4 and µ2 − µ4
are automatically declared nonsignificant at this point due to the
stopping rule.
Notice that had the researcher employed the classical Bonferroni
method, which unfortunately is still common practice, only µ2 − µ5
would have been significant.
7.4 Multiple Comparison Procedures
43/52
Solution (continued)
Test
p-value
S-D αPCE
CB αPCE
µ2 − µ5
.0090
.0100
.0100
µ1 − µ3
.0111
.0125
.0100
µ4 − µ5
.0181
.0167
.0100
µ3 − µ4
.0200
.0250
.0100
µ2 − µ4
.0419
.0500
.0100
S
S
NS
NS
NS
7.4 Multiple Comparison Procedures
44/52
Tukey’s HSD Method
Tukey’s HSD (Honestly Significant Difference) test is designed for use
in multiple comparison settings where all pairwise comparisons of
group means are to be carried out.
These tests are conducted by computing the test statistic, commonly
comparisons with the
symbolized as q, for each of the k(k−1)
2
resultant q statistics then being referenced to an appropriate table of
critical values.
7.4 Multiple Comparison Procedures
45/52
Tukey’s HSD Method (continued)
The test statistic is defined as follows.
x̄i − x̄j
qij = q
MSw
nh
The subscripts i and j denote the two groups being compared so that
x̄i and x̄j are the means of groups i and j respectively. MSw is the
mean square within as computed for a one-way ANOVA via Equations
7.3 and 7.4.
7.4 Multiple Comparison Procedures
46/52
Tukey’s HSD Method (continued)
The symbol nh represents the harmonic mean of the two sample sizes
and is computed as
2
nh = 1
1
ni + nj
When ni = nj , nh = n which is the sample size of either group.
7.4 Multiple Comparison Procedures
47/52
Example
Use the data from the dieting study depicted on slide #9 to perform
Tukey’s HSD test. Begin by stating the null hypotheses to be tested,
then perform the tests and finally, state you conclusions. Maintain
αFWE at .05.
7.4 Multiple Comparison Procedures
48/52
Solution
Because there are three groups and we wish to make all pairwise
comparisons, we will have 3(2)
2 = 3 hypotheses to test. They are
H0 : µ1 = µ2
H0 : µ1 = µ3
H0 : µ2 = µ3
7.4 Multiple Comparison Procedures
49/52
Solution (continued)
The means of the three groups are as follows.
x̄1 = 203.2
x̄2 = 211.0
x̄3 = 184.4
Previous calculations (see slides #10 and 11) obtained when
performing a one-way ANOVA on these data provide the following.
MSw = 625
Because sample sizes are the same for all groups, nh will be
nh =
1
ni
2
+
1
nj
=
1
5
2
+
1
5
=5
for all comparisons.
7.4 Multiple Comparison Procedures
50/52
Solution (continued)
The test statistics for the three comparisons are by Equation 7.13
x̄1 − x̄2
203.2 − 211.0
q
q12 = q
=
= −.698
MSw
nh
625
5
x̄1 − x̄3
203.2 − 184.4
q
q13 = q
= 1.682
=
MSw
nh
625
5
x̄2 − x̄3
211.0 − 184.4
q
q23 = q
=
= 2.379
MSw
nh
7.4 Multiple Comparison Procedures
625
5
51/52
Solution (continued)
Critical values of q are obtained from Appendix E. The table is
entered with the number of means in the analysis and the appropriate
degrees of freedom which are N − k.
Referencing Appendix E for 3 means and 12 degrees of freedom yields
a critical value of 3.773.
As may be seen, none of the hypotheses are rejected so that no
differences between group means can be demonstrated.
7.4 Multiple Comparison Procedures
52/52