Download Comparison of a probability of the correct decision of multiple

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Comparison of a probability of the correct decision
of multiple comparison procedures
Wojciech Zieliński
Summary
Four procedures of multiple comparisons are compared with respect to the probability of the correct decision.
Among the procedures there are two classical ones (Tukey and Newman-Keuls), the FTP procedure of
Caliński and Corsten and a new procedure W. On the basis of a Monte Carlo experiment it is shown that
none of the procedures is uniformly the best.
Introduction
Consider k normal populations N (µi , σ 2 ), 1 = 1, . . . , k. The problem is to verify the hypothesis
H0 : µ1 = · · · = µk .
This problem is known since thirties, when Fisher (1935) developed his analysis of variance. In analysis of
variance the decision is investigated means are equal (non–rejecting H0 ) or at least one mean is different from
others (rejecting H0 ). The second case from the practical point of view is less informative. The practician
would like to know which of means may be considered as equal, i. e. the practician would like to divide
the set {µ1 , . . . , µk } into subsets of equal means. Such subsets are called homogenous groups. There are
known several procedures of dividing the set of means into homogenous groups. The most famous and
used in practical applications are Tukey, Scheffé, Bonfferroni, LSD. Each of procedure may give a different
homogenous groups. The question is, which of divisions is nearest to reality, i. e. which procedure gives “the
best” division.
Almost all procedures of multiple comparisons may be found in Miller (1982) and Hochberg and Tamhane
(1988). Those procedures may be divide into three main groups: simultaneous confidence intervals, simultaneous hypothesis testing and “other”. The problem is in comparison of procedures and choosing “the best”
one. For example, simultaneous confidence intervals are compared due to the length of individual intervals.
But such comparison is not good (Zieliński 1990). More over, there is not reasonable criterion of comparison
of procedures of different types. A proposition of a criterion of comparison may be found in Zieliński (1991).
The criterion is the probability of correct decision, i. e. the probability of obtaining a division of a set of
means consistent with reality.
Definition. (Zieliński 1991). The subset {µi1 , . . . , µim } is called a homogeneous group if µi1 = · · · = µim
and any other mean is not equal to µi1 .
The aim of a multiple comparison procedure is to obtain a division of the set of means {µ1 , . . . , µk } into
homogeneous groups on the basis of a set {Xij : j = 1, . . . , n, i = 1, . . . , k} of observations. If obtained
division is equal to real one we said that procedure made a correct decision. We are interested in a probability
of correct decision, and a procedure with higher probability is better. In what follows, a comparison of four
procedures due that criterion is made.
In the paper we restrict ourselves to the simplest situation. We assume that we have samples with the
same number of observations and the samples are independent. Also, we assume equality of variances of the
compared distributions.
Procedures
Consider four procedures: simultaneous confidence intervals of Tukey, Newman–Keuls multiply hypothesis
test, cluster analysis procedure FTP of Caliński and Corsten, and a new procedure W . The first two procedures are classical procedures and they were chosen with respect to results of Zieliński (1991). Those procedures
may give non disjoint homogeneous groups. The two latter procedures divide a set of means into disjoint
homogeneous groups.
1
Let ν = k(n − 1) and
n
1X
Xi =
Xij , i = 1, . . . , k,
n j=1
k
n
XX
2
1
s =
Xij − X i .
k(n − 1) i=1 j=1
2
Tukey’s simultaneous confidence intervals.
Tukey’s simultaneous confidence intervals have the following form:
P
s
s
α
α
µi − µj ∈ X i − X j − qk,ν √ , X i − X j + qk,ν √
, for all i, j = 1, . . . , k, i 6= j = 1 − α
n
n
α
where qk,ν
is critical value of studentized range. If zero is in a confidence interval for µi − µj , then those two
means are considered as equal. Applying that rule to all confidence intervals a division into homogeneous
groups is obtained.
Multiple test of Newman–Keuls
Newman–Keuls procedure is based on testing hypothesis Hi1 ,...,im : µi1 = · · · = µim for all sets of indices
{i1 , . . . , im }, m = k, k − 1, . . . , 1 which are subsets of {1, . . . , k}. The hypothesis is rejected if
√ n
α
maxX i − minX i ≤ qm,ν
i∈I
i∈I
s
α
where I = {i1 , . . . , im } qk,ν
is critical value of studentized range. If hypothesis Hi1 ,...,im is not rejected, then
the decision is: µi1 = · · · = µim .
Newman–Keuls procedure starts with H1,...,k (which usually is noted as H0 ). If the hypothesis is rejected,
than go to second step else stop procedure. The second step consists of testing k subhypothesis µ1 = · · · =
µi−1 = µi+1 = · · · = µk , i = 1, . . . , k of H1,...,k . If i–th hypothesis is rejected, then k − 1 subhypothesis are
tested, else the set {µ1 , . . . , µi−1 , µi+1 , . . . , µk } is said to be a homogeneous group and any of subhypothesis
is tested. The procedure stops, if there is nothing left to test.
Caliński and Corsten F T P procedure
The F T P procedure is based on an idea of cluster analysis. Let J = {I1 , . . . , Ip } be a division of {1, . . . , k}
into disjoint subsets For J let
p X
X
S(p, J ) = n
(X j − X Ij )2 ,
i=1 j∈Ii
where
X Ij =
n
XX
Xjl /(nkj )
j∈Ij l=1
and kj is the number of elements of Ij . Let J ∗ be a division into p disjoint subsets such that S(p, J ∗ ) is
α
minimal among S(p, J ). Procedures starts with p = 1 and p is increased till S(p, J ∗ ) < s2 (k − 1)Fk−1,ν
,
α
where Fk−1,ν is a critical value of the F distribution with (k − 1, ν) degrees of freedom. In this way we obtain
a division J ∗ of a set of means into p disjoint homogeneous groups.
W procedures
The W procedure is very similar to the F T P procedure. It differs from the previous one in selection of critical
α
.
values. In the W procedure the number p of homogeneous groups is increased till S(p, J ∗ ) < s2 (k − p)Fk−p,ν
Criterion
Let S = {s1 , s2 , . . .} denote the set of all possible divisions of the set of means into homogenous groups.
Elements of the set S are disjoint subsets of Rk and for (µ1 , . . . , µk ) ∈ Rk there exists only one s ∈ S such
2
that (µ1 , . . . , µk ) ∈ s. Note that S is a finite set. The elements of the set S are commonly called states of
nature.
The aim of any multiple comparison procedure is to “detect” the true state of nature. Let D be a set of all
decisions which can be made on the basis of observations. The elements of the set D are called decisions. We
assume that D = S.
We define the loss function in the following manner
0, if d = s,
L(d, s) =
for d ∈ D and s ∈ S.
1, if d 6= s,
This loss function gives penalty of one when our decision is not correct.
If we denote by X the space of all observations, than the function δ : X → D is called a decision rule. Any
of the above mentioned procedures of multiple comparisons may be described as a decision rule.
A decision rule δ is characterized by its risk function, i.e. average loss. Let (µ1 , . . . , µk ) ∈ s. Then the risk
function of the rule δ equals to
Rδ (µ1 , . . . , µk ) = P(µ1 ,...,µk ) {δ(X) 6= s}.
Note that in general the risk depends on the differences between values of means (µ1 , . . . , µk ). For example,
if we assume k = 3 and σ 2 = 1, than it is easier to make a misclassification for m1 = µ2 = 1, µ3 = 1.1 than
for m1 = µ2 = 1, µ3 = 5, though both cases belong to the same state of nature. Only in case µ1 = · · · = µk
the risk does not depend on the values of means.
The risk of the rule δ is the probability of the false decision. This probability should as small as possible. In
our investigations we are interested in the probability of the correct decision which equals 1 − Rδ . We are
going to compare this probability for the four mentioned procedures.
Experiment
To compare a probability of a correct decision a Monte Carlo experiment was performed. In that experiment
k = 10, n = 8 and σ 2 = 1.
It is obvious that the probability of correct decision does not depend on values of compared means, but
only on differences between them, it was taken µ1 = 0 and µ1 ≤ µ2 ≤ · · · ≤ µ10 . Hence, there are 42
possibilities of dividing a set of means into disjoint homogeneous groups. All the possible states of nature
are shown in Table 1. Notation (i1 , i2 , . . . , im ) indicates m groups with i1 , i2 , . . . , im means. It is assumed
that i1 ≥ i2 ≥ · · · ≥ im and i1 + i2 + · · · + im = 10. For example, (4, 3, 2, 1) indicates the division into four
homogeneous groups: {µ1 , µ2 , µ3 , µ4 }, {µ5 , µ6 , µ7 }, {µ8 , µ9 } and {µ10 }.
Numbers of groups
10
9
8
7
6
5
4
3
2
1
State
(1,1,1,1,1,1,1,1,1,1)
(2,1,1,1,1,1,1,1,1)
(2,2,1,1,1,1,1,1), (3,1,1,1,1,1,1,1)
(2,2,2,1,1,1,1), (3,2,1,1,1,1,1), (4,1,1,1,1,1,1)
(2,2,2,2,1,1), (3,2,2,1,1,1), (3,3,1,1,1,1), (4,2,1,1,1,1), (5,1,1,1,1,1)
(2,2,2,2,2), (3,2,2,2,1), (3,3,2,1,1), (4,2,2,1,1)
(4,3,1,1,1), (5,2,1,1,1), (6,1,1,1,1)
(3,3,2,2), (4,2,2,2), (3,3,3,1), (4,3,2,1), (5,2,2,1)
(4,4,1,1), (5,3,1,1), (6,2,1,1), (7,1,1,1)
(4,3,3), (4,4,2), (5,3,2), (6,2,2), (5,4,1), (6,3,1), (7,2,1), (8,1,1)
(5,5), (6,4), (7,3), (8,2), (9,1)
(10)
In the Monte Carlo experiment one has to choose values of means. It is difficult to make “planned” experiment
in the sense of choosing mean values. Hence, values of means µ2 , . . . , µ10 were taken randomly according to the
uniform distribution. For example, for the state (5, 4, 1) we generate two random numbers, say 0 < z1 < z2 ,
3
and the values µ1 = · · · = µ5 = 0, µ6 = · · · = µ9 = z1 and µ10 = z2 were taken. Such a procedure was
applied 1000 times for each state.
At each generated point (µ1 , . . . , µ10 ), 1000 sets of ten samples of size eight were drawn from normal populations with means µi . To each sample all four procedures were applied and it was noted if the obtained
division was consistent with “reality”.
For generating random numbers from the uniform distribution, a 32-bit multiplicative generator was applied.
This generator was written by the author. To obtain normally distributed random numbers the algorithm
of Box and Muller (1958) was applied.
Results
The probability of the correct decision dependsPon the differentation of means. A classical measure of such a
k
differentation is the non-centrality parameter i=1 (µi − µ̄)2 , where µ̄ is the arithmetic mean of µ1 , . . . , µk .
But in the case of the probability of the correct decision this parameter is rather useless (Fig. 1). One can see
high irregularities in the dependence of the probability of a correct decision on the non-centrality parameter.
Therefore, presentation of results is made in a different way, namely in relation to the probability of the
correct decision of the W procedure. Figures 2, 3 and 4 present results of simulation for three states of nature:
(9, 1), (7, 1, 1, 1) and (3, 3, 1, 1, 1, 1), respectively. On the x-axis one can find probabilities of a correct decision
for W procedure, and on the y-axis differences between probabilities for W and F T P , Newman-Keuls and
Tukey procedures, respectively. If, for example, probability of a correct decision for W procedure is 0.40,
F T P - 0.28, Newman-Keuls - 0.30 and Tukey - 0.10, then in the figure we have three points with coordinates
(0.40, −0.12), (0.40, −0.10) and (0.40, −0.30).
Figures 2, 3 and 4 show comparisons of the Tukey, Newman-Keuls and F T P procedures with the W procedure. If the appropriate line is above zero then the procedure is better than the W procedure. If the line is
below zero, then the respective procedure is worse than W . In Figure 2 results for the state (9, 1) (two homogeneous groups of nine and one mean, respectively) are shown. One can see that the W procedure is better
than Tukey and Newman-Keuls and is comparable with the F T P procedure. In Figures 3 and 4 we may see
that the W procedure is, in general, better than the F T P and Tukey procedures. In the case (7, 1, 1, 1) the
Newman-Keuls and W procedures are comparable, but in the situation (3, 3, 1, 1, 1, 1) the W procedure may
be considered as the best one. For other states of nature figures are very similar to the presented ones.
Literature
Fisher R.A. (1935) The Design of Experiments, Edinburgh, Oliver and Boyd.
Hochberg Y., Tamhane A.C. (1988) Multiple Comparison Procedures, John Wiley & Sons.
Miller Jr. R.G. (1982) Simultaneous Statistical Inference, Springer Verlag, 2nd ed.
Zieliński W. (1990) Two remarks on the comparison of simultaneous confidence intervals, Biometrical Journal
32: 717–719.
Zieliński W. (1991) Monte Carlo comparison of multiple comparison procedures, Biometrical Journal 34:
291–296.
4
5
6