Download Multivariate Inequality Hypostheses Using SAS/IML® Software

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
MULTIVARIATE INEQUALITY HYPOTHESES USING SASlIML" SOFlWARE
Dan Jacobs, Maryland Sea Grant College, University of Maryland
Estelle Russek-Cohen, Department of Animal Sciences, University of Maryland
Abstract
To see this as a multivariate one-sided test. let:
Muttivariate inequality hypotheses and muttivariate onesided tests have received considerable attention in the
statistics literature (e.g., Barlow et aI., 1972 and Shapiro,
1988). However, they are often avoided by practicing
statisticians. The lack of readily available software may be
a major obstacle in the use of these methods. While
FORTRAN algor~hms have been published for various pieces
of the analyses required, no integrated package for these
procedures exist. SASlfML software modules have been
developed that are modificationsolthe established FORTRAN
procedures. These modules, along with the flexible
programming features of SASlfML software, make the testing
of a number of muttivariate inequality hypotheses fairly
straightforward.
(Eq. 1)
If all the elements of Q.' = (/;"/;"/;3) are positive, we say that
Q.E
where denotes the pos~iveorthant (i.e., the quadrat
on the graph where all the elements are pos~ive). The
hypotheses of interest can then be restated as:
a,
versus
For the case where l! - MVNI.§., :E) is known, a likelhood ratio
has been developed by Kudo (1963). Many hypotheses
involving linear models andlor generalized linear models
w~h sufficiently large sample size will reasonably satisfy
these assumptions since S (the sample covariance matrix)
can be used in place of :E. Hence, this test is quite flexible for
many practical examples including the two we have just
presented.
Introduction
The easiest way to understand what is meant by a
multivariate one-sided test is
a
by two examples. The first
The implementation of Kudo's procedure involves
two parts. The likelihood ratio statistic involves the calculation
of:
example is fairly straightforward. Suppose one had two
treatments to compare, such as a treatment and a control,
and may be only interested in the treatment if it improves one
or more variables of interest. In pollution studies, one may
be interested in the impact that a new water cooling system
may have on the performance of a nuclear power plant. One
records several factors or variables when the nuclear power
plant is operated with and without the new water cooling
system. Ratherthan test each variable individually, one may
use an overall test to determine whether the new cooling
system improves all of the variables. The commonly used
muttivariate Hotellings T-test, while easily computed using
PROC GLM (part of SAS/STAT' software), fails to testlor the
directionality of the response and would be less powerful
than the one-sided afiernatives we discuss below.
a = (l! - Q.)' :E"
(l! - Q.")
where Q- is selected to minimize 0- and is constrained to be
positive. That is, we find the value of Q.' using a quadratic
programming algorithm (for details see pages 143-145 in the
SASlfML manual, version 6, first edition). The other part of
the procedure involves the computation of a p-value. Kudo
(1963) has shown that the statistic:
x' = nUL' };"' Q - 0)
is the likelihood ratio statistic and under the null hypothesis
is a weighted combination of chi-square random variables.
The weight associated with P(X', <: C), i = 1, 2, ... , p, is the
probability that exactly i elements of p in a MVN(Q. :E) random
variable is negative. When p is small (i.e., p = 2, 3, 4), closed
form equations exist (see Kudo, 1963, or Wolak, 1987) and
may be easily programmed. For the example we presented
(i.e., HA : ~., " !1i' i = 1, 2, 3, ...,p) w~h balanced n, a table
in Barlowetal. (1972) is availableforp" 12. We have aSASI
IML procedure which computes these weights and it is
similar to the FORTRAN procedure of Bohrer and Chow
(1978). To implement this procedure, we also converted a
FORTRAN procedure developed by Schervish (1984, 1985)
for computing muttivariate normalprobabil~ies that appeared
to have good accuracy into a SASIIML software modules.
This subroutine, which computes multivariate normal
probabilities for rectangular regions, is useful in ~s own right
As a second example, suppose one had 4 treatments
corresponding to 4 doses (or a control and 3 doses) of the
same substance. One records~. a sample mean, for each
treatment and assumes that the means are normally
distributed. We are interested in the relationship of dose to
~j , the mean of the ith treatment group. However, we are
unwilling to assume a straight line equation. Instead, we are
interested in the following hypotheses:
versus
where at least one strict inequamy applies.
1163
and is callable w~hin SASIIML software as a stand-alone
procedure. Considerable effort was made to take advantage
of SAS/IML software features rather than a simple line-byline conversion of the original code. Also. an algor~hm that
calculates bivariate normal probabil~ies (Donnelly, 1973)
was also converted into SAS/IMLsoftware and used in place
of the. IMSL routines called by the mul~ivariate FORTRAN
program. An alternative approach would have been to
compute orthant probabiltties using Sun (1988).
00 PI = {I} TO P:
* 1 OF P NEGATIVE:
PROB(PI,I] = 1 - PROBCHI(K, PI}:
II = PI: Jl = INDEX[,LOC(INDEX '- PI)]:
B - BOUNDS[Il,]:
SI = SIGMA(Il,II] - SIGMA(Il,Jl] *
INV(SIGMA(JI,Jl]} * SIGMA[J1,Il]:
RUN MULNOR(Sl,B,EPS,WGTI,ERROR):
B - BOUNDS (J1 , ]:
SI - INV(SIGMA(Jl,J1]}:
RUN MULNOR(Sl,B,EPS,WGl'2,ERROR);
Once one has computed the likelihood ratio statistic,
one can calculate c = l:w, P(X', " X'l using the PROSCHI
function that is available in base SAS· software. Hc,; n, the
null hypothesis is rejected.
WGT(I,I] = WGT[I,I] + WGTI*WGT2:
IF P > {2} THEN DO P2 = P1+1 TO P:
* 2 OF P NEGATIVE:
12 - III IP2: J2 = J1{,LOC(JI '= P2)]:
B = BOUNDS[I2,]:
Sl = SIGMA(I2,I2] - SIGMA[I2,J2] *
INV(SIGMA[J2,J2]} * SIGMA[J2,I2]:
RUN MOLNOR(Sl,B,EPS,WGTl,ERROR}:
B = BOUNDS(J2,]:
Sl = INV(SIGMA[J2,J2]):
RUN MULNOR(Sl,B,EPS,WGT2,ERROR):
WGT(2,1] = WGT(2,1] + WGTl*WGT2:
IF P > {3} THEN DO P3 = P2+1 TO P:
* 3 OF P NEGATIVE:
13 = 1211P3:
J3 = J2 (, LOC (J2 '= P3) ]:
B = BOUNDS (13, ]:
Sl = SIGMA(I3,I3] - SIGMA(I3,J3] *
INV(SIGMA(J3,J3])*SIGMA(J3,I3]:
RUN MULNOR(Sl,B,EPS,WGTl,ERROR):
B = BOUNDS (J3, ]:
Sl - INV(SIGMA(J3,J3]):
RUN MOLNOR(Sl,B,EPS,WGT2,ERROR):
WGT(3,1] = WGT(3,1] + WGT1*WGT2:
An Example
As an illustrative example, we used data from a
toxicology problem reported by Perry (1991). Five cohorts of
Daphnia pu!exwere grown over a period of 18 days. Survival
and fecund~y of offspring were recorded every three days.
Each cohort was subjected to a pH corresponding 10 a given
level of acid stress. The pH used were 4.4, 5.0, 5.5, and 7.0
(control). The estimated growth rate (",) was determined
using the Leslie matrix model and assumed to approximate
a normally random variable. The standard error for each '"
was based on a detta method. The '" values and the standard
errors were used to calculate!! and S, respectively. The
following code shows how this may be done using SAS/IML
software.
PROG IML: RESET FW=lO:
START EXl:
*
END;
D AND SIGMA;
*
CONSTRAINTS OF THE FORM DELTAi * >= 0;
G
B
=
=
PRINT 'Ho: delta = 0
{
'Q K PROB;
*
IN MOLNOR MODULES;
%INCLUDE MULNOR/NOSOURCE2:
RUN EXl;
'>=', '>='}; * GX TO B:
'D1*', '02*', 'D3*' }: * LABELS;
QUIT;
REL = {'>=',
=
3 OF P NEGATIVE WEIGHT;
FINISH:
* QUADRATIC PROGRAMMING MODULE:
%INCLUDE QUADPROG/NOSOURCE2:
I (P):
* CONSTRAINT COEFFICIENTS:
J(P,l,O): * RIGHT SIDE OF CONSTRAINTS:
NAMES
*
END: * 2 OF P NEGATIVE WEIGHT:
END: * 1 OF P NEGATIVE WEIGHT:
• IAST WEIGHT - P OF P ARE NEGATIVE:
Sl = SIGMA: B = BOUNDS:
RUN MOLNOR(Sl,B,EPS,MULNOR,ERROR):
WGT (P ,1] = MOLNOR:
PROB = WGT' * PROB;
D = {-0.02, 0.50, 0.07}:
SIGMA = { 8.938 -2.074 0 .. 000,
-2.074 9.858 -7.784,
0.000 -7.784 l2.713}:
PRINT D SIGMA:
P = NROW(D}: * NUMBER OF ELEMENTS:
* QUADRATIC PROGRAMMING TO FIND D* VECTOR:
SINV = INV(SIGMA):
* OBJECTIVE FUNCTION COFFICIENTS:
C = -2 * SINV * 0; * LINEAR;
H - 2 * SINV:
* QUADRATIC:
In this case, with p = 3, the likelihood ratio statistics
was computed to be 0.1896 (p = 0.5912). Therefore, one is
unable to reject the hypothesis that the estimated growth
rates are equal using n = 0.05. This also demonstrates how
the code may expanded to handle problems when p > 3. The
statements from:
RUN QP(NAMES,C,H,G,REL,B,D_STAR);
* CALCULATE THE Q AND KUDO STATISTICS:
Q = (D - D_STAR)' * SINV * (D - D_STAR):
K = P * (D' * SINV * D - Q):
BOUNDS = {a .,0 .,0 .}: * a TO -INFINITY:
EPS = {O.OOOl}:* MINIMUM ERROR (ACCURACY):
* COMPUTE WEIGHTS AND PROBABILITIES:
WGT = J(P,l,O):
* WEIGHT VECTOR:
PROB = J(P,l,O}:
* PROBABILITY VECTOR:
INDEX = l:P:
* INDEX VECI'OR;
IF P > {3} THEN DO P3 = P2+1 TO P:
to:
END: * 3 OF P NEGATIVE WEIGHT:
1164
inclusive, are an example of howto include the code necessary
ff p = 4. One simply nests the appropriate number of such
blocks and changes number suffix of the two index variables
(I and J) to be equal to the number of negatives. This code
modffication may be done by expanding the program as
shown in this example or by using the MACRO facil~y
available in base SAS software.
PROC IML; RESET FW=10;
START EX2;
* DIFFERENCE BETWEEN LOG RELATIVE HAZARDS;
* FOR FACII OF THE FOUR STRATA;
0= {0_531, -0_266, -0.030, -0.724};
* STANDARD DEVIATIONS FOR FACII STRATA;
S = {0_208, 0_182,
0.206,
0_207};
* STANDARDIZE THE VlILllES;
o - D/S;
P = NROW (D); * NUMBER OF ELEMENTS;
* COMPUTE THE Q STATISTIC;
Q = MIN{«DU{2}) '*I(P)*(D > (O}»,
«DH{2}) '*I(P)*(D < (O}»};
* COMPUTE WEIGHTS AND PROBABILITIES;
P = P - 1;
* P - 1 ELEMENTS;
WGT = J(P,l,O);
PROB - J(P,l,O);
DO I = {I} TO P; * I POSITIVE ELEMENTS;
WGT[I,l] = PROBBNML(0.5, P, I)
- PROBBNML(0_5, P, I - I ) ;
PROB[I,l] = 1 - PROBCHI(Q, I);
END;
Related Hypotheses
Gail and Simon (t 985) developed a test for
qual~ative interaction which is appropriate in the comparison
of two treatments when the subjects (Le. patients) are
divided into discrete strata. The assumptions are more
rigorousthanthosegivenabovesincel:=I aiferstandardizing
the estimate of the treatment effect computed for each
stratum. Under anull hypothesis of noqualttative interaction,
we assume that treatment A is better than treatment B for all
strata or akernativelythattreatment B is betterthan treatment
A for all strata. This implies that the sign of the treatment
effect for the ith stratum is always of the same sign (e.g., the
sign of 1',. - 1',.,). The alternative assumes that this null
hypothesis is not true. The hypotheses can be stated
mathematically as:
PROB = WGT'
PRINT
FINISH;
~Ho:
*
PROB;
No interaction ' Q PROB;
RUN EX2;
HO:~E
versus
HA :
~E
auo·
auo·.
QUIT;
Q was calculated to be 6.54 and ~s probabil~ equal to
0.029. Therefore, One is unable to reject the null hypothesis
of no qualttative interaction between the treatments using a
= 0.05. This resuk is identical to what Gail and Simon (1985)
reported.
The likelihood ratio statistic consists of rejecting the null ff:
0= min(Q' I (Q> 0), Q'[ (Q < 0)) > c.
Russek-Cohen and Simon (1992) proposed an
extension tothe testiorqualttative interaction. It is essentially
equivalent to a test proposed by Wolak (1987) for regression
models. These tests involve the use of both the quadratic
programming algorithm and the mukivariate weights required
in the test proposed by Kudo. The SAS/IML code similar to
the first example may be used to examine these hypotheses.
Similar to Kudo's test, Ofollows a linear combination of chisquare random variables. However, this time the weights
simplffy because the estimates are independent. The weights
correspond to the probabil~ that exactly i elements out of
p-l elements are posttive. This can be calculated as
BINOM(p-l, i, .5) using the PROBBNML function available
in base SAS software. This procedure can also be readily
programmed using SAS/IML software.
Sasabuchi (1980) developed a likelihood ratio test
lor a related hypothesis. Using the same notation as was
presented in Equation 1 along with the additional assumption
negative correlations between each pair 01 the d/ a test was
presented to examine:
The following code is an example of how this may
be done using SAS/IML software. The hypothesis of interest
is that there is no quantitative interaction between two
dffferenttreatment protocals for young women diagnosed as
having breast cancer and positive nodes. The data used in
this example are from Table 3 in Gail and Simon (1985, page
366). There are four strata: (1) patient age < 50, PR < 10;
(2) patient age >= 50, PR < 10; (3) patient age < 50, PR >=
10; and (4) patient age >= 50, PR >= 10 (where PR is the
progesterone receptor level in fmole). The first treatment
was the combination of L-phenylalanine mustard and 5fluorouracil and the second treatment was the same as the
first wtth the addttion of Tamoxffen. The log relative hazard
was calculated for each of the patients. The difference
between the two treatments and standard deviations were
computed from this data.
versus
HA:~>Q·
In contrast with the test proposed by Kudo(1963), each
element 01 ~ must be strictly positive. One is able to rejectthe
null hypotheses when the minimum d,> z,."" where the level
of the test is a and the probability a standard normal is less
than z,.", is 1-2a. Again, the SAS/IML software code similar
to the two examples presented above may be used to
examine this type of hypothesis.
1165
More On The Multivariate Normal Algorithm
Schervish, M. 1984. AlgorithmASl95. Multivariate normal
probabiltties wtth error bound. Appl. Stat. 33:81-94.
Schervish, M. 1985. Corrections to AS195. Appl. Stat.
34:103-104.
Shapiro. A. 1988. Toward a unified theory of inequaltty
constrained testing in muhivariate analysis. Inter.
Stat Inst. 56:49-62.
Sun, H. J. 1988. A FORTRAN subroutine for computing
normal orthant probabilities of dimensions up to
nine. Commun. in Stat. Simula.17:1097-1111.
Wolak, FA 1987. An exact test for muhiple inequaltty and
equal~ constraints in the linear regression model.
J. Am. Stat. Assn. 82:782-793.
The MULNOR subroutine used in this paper only
requires the user to provide two matrices. The first matrix
(nx2) sets the bounds for each variable w~h the upper bound
given first. The bounds should be standardized (centered
about the mean). A period (.) may be used to indicate e~her
+ or - infinity. The second reqUired matrix may be either a
variance/covariance or correlatton matrix (nxn). The four
SAS/IML modules that comprise the subroutine are available
from the authors upon request.
We have compared our subroutine to that of
Schervish (1984, 1985) by computing probabiltties for the
same examples he provides. The answers obtained were
identical forthe first six decimal places. The minor differences
can be attnbuted to machine accuracy. We also tested the
computation of probabilities based on up to seven variables.
However, for morelhan four variables, the computations can
take aconsiderablelengthoitime. Schervish (1984) suggests
that considerable time can be saved during execution ff the
variables are arranged sothatthelargest ranges of integration
correspond to the first two dimensions. This is also our
recommendation for our SASIIML software subroutine.
SAS, SASIIML, and SASISTAT are registered trademarks or
trademarks of SAS Inst~ute, Inc in the USA and other
countries. ® indicates USA registration.
Other brand and product names are registered trademarks
or trademarks of their representative companies.
Conclusions
Muhivariate inequaltty hypotheses offerstatisticians
the opportun~yto test more precise hypotheses than general
multivariate GLM·like procedures. We believe that the SASI
IML software programming demonstrated in this paper will
facilitate the use of such procedures. In addition, the
muhivariate normal probability SASIIML software subroutine
will have considerable util~y in ~s own right in such areas as
simuhaneous testing and for isotonic regression problems
(Barlow et aI., 1972).
Literature Cited
Barlow, R., D. J. Bartholomew, J. M. Bremner and H.D.
Brunk. 1972. Statistical inference under order
restrictions. Wiley, N.Y.
Bohrer, R. and W. Chow. 1978. Weights for one-sided
multivariate inference. Appl. Stat. 27:100-104.
Donnely, T. G. 1973. Algor~hm 462. Bivariate normal
distribution. Commun. Ass. Comput. Mach. 16:636.
Gail, M. and R. Simon. 1985. Tests for qual~ative
interactions between treatment means and patient
subsets. Biometrics 41 :361-372.
Kudo,A. 1963. A muhivariate analogue of the one-sided test.
Biometrika 50:403-418.
Perry, E. 1991. Distributional properties of parameters
derived from Leslie matrix models. Unpublished
Ph.D. Dissertation, Univers~ of Maryland.
Russek-Cohen, E. and R. Simon. 1992. Qualitative interaction
in muhifactor studies. Biometrics (in press).
Sasabuchi, S. 1980. A test of muhivariate normal means
with composite hypotheses determined by linear
inequaltties. Biometrika 67:429-439
1166