Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MULTIVARIATE INEQUALITY HYPOTHESES USING SASlIML" SOFlWARE Dan Jacobs, Maryland Sea Grant College, University of Maryland Estelle Russek-Cohen, Department of Animal Sciences, University of Maryland Abstract To see this as a multivariate one-sided test. let: Muttivariate inequality hypotheses and muttivariate onesided tests have received considerable attention in the statistics literature (e.g., Barlow et aI., 1972 and Shapiro, 1988). However, they are often avoided by practicing statisticians. The lack of readily available software may be a major obstacle in the use of these methods. While FORTRAN algor~hms have been published for various pieces of the analyses required, no integrated package for these procedures exist. SASlfML software modules have been developed that are modificationsolthe established FORTRAN procedures. These modules, along with the flexible programming features of SASlfML software, make the testing of a number of muttivariate inequality hypotheses fairly straightforward. (Eq. 1) If all the elements of Q.' = (/;"/;"/;3) are positive, we say that Q.E where denotes the pos~iveorthant (i.e., the quadrat on the graph where all the elements are pos~ive). The hypotheses of interest can then be restated as: a, versus For the case where l! - MVNI.§., :E) is known, a likelhood ratio has been developed by Kudo (1963). Many hypotheses involving linear models andlor generalized linear models w~h sufficiently large sample size will reasonably satisfy these assumptions since S (the sample covariance matrix) can be used in place of :E. Hence, this test is quite flexible for many practical examples including the two we have just presented. Introduction The easiest way to understand what is meant by a multivariate one-sided test is a by two examples. The first The implementation of Kudo's procedure involves two parts. The likelihood ratio statistic involves the calculation of: example is fairly straightforward. Suppose one had two treatments to compare, such as a treatment and a control, and may be only interested in the treatment if it improves one or more variables of interest. In pollution studies, one may be interested in the impact that a new water cooling system may have on the performance of a nuclear power plant. One records several factors or variables when the nuclear power plant is operated with and without the new water cooling system. Ratherthan test each variable individually, one may use an overall test to determine whether the new cooling system improves all of the variables. The commonly used muttivariate Hotellings T-test, while easily computed using PROC GLM (part of SAS/STAT' software), fails to testlor the directionality of the response and would be less powerful than the one-sided afiernatives we discuss below. a = (l! - Q.)' :E" (l! - Q.") where Q- is selected to minimize 0- and is constrained to be positive. That is, we find the value of Q.' using a quadratic programming algorithm (for details see pages 143-145 in the SASlfML manual, version 6, first edition). The other part of the procedure involves the computation of a p-value. Kudo (1963) has shown that the statistic: x' = nUL' };"' Q - 0) is the likelihood ratio statistic and under the null hypothesis is a weighted combination of chi-square random variables. The weight associated with P(X', <: C), i = 1, 2, ... , p, is the probability that exactly i elements of p in a MVN(Q. :E) random variable is negative. When p is small (i.e., p = 2, 3, 4), closed form equations exist (see Kudo, 1963, or Wolak, 1987) and may be easily programmed. For the example we presented (i.e., HA : ~., " !1i' i = 1, 2, 3, ...,p) w~h balanced n, a table in Barlowetal. (1972) is availableforp" 12. We have aSASI IML procedure which computes these weights and it is similar to the FORTRAN procedure of Bohrer and Chow (1978). To implement this procedure, we also converted a FORTRAN procedure developed by Schervish (1984, 1985) for computing muttivariate normalprobabil~ies that appeared to have good accuracy into a SASIIML software modules. This subroutine, which computes multivariate normal probabilities for rectangular regions, is useful in ~s own right As a second example, suppose one had 4 treatments corresponding to 4 doses (or a control and 3 doses) of the same substance. One records~. a sample mean, for each treatment and assumes that the means are normally distributed. We are interested in the relationship of dose to ~j , the mean of the ith treatment group. However, we are unwilling to assume a straight line equation. Instead, we are interested in the following hypotheses: versus where at least one strict inequamy applies. 1163 and is callable w~hin SASIIML software as a stand-alone procedure. Considerable effort was made to take advantage of SAS/IML software features rather than a simple line-byline conversion of the original code. Also. an algor~hm that calculates bivariate normal probabil~ies (Donnelly, 1973) was also converted into SAS/IMLsoftware and used in place of the. IMSL routines called by the mul~ivariate FORTRAN program. An alternative approach would have been to compute orthant probabiltties using Sun (1988). 00 PI = {I} TO P: * 1 OF P NEGATIVE: PROB(PI,I] = 1 - PROBCHI(K, PI}: II = PI: Jl = INDEX[,LOC(INDEX '- PI)]: B - BOUNDS[Il,]: SI = SIGMA(Il,II] - SIGMA(Il,Jl] * INV(SIGMA(JI,Jl]} * SIGMA[J1,Il]: RUN MULNOR(Sl,B,EPS,WGTI,ERROR): B - BOUNDS (J1 , ]: SI - INV(SIGMA(Jl,J1]}: RUN MULNOR(Sl,B,EPS,WGl'2,ERROR); Once one has computed the likelihood ratio statistic, one can calculate c = l:w, P(X', " X'l using the PROSCHI function that is available in base SAS· software. Hc,; n, the null hypothesis is rejected. WGT(I,I] = WGT[I,I] + WGTI*WGT2: IF P > {2} THEN DO P2 = P1+1 TO P: * 2 OF P NEGATIVE: 12 - III IP2: J2 = J1{,LOC(JI '= P2)]: B = BOUNDS[I2,]: Sl = SIGMA(I2,I2] - SIGMA[I2,J2] * INV(SIGMA[J2,J2]} * SIGMA[J2,I2]: RUN MOLNOR(Sl,B,EPS,WGTl,ERROR}: B = BOUNDS(J2,]: Sl = INV(SIGMA[J2,J2]): RUN MULNOR(Sl,B,EPS,WGT2,ERROR): WGT(2,1] = WGT(2,1] + WGTl*WGT2: IF P > {3} THEN DO P3 = P2+1 TO P: * 3 OF P NEGATIVE: 13 = 1211P3: J3 = J2 (, LOC (J2 '= P3) ]: B = BOUNDS (13, ]: Sl = SIGMA(I3,I3] - SIGMA(I3,J3] * INV(SIGMA(J3,J3])*SIGMA(J3,I3]: RUN MULNOR(Sl,B,EPS,WGTl,ERROR): B = BOUNDS (J3, ]: Sl - INV(SIGMA(J3,J3]): RUN MOLNOR(Sl,B,EPS,WGT2,ERROR): WGT(3,1] = WGT(3,1] + WGT1*WGT2: An Example As an illustrative example, we used data from a toxicology problem reported by Perry (1991). Five cohorts of Daphnia pu!exwere grown over a period of 18 days. Survival and fecund~y of offspring were recorded every three days. Each cohort was subjected to a pH corresponding 10 a given level of acid stress. The pH used were 4.4, 5.0, 5.5, and 7.0 (control). The estimated growth rate (",) was determined using the Leslie matrix model and assumed to approximate a normally random variable. The standard error for each '" was based on a detta method. The '" values and the standard errors were used to calculate!! and S, respectively. The following code shows how this may be done using SAS/IML software. PROG IML: RESET FW=lO: START EXl: * END; D AND SIGMA; * CONSTRAINTS OF THE FORM DELTAi * >= 0; G B = = PRINT 'Ho: delta = 0 { 'Q K PROB; * IN MOLNOR MODULES; %INCLUDE MULNOR/NOSOURCE2: RUN EXl; '>=', '>='}; * GX TO B: 'D1*', '02*', 'D3*' }: * LABELS; QUIT; REL = {'>=', = 3 OF P NEGATIVE WEIGHT; FINISH: * QUADRATIC PROGRAMMING MODULE: %INCLUDE QUADPROG/NOSOURCE2: I (P): * CONSTRAINT COEFFICIENTS: J(P,l,O): * RIGHT SIDE OF CONSTRAINTS: NAMES * END: * 2 OF P NEGATIVE WEIGHT: END: * 1 OF P NEGATIVE WEIGHT: • IAST WEIGHT - P OF P ARE NEGATIVE: Sl = SIGMA: B = BOUNDS: RUN MOLNOR(Sl,B,EPS,MULNOR,ERROR): WGT (P ,1] = MOLNOR: PROB = WGT' * PROB; D = {-0.02, 0.50, 0.07}: SIGMA = { 8.938 -2.074 0 .. 000, -2.074 9.858 -7.784, 0.000 -7.784 l2.713}: PRINT D SIGMA: P = NROW(D}: * NUMBER OF ELEMENTS: * QUADRATIC PROGRAMMING TO FIND D* VECTOR: SINV = INV(SIGMA): * OBJECTIVE FUNCTION COFFICIENTS: C = -2 * SINV * 0; * LINEAR; H - 2 * SINV: * QUADRATIC: In this case, with p = 3, the likelihood ratio statistics was computed to be 0.1896 (p = 0.5912). Therefore, one is unable to reject the hypothesis that the estimated growth rates are equal using n = 0.05. This also demonstrates how the code may expanded to handle problems when p > 3. The statements from: RUN QP(NAMES,C,H,G,REL,B,D_STAR); * CALCULATE THE Q AND KUDO STATISTICS: Q = (D - D_STAR)' * SINV * (D - D_STAR): K = P * (D' * SINV * D - Q): BOUNDS = {a .,0 .,0 .}: * a TO -INFINITY: EPS = {O.OOOl}:* MINIMUM ERROR (ACCURACY): * COMPUTE WEIGHTS AND PROBABILITIES: WGT = J(P,l,O): * WEIGHT VECTOR: PROB = J(P,l,O}: * PROBABILITY VECTOR: INDEX = l:P: * INDEX VECI'OR; IF P > {3} THEN DO P3 = P2+1 TO P: to: END: * 3 OF P NEGATIVE WEIGHT: 1164 inclusive, are an example of howto include the code necessary ff p = 4. One simply nests the appropriate number of such blocks and changes number suffix of the two index variables (I and J) to be equal to the number of negatives. This code modffication may be done by expanding the program as shown in this example or by using the MACRO facil~y available in base SAS software. PROC IML; RESET FW=10; START EX2; * DIFFERENCE BETWEEN LOG RELATIVE HAZARDS; * FOR FACII OF THE FOUR STRATA; 0= {0_531, -0_266, -0.030, -0.724}; * STANDARD DEVIATIONS FOR FACII STRATA; S = {0_208, 0_182, 0.206, 0_207}; * STANDARDIZE THE VlILllES; o - D/S; P = NROW (D); * NUMBER OF ELEMENTS; * COMPUTE THE Q STATISTIC; Q = MIN{«DU{2}) '*I(P)*(D > (O}», «DH{2}) '*I(P)*(D < (O}»}; * COMPUTE WEIGHTS AND PROBABILITIES; P = P - 1; * P - 1 ELEMENTS; WGT = J(P,l,O); PROB - J(P,l,O); DO I = {I} TO P; * I POSITIVE ELEMENTS; WGT[I,l] = PROBBNML(0.5, P, I) - PROBBNML(0_5, P, I - I ) ; PROB[I,l] = 1 - PROBCHI(Q, I); END; Related Hypotheses Gail and Simon (t 985) developed a test for qual~ative interaction which is appropriate in the comparison of two treatments when the subjects (Le. patients) are divided into discrete strata. The assumptions are more rigorousthanthosegivenabovesincel:=I aiferstandardizing the estimate of the treatment effect computed for each stratum. Under anull hypothesis of noqualttative interaction, we assume that treatment A is better than treatment B for all strata or akernativelythattreatment B is betterthan treatment A for all strata. This implies that the sign of the treatment effect for the ith stratum is always of the same sign (e.g., the sign of 1',. - 1',.,). The alternative assumes that this null hypothesis is not true. The hypotheses can be stated mathematically as: PROB = WGT' PRINT FINISH; ~Ho: * PROB; No interaction ' Q PROB; RUN EX2; HO:~E versus HA : ~E auo· auo·. QUIT; Q was calculated to be 6.54 and ~s probabil~ equal to 0.029. Therefore, One is unable to reject the null hypothesis of no qualttative interaction between the treatments using a = 0.05. This resuk is identical to what Gail and Simon (1985) reported. The likelihood ratio statistic consists of rejecting the null ff: 0= min(Q' I (Q> 0), Q'[ (Q < 0)) > c. Russek-Cohen and Simon (1992) proposed an extension tothe testiorqualttative interaction. It is essentially equivalent to a test proposed by Wolak (1987) for regression models. These tests involve the use of both the quadratic programming algorithm and the mukivariate weights required in the test proposed by Kudo. The SAS/IML code similar to the first example may be used to examine these hypotheses. Similar to Kudo's test, Ofollows a linear combination of chisquare random variables. However, this time the weights simplffy because the estimates are independent. The weights correspond to the probabil~ that exactly i elements out of p-l elements are posttive. This can be calculated as BINOM(p-l, i, .5) using the PROBBNML function available in base SAS software. This procedure can also be readily programmed using SAS/IML software. Sasabuchi (1980) developed a likelihood ratio test lor a related hypothesis. Using the same notation as was presented in Equation 1 along with the additional assumption negative correlations between each pair 01 the d/ a test was presented to examine: The following code is an example of how this may be done using SAS/IML software. The hypothesis of interest is that there is no quantitative interaction between two dffferenttreatment protocals for young women diagnosed as having breast cancer and positive nodes. The data used in this example are from Table 3 in Gail and Simon (1985, page 366). There are four strata: (1) patient age < 50, PR < 10; (2) patient age >= 50, PR < 10; (3) patient age < 50, PR >= 10; and (4) patient age >= 50, PR >= 10 (where PR is the progesterone receptor level in fmole). The first treatment was the combination of L-phenylalanine mustard and 5fluorouracil and the second treatment was the same as the first wtth the addttion of Tamoxffen. The log relative hazard was calculated for each of the patients. The difference between the two treatments and standard deviations were computed from this data. versus HA:~>Q· In contrast with the test proposed by Kudo(1963), each element 01 ~ must be strictly positive. One is able to rejectthe null hypotheses when the minimum d,> z,."" where the level of the test is a and the probability a standard normal is less than z,.", is 1-2a. Again, the SAS/IML software code similar to the two examples presented above may be used to examine this type of hypothesis. 1165 More On The Multivariate Normal Algorithm Schervish, M. 1984. AlgorithmASl95. Multivariate normal probabiltties wtth error bound. Appl. Stat. 33:81-94. Schervish, M. 1985. Corrections to AS195. Appl. Stat. 34:103-104. Shapiro. A. 1988. Toward a unified theory of inequaltty constrained testing in muhivariate analysis. Inter. Stat Inst. 56:49-62. Sun, H. J. 1988. A FORTRAN subroutine for computing normal orthant probabilities of dimensions up to nine. Commun. in Stat. Simula.17:1097-1111. Wolak, FA 1987. An exact test for muhiple inequaltty and equal~ constraints in the linear regression model. J. Am. Stat. Assn. 82:782-793. The MULNOR subroutine used in this paper only requires the user to provide two matrices. The first matrix (nx2) sets the bounds for each variable w~h the upper bound given first. The bounds should be standardized (centered about the mean). A period (.) may be used to indicate e~her + or - infinity. The second reqUired matrix may be either a variance/covariance or correlatton matrix (nxn). The four SAS/IML modules that comprise the subroutine are available from the authors upon request. We have compared our subroutine to that of Schervish (1984, 1985) by computing probabiltties for the same examples he provides. The answers obtained were identical forthe first six decimal places. The minor differences can be attnbuted to machine accuracy. We also tested the computation of probabilities based on up to seven variables. However, for morelhan four variables, the computations can take aconsiderablelengthoitime. Schervish (1984) suggests that considerable time can be saved during execution ff the variables are arranged sothatthelargest ranges of integration correspond to the first two dimensions. This is also our recommendation for our SASIIML software subroutine. SAS, SASIIML, and SASISTAT are registered trademarks or trademarks of SAS Inst~ute, Inc in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or trademarks of their representative companies. Conclusions Muhivariate inequaltty hypotheses offerstatisticians the opportun~yto test more precise hypotheses than general multivariate GLM·like procedures. We believe that the SASI IML software programming demonstrated in this paper will facilitate the use of such procedures. In addition, the muhivariate normal probability SASIIML software subroutine will have considerable util~y in ~s own right in such areas as simuhaneous testing and for isotonic regression problems (Barlow et aI., 1972). Literature Cited Barlow, R., D. J. Bartholomew, J. M. Bremner and H.D. Brunk. 1972. Statistical inference under order restrictions. Wiley, N.Y. Bohrer, R. and W. Chow. 1978. Weights for one-sided multivariate inference. Appl. Stat. 27:100-104. Donnely, T. G. 1973. Algor~hm 462. Bivariate normal distribution. Commun. Ass. Comput. Mach. 16:636. Gail, M. and R. Simon. 1985. Tests for qual~ative interactions between treatment means and patient subsets. Biometrics 41 :361-372. Kudo,A. 1963. A muhivariate analogue of the one-sided test. Biometrika 50:403-418. Perry, E. 1991. Distributional properties of parameters derived from Leslie matrix models. Unpublished Ph.D. Dissertation, Univers~ of Maryland. Russek-Cohen, E. and R. Simon. 1992. Qualitative interaction in muhifactor studies. Biometrics (in press). Sasabuchi, S. 1980. A test of muhivariate normal means with composite hypotheses determined by linear inequaltties. Biometrika 67:429-439 1166