Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Linear regression wikipedia , lookup
Regression analysis wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Choice modelling wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Least squares wikipedia , lookup
Forecasting wikipedia , lookup
Statistics THE BISQUARE WEIGHTED ANALYSIS OF VARIANCE: A TECHNIQUE FOR NONNORMAL DISTRIBUTIONS Rebecca Anne Regeth & Wm Wren Stine University of New Hampshire The bisquare-weighted analysis of variance (bANOVA) is a technique that we have developed for comparing the weighted means of several groups where the weights are designed to reduce the influence of, or completely eliminate, outliers (Stine & Regeth, In preparation). Unlike the analysis of variance (ANOVA), the bANOVA maintains high power with nonnormal distributions. It also has nearly the power of the ANOVA when used with normal distributions, making the bAN OVA especially useful when it is difficult to tell if the underlying distribution is normal. The ANOVA is used to compare the arithmetic averages, or means, of two or more group (also called levels of an independent variable). One of the assumptions of the ANOVA procedure is that each group is sampled from a normally distributed population (Kirk, 1995, p. 99). Unfortunately, normal distributions are rare. For example, Micceri (1989) examined 440 data sets and found that 28% were symmetrical with moderate or heavy tails and 69% were moderately asymmetric with moderate or heavy tails. Heavy-tailed distributions, in particular, are very common. Bessel (1818, as cited in Hampel, Ronchetti, Rousseeuw, & Stahel, 1986, p. 22) examined a data set consisting of errors of observation of 300 star positions. Although this distribution has been widely cited as an example of a normal distribution (e.g., Maxwell & Delaney, 1990, p. 51), it appears to be heavy-tailed (Hampel et aI., 1986, p. 22; Stine & Regeth, In preparation). Many statistics books state that the ANOVA is robust with respect to violations of normality (e.g., Hays, 1981, p. 276: Keppel, 1982, p. 85-86, Kirk, 1995, p. 99). However, there is considerable evidence to suggest that even for distributions with Slightly heavy tails the procedure shows a drop in power. Power drops considerably with heavier taDed distributions. The main reason that power drops with heavy-tailed distributions is that the mean is influenced by outliers. One reason that previous Monte Carlo studies have not shown the ANOVA to be non-robust with respect to violations of normality is that these studies have not controlled for the inflation of variance when examining heavy-tailed distributions. We have found that when the normal distribution is contaminated with outliers, Type I error rates remain constant, but Type II error rates increase, decreasing power (Stine & Regeth, In preparation). Methods for dealing with heavy-tailed distributions include using data transformations, using nonparametric statistics, or using a technique that includes a measure of central tendency that is not sensitive to outliers (i.e., the bisquare-weighted average.) The problem with using data transformations is that the researcher must detect the non normal distribution (which is quite difficult). A problem with using nonparametric techniques (e.g., the rank ANOVA) is that these procedures lose power when samples are drawn from moderately heavy-tailed distributions (Stine & Regeth, In preparation). The bisquare-weighted ANOVA retains power with heavy-tailed distributions and has nearly the power of the ANOVA with normal distributions. Additionally, the bANOVA can be used "blindly." The researcher does not need to determine whether or not the distribution of the underlying population is normal. There are three basic steps used in calculating the bANOVA. The first step is to calculate the bisquare-weighted average of each group in the design. Next, each weight is used in order to calculate a weighted ANOVA. Third, the F-ratio from the weighted ANOVA is transformed to account for the change in degrees of freedom that occurs when the weights are used. The bisquare-weighted average, in turn, is calculated using an iterative process. The first iteration uses the median as the measure of central tendency (Eq. 1). Deviations from the median are found tor each score in the group. After calculating 1.483 times the median of the absolute values of the deviations (the Median Absolute Deviation, or MAD: this product is a robust estimate of the standard deviation, Hampel et aI., 1986, pp. 105 & 107), the weights for each score are calculated as a function each score's deviation from the median (with this deviation being scaled by 1.483 MAD: Eqs. 2, 3, and 4). These weights are then used to calculate the weighted average (Eq. (5». bw(O)(X.J ) MA0 (k) _ = 11!edian{ ~j I I=l, .. .,nj (1) = ~f1.~2'{\~j - ~E~{Xjjll} w ~j - b k-1) ( X) 1. 483MAD; (3) E.. - -------"'--'..... lJ w~). 677 (2) H~Jr ~J"I <r (4) NESUG '96 Proceedings Statistics R, L lI{f)X;j bW(k)(X.) = .!.:i=~!_ _ .J (5) II; "{"' W~~) i.J i=l 'J The bisquare weighting formula (Eq. 4) assigns weights near 1.0 to scores that are near the mean. Scores that are far from the mean get weights closer to zero. If a scaled deviation (Eq. 3) exceeds approximately four (r 4) times a robust estimate of the standard deviation (1.483 MAD), it is given a weight of zero. This removes the score from the sample. In the next iteration, the bisquare-weighted x. h j e: (1) 21 22 23 24 25 -1.349 -0.674 0.000 0.674 1.349 ,1 Next, the weights are calculated. The bisquare formula assigns weights near 1.0 to scores that are near the measure of central tendency. Scores that are far from the measure get weights closer to zero. If a score exceeds approximately four standard deviations it is given a weight of zero which removes the score from the sample. = average from the first iteration (bw(1)( A';! is used in Eq. (3) instead of the median. weights are assigned using Eq. (4) as before. Finally, a new bisquare-weighted average is found from the new weights (Eq. 5). When the new bisquare-weighted average is approximately equal to the old bisquare-weghted average (bW(k)( j ) i! bW(k-I)( j ) , the iterative x. x. process is stopped and the first step for calculating the bANOVA is finished. The second step involves using the weights (Wi~k» The bisquare-weighted average is compared to the median. If the two are approximately equal, the procedure is finished for this group. In this case from the final iteration to calculate a weighted ANOVA. One then transforms the original F-ratio using Eq. 6. The result of the transformation (Fbw ) is compared to a tabled value using the degrees of freedom from the original design. Fbw = (0.534+ 0.OO1206dfErroJF bw(l)(X ) An example of the calculation of a bisqua... weighted average and a bisquare-weighted ANOVA. (bANOVA) Below is an example of a one-way ANOVA design. There are three groups and five subjects per group. Group 2 Group 3 21,22, 23,24,25 Medlan=23 Mean=23 1,2,3, 4 10 Medlan=3 Mean=4 1,2,3, 4,100 Medlan=3 Mean=22 = 23= b¥1°) (X 1 X;2 e: (1) w(1) 1 2 3 4 10 -1.349 -0.674 0.000 0.674 4.720 0.786 0.944 1.000 0.944 0.000 3.674 ) so there ' i2 i2 w(I)X i2 i2 0.786 1.888 3.000 3.776 0.000 9.450 Notice that in the last row (a score of 10) the weight is zero, indicating that the score was beyond approximately four robust estimates of the standard deviation from the measure of central tendency and is thus considered too extreme to keep. It was removed from the sample at this point (i.e., w~i = 0.0). However, the weight may become nonzero in subsequent iterations. In the next iteration, the bisquare-weighted average from the first iteration is used in Eq. (3) instead of the median. The deviations of the scores from the bisquare-weighted average are calculated. As in the previous iteration, the absolute value of the deviations is found. Weights are assigned USing Eq. To calculate the bisquare-weighted average we start with Eq. (1). The first iteration uses the median Scaled as the measure of central tendency. deviations from the median are found for each score in the group. (MAD = 1.0) NESUG '96 Proceedings 102.559 4.459 is no need to continue to iterate. Next we will calculate the weights for Group 2. The first iteration for Group 2 is presented below: (6) Group 1 = .1 678 Statistics (4). Finally, a new bisquare-weighted average is found from the new weights (Eq. (5». The second iteration for Group 2 is below: i2 ;2 wi2(z)Xi2 -1.060 -0.386 0.288 0.963 5.009 .864 .981 .990 .888 .000 3.723 .864 1.963 2.970 3.550 0.000 9.346 £(z) X;z 1 2 3 4 10 w(z) point, from the first iteration, bw(l)(X:z) = 2572 bW(2)( x:z) = 2.510. while, from this iteration, At this gives the bANOVAonce the F is tranSformed into Fbw using Eq. 6). ANOVA summary table: Source Between Within Total Source Between Within Total The difference between these bW(k-I) (X:2~ ~ 0.001. 1 2 3 4 10 -1.012 -0.337 0.337 1.011 5.057 i2 wg) .864 .981 .990 .888 .000 3.724 (5) W;z X;2 .864 1.963 2.970 3.550 0.000 9.311 (~w'S)(X.3) - bW(4)(X:3)1~ 0.(01). Here are the results for Group 3 (bw(S) (X:3 ) X;3 1 2 3 4 10 £ (5) w(S) i3 i3 -1.012 -0.337 0.337 1.011 65.745 .876 .986 .986 .876 = 2.5(0): (5) W i3 X;3 .876 1.972 2.958 3.505 .000 0.000 3.724 9.311 P .434 SS 1172.18 17.04 1189.23 df 2 10 12 MS 586.09 1.70 Fbw~ 187.77 .0001 The code presented in the appendix calculates a two-way ANOVA and a two-way bANOVA. The former is included for comparison to the later. We wUl briefly describe only the calculation of the weights and bisquare-weighted averages for each cell of the two-way design (using the NLiN procedure) and, more briefly, the weighted ANOVA (with the GLM procedure) using the weights from the previous step. Most of the code presented for the bANOVA merely creates a SAS data set that the procedure NLIN can use for calculating the bisquare averages. As no "interesting- routines are used in these procedures (indeed. they are quite tedious), we will avoid reviewing them in detail. The SAS data set that NUN reads (TEST) contains columns for the variables IVA (naming the levels of independent variable A). IVB (naming the levels of independent variable B), DVA (the dependent variable), MEDKY (which holds the medians for each of the groups), DIFF (which simply equals zero), and MAD (containing the median absolute deviations for each of the groups). NLiN calculates a bisquare-weighted average for each group of the two-way design (BY IVA IVB) and outputs the weights for each case of each cell. These weights, as mentioned above, will be used by GLM to calculate a weighted ANOVA. Procedure NUN requires one to use the PARMS statement to supply an initial value for the parameter to be estimated (with the estimate calculated iteratively, starting with the initial value, in order to minimize the sum of squared errors). For the bisquare-wieghted averages we wish to use the median of a particular group for the initial value. Group 3 took five iterations before the difference was small F .894 Description of SAS Routines. results for Group 2: £(5) MS 571.67 639.17 As you can see, the two extreme scores in groups 2 and 3 had a large impact on the ANOVA. However, these scores were given weights of zero and therefore are not reflected in the Fbw and the bANOVA summary table. At the 5th iteration, the difference was less than Here are the 0.001 with bw(S) (X:2) = 2.500 . X;2 df 2 12 14 bANOVA summary table: numbers is 0.062. The researcher needs to decide just how close these two numbers need to be. We used a difference of 0.001. Therefore, the procedure will be repeated again. until ~w'k) (X:2) - SS 1143.33 7670.00 8813.33 To show the usefulness of the bANOVA with a heavy-tailed distribution, we have calculated an ANOVA on the original scores and a weighted ANOVA using the final weights (W~k» from each of the bisquare-weighted averages (which, of course, 679 NESUG '96 Proceedings Statistics Unfortunately, one has to supply an explicit value (not a variable) in the PARMS statement. So, in the PARMS statement, we set B (which will be the bisquare-weighted average for the cell defined by the factorial combination of IVA with IVB) equal to 1. The next statement redefines B to the initial value that we will actually use. For iteration (i.e., no iterative fits have been calculated: _ITER-=O) and the first case LN_=1) we define the initial value of B to be the median of the group (MEDKY). In the MODEL statement we specify that we wish to chose B (the bisquare-weighted average) so that the difference between B and the dependent variable (OVA) is as small as possible (I.e., the difference is as dose to zero, the value of DIFF, as possible). NLiN also needs an expression for the first derivative of the model with respect to the parameter to be estimated. This derivative is specified by the DER.B statement. R and SCALE are the rejection point and the robust measure of variability, respectively. RESID is the scaled residual where MODEL.DIFF is the estimated model (the later variable created by the NLiN procedure as the value of the predicted score for the given case and iteration). The two sets Of IF-THEN ELSE statements define the bisquare-weighting function (where WS is the variable that contains the weights) by first creating an influence function (PSYS) and then generating the weights (see Hampel et aI., 1986, Ch. 2). We then replace the weight used by the NLIN procedure LWEIGHTJ with the weight we have calculated. Finally, the calculated weight is indicated for output to the SAS data set that we wish to pass on to the GLM procedure (by the 10 statement) and this data set (named WEIGHTS) is created by the OUTPUT statement. The GLM procedure calculates a weighted ANOVA using the weights that we calculated in the NLIN procedure (which were stored in the SAS data set WEIGHTS). Notice that this code is identical to that of the two-way ANOVA with the exception that a WEIGHT statement is used. The weights (WS), of course, are those from the NLIN procedure where the bisquare-weighted averages were calculated for each cell (or group) of the design. The procedures that follow the GLM print results and calculate transformed F-ratios. Again, the programming is rather straight forward and will not be reviewed. To convert this code calculate a bANOVA for a one-way design (a single independent variable) one simply removes IVB from the routines (hence, statements such as BY IVA IVB become BY IVA). The GLM procedure is then modified to calculate a one-way ANOVA and the single resulting F-ratio is transformed in the last procedure. ° NESUG '96 Proceedings 680 References Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (1986). Robust statistics: The aWoach based on influence functions. NY: Wiley. Keppel, G. (1982). pesign and analysis· A researcher's handbook, 2nd Ed., Englewood Cliffs, NJ: Prentice-Hall. Kirk, R. E. (1995). Experimental design: Procedures for the behavioral sciences, 3rd Ed., Belmont, CA: Brooks/Cole. Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and analyzing data: A model comparison Def§PBCtive. Pacific Grove, CA: Brooks/Cole. Micceri, T. (1989). The Unicorn, the normal curve, and other improbable creatures. PsychQlogical Bulletin, W, 156-166. Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression: A second course in statistics. Reading, MA: Addison-Wesley. SAS Institute, (1990). SAS language: Reference. Version 6, First Edition, Cary, NC: SAS Institute. Stine, W. W., & Regeth, R. A. (In preparation). Parametric statistics and power in the face of heavy tails: I. Univariate omnibus tests for completely randomized designs. Appendix: SAS Routines for Calculating the ANOVA and bANOVA for a two-way design. Two SAS routines are presented for conducting the Analysis of Variance techniques that we described. They are appropriate for a Completely Randomized Factorial design with two independent variables that have p and q levels, respectively (CRF-pq). The ANOVA calculates an AnalySis of Variance. The bANOVA calculates a BisquareWeighted Analysis of Variance. The bANOVA routines also include procedures that print the weights for individual cases to assist the analyst in identifying outliers, etc. This code has been tested on SAS Production Release 6.08 using a Digital Equipment Corporation VAX-8820 under the VMS Version 5.5-2 operating system. Perhaps the most effective way to use the code presented below would be to optically scan the text into a microcomputer (e.g., a Macintosh) and then pass the code over to a machine with SAS. In most environments, only the file specifications would Statistics INPUT IVA IVB DVA; /* Tells SAS to use the first and second columns of values from NUMBER3.DAT as the values for the independent variables and the third column of values as the values for the dependent variable. Names these IVA, IVB, and DVA. * / DIFF = 0; /* DIFF will be the difference between the mean of the DVA and the bisquare weigrted average (see below). It is now set to zero. */ OUTPUT; /* SAS puts these values into RA WDATA. */ RUN; /* Forces the execution of the DATA step. */ have to be altered for the code to work. The code can also be used as a model for the development of these analyses in other statistical packages. ANOVA for CRF·pq DATA RAWDATA; /* Names the SAS data set that will be created RAWDATA. */ INFILE 'NUMBER2.DAT'; /* Tells SAS that the data are in a file named NUMBER2.DAT (VMS specific). */ INPUT IVA IVB DVA; /* Indicates to $AS to use the first and second columns of values from NUMBER2.DAT as the values for the independent variables and the third column of values as the values for the dependent variable. Names these IVA IVB and ' , DVA. */ OUTPUT; /*Tells $AS to put these values into RAWDATA */ OPTIONS LlNESIZE = 80; /* Displays the output as 80 characters wide. */ PROC UNIVARIATE DATA = RAWDATA NOPRINT' /* Provides descriptives (such as the mean and median) for the data set RAWDATA. */ VARDVA; /* Indicates that DVA is to be analyied from the data set. */ BY IVA IVB; /* SAS executes this step for each group defined IVA and IVB. */ OUTPUT OUT = MEDIANS MEDIAN = MEDDVA; /* Puts the results into a file called MEDIANS and names the median MEDDVA. */ RUN; /* Forces the execution of the PROC step. */ . RUN; /* Forces the execution of the DATA step. */ OPTIONS LlNESIZE = 80; /* Displays the output as 80 characters wide */ . PROC GLM DATA = RAWDATA; /* Uses the General Unear model on the RAWDATA data set. */ CLASS IVA IVB; /*Classifies IVA and IVB as the nominal level variables. * / MODEL DVA = IVA IVB IVA*IVB' /* Defines DVA as the dependent variable and IVA and IVB as the independent variables. * / RUN; . . /* Forces the execution of the GLM step. */ DATA EXTEND; /* Creates a SAS data file called EXTEND. */ MERGE RAWDATA MEDIANS; /* EXTEND contains the data from RAWDATA and MEDIANS. */ BY IVA IVB; /* Arranges the data by IVA and IVB. */ IF _~ = 1 THEN DO; /* This procedure extends the number of groups in MEDIANS to match the number of subjects in RA WDATA. * / IVAOLD=O; IVBOLD = 0; END; /* Ends the IF, THEN loop. */ IF IVAOLD A= IVA AND IVBOLD "'=IVB THEN DO; /* This is a continuation of the above procedure. For example, if MEDIANS has MEDIANS has p groups and RAWDATA has 30 bANOVA for the CRF.pq /* This block reads the data from NUMBER3.DAT into a SAS data file called RAWDATA and sets up a variable named DIFF.RAWDATA will contain 4 variables: IVA IVB, DVA, and DIFF. */ ' DATA RAWDATA; /* Names the SAS data set that will be created RAWDATA. */ INFILE 'NUMBER3.DAT'; /* Tells SAS that the data are in a file named NUMBER3.DAT. */ 681 NESUG '96 Proceedings Statistics subjects then EXTEND will have 120 rows of data. */ IVAOLD = IVA; IVBOLD = IVB; MED = MEDDVA; /* Names the median MEDDVA. */ END; /* Ends the IF, THEN loop. */ ABDDVA = ABS(DVA - MED); /* ABDDVA is the absolute value of the deviation of OVA from the median (MED). */ RETAIN MEDDVA; /* Tells SAS to keep the new medians. */ DROP IVAOLD IVBOLD MED; /* Tells SAS to drop the old IVAs, IVBs, and medians. */ OUTPUT; /* Tells SAS to put these values into EXTEND. */ RUN; /* Forces the execution of the DATA step. */ PROC DATASETS NOUST; /* Deletes the MEDIANS data file. */ DELETE MEDIANS; RUN; /* Forces the execution of the PROC step. */ /* This block sets up a data set called EXTEND from the data contained in RA WDATA and MEDIANS. MEDIANS contains the median values for each group. EXTEND will contain 6 variables: IVA, IVS, DVA, DIFF, the group medians (MEDDVA) and the absolute median deviation for each group (ABDDVA). */ PROC UNIVARIATE DATA = EXTEND NOPRINT; /* Gives descriptives on EXTEND such as the mean and median for each group defined by the IVA and IVB. */ VAR ABDDVA DVA; /* Indicates that ABDDVA and DVA will be analyzed from the data set EXTEND. */ BY IVA IVB; /* SAS executes this step for each group defined by the IVA and IVB. */ OUTPUT OUT = MEDABDEV MEDIAN = MAD2DVA MED2DVA; /* Puts the results into a file called MEDABDEV and names the median of ABDDVA MAD2DVA and the median of OVA MED2DVA. */ RUN; . /* Forces the execution of the PROC step. */ NESUG '96 Proceedings PROC DATASETS NOUST; /* Deletes the EXTEND data file. */ DELETE EXTEND; RUN; /* Forces the execution of the PROC step. */ DATA TEST; /* Creates a SAS data file called TEST that contains the data from RAWDATA and MEDABDEV. */ MERGE RAW DATA MEDABDEV; BY IVA IVB; /* Arranges the data by the IVA and IVB. */ IF _N_ = 1 THEN DO; /* This procedure extends the number of groups in MEDABDEV to match the number of subjects in RAWDATA. */ IVAOLD= 0; IVBOLD = 0; END; /* Ends the IF, THEN loop. */ IF IVAOLD A= IVA AND IVBOLD ~IVB THEN 00; /* This procedure renames several variables. */ IVAOLD = IVA; /* Renames IVA to IVAOLD. */ IVBOLD = IVB; /* Renames IVB to IVBOLD. */ MAD = MAD2DVA; /* Renames MAD2DVA to MAD. */ MEDKY = MED2DVA; /* Renames MED2DVA to MEDKY. */ END; /* Ends the IF, THEN loop. */ RETAIN MAD MEDKY; /* Tells SAS to keep MAD and MEDKY. */ DROPMAD2DVA MED2DVA IVAOLD IVBOLD; /* Tells SAS to drop MAD2DVA, MED2DVA, IVAOLD, and IVBOLD. */ OUTPUT; /* Tells SAS to put these values into TEST. */ RUN; /* Forces the execution of the DATA step. */ PROC OATASETS NOUST; /* Deletes the RAWDATA and MEDABDEV files. */ DELETE RAWDATA MEDABDEV; RUN; /* Forces the execution of the PROC step. */ 682 Statistics 1* If the absolute value of the residual is /* The following block calculates a bisquare weighted average for each cell and outputs the weights for each element in the cell into a file called WEIGHTS. WEIGHTS will contain 7 variables: IVA, IVB, DVA, DIFF, MED2DVA, MAD20VA, and WS. These weights will later be used to calculate the weighted ANOVA * / outside of 4 units of the middle of the distribution (R), then the residual is given a weight of zero. *1 IF RESID A= 0 THEN WS == PSYS / RESIO; /* If the value of the residual is not equal to zero, then the weight of the residual equals the influence function value divided by the residual. PROC NUN DATA = TEST NOHAlVE; /* Fits a nonlinear regression model using the least squares procedure. NaHALVE turns off the step-size search during iteration. (See Example 5; SAS Institute, 1990, p. 1165.) */ TITLE 'Tukey biweight'; /* Title for this section. * / BY IVA IVB; /* Executes the procedure for each group. * / PARMS B = 1; /* B represents the bisquare weighted average. It is nominally set to 1.0. */ IF _ITEIl-=O AND _N_= 1 THEN B = MEOKY; /* For the first iteration, B is initialized to the median value. * / MODEL DIFF = (OVA - B); /* The model is set as the difference (OIFF) between the OVA and B. * / OER.B = -1; /* The derivative of the model with respect to B is -1. */ *1 ELSE WS = 1.; 1* If the residual equals zero, then the weight for that case equals 1.0 (B at the center of the distribution). *1 _WEIGHT_ = WS; 1* This replaces SAS's NUN weight value with the weight value calculated from the biweight procedure. * / 10 WS; /* This specifies that the variable WS will be output to the SAS data set. */ . OUTPUT OUT=WEIGHTS; 1* Puts results into a file called WEIGHTS. * / RUN; 1* Forces the execution of the PROC step. * / PROC OATASETS NOLlST; 1* Deletes the TEST data file. *1 OELffiTEST; RUN; R = 4; /* Forces the execution of the PROC step. * / /* This defines the rejection point as 4 scale units beyond the middle of the distribution. * / SCALE = 1.483 * MAD; /* SCALE is the unit of measurement for the residuals. MAD is the measure of scale. MAD times 1.483 provides an unbiased estimate of the standard deviation when sampling from a normal distribution (Hampel et aI., 1986, Ch. PROC PRINT DATA = WEIGHTS; 1* Prints the data set with the weights for outlier identification*/ RUN; 1* The following block calculates the weighted ANOVA from the weights obtained from the 2). *1 previous procedure. The results are output into the file ANOVA which will contain 7 variables: IVA, IVB, OVA, OIFF, MAD, MEOKY, and WS. *1 RESIO = (OIFF - MOOEL.OIFF)/SCALE; 1* MOOEL.OIFF is the estimated deviation of B from OVA. RESIO provides a scaled normalized estimate of the residuals. *1 IF ABS (RESIO) < R THEN PSYS = RESIO*( (1 -(RESID/R)**2) **2); 1* If the absolute value of the residual is within 4 units of the middle of the distribution (R) then the residual receives a non-zero weight using the following influence function (Mosteller & Tukey, 1977, p. 205). The influence function is called the psy function (Hampel et aI., 1986, Ch. 2). */ ELSE PSYS = 0.; PROC GLM DATA = WBGHTS OUTSTAT = ANOVA NOPRINT; /* This uses the general linear model on the WEIGHTS data set and puts the results into ANOVA *1 CLASS IVA IVB; 1* Classifies IVA and IVB as the variables of interest. * / MODEL DVA = IVA IVB IVA*lVB; /* This defines OVA as the dependent variable, IVA and IVB as the independent variables, and IVA*lVB as the interaction. * / 683 NESUG '96 Proceedings Statistics PROBAFA = PROBF(APPFA, OF, DFERROR, 0); /* This finds the probability value for the APPFA value. */ END; /* Ends the IF, THEN loop. */ RETAIN SSA FA APPFA PROBA PROBAFA; /* This tells SAS to keep these variables. */ IF _TYPE- = 'ss l' AND _SOURCE- = 'IVB' THEN DO; /* This labels the sum of squares for the B effect SSB. */ SSB = SS; FB = F; /* This labels the F value for the B effect FB. WEIGHT WS; /* This indicates that the OVA will be weighted using the weights (WS) obtained in the above procedure when calculating the ANOVA. */ RUN; /* Forces the execution of the GLM step. * / PROC DATASETS NOUST; /* Deletes WEIGHTS. */ DELETE WEIGHTS; RUN; /* Forces the execution of the PROC step. */ /* This block calculates the F value for the weighted ANOVA. It also calculates approximate F values (APPFA, APPfB, and APPFAB) based on the regression equation in equation (6) and the corresponding probability value (PROBFA). These values are listed in OUTF: NAME (OVA), SOURCE (error, IVA, IVB, IVAB), TYPE (error, SSl), OF, SS, F, PROB, DFERROR, SSERROR, SSA, FA, APPFA, PROBA, PROBAFA, SSB, FB, APPFB, PROBB, PROBAFB, SSAB, FAB, APPFAB, PROBAB, and PROBAFAB. */ DATAOUTF; /* Creates SAS data set called OUTF. SET ANOVA; /* Uses the ANOVA file. */ */ APPFB = (.534 +(.001206 * OF ERROR» * FB; /* This calculates an F value (APPFB value) based on the regression equation (see equation (6». */ PROBB = PROB; /* This labels the probability value for the B effect PROBB. */ PROBAFB = PROBF(APPFB, OF, DFERROR, 0); /* This finds the probability value for APPFB. */ END; /* Ends the IF, THEN loop. */ RETAIN SSB FB APPFB PROBB PROBAFB; /* This tells SAS to keep these variables. */ IF _TYPE_ = '551' AND _SOURCE- = 'IVA*lVB' THEN DO; /* This labels the sum of squares for the AB effect SSAB. */ SSAB = SS; FAB = F; /* This labels the F value for the AB effect FAB. */ APPFAB = (.S34 + (.001206 * DFERROR» * FAB; /* This calculates an F value (APPFAB value) based on the regression equation (see equation (6». */ PROBAB = PROB; /*This labels the probability value for the AB effect PROBAB. */ PROBAFAB = PROBF (APPFAB, OF, DFERROR, 0); /* This finds the probability value for the APPFAB value. */ OUTPUT; /* This puts the output into OUTF. */ */ IF _TYPE_ .. 'ERROR' THEN DO; /* This labels the error term in the ANOVA file to DFERROR and the sum of squares term to SSERROR. */ DFERROR = OF; SSERROR = SS; END; /* Ends the IF, THEN loop. */ RETAIN DFERROR SSERROR; /* This tells SAS to keep the DFERROR and SSERROR terms. */ IF _TYPE-. = 'ss 1' AND _SOURCE- = 'NA' THEN DO; /* This labels the sum of squares for the A effect SSA and labels the F value for the A effect FA. */ SSA = SS; FA = F; APPFA = (.534 + (.001206 * DFERROR» * FA; /* This calculates an approximate F value (APPFA value) based on the regression equation (see equation (6». */ PROBA == PROB; /* This labels the probability value for the A effect PROBA. */ NESUG '96 Proceedings 684 Statistics END; /* Ends the IF, THEN loop. */ RUN; /* Forces the execution of the DATA step. */ PROC PRINT DATA = OUTF; /* Prints OUTF. */ RUN; /* Forces the execution of the PROC step. */ 685 NESUG '96 Proceedings