Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SAS/STAT SOFTWARE ON THE IBM PC UNDER PC DOS Gerhard Held, SAS Institute GmbH Following base SAS software now SAS/STAT software is available on the IBM PC under PC DOS. SAS/STAT software is the faithful implementation of the most prominent statistical procedures of the mainframe SAS System on the IBM PC. The following methods are included in SP~/STAT currently: ANOVA CATMOD CANCORR DISCRIM FACTOR FREQ GLM NPARIWAY ORTHOREG REG SCORE TTEST analysis of variance for balanced data including repeated measures designs fits. linear models to functions of categorical data using linear modeling, logistic regression, and repeated measures analyses canonical correlation, partial canonical correlation, and canonical redundancy analysis classification of observations assuming multivariate normal dj.stribution within each class several types of' common factor analysis with orthogonal and oblique rotations contingency tables, stratified analysis (Cochran-Mantel-Haenszel statistics), relative risk estimates analysis of variance for unbalanced designs, MANOVA, analysis of covariance, polynomial regression nonparametri.c one-way analysis of rank scores orthogonal regression using the Gentleman-Givens method for ill-conditioned data . multiplp regression with regression diagnostics, stepwise regression, and all subsets regression constructs new var:lables which are linear combinations of coefficients (e.g., factor-scoring coefficients or parameter estimates from a linear model) and variables in another data set computes t-tests for the means of two groups As the uiajor features of the SAS/STAT software procedures are already well known from the SAS Users's Guide: Statistics, Version 5 Edition, we like to concentrate our discussion on the d:lfferences between the mainframe and the SAS/STAT software version of these procedures. III aedition to the changes and enhancements described below, Version 6 of the ANOVA, GLM, and REG procedures can now run interactively. For example, with the REG ·procedures you can now look at the printout of diagnostics for the model, decide to delete an observation, and re-fit the medel without ever leaving the procedure. Thil" saves computer t1me and more importantly allows you to more easily and quickly fit a model to your data. With the GLM procedure, you can inspect tr.e ANOVA table or ask to see the expected mean squares before proceeding with thE: analysis. The following example (Output 1) sho'", the usefulness of thp interactive features: \. 55 Output 1 Using PROC REG Interactively j*--------------------------------------------------- *j Example of an interactive regression analysis */ j*----------------------------------------------------*j 1* data class; title 'Interactive Analysis with PROe REG'; input name $ height weight age; cards; alfred alice barbara carol henry james jane janet jeffrey john joyce judy louise mary philip robert ronald thomas william 69.0 56.5 65.3 62.8 63.5 57.3 59.8 62.5 62.5 59.0 51.3 64.3 56.3 66.5 72.0 64.8 67.0 57.5 66.5 112.5 84.0 98.0 102.5 102.5 83.0 84.5 112.5 84.0 99.5 50.5 90.0 77.0 112.0 150.0 128.0 133.0 85.0 112.0 14 13 13 14 14 12 12 15 13 12 11 14 12 15 16 12 15 11 15 proc reg; model weight = age height; id name; run: print influence; run; delobs 16; run; print; run; delete age; run; endsasi 56 Interactive Analysis with PROC REG Model: MODELl Dep Variable: WEIGHT Analysis of Variance Source DF Sum of Squares F Value Prob>F Model Error C Total 2 16 18 7215.63710 2120.09974 9335.73684 3607.81855 132.50623 27.228 0.0001 11.51114 100.02632 11.50811 R-Square Adj R-Sq 0.7729 0.7445 Root MSE Dep Mean C.V. Mean Square Parameter Estimates Variable DF INTERCEP AGE HEIGHT 1 1 1 Parameter Standard Estimate -141. 223763 1.278393 3.597027 T for HO: Error Parameter=o Prob > ITI 33.38309350 3.11010374 0.90546072 -4.230 0.411 3.973 0.0006 0.6865 0.0011 Interactive Analysis with PROC REG Obs {: !i, '. y": Y f ~ ~~, ,:' i ~i l' ~ ft ~ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 K Obs ~~ ~: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ~. t !Ii [i h ~: ~ NAME Residual Rstudent Hat Diaq H Cov Ratio alfred alice barbara carol henry -12.3686 5.3727 -12.2812 -0.0670. -2.5849 2.7734 -4.7191 9;7337 -16.2095 13.1585 -6.8660 -17.9625 0.3705 -5.1544 11.7836 20.7957 14.0471 5.3324 -5.1544 -1.2231 0.5119 -1.1679 -0.{)059 -0.2256 0.2474 -0.4218 0.9752 -1.5110 1. 2209 -0.7101 -1.7069 0.0334 -0.4669 1. 2084 2.5578 1.3348 0.5065 -0.4669 0.2043 0.2071 0.1465 0.0772 0.0677 0.1074 0.1037 0.2504 0.0619 0.0965 0.3163 0.0643 0.1305 0.1251 0.2617 0.3283 0.1234 0.2025 0.1251 1.1468 1.4532 1. 0952 1.3152 1 ames ane janet jeffrey john ~oyce Judy louise mary philip robert ronald thomas william NAME alfred alice barbara carol henry ~ames ane janet jeffrey john joyce judy louise mary philip robert ronald thomas william AGE Dfbetas HEIGHT Dfbetas 0.3296 .0.1651 0.3470 -0.0009 -0.0258 -0.0089 0.0862 0.5008 0.1493 -0.1831 -0.0676 -0.1051 0.0011 -0.0945 0.1525 -1. 6002 0.2244 -0.1798 -0.0945 -0.5129 -0.2242 -0.3822 0.0007 0.0137 -0.0282 -0.0396 -0.4014 -0.1280 0.-0334 0.3095 -0.0078 -0.0067 0.0208 0.2414 1.5050 -0.0035 0.0722 0.0208 Sum of Residuals -7.38964E-13 Sum of Squared Residuals 2120.0997 Predicted Resid SS (Press) 3272.7219 57 1.~86 1. 432. 1. 3070 1. 3464 0.8457 1. 0109 1. 6074 0.7615 1. 3955 1. 3284 1. 2441 0.6100 0.9886 1. 4464 1..3284 Dffits INTERCEP Dfbetas -0.6197 0.2617 -0.4839 -0.0017 -0.0608 0.0858 -0.1435 0.5636 -0.3880 0.3989 -0.4830 -0.4476 0.0129 -0.1765 0.7195 1. 7883 0.5007 0.2553 -0.1765 0.4335 0.1846 0.1928 -0.0001 0.0047 0.0635 -0.0481 0.0779 0.0029 0.1940 -0.4550 0.1115 0.0106 0.0729 -0.5718 -0.5029 -0.i!466 0.1113 0.0729 Observation 16 was deleted. Interactive Analysis with PROC REG Model: MODELl Dep Variable: WEIGHT Analysis of Variance Source DF Sum of Squares F Value Prob>F Model Error C Total 2 15 17 7033.50277 1476.23335 8509.73611 3516.75138 98.41556 35.734 0.0001 9.92046 98.47222 10.07438 R-Square Adj R-Sq 0.8265 0.8034 Root MSE Dep Mean C.V. Mean Square Parameter Estimates Variable DF INTERCEP AGE HEIGHT 1 1 1 Parameter Standard Error T for HO: Parameter=O Prob > ITI 29.32075393 3.16164072 0.90539702 -4.323 1.761 2.676 0.0006 0.0986 0.0173 Estimate -126.756480 5.567408 2.422625 Interactive Analysis with PROC REG Obs i'· ;;: ;;.. i:; r) t. ~" \; ~,~ ,. ;' ;, I ~ ri r): ~,. G~ " ~ , 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 NAME Residual Rstudent alfred alice barbara carol henry james jane janet jeffrey john ~oyce udy louise mary philip robert ronald thomas william -5.8484 1.5018 -5.8173 -0.8281 -2.5239 4.1312 -0.4254 4.3313. -13.0339 16.5127 -8.2657 -16.9620 0.5538 -5.8592 13.2489 -0.6776 0.1669 -0.6474 -0.0840 -0.2552 0.4293 -0.0445 0.5072 -1. 4126 1.9202 -1. 0106 -1.9212 0.0578 -0.6186 1. 6439 13.9295 11. 2140 -5.8592 1.5714 1.3457 -0.6186 Hat Diaq H Cov Ratio Dffits INTERCEP Dfbetas 0.2703 0.2304 0.2114 0.0781 0.0677 0.1103 0.1323 0.2957 0.0775 0.1139 0.3193 0.0659 0.1306 0.1259 0.2651 0.4888 0.1234 0.2563 0.1259 1. 5300 1. 5887 1:4276 1.3321 1. 3011 1.3292 1.4170 1. 6535 0.8940 0.6884 1. 4629 0.6525 1. 4137 1. 2977 0.9856 -0.4124 0.0913 -0.3352 -0.0245 -0.0688 0.1511 -0.0174 0.3287 -0.4095 0.6886 -0.6922 -0.5102 0.0224 -0.2347 0.9873 0.2854 0.0655 0.1449 -0.0020 0.0053 0.1036 .-0.0035 0.0658 0.0382 0.2502 -0.6499 0.1384 0.0180 0.0913 -0.7864 0.8619 1.1481 1. 2977 0.5896 0.7899 -0.2347 -0.2834 0.2307 0.0913 AGE Dfbetas HEIGHT Dfbetas 0.2698 0.0617 0.2681 -0.0129 -0.0244 -0.0260 0.0121 0.2961 0.2170 -0.3895 -0.1176 -0.0588 0.0013 -0.1159 0.1176 -0.3591 -0.0786 -0.2841 0.0099 0.0130 -0.0299 -0.0078 -0.2509 -0.1974 0.1824 0.4148 -0.0473 -0.0097 0.0331 0.3398 0.2281 -0.6111 -0.1159 -0.0075 0.3545 0.0331- Sum of Residuals Sum of Squared Residuals Predicted Resid SS (Press) -3.2685E-13 1476.2333 2109.4445 Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 NAME alfred alice barbara carol henry james jane janet jeffrey ~ohn oyce judy louise mary philip robert ronald thomas william 58 Int~ractive Analysis with PROC REG Model: MODELl Dep Variable: WEIGHT Analysis of variance Source Model Error C Total DF sum of Squares Mean Square 1 :16 17 6728.33067 1781. 40544 8509.73611 10.55167 98.47222 10.71538 Root MSE Dep Mean C.V. F Value Prob>F 6728.33067 111. 33784 60.432 0.0001 R-Square Adj R-Sq 0.7907 0.7776 Parameter Estimates Variable DF Parameter Estimate INTERCEP HEIGHT 1 1 -137.682825 3.796705 Standard Error Par,ameter=o Prob > ITI 30.48004126 0.48839879 -4.517 7.774 0.0004 0.0001 \ \. 59 T for HO: The ANOVA Procedure You can use the ANOVA procedure in interactive mode. After specifying and running a model, a variety of statements (such as MEANS, MANOVA, TEST, and REPEATED) can be executed without recalculating the model sum of squares. The CLM option, specified in the MEANS statement, requests that the result of the BON, GABRIEL, SCHEFFE, SIDAK. SMM. T, and LSD options be presented as confidence intervals for the mean of each level of the variahles specified in the MEANS statement. The DISCRUr Procedure The following TESTWEIGHT. statements are new in Version 6: FREQ, WEIGHT, TESTFREQ. and With the FREQ statemen.t you can name a variable which represents tht' occurrence for the other values in the observation. If you want to use relative weights for each observation in the input data set. place the weights in a variable in tht' data set and specify the name in a WEIGHT statement. If a variable in the TESTDATA = data set represents the frequency of occurrence for the other values in the observation. include the variable's name in a TESTFREQ statement. If you want to use relative weights for each observation in the TESTDATA = data set then specify the variable's name in a TESTWEIGHT statement. The following new or enhance options are now available in the PROC DISCRU! statement ~ TSSCP prints the total-sample corrected SSCP matrix. WSSCP prints the within-class corrected SSCP matrix for each class level. BSSCP prints the between-class corrf'cted SSCP matrix. TCOV prints total-sample covariances. WCOV prints t..Tithin-class covariances for each class level. PCOV prints pooled withi.n-class covariances. BCOV prints between-cla.ss covariances. TCORR prints total-sample correlations. WCORR prints within-class correlations for each class level. PCORR prints pooled within-claps correlations. BCORR prints between-class correlat:l.ons. 60 ANOVA prints univariate statistics for testing the hypothesis that the class means are equal in the population for each variable. MANOVA prints multivariate statistics for testing the hypothesis that the class means are equal in the population. MAHALANOBIS ALL NOPRINT print Mahalanobis distance between classes. prints all the statistics. suppresses the printout. SINGULAR specifies the criterion for determing the singularity of a covariance matrix. OUTSTAT= names an output SAS data set containing various statistics such as means, standard deviations, correlations, and coefficients of the discriminant functions. TESTOUT= names an output SAS data set containing all the data from the TESTDATA= data set plus the posterior probabilities and the class into which each observation is classified. The GLM Procedure You can use the GLM procedure in interactive mode. After specifying and running a model, a variety of statements (such as MEANS, MANOVA, TEST, and REPEATED) can be "xecuted without recalculating the Ulodel sum of squares. The CLM option, specified in the MEANS statement, requests that the result of the BON, GABRIEL, SCHEFFE, SIDAK, SMM, T and LSD options are presented as confidence intervals for the mean of each level of the variables specified in the MEANS statement. In addition to thE< options available in Version 5, the following option is available in the PROC GLM statement: OUTSTAT=SASdataset names an output data set that will contain sums of squares, F statistics, and probability levels for each effect in the model, as .rell as for each CONTRAST statement used. If the CANONICAL option of the MANOVA statement 1.s used and there is no M= specification, the data set also contains the re,mlts of the canonical analysis. The following example (01Jtput 2) illustrate the OlTTSTAT= opcion of the GLM procedure and how to write a customised report of the analysis of variance results. 61 Output 2 PROC GLM Shows Use of Output Data Set 1*--------------------------------------------------*1 1* GLM - shows use of output dataset *1 I*--------------------------~-----------------------*I options nodate nonumber; titlel; data test; input a b y aa $ bb $ cards; 1 3 18 levell leve13 1 1 20 levell level1 1 1 22 level1 level1 1 1 24 leve11 level1 1 4 18 1eve11 1eve14 1 2 14 level1 leve12 1 2 19 level1 leve12 1 2 15 levell level2 1 3 19 levell leve13 1 3 21 levell leve13 2 4 14 leve12 leve14 2 1 15 leve12 levell 2 1 16 level2 levell 2 2 19 level2 leve12 2 4 12 leve12 leve14 2 2 24 leve12 leve12 2 3 10 leve12 leve13 1 4 19 levell leve14 2 3 18 leve12 level3 2 3 20 leve12 leve13 run; proc glm data=test outstat=new; class aa bb; model y=aalbb/ss4; means aalbb/duncan; run; PRINT OUT THE STATISTICS proc print data=new; run: 1* *1 data null ; file prInt: set new; retain edf: if type = 'ERROR' then edf = df; else do;put @1 'F test for effect' _source_ $ , was' F 8.2 " with' df 3. , and ' edf 3. ' degrees of freedom.' I @1 'Probability of a larger F is ' prob 6.4 " which is' @; if prob < .001 then put' highly' @; else if prob < .01 then put' very' @; else if prob < .05 then put' , @; else put ' not I @; put 'significant.' @ I; end; put I; run: 62 General Linear Models Procedure Class Level Information Class Levels Values AA 2 levell leve12 BB 4 leve11 leve12 leve13 leve14 Number of observations in data set = 2'0 General Linear Models Procedure Dependent Variable: Y OF Sum of Squares Mean Square F Value Pro> F Model 7 164.3833'3333 23.48333333 2.87 0.0523 Error 12 98.16666667 8.18055556 Corrected Total 19 262.55000000 Source Source AA BB AA*BB R-Square C.V. Root MSE Y Mean 0.626103 16.023345 2.8601671 17.85000000 OF Type IV SS Mean Square F Value Pr > F 1 3 3 29.0083333 25.5799320 105.9799320 29.0083333 8.5266440 35.3266440 3.55 1.04 4.32 0.0841 0.4091 0.0278 63 General Linear Models Procedure Duncan's Multiple Ranqe Test for variable: Y NOTE: This test controls the type I comparisonwise error rate, not the experimentwise error rate Alpha- 0.05 df= 12 MSE= 8.180556 WARNING: Cell sizes are not equal. Harmonic Mean of cell sizes- 9.9 Number of Means 2 Critical Ranqe 2.7956226 Means with the same letter are not siqnificantly different. Duncan Groupinq Mean N A A A 19.000 11 levell 16.444 9 leve12 AA General Linear Models Procedure Duncan's Multiple Ranqe Test for variable: Y NOTE: This test controls the type I comparisonwise error rate, not the experimentwise error rate Alpha= 0.05 df= 12 MSE= 8.180556 WARNING: Cell sizes are not equal. Harmonic Mean of cell sizes= 4.897959 Number of Means 2 3 critical Ranqe 3.9745555 4.1634834 4 4.289752 Means with the same letter are not siqnificantly different. N BB Duncan Groupinq Mean A 19.400 5 level 1 18.200 5 leve12 A A General Linear Models Procedure Level of Duncan Groupinq Mean A A A A 17.667 6 leve13 15.750 4 leve14 AA Level of BB N level 1 levell levell levell leve12 leve12 leve12 leve12 levell leve12 leve13 leve14 levell leve12 leve13 leve14 3 3 3 2 2 2 3 2 N BB ---------------Y-------------Mean SO 22.0000000 16.0000000 19.3333333 18.5000000 15.5000000 21.5000000 16.0000000 13.0000000 64 2.00000000 2.64575131 1.52752523 0.70710678 0.70710678 3.53553391 5.29150262 1. 41421356 OBS 1 2 3 4 _ NAME Y Y Y Y - - SOURCE_ _TYPE ERROR AA BB AA*BB ERROR SS4 SS4 SS4 - OF SS 12 1 3 3 98.167 29.008 25.580 105.980 F PROB 3.54601 0.084148 0.409132 0.027764 1. 04231 4.31837 F test for effect AA was 3.55, with 1 and 12 degrees of freedom. Probability of a larger F is 0.0841, which is not significant. F test for effect·BB was 1.04, with 3 and 12 degrees of' freedom. Probability of a larger F is 0.4091, which is not significant. F test for effect AA*BB was 4.32, with 3 and 12 degrees of freedom. Probability of a larger F is 0.0278, which is significant. 65 The REG Procedure The PRINT, ADD, DELETE, and DELOBS statements have been added to the REG procedure. The PRINT statement allows you to set many options interactively. Th£ ADD statement adds independent 'Variables to the regression model and the DELETE statement deletes independent variables from the regression model. The DELOBS statement deletes observations from computation. PROC REG in Version 6 includes the stepwise regression methods and the all subsets regression method implemented in Version 5 in the procedures STEPWISE and RSQUARE. The method used to select the model is now specified using .the METHOD= option of the MODEL statement. All other options of the STEPWISE and RSQUARE procedures in Version 5 of the SAS System can also be used as options of the }rODEL statement in the procedure REG of SAS/STAT software. The SCORE Procedure RESIDUAL is a new option that can appear in the PROC SCORE statement. It reverses the sign of each score. Conclusion SAS/STAT software offers the same reliable methods known from Version 5 of the SAS System. In addition, intEractivity of the procedures ANOVA, GLM. and REG allows to more easily and quickly fit a model to the data. The procedures GLM and DISCRIM can produce OUTS TAT data sets for a subsequent analysis or report writing. It is intendEd to add all the statistical procedures to SAS/STAT software in an upcoming release which are available in the base SAS System on the mainframe. 66