Download SAS/STAT Software on the IBM PC under PC DOS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SAS/STAT SOFTWARE ON THE IBM PC UNDER PC DOS
Gerhard Held, SAS Institute GmbH
Following base SAS software now SAS/STAT software is available on the IBM PC under PC
DOS. SAS/STAT software is the faithful implementation of the most prominent
statistical procedures of the mainframe SAS System on the IBM PC. The following
methods are included in SP~/STAT currently:
ANOVA
CATMOD
CANCORR
DISCRIM
FACTOR
FREQ
GLM
NPARIWAY
ORTHOREG
REG
SCORE
TTEST
analysis of variance for balanced data including repeated measures
designs
fits. linear models to functions of categorical data using linear
modeling, logistic regression, and repeated measures analyses
canonical correlation, partial canonical correlation, and canonical
redundancy analysis
classification of observations assuming multivariate normal
dj.stribution within each class
several types of' common factor analysis with orthogonal and oblique
rotations
contingency tables, stratified analysis (Cochran-Mantel-Haenszel
statistics), relative risk estimates
analysis of variance for unbalanced designs, MANOVA, analysis of
covariance, polynomial regression
nonparametri.c one-way analysis of rank scores
orthogonal regression using the Gentleman-Givens method for
ill-conditioned data
.
multiplp regression with regression diagnostics, stepwise regression,
and all subsets regression
constructs new var:lables which are linear combinations of coefficients
(e.g., factor-scoring coefficients or parameter estimates from a
linear model) and variables in another data set
computes t-tests for the means of two groups
As the uiajor features of the SAS/STAT software procedures are already well known
from the SAS Users's Guide: Statistics, Version 5 Edition, we like to concentrate our
discussion on the d:lfferences between the mainframe and the SAS/STAT software version
of these procedures.
III aedition to the changes and enhancements described below, Version 6 of the ANOVA,
GLM, and REG procedures can now run interactively. For example, with the REG
·procedures you can now look at the printout of diagnostics for the model, decide to
delete an observation, and re-fit the medel without ever leaving the procedure. Thil"
saves computer t1me and more importantly allows you to more easily and quickly fit a
model to your data. With the GLM procedure, you can inspect tr.e ANOVA table or ask to
see the expected mean squares before proceeding with thE: analysis. The following
example (Output 1) sho'", the usefulness of thp interactive features:
\.
55
Output 1
Using PROC REG Interactively
j*--------------------------------------------------- *j
Example of an interactive regression analysis
*/
j*----------------------------------------------------*j
1*
data class;
title 'Interactive Analysis with PROe REG';
input name $ height weight age;
cards;
alfred
alice
barbara
carol
henry
james
jane
janet
jeffrey
john
joyce
judy
louise
mary
philip
robert
ronald
thomas
william
69.0
56.5
65.3
62.8
63.5
57.3
59.8
62.5
62.5
59.0
51.3
64.3
56.3
66.5
72.0
64.8
67.0
57.5
66.5
112.5
84.0
98.0
102.5
102.5
83.0
84.5
112.5
84.0
99.5
50.5
90.0
77.0
112.0
150.0
128.0
133.0
85.0
112.0
14
13
13
14
14
12
12
15
13
12
11
14
12
15
16
12
15
11
15
proc reg;
model weight = age height;
id name;
run:
print influence;
run;
delobs 16;
run;
print; run;
delete age; run;
endsasi
56
Interactive Analysis with PROC REG
Model: MODELl
Dep Variable: WEIGHT
Analysis of Variance
Source
DF
Sum of
Squares
F Value
Prob>F
Model
Error
C Total
2
16
18
7215.63710
2120.09974
9335.73684
3607.81855
132.50623
27.228
0.0001
11.51114
100.02632
11.50811
R-Square
Adj R-Sq
0.7729
0.7445
Root MSE
Dep Mean
C.V.
Mean
Square
Parameter Estimates
Variable
DF
INTERCEP
AGE
HEIGHT
1
1
1
Parameter
Standard
Estimate
-141. 223763
1.278393
3.597027
T for HO:
Error
Parameter=o
Prob > ITI
33.38309350
3.11010374
0.90546072
-4.230
0.411
3.973
0.0006
0.6865
0.0011
Interactive Analysis with PROC REG
Obs
{:
!i,
'.
y":
Y
f
~
~~,
,:'
i
~i
l'
~
ft
~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
K
Obs
~~
~:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
~.
t
!Ii
[i
h
~:
~
NAME
Residual
Rstudent
Hat Diaq
H
Cov
Ratio
alfred
alice
barbara
carol
henry
-12.3686
5.3727
-12.2812
-0.0670.
-2.5849
2.7734
-4.7191
9;7337
-16.2095
13.1585
-6.8660
-17.9625
0.3705
-5.1544
11.7836
20.7957
14.0471
5.3324
-5.1544
-1.2231
0.5119
-1.1679
-0.{)059
-0.2256
0.2474
-0.4218
0.9752
-1.5110
1. 2209
-0.7101
-1.7069
0.0334
-0.4669
1. 2084
2.5578
1.3348
0.5065
-0.4669
0.2043
0.2071
0.1465
0.0772
0.0677
0.1074
0.1037
0.2504
0.0619
0.0965
0.3163
0.0643
0.1305
0.1251
0.2617
0.3283
0.1234
0.2025
0.1251
1.1468
1.4532
1. 0952
1.3152
1
ames
ane
janet
jeffrey
john
~oyce
Judy
louise
mary
philip
robert
ronald
thomas
william
NAME
alfred
alice
barbara
carol
henry
~ames
ane
janet
jeffrey
john
joyce
judy
louise
mary
philip
robert
ronald
thomas
william
AGE
Dfbetas
HEIGHT
Dfbetas
0.3296
.0.1651
0.3470
-0.0009
-0.0258
-0.0089
0.0862
0.5008
0.1493
-0.1831
-0.0676
-0.1051
0.0011
-0.0945
0.1525
-1. 6002
0.2244
-0.1798
-0.0945
-0.5129
-0.2242
-0.3822
0.0007
0.0137
-0.0282
-0.0396
-0.4014
-0.1280
0.-0334
0.3095
-0.0078
-0.0067
0.0208
0.2414
1.5050
-0.0035
0.0722
0.0208
Sum of Residuals
-7.38964E-13
Sum of Squared Residuals
2120.0997
Predicted Resid SS (Press)
3272.7219
57
1.~86
1. 432.
1. 3070
1. 3464
0.8457
1. 0109
1. 6074
0.7615
1. 3955
1. 3284
1. 2441
0.6100
0.9886
1. 4464
1..3284
Dffits
INTERCEP
Dfbetas
-0.6197
0.2617
-0.4839
-0.0017
-0.0608
0.0858
-0.1435
0.5636
-0.3880
0.3989
-0.4830
-0.4476
0.0129
-0.1765
0.7195
1. 7883
0.5007
0.2553
-0.1765
0.4335
0.1846
0.1928
-0.0001
0.0047
0.0635
-0.0481
0.0779
0.0029
0.1940
-0.4550
0.1115
0.0106
0.0729
-0.5718
-0.5029
-0.i!466
0.1113
0.0729
Observation
16 was deleted.
Interactive Analysis with PROC REG
Model: MODELl
Dep Variable: WEIGHT
Analysis of Variance
Source
DF
Sum of
Squares
F Value
Prob>F
Model
Error
C Total
2
15
17
7033.50277
1476.23335
8509.73611
3516.75138
98.41556
35.734
0.0001
9.92046
98.47222
10.07438
R-Square
Adj R-Sq
0.8265
0.8034
Root MSE
Dep Mean
C.V.
Mean
Square
Parameter Estimates
Variable
DF
INTERCEP
AGE
HEIGHT
1
1
1
Parameter
Standard
Error
T for HO:
Parameter=O
Prob > ITI
29.32075393
3.16164072
0.90539702
-4.323
1.761
2.676
0.0006
0.0986
0.0173
Estimate
-126.756480
5.567408
2.422625
Interactive Analysis with PROC REG
Obs
i'·
;;:
;;..
i:;
r)
t.
~"
\;
~,~
,.
;'
;,
I
~
ri
r):
~,.
G~
"
~
,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
NAME
Residual
Rstudent
alfred
alice
barbara
carol
henry
james
jane
janet
jeffrey
john
~oyce
udy
louise
mary
philip
robert
ronald
thomas
william
-5.8484
1.5018
-5.8173
-0.8281
-2.5239
4.1312
-0.4254
4.3313.
-13.0339
16.5127
-8.2657
-16.9620
0.5538
-5.8592
13.2489
-0.6776
0.1669
-0.6474
-0.0840
-0.2552
0.4293
-0.0445
0.5072
-1. 4126
1.9202
-1. 0106
-1.9212
0.0578
-0.6186
1. 6439
13.9295
11. 2140
-5.8592
1.5714
1.3457
-0.6186
Hat Diaq
H
Cov
Ratio
Dffits
INTERCEP
Dfbetas
0.2703
0.2304
0.2114
0.0781
0.0677
0.1103
0.1323
0.2957
0.0775
0.1139
0.3193
0.0659
0.1306
0.1259
0.2651
0.4888
0.1234
0.2563
0.1259
1. 5300
1. 5887
1:4276
1.3321
1. 3011
1.3292
1.4170
1. 6535
0.8940
0.6884
1. 4629
0.6525
1. 4137
1. 2977
0.9856
-0.4124
0.0913
-0.3352
-0.0245
-0.0688
0.1511
-0.0174
0.3287
-0.4095
0.6886
-0.6922
-0.5102
0.0224
-0.2347
0.9873
0.2854
0.0655
0.1449
-0.0020
0.0053
0.1036
.-0.0035
0.0658
0.0382
0.2502
-0.6499
0.1384
0.0180
0.0913
-0.7864
0.8619
1.1481
1. 2977
0.5896
0.7899
-0.2347
-0.2834
0.2307
0.0913
AGE
Dfbetas
HEIGHT
Dfbetas
0.2698
0.0617
0.2681
-0.0129
-0.0244
-0.0260
0.0121
0.2961
0.2170
-0.3895
-0.1176
-0.0588
0.0013
-0.1159
0.1176
-0.3591
-0.0786
-0.2841
0.0099
0.0130
-0.0299
-0.0078
-0.2509
-0.1974
0.1824
0.4148
-0.0473
-0.0097
0.0331
0.3398
0.2281
-0.6111
-0.1159
-0.0075
0.3545
0.0331-
Sum of Residuals
Sum of Squared Residuals
Predicted Resid SS (Press)
-3.2685E-13
1476.2333
2109.4445
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
NAME
alfred
alice
barbara
carol
henry
james
jane
janet
jeffrey
~ohn
oyce
judy
louise
mary
philip
robert
ronald
thomas
william
58
Int~ractive
Analysis with PROC REG
Model: MODELl
Dep Variable: WEIGHT
Analysis of variance
Source
Model
Error
C Total
DF
sum of
Squares
Mean
Square
1
:16
17
6728.33067
1781. 40544
8509.73611
10.55167
98.47222
10.71538
Root MSE
Dep Mean
C.V.
F Value
Prob>F
6728.33067
111. 33784
60.432
0.0001
R-Square
Adj R-Sq
0.7907
0.7776
Parameter Estimates
Variable
DF
Parameter
Estimate
INTERCEP
HEIGHT
1
1
-137.682825
3.796705
Standard
Error
Par,ameter=o
Prob > ITI
30.48004126
0.48839879
-4.517
7.774
0.0004
0.0001
\
\.
59
T for HO:
The ANOVA Procedure
You can use the ANOVA procedure in interactive mode. After specifying and running a
model, a variety of statements (such as MEANS, MANOVA, TEST, and REPEATED) can be
executed without recalculating the model sum of squares.
The CLM option, specified in the MEANS statement, requests that the result of the
BON, GABRIEL, SCHEFFE, SIDAK. SMM. T, and LSD options be presented as confidence
intervals for the mean of each level of the variahles specified in the MEANS
statement.
The DISCRUr Procedure
The following
TESTWEIGHT.
statements
are
new
in
Version
6:
FREQ,
WEIGHT,
TESTFREQ.
and
With the FREQ statemen.t you can name a variable which represents tht' occurrence for
the other values in the observation. If you want to use relative weights for each
observation in the input data set. place the weights in a variable in tht' data set
and specify the name in a WEIGHT statement. If a variable in the TESTDATA = data set
represents the frequency of occurrence for the other values in the observation.
include the variable's name in a TESTFREQ statement. If you want to use relative
weights for each observation in the TESTDATA = data set then specify the variable's
name in a TESTWEIGHT statement.
The following new or enhance options are now available in the PROC DISCRU! statement ~
TSSCP
prints the total-sample corrected SSCP matrix.
WSSCP
prints the within-class corrected SSCP matrix
for each class level.
BSSCP
prints the between-class corrf'cted SSCP matrix.
TCOV
prints total-sample covariances.
WCOV
prints t..Tithin-class covariances for each class level.
PCOV
prints pooled withi.n-class covariances.
BCOV
prints between-cla.ss covariances.
TCORR
prints total-sample correlations.
WCORR
prints within-class correlations for each class level.
PCORR
prints pooled within-claps correlations.
BCORR
prints between-class correlat:l.ons.
60
ANOVA
prints univariate statistics for testing the hypothesis
that the class means are equal in the population for each
variable.
MANOVA
prints multivariate statistics for testing the hypothesis
that the class means are equal in the population.
MAHALANOBIS
ALL
NOPRINT
print Mahalanobis distance between classes.
prints all the statistics.
suppresses the printout.
SINGULAR
specifies the criterion for determing the singularity of
a covariance matrix.
OUTSTAT=
names an output SAS data set containing various statistics
such as means, standard deviations, correlations, and
coefficients of the discriminant functions.
TESTOUT=
names an output SAS data set containing all the data from
the TESTDATA= data set plus the posterior probabilities and
the class into which each observation is classified.
The GLM Procedure
You can use the GLM procedure in interactive mode. After specifying and running a
model, a variety of statements (such as MEANS, MANOVA, TEST, and REPEATED) can be
"xecuted without recalculating the Ulodel sum of squares.
The CLM option, specified in the MEANS statement, requests that the result of the
BON, GABRIEL, SCHEFFE, SIDAK, SMM, T and LSD options are presented as confidence
intervals for the mean of each level of the variables specified in the MEANS
statement.
In addition to thE< options available in Version 5, the following option is available
in the PROC GLM statement:
OUTSTAT=SASdataset
names an output data set that will contain sums of squares, F statistics,
and probability levels for each effect in the model, as .rell as for each
CONTRAST statement used. If the CANONICAL option of the MANOVA statement
1.s used and there is no M= specification, the data set also contains the
re,mlts of the canonical analysis.
The following example (01Jtput 2) illustrate the OlTTSTAT= opcion of the GLM procedure
and how to write a customised report of the analysis of variance results.
61
Output 2
PROC GLM Shows Use of Output Data Set
1*--------------------------------------------------*1
1* GLM - shows use of output dataset
*1
I*--------------------------~-----------------------*I
options nodate nonumber;
titlel;
data test;
input a b y aa $ bb $
cards;
1 3 18 levell leve13
1 1 20 levell level1
1 1 22 level1 level1
1 1 24 leve11 level1
1 4 18 1eve11 1eve14
1 2 14 level1 leve12
1 2 19 level1 leve12
1 2 15 levell level2
1 3 19 levell leve13
1 3 21 levell leve13
2 4 14 leve12 leve14
2 1 15 leve12 levell
2 1 16 level2 levell
2 2 19 level2 leve12
2 4 12 leve12 leve14
2 2 24 leve12 leve12
2 3 10 leve12 leve13
1 4 19 levell leve14
2 3 18 leve12 level3
2 3 20 leve12 leve13
run;
proc glm data=test outstat=new;
class aa bb;
model y=aalbb/ss4;
means aalbb/duncan;
run;
PRINT OUT THE STATISTICS
proc print data=new;
run:
1*
*1
data null ;
file prInt:
set new;
retain edf:
if type = 'ERROR' then edf = df;
else do;put @1 'F test for effect' _source_ $ , was' F 8.2 " with' df 3.
, and ' edf 3. ' degrees of freedom.' I
@1 'Probability of a larger F is ' prob 6.4 " which is' @;
if prob < .001 then put' highly' @;
else if prob < .01 then put' very' @;
else if prob < .05 then put' , @;
else put ' not I @;
put 'significant.' @ I;
end;
put I;
run:
62
General Linear Models Procedure
Class Level Information
Class
Levels
Values
AA
2
levell leve12
BB
4
leve11 leve12 leve13 leve14
Number of observations in data set = 2'0
General Linear Models Procedure
Dependent Variable: Y
OF
Sum of
Squares
Mean
Square
F Value
Pro> F
Model
7
164.3833'3333
23.48333333
2.87
0.0523
Error
12
98.16666667
8.18055556
Corrected Total
19
262.55000000
Source
Source
AA
BB
AA*BB
R-Square
C.V.
Root MSE
Y Mean
0.626103
16.023345
2.8601671
17.85000000
OF
Type IV SS
Mean Square
F Value
Pr > F
1
3
3
29.0083333
25.5799320
105.9799320
29.0083333
8.5266440
35.3266440
3.55
1.04
4.32
0.0841
0.4091
0.0278
63
General Linear Models Procedure
Duncan's Multiple Ranqe Test for variable: Y
NOTE: This test controls the type I comparisonwise error rate,
not the experimentwise error rate
Alpha- 0.05 df= 12 MSE= 8.180556
WARNING: Cell sizes are not equal.
Harmonic Mean of cell sizes- 9.9
Number of Means
2
Critical Ranqe 2.7956226
Means with the same letter are not siqnificantly different.
Duncan Groupinq
Mean
N
A
A
A
19.000
11
levell
16.444
9
leve12
AA
General Linear Models Procedure
Duncan's Multiple Ranqe Test for variable: Y
NOTE: This test controls the type I comparisonwise error rate,
not the experimentwise error rate
Alpha= 0.05 df= 12 MSE= 8.180556
WARNING: Cell sizes are not equal.
Harmonic Mean of cell sizes= 4.897959
Number of Means
2
3
critical Ranqe 3.9745555 4.1634834
4
4.289752
Means with the same letter are not siqnificantly different.
N BB
Duncan Groupinq
Mean
A
19.400
5
level 1
18.200
5
leve12
A
A
General Linear Models Procedure
Level of
Duncan Groupinq
Mean
A
A
A
A
17.667
6
leve13
15.750
4
leve14
AA
Level of
BB
N
level 1
levell
levell
levell
leve12
leve12
leve12
leve12
levell
leve12
leve13
leve14
levell
leve12
leve13
leve14
3
3
3
2
2
2
3
2
N BB
---------------Y-------------Mean
SO
22.0000000
16.0000000
19.3333333
18.5000000
15.5000000
21.5000000
16.0000000
13.0000000
64
2.00000000
2.64575131
1.52752523
0.70710678
0.70710678
3.53553391
5.29150262
1. 41421356
OBS
1
2
3
4
_ NAME
Y
Y
Y
Y
-
- SOURCE_
_TYPE
ERROR
AA
BB
AA*BB
ERROR
SS4
SS4
SS4
-
OF
SS
12
1
3
3
98.167
29.008
25.580
105.980
F
PROB
3.54601
0.084148
0.409132
0.027764
1. 04231
4.31837
F test for effect AA was
3.55, with 1 and 12 degrees of freedom.
Probability of a larger F is 0.0841, which is not significant.
F test for effect·BB was
1.04, with 3 and 12 degrees of' freedom.
Probability of a larger F is 0.4091, which is not significant.
F test for effect AA*BB was
4.32, with 3 and 12 degrees of freedom.
Probability of a larger F is 0.0278, which is significant.
65
The REG Procedure
The PRINT, ADD, DELETE, and DELOBS statements have been added to the REG procedure.
The PRINT statement allows you to set many options interactively. Th£ ADD statement
adds independent 'Variables to the regression model and the DELETE statement deletes
independent variables from the regression model. The DELOBS statement deletes
observations from computation.
PROC REG in Version 6 includes the stepwise regression methods and the all subsets
regression method implemented in Version 5 in the procedures STEPWISE and RSQUARE.
The method used to select the model is now specified using .the METHOD= option of the
MODEL statement. All other options of the STEPWISE and RSQUARE procedures in Version
5 of the SAS System can also be used as options of the }rODEL statement in the
procedure REG of SAS/STAT software.
The SCORE Procedure
RESIDUAL is a new option that can appear in the PROC SCORE statement. It reverses the
sign of each score.
Conclusion
SAS/STAT software offers the same reliable methods known from Version 5 of the SAS
System. In addition, intEractivity of the procedures ANOVA, GLM. and REG allows to
more easily and quickly fit a model to the data. The procedures GLM and DISCRIM can
produce OUTS TAT data sets for a subsequent analysis or report writing.
It is intendEd to add all the statistical procedures to SAS/STAT software in an
upcoming release which are available in the base SAS System on the mainframe.
66
Related documents