Download GLMOUT - A SAS Program to Read PROC GLM Output

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Predictive analytics wikipedia , lookup

Data analysis wikipedia , lookup

Forecasting wikipedia , lookup

Experimental uncertainty analysis wikipedia , lookup

Transcript
SAS
EUROPEAN
GLMOUT
USER
GROUP
MEETING
APRIL 6-8 1983
A SAS program to read PROC GLM output
D A Budgett & A Eastwood*
ICI Pharmaceuticals Division
Macclesfield, England
The Problem
In our work in the pharmaceutical industry, we spend a great deal of
time analysing the results of clinical and animal studies of new drugs
and veterinary products. Frequently the trialist has recorded many
variables at many time points. We need to provide tables of least
squares means and p-values. We use PROC GLM for the analysis but the
labour of working through perhaps 200 pages of output to copy means
and p-values and calculated least significance differences (LSD's) is
great.
SAS has very good facilities for preparing reports from a SAS data set
but unfortunately PROC GLM does not output means or least squares
means to a data set. It is possible to write SAS programs to read the
printed output but not really worthwhile to do so for a single job.
There was therefore a need for a general purpose program to read PROC
GLM printed output on to a SAS data set.
*
Present address:
Queens' College, Cambridge.
22
Our Solution
We already had a special program to do this for toxicity studies in
animals. We wrote this in 1978 using SASe We chose SAS because it
provides a very powerful language for data capture and manipulation far more so than FORTRAN. In particular:Trailing @ symbol in INPUT
Free format reading of words and numbers
String processing and scanning
Relational data base management
An additional feature which has been introduced since then is the
structured programming concepts of IF, ELSE, DO, END. The program has
proved very serviceable and has saved hundreds of man-hours in the
preparation of routine tables. However, it is specifically designed
for toxicity trials. We do many other trials that require tables of
means but requirements are more variable. These are mostly
human/clinical trials and animal husbandry trials.
We defined as generally as possible, what kind of program was
required. We decided this should:Read PROC GLM output, ignore anything else
Accept any number of BY and CLASS variables,
with any names.
Read all means, least squares means and p-values
between means that were printed.
Read relevant parts of analysis - residual sd, df,
contrasts, estimates.
Merge all these coherently.
We did not attempt to generalise the report writing since SAS already
does this. We did prepare a general-purpose routine that prints the
means in one particular way.
Use of Program
All the SAS output is diverted to disk before a utility copies it to
the printer in the usual way. The user supplies the names of BY and
CLASS variables beginning with the treatment, together with the source
names of any Type IV sums of squares required. The SAS code to read
the output may be followed by a report-writing section or the FINAL
and CONTREST data sets can be allocated to permanent storage.
After the program is run we normally screen-edit the resulting table
to get exactly the desired layout.
23
All the results are presented in two SAS data sets: FINAL and
CONTREST. The data set FINAL contains either one observation per mean
or one per p-value depending on whether the PDIFF option was used.
The only p-values included are those which compare treatments at .~
constant level of other factors. The list of variables is shown in
Table 1. They are chosen to provide everything a report writer is
likely to need.
For example, INTRcan be used to separate treatment by sex means from
treatment means. SOURSI might be a covariate which is only used in
some of the analyses. TSPO, T2P5 and RSD can be used to calculate
least significant differences. CMEAN and MEAN (or CLSMEAN and LSMEAN)
can be used to calculate percentage changes.
The data set CONTREST has any CONTRASTS or ESTIMATES that are found.
These could be merged with the FINAL observations or used separately.
Table 2 contains a list of variables in CONTREST.
Program Design
The first section of the program accepts details of BY and CLASS
variables and source names of Type IV sums of squares required. These
are translated into SAS code - RETAIN and RENAME statements and
variable lists - all contained within MACROS. The code is written to
a temporary data set and concatenated back into the stream of SAS
code. In this way we avoid having too many or too few variables to
hold the information or asking the user to follow detailed
instructions for modifying code.
Separate data sets are used to hold analysis of variance, ordinary
means, least squares means, p-values and contrasts/estimates.
The program starts at the top of each page and works down line by
line. It looks first for BY variables (= on the line) and the GENERAL
in the page heading.
If it is a page produced by PROC GLM it looks further to identify a
page with class level information, analysis of variance, means or
least squares means.
For a means page, the program firstly reads the heading of class
variables and dependent variables. Then each line is deciphered,
allocating class levels to their respective variables and reading each
mean in turn.
For a least squares mean page without p-values, the logic is the
If there are p-values, the class levels are read in the same way
each p-value goes into a different observation. If p-values run
several pages there is no problem because they are identified by
I/J values printed by SASe
24
same.
and
on to
the
For an analysis of variance page, several items can appear on a page.
After the ~nalysis of variance there may be a solution, contrasts, or
estimates. Each of these is recognised by its heading and special
code is provided to read that section of the page. Then we return to
the same search point to look for the next heading or the next page,
whichever comes first.
When a class level page is recognised some initialising is done and a
key variable called CLASSNO is incremented.
The data set containing contrasts and means is not altered. The
others are all merged into a single data set. First, they are sorted
by dependent variable, class page number, (CLASSNO), the effect that
defines a mean and its row number in the table printed by SASe The
ordinary means, least squares means and p~values are merged. This
data set is re-sorted as above but with the p-value's column number
replacing the row number. The means are re-named: CMEAN, CLSMEAN
etc. This data set is merged with the means and least squares means
again so that each p-value is now matched with the two means that it
compares and their class levels. Finally, the analysis of variance
details are merged in. Then each observation contains: a p-value,
the means that it compares, the treatment and other levels of those
means, the residual standard deviation and degrees of freedom and
related t-values and any other sums of squares requested.
1944
25
:,"..
\
1- .
Table 1.
Variables in data set FINAL
Name
Description
DEP
The dependent variable.
VARVI
The value of the treatment variable.
VARV2
The values of other CLASS/BY variables
VARV3
in the order you specified.
etc.
CLASSNO
The serial number of the class page
immediately preceding the mean.
PAGE
The page number of the p-value or 9f the
analysis of variance.
N
The number of results shown on the
ordinary means page.
MEAN
The ordinary mean.
LSMEAN
The least squares mean.
LSSE
The standard error of the least squares
mean as printed by SASe
INTR
A coded variable for the CLASS variables
defining the means. The code is similar
to that for BYV. May be a useful sort key.
BYV
A coded variable for the BY variables.
Its value is
ai 2i
where ai is 1 if the ith variable
appeared at the top of the page
ai is 0 otherwise~
P
The p-value comparing LSMEAN with CLSMEAN.
CVARVI
The value of the treatment variable for CN,
CMEAN, CLSMEAN.
r
26
Table 1.
Variables in data set FINAL (cont.)
Name
Description
CN
The value of Nat this level of treatment
but with same values of VARV2, VARv3 etc.
CMEAN
The corresponding ordinary mean.
CLSMEAN
The corresponding least squares mean.
RSD
The residual standard deviation.
RDF
The degrees of freedom for this.
T5PO
Student's t (5% on RDF degrees of freedom)
T2P5
Student's t (2 1 /2% on RDF degrees of
freedom) •
SOURSI
The Type IV sum of squares for the first
source of variation named following the
word SOURCE in the "USERS" dataset.
SOURDI
The degrees of freedom.
SOURPI
The p-value.
SOURS2
SOURD2
SOURP2
)
)
)
As above for second source of variation.
etc.
27
Table 2.
Variables in CONTREST
Name
Description
CE
Has value 'c' for a CONTRAST
and
'E' for an ESTIMATE
EFFECT
The name of the effect, as given in the
CONTRAST or ESTIMATE statement.
ESTIMATE
The estimate (ESTIMATE statement only).
SE
The standard error (ESTIMATE only)
DF
The degrees of freedom (CONTRAST only).
SS
The sum of squares (CONTRAST only).
P
The p-value.
DEP
The dependent variable.
VARV1
The value of the treatment variable.
VARV2
The values of BY variables
etc.
RDF
The degrees of freedom for this.
RSD
The residual standard deviation.
CLASSNO
The serial number of the class page
immediately preceding the contrast/estimate.
BYV
A coded variable for the BY variables.
Its value is I
ai 2i
where ai is 1 if the ith variable
appe~red at the top of the page
ai is 0 otherwise.
PAGE
The page number of the p-value or of the
analysis of variance.
T2P5
Student's t (2 1 /2% on RDF degrees of
freedom).
T5PO
Student's t (5% on RDF degrees of freedom)
1944
28