Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SAS EUROPEAN GLMOUT USER GROUP MEETING APRIL 6-8 1983 A SAS program to read PROC GLM output D A Budgett & A Eastwood* ICI Pharmaceuticals Division Macclesfield, England The Problem In our work in the pharmaceutical industry, we spend a great deal of time analysing the results of clinical and animal studies of new drugs and veterinary products. Frequently the trialist has recorded many variables at many time points. We need to provide tables of least squares means and p-values. We use PROC GLM for the analysis but the labour of working through perhaps 200 pages of output to copy means and p-values and calculated least significance differences (LSD's) is great. SAS has very good facilities for preparing reports from a SAS data set but unfortunately PROC GLM does not output means or least squares means to a data set. It is possible to write SAS programs to read the printed output but not really worthwhile to do so for a single job. There was therefore a need for a general purpose program to read PROC GLM printed output on to a SAS data set. * Present address: Queens' College, Cambridge. 22 Our Solution We already had a special program to do this for toxicity studies in animals. We wrote this in 1978 using SASe We chose SAS because it provides a very powerful language for data capture and manipulation far more so than FORTRAN. In particular:Trailing @ symbol in INPUT Free format reading of words and numbers String processing and scanning Relational data base management An additional feature which has been introduced since then is the structured programming concepts of IF, ELSE, DO, END. The program has proved very serviceable and has saved hundreds of man-hours in the preparation of routine tables. However, it is specifically designed for toxicity trials. We do many other trials that require tables of means but requirements are more variable. These are mostly human/clinical trials and animal husbandry trials. We defined as generally as possible, what kind of program was required. We decided this should:Read PROC GLM output, ignore anything else Accept any number of BY and CLASS variables, with any names. Read all means, least squares means and p-values between means that were printed. Read relevant parts of analysis - residual sd, df, contrasts, estimates. Merge all these coherently. We did not attempt to generalise the report writing since SAS already does this. We did prepare a general-purpose routine that prints the means in one particular way. Use of Program All the SAS output is diverted to disk before a utility copies it to the printer in the usual way. The user supplies the names of BY and CLASS variables beginning with the treatment, together with the source names of any Type IV sums of squares required. The SAS code to read the output may be followed by a report-writing section or the FINAL and CONTREST data sets can be allocated to permanent storage. After the program is run we normally screen-edit the resulting table to get exactly the desired layout. 23 All the results are presented in two SAS data sets: FINAL and CONTREST. The data set FINAL contains either one observation per mean or one per p-value depending on whether the PDIFF option was used. The only p-values included are those which compare treatments at .~ constant level of other factors. The list of variables is shown in Table 1. They are chosen to provide everything a report writer is likely to need. For example, INTRcan be used to separate treatment by sex means from treatment means. SOURSI might be a covariate which is only used in some of the analyses. TSPO, T2P5 and RSD can be used to calculate least significant differences. CMEAN and MEAN (or CLSMEAN and LSMEAN) can be used to calculate percentage changes. The data set CONTREST has any CONTRASTS or ESTIMATES that are found. These could be merged with the FINAL observations or used separately. Table 2 contains a list of variables in CONTREST. Program Design The first section of the program accepts details of BY and CLASS variables and source names of Type IV sums of squares required. These are translated into SAS code - RETAIN and RENAME statements and variable lists - all contained within MACROS. The code is written to a temporary data set and concatenated back into the stream of SAS code. In this way we avoid having too many or too few variables to hold the information or asking the user to follow detailed instructions for modifying code. Separate data sets are used to hold analysis of variance, ordinary means, least squares means, p-values and contrasts/estimates. The program starts at the top of each page and works down line by line. It looks first for BY variables (= on the line) and the GENERAL in the page heading. If it is a page produced by PROC GLM it looks further to identify a page with class level information, analysis of variance, means or least squares means. For a means page, the program firstly reads the heading of class variables and dependent variables. Then each line is deciphered, allocating class levels to their respective variables and reading each mean in turn. For a least squares mean page without p-values, the logic is the If there are p-values, the class levels are read in the same way each p-value goes into a different observation. If p-values run several pages there is no problem because they are identified by I/J values printed by SASe 24 same. and on to the For an analysis of variance page, several items can appear on a page. After the ~nalysis of variance there may be a solution, contrasts, or estimates. Each of these is recognised by its heading and special code is provided to read that section of the page. Then we return to the same search point to look for the next heading or the next page, whichever comes first. When a class level page is recognised some initialising is done and a key variable called CLASSNO is incremented. The data set containing contrasts and means is not altered. The others are all merged into a single data set. First, they are sorted by dependent variable, class page number, (CLASSNO), the effect that defines a mean and its row number in the table printed by SASe The ordinary means, least squares means and p~values are merged. This data set is re-sorted as above but with the p-value's column number replacing the row number. The means are re-named: CMEAN, CLSMEAN etc. This data set is merged with the means and least squares means again so that each p-value is now matched with the two means that it compares and their class levels. Finally, the analysis of variance details are merged in. Then each observation contains: a p-value, the means that it compares, the treatment and other levels of those means, the residual standard deviation and degrees of freedom and related t-values and any other sums of squares requested. 1944 25 :,".. \ 1- . Table 1. Variables in data set FINAL Name Description DEP The dependent variable. VARVI The value of the treatment variable. VARV2 The values of other CLASS/BY variables VARV3 in the order you specified. etc. CLASSNO The serial number of the class page immediately preceding the mean. PAGE The page number of the p-value or 9f the analysis of variance. N The number of results shown on the ordinary means page. MEAN The ordinary mean. LSMEAN The least squares mean. LSSE The standard error of the least squares mean as printed by SASe INTR A coded variable for the CLASS variables defining the means. The code is similar to that for BYV. May be a useful sort key. BYV A coded variable for the BY variables. Its value is ai 2i where ai is 1 if the ith variable appeared at the top of the page ai is 0 otherwise~ P The p-value comparing LSMEAN with CLSMEAN. CVARVI The value of the treatment variable for CN, CMEAN, CLSMEAN. r 26 Table 1. Variables in data set FINAL (cont.) Name Description CN The value of Nat this level of treatment but with same values of VARV2, VARv3 etc. CMEAN The corresponding ordinary mean. CLSMEAN The corresponding least squares mean. RSD The residual standard deviation. RDF The degrees of freedom for this. T5PO Student's t (5% on RDF degrees of freedom) T2P5 Student's t (2 1 /2% on RDF degrees of freedom) • SOURSI The Type IV sum of squares for the first source of variation named following the word SOURCE in the "USERS" dataset. SOURDI The degrees of freedom. SOURPI The p-value. SOURS2 SOURD2 SOURP2 ) ) ) As above for second source of variation. etc. 27 Table 2. Variables in CONTREST Name Description CE Has value 'c' for a CONTRAST and 'E' for an ESTIMATE EFFECT The name of the effect, as given in the CONTRAST or ESTIMATE statement. ESTIMATE The estimate (ESTIMATE statement only). SE The standard error (ESTIMATE only) DF The degrees of freedom (CONTRAST only). SS The sum of squares (CONTRAST only). P The p-value. DEP The dependent variable. VARV1 The value of the treatment variable. VARV2 The values of BY variables etc. RDF The degrees of freedom for this. RSD The residual standard deviation. CLASSNO The serial number of the class page immediately preceding the contrast/estimate. BYV A coded variable for the BY variables. Its value is I ai 2i where ai is 1 if the ith variable appe~red at the top of the page ai is 0 otherwise. PAGE The page number of the p-value or of the analysis of variance. T2P5 Student's t (2 1 /2% on RDF degrees of freedom). T5PO Student's t (5% on RDF degrees of freedom) 1944 28