Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
run tran GLIM macro library Release 1.0 ------------------------------ January 1985 ------------ Macro Library Description ------------------------<PAGE> 1. Introduction -----------The GLIM macro library is provided as a standard facility in release 3.77 and future releases of GLIM. It provides a convenient method of access to many commonly used GLIM macros, some of which have been previously published in the GLIM newsletter. The macros which have been included in the library have been chosen for their usefulness either in providing extra facilities or by extending the range of data analysis and modelling which can be performed on the GLIM system. The library is distributed free of charge with new releases of the GLIM system, starting with release 3.77. The library uses many new features of GLIM 3.77 and is not suitable for use on previous releases of GLIM. Updates to the macro library will be produced periodically and details of the updates will be published in the GLIM newsletter. The updates can be obtained on magnetic tape or disk from NAG for an additional fee. Submissions to the macro library are welcome, and potential authors should contact the macro library editor to submit macros for refereeing and to obtain submission guidelines. This document consists of a short description of the GLIM macro library and its use, followed by detailed documentation on each macro. This documentation contains an example of the use of the macro together with example output. The document was written using the following symbols: Directive symbol $ Repetition symbol : Substitution symbol # End of record symbol ! Function symbol % Quote symbol ' 2. Structure of the macro library -----------------------------The macro library consists of a number of subfiles, with each subfile (apart from the first) containing one or more GLIM macros. Within each subfile, there will be one or more main macros, and possibly many other subsidiary macros which are called by the main macro(s). The subfiles are arranged into eight sections, as follows: i) Data description, exploration and display ii) Statistical utilities iii) Normal models iv) Poisson models and contingency tables v) Binomial models vi) Gamma models vii) Survival models and censored data viii) Other statistical models and techniques The current contents of the macro library is listed in table 1. All subfile names and main macro names are given. This information can also be obtained interactively during a GLIM session by echoing the contents of the first subfile (INFO) as it is read in by GLIM. The GLIM statements needed to do this are: $ECHO $INPUT %PLC 80 INFO $ECHO <PAGE> TABLE 1 Current Contents of GLIM 3.77 Macro Library - Release 1.0 ----------------------------------------------------------------------------Subfile name ============ Macro name ========== Description =========== i) Data description, exploration and display SUMM STEM SMOO SUMM STEM SMOO Summary statistics of a variate Stem and Leaf plots Tukey smoothing of a variate ii) Statistical utilities CHIP CHIP Chi-squared probability QPLOT STAN JACK TNOR WDASH Normal probability plotting Standardised residuals Jack-knife residuals Test for normality of variate by chi-squared goodness of fit test Shapiro - Francia W' test for RSQ TVAL LEV BOXCOX R- squared statistic t-values of parameter estimates Leverage values Box-Cox transformation family on y- BOXFIT Box-Cox transformation for fixed PRESS Prediction error sum of squares iii) Normal models QPLOT QPLOT QPLOT TNOR TNOR normality. NORMAC NORMAC LEV BOXCOX variate BOXCOX lambda PRESS iv) Poisson models and contingency tables v) Binomial models vi) Gamma models vii) Survival models and censored data WEIB WEIB WEIB RESP Fitting the Exponential and Weibull distributions to censored data. Residual plotting after use of macro WEIB viii) Other statistical models and techniques TUNI against TUNI Test for distribution of variate Uniform distribution by chi-square goodness of fit test. ----------------------------------------------------------------------------<PAGE> 3. Reading the contents of a macro library subfile into a GLIM session. -------------------------------------------------------------------The GLIM macro library has been automatically assigned to a specific FORTRAN channel. In general, this channel number will vary over different machine ranges and installations of GLIM. The macro library channel number for your particular installation of GLIM may be found by issuing the GLIM statement: $ENV C$ A system scalar, %PLC, (i.e. program library channel) is also available and contains the channel number to which the macro library has been assigned. The contents of a subfile can be read in by issuing the GLIM statement $INPUT %PLC 80 <subfile name> $ For example, the subfile containing the Box-Cox transformation macros is called BOXCOX. The macros contained in this subfile can be read into GLIM by using $INPUT %PLC 80 BOXCOX$ The contents of more than one subfile can be read in by issuing a series of $INPUT directives. e.g. $INPUT %PLC 80 NORMAC$ $INPUT %PLC 80 BOXCOX$ or, more simply $INPUT %PLC 80 NORMAC BOXCOX$ Note that GLIM reads the macro library sequentially. Therefore, for greatest efficiency, the subfile names should be specified in the order in which the subfiles are stored (given in Table 1). The contents of a subfile may be displayed as it is being read by ensuring that the $ECHO facility is switched on before reading the subfile. in $ECHO $INPUT %PLC 80 BOXCOX$ $ECHO Each subfile contains a short description of its contents together with condensed documentation on the use of the main macros contained within the subfile. This information is stored at the start of each subfile. The $ECHO facility therefore provides a way of obtaining on-line documentation on the use of the various macros within the library. <PAGE> 4. Passing data to a GLIM library macro. ------------------------------------Your data can be passed to a GLIM macro in many ways. Some ways of passing data to a macro are outlined below; each of these methods has been used in at least one macro in the macro library. The documentation indicates the method or methods required for each macro. a) Formal Arguments Formal arguments to a macro are specified either by using the $ARGUMENT directive before the macro is used or, alternatively, by specifying the arguments as part of the $USE directive. For example, if a macro M needs two formal arguments, the first a vector and the second a scalar, then the macro is used with the first argument set to the vector V and the second argument set to the scalar %A as follows: $ARG M V %A $USE M$ or $USE M V %A$ b) Macro Arguments Macro arguments to a library macro are the most suitable way of passing text information to the library macro. The text might be a model formula, a variate name or a calculate expression. Macro arguments are simply macros of the required names which have been set up by the user and which contain the required information stored as text. The user needs to declare all macro arguments before calling the library macro. Examples of suitable declarations are: $MACRO MODEL AGE*REGION $ENDM $MACRO ARG1 V $ENDM $MACRO PRINT 0 $ENDM c) Scalar Arguments Sometimes certain scalars may need to be set before the macro is called. These are referred to here as scalar arguments. An example of this is the macro TUNIFORM in subfile TUNIFORM. This macro checks and acts upon the values of the scalars %U and %V. d) Macro prompting Certain macros may prompt the user for information while the macro is being executed. The information asked for will always be numeric and never textual. If any non-numeric answer is given, then the macro will fail.For example, the macro BOXFIT prompts the user as follows: Value of lambda? $DIN? <PAGE> 5. Macro Library conventions. -------------------------a) Locally declared vectors Many macros in the macro library need to declare vectors local to that macro. Some of these vectors may be useful to the user on exit from the macro. that all In the GLIM macro library, a convention has been adopted vectors declared within a macro have names which end with the underline symbol (_). This convention minimises the possibility of the names of these locally declared vectors clashing with the names of any vectors set up by the user. The user should therefore avoid choosing names for vectors which end with the underline symbol (e.g. OBS_ ,EXP_) All locally declared vectors are deleted on sucessful completion of the macro, unless the documentation specifies otherwise. Some macros have the option to keep certain useful vectors in the workspace (see the documentation for each macro) b) Scalars Most macros in the GLIM macro library use the macro library scalars (%Z1, %Z2,..., %Z9) for temporary storage of scalars. Some macros may additionally use some ordinary scalars (%A, %B,..., %Z) and these may contain scalar values potentially useful to the user. If any ordinary scalars are used by a library macro, then this is documented, together with their contents, if useful. c) Space recovery. Library macros occupy part of the data space and it may be necessary to reclaim this space after the macro has been used. To facilitate this, some subfiles contain an additional macro called DEL; the text of this macro is, of course, different for each subfile. After execution of the required library macro, $USE DEL$ will delete all identifiers from that subfile, with the exception of DEL itself. If a different subfile were then to be read in, DEL could be left, as it would then be overwritten; otherwise it could be deleted. The directive $PRINT DEL$ is a further way of obtaining the list of macro identifiers used in a subfile. <PAGE> 6. Error messages and reporting. ---------------------------Many library macros produce their own error and information messages; these are hopefully self-explanatory when read together with the documentation. All macros may fail with a GLIM error message in certain circumstances. Common causes are: - failure to set or to correctly specify all necessary to a macro, - exceeding the maximum number of identifiers - exceeding the size of the workspace arguments Some library macros use the $OUTPUT$ directive to switch off unnecessary output (for instance, iterative macros using the $FIT directive). If such a library macro should fail (or if the user should break-in to it) then when control returns to the user, output to the primary output channel may still be switched off, and the user will seemingly get no response when further GLIM statements are entered. If this should happen, then $OUTPUT %POC $ will set the output back to its default setting. If unexplained errors are encountered when using the library macros and any local support available fails to locate the problem, then a copy of the transcript file which reproduces the error should be sent either to The GLIM coordinator, NAG Central Office, Mayfield House, 256, Banbury Road, OXFORD OX2 7DE or to Brian Francis GLIM macro library editor Centre for Applied Statistics Cartmel College University of Lancaster Lancaster LA1 4YL The transcript file should contain the GLIM directive $ENV I$ together with its output. Details of the machine on which GLIM is installed the operating system being used should also be provided e.gs. IBM PC-XT 512K RAM running under PC/DOS, VAX 11/780 running under VAX/VMS 3.7 etc. <PAGE> GLIM macro library Release 1.0 ------------------------------ January 1985 ------------ Macro Library Documentation --------------------------<PAGE> Macro SUMMARY in subfile SUMMARY Author B. J. Francis, Centre for Applied Statistics, Cartmel College, University of Lancaster, U.K. Purpose The macro SUMMARY evaluates and prints a number of simple descriptive statistics of a vector. Formal Arguments and %1 The name of a variate or factor. %2 (optional). A scalar. If the second argument is not set or is set and equal to zero, then the summary statistics are not saved. If the second argument is set and not equal to zero then the summary statistics are saved. Macro Arguments -NoneUses MAX_, Within the macro, MIN_, scalar %Z and variates MEA_, VAR_, MED_, TOT, SDV_, P25_, P75_ and RAN_ (all of length 1) are used. The variates can be optionally saved. Subsidiary macros Macro SUMMARY calls macros Output statistics. The SUM1 and NOTS. macro will evaluate and print the following On exit, scalar %Z will contain the length of the variate or factor. In addition, if the second formal argument has been set to a non-zero value, then the following variables will contain values: MEA_ VAR_ SDV_ MIN_ MAX_ RAN_ P75_ MED_ P25_ - Mean Value. Variance. Standard deviation. Minimum value. Maximum value. Range 75% ile value. Median ( 50% ile). 25% ile value. All statistics are evaluated by using the $TABULATE directive and users should consult the GLIM3.77 update manual for exact definitions. <PAGE> Example Input $UNIT 1000$ $CAL X=%SR(0)$ $INPUT %PLC 80 SUMM$ $USE SUMM X$ Example Output Summary statistics for X Length 1000. Maximum 0.9975 Minimum 0.0023 Range 0.9952 Mean 0.5003 Total 500.3 Variance 0.0842 Std. Dev. 0.2902 1st Quartile 0.2474 Median 0.5083 3rd Quartile 0.7466 <PAGE> Macro STEM in subfile STEM Author R. A. Reese, Computer Centre, Hull University, U.K. Purpose The macro STEM forms a "stem and leaf" plot of the values of a vector (ref. Exploratory Data Analysis, J. W. Tukey 1977 Addison-Wesley). The interval size, which controls the breakpoint in each value between stem and leaf, and the formats using which the stem and the leaf are printed may be provided as optional arguments or the default values may be used. Formal Arguments -noneMacro Arguments ARG1 This macro is obligatory and should contain the name of a variate or a vector expression. The vector may be of any length. ARG2 (optional) Contains a positive value that defines the interval between successive stems. The default value is 10. This means that the units digit and fractional part of each value will be split off as the leaf. Tukey shows the use of values of 10, 100 and 5 in examples. In general, the best plots (in the sense of being easiest to interpret) are obtained if the values (of ARG1) have been scaled to be integers and a positive power of 10 is used for ARG2 but, where the number of values shown on each line is too great, the values can be split over more than one line by setting ARG2 to the next lower power of 10 multiplied by 2 or 5. ARG3 (optional) Contains an integer to define the format for the stem values. The default is -1, meaning that they are to be printed as integers in the smallest possible number of characters. ARG4 (optional) Contains an integer to define the format for the leaf values. The default is 2, to avoid rounding to integers which look as if they are then on the wrong line but, if the leaf values are integer, -1 will pack them closer and allow more values to be displayed. Any or all of ARG2 to ARG4 may be reset before or between calls of STEM. Uses Within the macro, vectors TMP_ and LIN_ are used. <PAGE> Subsidiary macros Macro STEM calls or uses macros SL1, SL2, SL3, SL4, SL5, ARG1, ARG2, ARG3, ARG4 and PERR. ARG1 is initially defined as empty. ARG2 to ARG4 are given default settings as described above. Output The macro outputs the stem and leaf plot only. Where one or more successive stem values would appear on a line with no leaves, they are not printed but a single row of dots is output to indicate the jump in stem value. This is not precisely after Tukey but is necessary to avoid plots of excessive height. Further Information The stem and leaf plot is a good way of displaying values so as to show their distribution. When some of the numbers are negative, the method cannot be immediately applied (which line would 0 itself go on?), so the macro adopts the conventions that (1) If all the numbers are negative, the sign is ignored and the absolute values are plotted and (2) If the values straddle 0, then a constant is added to each, so as to make them all greater than 0, and these shifted values are plotted. In either case, with the plot. an explanatory message will be output Example Input $UNIT 23 $DATA OBS $READ ! Tukey P 212 Coal Production. 569 416 422 565 484 520 573 518 501 505 468 382 310 334 359 372 439 446 349 395 461 511 583 $INPUT %PLC 80 STEM $ $M ARG1 OBS $E $USE STEM$ ! ARG2, ARG3 and ARG4 defaults used <PAGE> Example Output Stem and leaf plot of OBS 31. ... 33. 34. 35. ... 37. 38. 39. ... 41. 42. 43. 44. ... 46. ... 48. ... 50. 51. 52. ... 56. 57. 58. - in steps of 10.00 0. - 4.0 9.0 9.0 - 2.0 2.0 5.0 - 6.0 2.0 9.0 6.0 - 1.0 - 4.0 - 1.0 1.0 0. 5.0 8.0 - 5.0 3.0 3.0 9.0 8.0 Number of values: 23 <PAGE> Macro SMOOTH in subfile SMOOTH Author R. A. Reese, Computer Centre, Hull University, U.K. Purpose SMOOTH provides a number of methods for smoothing the values of a variate and allows for the easy addition of other methods or refinement in dealing with end values. Macro arguments ARG1 (obligatory) The text of the macro ARG1 should contain the name of the variate to be smoothed. This variate will be overwritten by the smoothed values calculated by the macro. The vector may be of any length. Uses ARG2 (optional) If set by the user, macro ARG2 must contain the name of a macro. Note the substitution character must be present. By default it is set as #KMEAN, but of course the default may be changed locally. #KMEAN replaces the values of ARG1 by running means. The other methods provided as macros are three term running median ( #MEDIAN ) Tukey's 3R repeated median with end smoothing ( #T3R ) and Hanning ( #HANN ). #KMEAN, #MEDIAN and #HANN copy the end values (actually, they never alter them) and #T3R applies #MEDIAN until the values do not change and then chooses values for the end points. ARG3 (optional) Macro ARG3 is used only if #KMEAN is the method and is an integer to define the number of terms to be included in the running mean. If ARG3 has the value K, then 2K+1 term running means will be substituted for all except the first and last K values, which will be unchanged. ARG3 has the default value 1 to give three terms in each mean. Within the macro, ordinary scalars %Y and %Z and vectors IND_, TMP_ and CPY_ are used. Subsidiary macros Macro SMOOTH calls or uses macros SMOOTH, KMEAN, HANN, MEDIAN, T3R, KME1, ARG1, ARG2 and ARG3. ARG1 is initially defined as empty. ARG2 and ARG3 are, as described above, initially set to give three term running mean smoothing and it is necessary for the user to redefine these only to choose another method. If another algorithm is provided, its name would be given as the text of ARG2. Output The only output is a confirmatory message. <PAGE> Example Input $UNIT 23 $DATA OBS $READ ! Tukey P 212 Coal Production. 569 416 422 565 484 520 573 518 501 505 468 382 310 334 359 372 439 446 349 395 461 511 583 $INPUT %PLC 80 SMOOTH $ $M ARG1 O $E !We will use all possible smoothings $CALC O=OBS $USE SMOO $CALC S1=O $M ARG2 #MEDIAN $E $CALC O=OBS $USE SMOO $CALC S2=O $M ARG2 #T3R $E $CALC O=OBS $USE SMOO $CALC S3=O $M ARG2 #HANN $E $CALC O=OBS $USE SMOO$ $PR ' OBS MEANS MEDIANS 3R HANNED';$ $LOOK(S=-1) OBS S1 S2 S3 O $ Example Output OBS MEANS MEDIANS 569.0 416.0 422.0 565.0 484.0 520.0 573.0 518.0 501.0 505.0 468.0 382.0 310.0 334.0 359.0 372.0 439.0 446.0 349.0 395.0 461.0 511.0 583.0 569.0 469.0 467.7 490.3 523.0 525.7 537.0 530.7 508.0 491.3 451.7 386.7 342.0 334.3 355.0 390.0 419.0 411.3 396.7 401.7 455.7 518.3 583.0 569.0 422.0 422.0 484.0 520.0 520.0 520.0 518.0 505.0 501.0 468.0 382.0 334.0 334.0 359.0 372.0 439.0 439.0 395.0 395.0 461.0 511.0 583.0 3R 422.0 422.0 422.0 484.0 520.0 520.0 520.0 518.0 505.0 501.0 468.0 382.0 334.0 334.0 359.0 372.0 439.0 439.0 395.0 395.0 461.0 511.0 583.0 HANNED 569.0 455.7 456.2 509.0 513.2 524.2 546.0 527.5 506.2 494.7 455.7 385.5 334.0 334.2 356.0 385.5 424.0 420.0 384.7 400.0 457.0 516.5 583.0 <PAGE> Macro CHIP in subfile CHIP Author R. A. Reese, Computer Centre, Hull University, U.K. Purpose The text of CHIP is an expression that evaluates the uppertail probability of a chi-squared variate. This expression is incorporated into macros TUNIFORM and TNORMAL but is not separately usable from those macros. Formal Arguments %1 %2 The value of the chi-squared statistic. The appropriate number of degrees of freedom. Macro arguments -NoneUses This macro uses no named structures or variables. Subsidiary macros Macro CHIP calls or uses no subsidiary macros. Output The macro is an expression, so must be used in a $CALCULATE directive. If the value is not assigned, it will be output by the directive. eg $CAL #CHIP $ Otherwise, you may store it. eg $CAL %P=#CHIP $ Example Input $CAL %A=3.84 : %B=1$ $INPUT %PLC 80 CHIP$ $ARG CHIP %A %B$ $CAL #CHIP$ Example Output 0.05004 <PAGE> Macro QPLOT in subfile QPLOT Author Purpose plot B. J. Francis, Centre for Applied Statistics, Cartmel College, University of Lancaster, U.K. The macro (normal QPLOT produces a plot) of raw, Normal quantile-quantile standardised or jack-knife residuals. The quantile- macro can also be used to produce a Normal quantile plot of a variate. Alternative expressions for residual calculation may be defined by the user. the The macro takes account of any prior weights being used to weight out observations. If prior weights are set, only those values for which the prior weight is non-zero will be used in the production of the plot. Prior weights with non-zero values will be taken to have a prior weight of one for the purposes of this macro. A general expression for the quantiles of the Normal distribution is allowed. For 0<=k<1, the quantiles are calculated using the formula %ND((i-k)/(n+1-2k)) for i=1...n. Common values of k are 0 and 0.5. Quantiles given by Filliben(1975), (which set k to .3175, but have adjusted first and last quantiles) are provided by default. If these are used, the Filliben correlation coefficient, which calculates the correlation between the sorted values and the Normal quantiles, is given. The significance of this correlation may be assessed by a table provided in Filliben. Formal Arguments %1 (optional) The name of a macro which gives the type of residuals required. Valid macro names are: RAW raw residuals (default) STAN standardised residuals JACK jack-knife or cross-validatory residuals The name or a macro containing the name of of a variate of standard length The or name of a macro containing an alternative expression for the calculation of residuals. %2 (optional) a scalar. If not set, or if set and its value lies outside the range [0,1), then the Normal quantiles given by Filliben are used. The Filliben correlation coefficient is calculated and displayed. If set, and the value of the scalar lies within the range [0,1), then this is taken as the value of k in the above expression. The Filliben correlation coefficient is not calculated or displayed. <PAGE> Macro Arguments -NoneUses are Within the macro, vectors PW_, IND_, RES_, R_ and I_ used. Subsidiary macros Macro QPLOT calls or uses macros QP1, QP2, QP3, QP4, QP5, RAW, STAN, JACK and NOTSET. Output the The macro will produce a Normal quantile-quantile plot of sorted residuals or sorted values of the vector plotted against their equivalent Normal deviate. If prior weights are set, then a message will be output informing the user of the number of units used in the production of the plot. If formal argument %2 is not set, or lies outside the range [0,1), then the Filliben correlation coefficient for the plot is displayed and is stored in scalar %C. Further information 1. Macros RAW, STAN and JACK can be used in a $CALCULATE statement to to enable the unsorted residuals from a fit be stored. Macros STAN and JACK require the system vector %VL, so the GLIM directive $EXTRACT %VL$ must be used. Macro JACK calls macros STAN and RAW. Macro STAN calls macro RAW. Examples: $CALC RRES=`RAW $EXT %VL $CALC SRES=`STAN 2. The macro will produce uninformative residual plots if the number of points weighted in is close to ,equal to or less than the number of parameters being fitted in the model. 3. The macros take account of any $OFFSET or $SCALE parameter settings in the calculation of residuals. Macro QPLOT takes account of points weighted out of a fit by use of prior weights in the manner desribed above. 4. If prior weights are set and standardised or jackknife residuals are being plotted, the macro will calculate the residual expression for all observations, and not just for those observations weighted in to the fit. In this case, the GLIM warning message --- Invalid function/operator arguments may be produced. This will be caused by observations weighted out of the fit, and will not therefore affect the Normal plot. 5. QPLOT deletes the system vector %RE. <PAGE> Restrictions 1. If the macro QPLOT is used to produce a Normal quantilequantile plot of a variate, then that variate must be of standard length. References coefficient Filliben(1975) 'The probability plot correlation test for Normality', Technometrics,17,pages 111-117 Example Input $C BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS FOR MALE AND FEMALE BABIES (FROM DOBSON P.14) $UNITS 24 $DATA AGE WEIGHT $READ 40 2968 38 2795 40 3163 35 2925 36 2625 37 2847 41 3292 40 3473 37 2628 38 3176 40 3421 38 2975 40 3317 36 2729 40 2935 38 2754 42 3210 39 2817 40 3126 37 2539 36 2412 38 2991 39 2875 40 3231 $CAL SEX=%GL(2,12)$FACT SEX 2$ $C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE SECOND FOR FEMALES. $YVAR WEIGHT$FIT AGE+SEX$ $INPUT %PLC 80 QPLOT$ $USE QPLOT STAN$ ! Q-Q plot of standardised residuals $M VECT WEIGHT $E $CALC %A=0$ $USE QPLOT VECT %A$ ! Q-Q plot for vector WEIGHT using k=0 <PAGE> Example Output deviance = 658771. d.f. = 21 Normal Q-Q plot (STAN) 2.200 | 2.000 | 1.800 | 1.600 | 1.400 | 1.200 | 1.000 | 0.800 | 0.600 | 0.400 | 0.200 | 0.000 | -0.200 | -0.400 | -0.600 | -0.800 | -1.000 | + + + + + +++ + ++ +++ + + +++ + + -1.200 | + -1.400 | + -1.600 | + ----------:---------:---------:---------:---------:---------:---------: -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 Filliben correlation coefficient equals 0.9718 Normal Q-Q plot (VECT) 3540.0 | 3480.0 | + 3420.0 | + 3360.0 | 3300.0 | + + 3240.0 | + 3180.0 | +++ 3120.0 | + 3060.0 | 3000.0 | ++ 2940.0 | ++ + 2880.0 | + 2820.0 | +++ 2760.0 | + 2700.0 | + 2640.0 | + + 2580.0 | 2520.0 | + 2460.0 | 2400.0 | + ----------:---------:---------:---------:---------:---------:---------: -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 <PAGE> Macros TNORMAL and WDASH in subfile TNORMAL Author R. A. Reese, Computer Centre, Hull University, U.K. Purpose The macro TNORMAL tests the distribution of a variate against a theoretical normal distribution, using both a Shapiro-Francia W' test and a chi-squared goodness of fit test based on dividing the range into equal probability intervals. The parameters of the theoretical distribution may be supplied or evaluated from the sample. In the latter case, the chi-squared statistic is biased. If only the W' test is required, WDASH should be called in place of TNORMAL. The number of intervals is selected to give at least five expected observations in each partition, subject to a maximum of twenty intervals. Formal Arguments -NoneMacro Arguments ARG1 name (obligatory) The text of the macro ARG1 must contain the of a vector or a vector expression. The vector may be of any length. PRINT (Optional). Can contain a scalar or a number. If the scalar or number is not zero (default), then the result of the test will be printed on the current output channel. If macro PRINT contains a number or scalar which evaluates to zero, then the results will be saved and not printed. Scalar Arguments %X,%Y Uses For TNORMAL (but not WDASH), if the value of ordinary scalar %X is greater than zero, then %Y will be used as the mean and %X as the variance of the theoretical normal distribution. Otherwise, the sample mean and variance will be used, with consequent loss of degrees of freedom. Within the macros, ordinary scalars %P, %T, %U, %V, %W, %X, %Y and %Z and variates OBS_, EXP_ and PCH_ are used. Subsidiary macros Macros TNORMAL and WDASH call or use macros TN1, TN2, TN3, TN4 and PRINT. ARG1 is initially defined as empty. <PAGE> Output If the number of observations is within the range 5 to 1000, the Shapiro-Francia W' test (Royston 1983) will be applied first and the macro will print the test statistic and the approximate probability of obtaining that value if sampling from a normal distribution. If the macro argument PRINT is set to a non-zero value then the macro will print a table of the observed and expected frequencies within each interval of the range titled with the name of the variable (see example), the parameters of the theoretical distribution and the chi-squared test statistic. If WDASH is called, the W' test statistic and probability will be output followed by an optional normal plot. On exit, the following system identifiers will contain values: %P %U %V %W %X %Y %Z - probability of W' or chi-squared statistic. overall chi-squared value. number of intervals. standard deviation of distribution. variance of distribution. mean of distribution. degrees of freedom for significance test. If the macro argument PRINT is set to zero then additionally the first %V values of the following OBS_ EXP_ PCH_ References vectors will contain: - observed frequency in each interval. - expected frequency in each interval. - contribution to chi-squared from each interval. J P Royston (1983) A Simple Method for Evaluating the Shapiro-Francia W' Test. The Statistician Vol 32, No 3, p 297. (but note that the example in the paper is evaluated not using Blom's normal order values.) Example Input $UNIT 23$CAL X=%SR(0)$ $INPUT %PLC 80 TNORMAL $ $M ARG1 X $E $CALC %X=0$ $USE TNOR$ <PAGE> Example Output Sample estimates used as parameters. - df adjusted. Shapiro and Francia's W' test (ref. Royston, STATISTICIAN 1983) small W and P values suggest non-normality test value (W') 0.9595 P= 0.3886 Test of distribution of X against Normal. observed expected partitioned freq freq chi-squared 8.000 4.000 5.000 6.000 5.750 5.750 5.750 5.750 0.88043 0.53261 0.09783 0.01087 mean (%y) 0.5189 chi-squared (%u) st. dev. (%w) 0.2907 1.522 with (%z) 1 df P= (%P) 0.217 <PAGE> Macro RSQ in subfile NORMAC Author Purpose the M. Slater, Dept of Computer Science and Statistics, Queen Mary College, Mile End Road, LONDON E1 The macro RSQ displays the R-squared coefficient for current fit. Formal Arguments -None Macro Arguments -NoneUses Within the macro, scalar %R is used. Subsidiary macros -NoneOutput The macro prints the value of the R-squared coefficient, which is also stored in scalar %R. Further information 1. The macro takes account of any use of the $OFFSET directive but ignores any use of prior weights. Example Input $C BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS FOR MALE AND FEMALE BABIES (FROM DOBSON P.14) $UNITS 24 $DATA AGE WEIGHT $READ 40 2968 38 2795 40 3163 35 2925 36 2625 37 2847 41 3292 40 3473 37 2628 38 3176 40 3421 38 2975 40 3317 36 2729 40 2935 38 2754 42 3210 39 2817 40 3126 37 2539 36 2412 38 2991 39 2875 40 3231 $CAL SEX=%GL(2,12)$FACT SEX 2$ $C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE SECOND FOR FEMALES. $YVAR WEIGHT$FIT AGE+SEX$ $INPUT %PLC 80 NORMAC$ $USE RSQ$ Example Output deviance = 658771. d.f. = 21 R-squared equals 0.6400 <PAGE> Macro TVAL in subfile NORMAC Author M. Slater, Dept of Computer Science and Statistics, Queen Mary College, Mile End Road, LONDON E1 Purpose The the macro TVAL computes, displays and optionally stores t-test values for a Normal model. These are produced by dividing the vector of parameter estimates by the corresponding vector of the standard errors of the parameter estimates. Formal Arguments -NoneMacro Arguments -NoneUses Within the macro, vectors TV_ and IND_ are used. Subsidiary macros -NoneOutput fitted The macro will display the t-test values for the normal model. The t-test values are not labelled but they are displayed in the same order as the parameter estimates and standard errors directive $DIS are displayed in the output from the GLIM E$. The t-values are stored in the variate TV_, which has length %PL. Further information 1. The macros take account of any $OFFSET, $SCALE or $WEIGHT settings. <PAGE> Example Input $C BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS FOR MALE AND FEMALE BABIES (FROM DOBSON P.14) $UNITS 24 $DATA AGE WEIGHT $READ 40 2968 38 2795 40 3163 35 2925 36 2625 37 2847 41 3292 40 3473 37 2628 38 3176 40 3421 38 2975 40 3317 36 2729 40 2935 38 2754 42 3210 39 2817 40 3126 37 2539 36 2412 38 2991 39 2875 40 3231 $CAL SEX=%GL(2,12)$FACT SEX 2$ $C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE SECOND FOR FEMALES. $YVAR WEIGHT$FIT AGE+SEX$INPUT %PLC 80 NORMAC$ $D E$ $USE TVAL$ Example Output deviance = 658771. d.f. = 21 1 2 3 estimate -1447. 120.9 -163.0 s.e. 784.3 20.46 72.81 scale parameter taken as parameter 1 AGE SEX 31370. T values +----------+ | TV_ | +---+----------+ | 1 | -1.845 | | 2 | 5.908 | | 3 | -2.239 | +---+----------+ <PAGE> Macro LEV in subfile LEV Author M. Slater, Dept of Computer Science and Statistics, Queen Mary College, Mile End Road, LONDON E1 Purpose (or The macro LEV calculates and stores the leverage values influence values) after a fit of a Normal model. The macro also displays a plot of the leverage values plotted against observation number. Formal Arguments -NoneMacro Arguments -NoneUses are Within the macro, vectors PW_, LEV_, LIM_, IND_ and LWT_ used. Subsidiary macros Macro LEV calls or uses macros Output against LEV1 and LEV2. Displays an index plot of the leverage values plotted observation number. Observations weighted in to the fit are plotted with a plus symbol (+), whereas observations weighted out of the fit are plotted with a dot (.) . Hoaglin and Welsch's criterion for detecting influential points is 2p/n, where p is the number of parameters, and n is the number of observations included in the fit. This criterion is also calculated, and is displayed on the plot as a line of dashes (-). Points labelled with a plus symbol lying above this line have high influence. On exit from the macro, scalar %Z contains the value 2p/n. Additionally, the following variates contain values: LEV_ LWT_ out contains the leverage values contains a vector which can be used to weight from the fit points of high influence (i.e. those points whose influence values are greater than %Z). LWT_ is set to zero if the observation was previously weighted out or if the point has high influence, and set to 1 otherwise. <PAGE> Further information 1. The macros take account of any $OFFSET or $SCALE parameter settings in the calculation of leverage values. Macro LEV also takes account of points weighted out of a fit in the manner desscribed above. Prior weights with non- zero values will be taken to have a prior weight of one for the purposes of this macro. 2. LEV deletes the system vector %RE. References in Hoaglin, D. and Welsch, R. (1978) 'The hat matrix regression and ANOVA' ,American Statistician, 32,pp 17-22 and 132 Cook, R and Weisburg, S (1982), 'Residuals and influence in regression', Chapman and Hall, London Example Input $C BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS FOR MALE AND FEMALE BABIES (FROM DOBSON P.14) $UNITS 24 $DATA AGE WEIGHT $READ 40 2968 38 2795 40 3163 35 2925 36 2625 37 2847 41 3292 40 3473 37 2628 38 3176 40 3421 38 2975 40 3317 36 2729 40 2935 38 2754 42 3210 39 2817 40 3126 37 2539 36 2412 38 2991 39 2875 40 3231 $CAL SEX=%GL(2,12)$FACT SEX 2$ $C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE SECOND FOR FEMALES. $YVAR WEIGHT$FIT AGE+SEX$ $INPUT %PLC 80 QPLOT$ $USE LEV$ $CAL $WEI $FIT $USE W=SEX-1$ W$ .$ LEV$ <PAGE> Example Output deviance = 658771. d.f. = 21 Leverage plot 0.2600 | 0.2500 | - - - - - - - - - - - - - - - - - - - - - - - 0.2400 | 0.2300 | + 0.2200 | + 0.2100 | 0.2000 | 0.1900 | 0.1800 | + + + 0.1700 | 0.1600 | + 0.1500 | 0.1400 | 0.1300 | 0.1200 | + + + + + 0.1100 | + + 0.1000 | + + + + 0.0900 | + + 0.0800 | + + + + + 0.0700 | ----------:---------:---------:---------:---------:---------:---------: 0.00 5.00 10.00 15.00 20.00 25.00 30.00 -- model changed deviance = 248726. d.f. = 10 from 12 observations Leverage plot --- weight variate is W with 12 units weighted out (.) 0.5250 | 0.5000 | - - - - - - - - - - - - - - - - - - - - - - - 0.4750 | . 0.4500 | 0.4250 | 0.4000 | 0.3750 | + 0.3500 | 0.3250 | 0.3000 | . + + 0.2750 | 0.2500 | 0.2250 | . 0.2000 | 0.1750 | . . + 0.1500 | 0.1250 | . . . . + + + + 0.1000 | . . . + + 0.0750 | + + 0.0500 | ----------:---------:---------:---------:---------:---------:---------: 0.00 5.00 10.00 15.00 20.00 25.00 30.00 <PAGE> Macro BOXCOX in subfile BOXCOX Author B. J. Francis, Centre for Applied Statistics, Cartmel College, University of Lancaster, U.K. Purpose The macro BOXCOX fits the Box-Cox transformation family to a selected y-variate and model over a grid of values of lambda. Formal Arguments %1 (optional) A scalar. If the first formal argument is set and not zero, then the variates containing the values of -2 log(maximised likelihood) (DEV_) and the corresponding values of lambda (LMB_) are saved in the workspace. If the argument is not set, or is set to a scalar which has the value 0, then these variates are displayed but not saved. Macro Arguments YVAR (obligatory) The untransformed y-variate name needs to be stored in a macro called YVAR. MODEL (obligatory) The model formula to be fitted needs to be stored in a macro called MODEL. Macro Prompting The macro prompts for the required grid values of lambda. The maximum and minimum values of lambda, together with the grid increment to be used, need to be specified. Suitable starting values are 2 (0.4) -2 . Uses vectors Within the macro, scalars %D, %L, %N and %Z and INP_, YTR_, IND_, SYV_, DEV_, LMB_, and PW_ are used. Subsidiary macros Macro BOXCOX calls or uses macros YVAR, MODEL, BOX1, BOX2, BOX3, BOX4, BOX5, BOX6, BOX7, BOX8, BOX9 and BOXA. Output each Calculates and displays -2*log(maximized likelihood) for value of lambda on the grid and produces a 'deviance' plot. Additionally, if the first formal parameter is set to a nonzero scalar, the following variates of length %Z are saved: DEV_ Variate containing -2 log(maximized likelihood) for each value of lambda. Variate containing corresponding grid values LMB_ of lambda. The following scalars contain useful information: %N Number of points weighted in. Equal to %NU if prior weights are not set. %Z Number of grid points for lambda. Length of DEV_ and LMB_. <PAGE> Further information 1. Macro BOXCOX can also be used when GLIM is being used in batch rather than interactive mode. In true batch mode, there is no interaction with the user. However, the values of the maximum value of lambda, increment and minimum value of lambda can still be passed to the macro. They should be placed in the input file in order, with one number on a line, following the call to BOXCOX: $USE BOXCOX$ 2 0.4 -2 The three numbers will then be read from the primary input channel by the macro. 2. Remember for that a separate fit has to be carried out each value of lambda on the grid. Choosing a small increment value will produce a large number of grid points and the macro will take a long time to produce any output. 3. The macro takes account of any $OFFSET variate. BOXCOX also takes account of any prior weights being used to weight out observations. If prior weights are set, only those values for which the prior weight is non-zero will be used in the calculation of the likelihood. Prior weights with non-zero values will be taken to have a prior weight of one for the purposes of this macro. BOXCOX finds the m.l.e of sigma for each fixed value of lambda on the grid, so any $SCALE setting is ignored. 4. BOXCOX deletes the system vector %RE. 5. As the macro carries out many $FITs, the $OUTPUT is switched off for part of the execution. If the macro fails, for instance, model, or because of an incorrectly specified if break-in is used, the output may still be switched off when control returns to the user at the terminal. See section 6 of the macro library description for appropriate remedial action. 6. On normal exit from the macro: The $TRANSCRIPT options will be reset to the default. The GLIM $YVARiate is set to the untransformed y-variate specified in macro YVAR. The model formula is set to the model formula specified in macro MODEL. 7. Macro BOXCOX produces more than one screenful of output. To a examine the output from the macro one screenful at time, the $PAGE directive may be issued before the call to BOXCOX. <PAGE> Restrictions 1. Macro prompting works only when free format is in force. If the $FORMAT directive has been used to read in formatted data, then the format setting should be reset by the GLIM directive: $FORMAT FREE$ 2. The macro will work only for y-variates which have no negative or zero values (as transformations such as log or square root may be included). References Box, G. & Cox, D.(1964), 'An analysis of transformations', JRSSB, 26 ,211-252 Example Input $C BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS FOR MALE AND FEMALE BABIES (FROM DOBSON P.14) $UNITS 24 $DATA AGE WEIGHT $READ 40 2968 38 2795 40 3163 35 2925 36 2625 37 2847 41 3292 40 3473 37 2628 38 3176 40 3421 38 2975 40 3317 36 2729 40 2935 38 2754 42 3210 39 2817 40 3126 37 2539 36 2412 38 2991 39 2875 40 3231 $CAL SEX=%GL(2,12)$FACT SEX 2$ $C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE SECOND FOR FEMALES. $INPUT %PLC 80 BOXCOX$ $MAC MODEL AGE+SEX $END $MAC YVAR WEIGHT $END $USE BOXCOX$ Max. value of lambda? $DIN? 2 Increment? $DIN? 0.4 Min. value of lambda? $DIN? -2 <PAGE> Example Output --- Model is AGE+SEX --- Y-Variate is WEIGHT +----------:---------:---------:---------:---------:---------:---------: | 324.160 | | | 324.000 | * | | 323.840 | | | 323.680 | | | 323.520 | | | 323.360 | * | | 323.200 | | | 323.040 | | | 322.880 | | | 322.720 | * | | 322.560 | | | 322.400 | | | 322.240 | * | | 322.080 | * | | 321.920 | * | | 321.760 | * | | 321.600 | * * * | | 321.440 | * | |----------:---------:---------:---------:---------:---------:---------: | -2.400 -1.600 -0.800 0.000 0.800 1.600 2.400 +----------------------------------------------------------------------+ +----------------------+ | DEV_ | LMB_ | +----+----------------------+ | 1 | 324.0 | -2.0000 | | 2 | 323.3 | -1.6000 | | 3 | 322.7 | -1.2000 | | 4 | 322.2 | -0.8000 | | 5 | 321.9 | -0.4000 | | 6 | 321.6 | 0.0000 | | 7 | 321.5 | 0.4000 | | 8 | 321.5 | 0.8000 | | 9 | 321.6 | 1.2000 | | 10 | 321.8 | 1.6000 | | 11 | 322.1 | 2.0000 | +----+----------------------+ <PAGE> Macro BOXFIT in subfile BOXCOX Author Purpose value B. J. Francis, Centre for Applied Statistics, Cartmel College, University of Lancaster, U.K. The of (or macro BOXFIT fits the chosen model for a selected lambda using the transformed y-variate y**lambda log(y) if lambda is zero). Displays a Quantile-quantile plot of the raw residuals from the fit. Calculates and displays -2*log (maximised likelihood) for the chosen value of lambda. Formal Arguments -NoneMacro Arguments YVAR (obligatory) The untransformed y-variate name needs to be stored in a macro called YVAR. MODEL (obligatory) The model formula to be fitted needs to be stored in a macro called MODEL. Macro Prompting The macro prompts for the required value of lambda. Uses Within the macro, scalars %D, %L and %N and vectors INP_, YTR_, IND_, SYV_, RES_, ND_ and PW_ are used. Subsidiary macros Macro BOXFIT calls or uses macros YVAR, MODEL, BOX1, BOX2, BOX3, BOX4, BOX5, BOX9 and BOXA. Output of Displays -2*log(maximized likelihood) for the chosen value lambda. Displays the GLIM deviance and parameter estimates for the transformed y-variate: y to the power lambda for lambda not equal to zero log(y) for lambda equal to zero Produces a quantile-quantile plot of the sorted raw residuals plotted against their eqivalent Normal deviates (for those points weighted in) The following scalars contain useful information: %D -2*log(maximized likelihood) for transformed %N %L y-variate. Number of points weighted in. Equal to %NU if prior weights are not set. Chosen value of lambda. <PAGE> Further information 1. Macro BOXFIT can also be used when GLIM is being used in batch rather than interactive mode.In true batch mode, there is no interaction with the user. However, the value of the chosen value of lambda can still be passed to the macro. They value should be placed in the input file following the call to BOXFIT: $USE BOXFIT$ 0 The number will then be read by the macro from the primary input channel. 2. The macro takes account of any $OFFSET variate. BOXFIT also takes account of any prior weights being used to weight out observations. If prior weights are set, only those values for which the prior weight is non-zero will be used in the calculation of the likelihood and in the production of the plot. Prior weights with non-zero values will be taken to have a prior weight of one for the purposes of this macro. BOXFIT find the m.l.e. of sigma for the fixed, chosen value of lambda, so any $SCALE setting is ignored. 3. BOXFIT deletes the system vector %RE. 4. The macro will produce uninformative residual plots if the number of points weighted in is close to ,equal to or less than the number of parameters being fitted in the model. 5. On normal exit from the macro: The GLIM $YVARiate is set to the untransformed y-variate specified in macro YVAR. The model formula is set to the model formula specified in macro MODEL. Restrictions 1. Macro prompting works only when free format is in force. If the $FORMAT directive has been used to read in formatted data, then the format setting should be reset by the GLIM directive: $FORMAT FREE$ 2. The macro will work only for y-variates which have no negative or zero values (as transformations such as log or square root may be included). <PAGE> References Box, G. & Cox, D.(1964), 'An analysis of transformations', JRSSB, 26 ,211-252 Example Input $C BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS FOR MALE AND FEMALE BABIES (FROM DOBSON P.14) $UNITS 24 $DATA AGE WEIGHT $READ 40 2968 38 2795 40 3163 35 2925 36 2625 37 2847 41 3292 40 3473 37 2628 38 3176 40 3421 38 2975 40 3317 36 2729 40 2935 38 2754 42 3210 39 2817 40 3126 37 2539 36 2412 38 2991 39 2875 40 3231 $CAL SEX=%GL(2,12)$FACT SEX 2$ $C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE SECOND FOR FEMALES. $INPUT %PLC 80 BOXCOX$ $MAC MODEL AGE+SEX $END $MAC YVAR WEIGHT $END $USE BOXFIT$ Value of lambda? $DIN? 0 Example Output --- Model is AGE+SEX --- Y-Variate is WEIGHT -- $data list abolished --- Transformed Y-variate is log(WEIGHT ) deviance = 0.075739 d.f. = 21 1 estimate 6.487 s.e. 0.2659 parameter 1 2 0.04118 0.006938 AGE 3 -0.05543 0.02469 SEX scale parameter taken as 0.003607 -- model changed -2 log l = 321.6 lambda = 0. <PAGE> Raw residual Q-Q plot +----------:---------:---------:---------:---------:---------:---------: | 0.1200 | | | 0.1080 | + | | 0.0960 | | | 0.0840 | + | | 0.0720 | + + | | 0.0600 | ++ + | | 0.0480 | + | | 0.0360 | | | 0.0240 | + | | 0.0120 | | | 0.0000 | ++ | | -0.0120 | | | -0.0240 | +++ + | | -0.0360 | ++++ | | -0.0480 | | | -0.0600 | + + | | -0.0720 | + | | -0.0840 | + + | |----------:---------:---------:---------:---------:---------:---------: | -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 +----------------------------------------------------------------------+ <PAGE> Macro PRESS in subfile PRESS Author Purpose and M. A. Aitkin, Centre for Applied Statistics, Cartmel College, University of Lancaster, U.K. After a fit of a Normal model, the macro PRESS calculates displays the prediction sum of squares, and the crossvalidation estimates of sigma squared and R squared. Formal Arguments -NoneMacro Arguments -None- Uses YMI_, Within EST_, the macro, RES_ scalars %P, %R and %S and vectors and YVA_ are used. Subsidiary macros Macro PRESS calls or uses macros PRE1 Output and The macro will display the prediction sum of squares, cross-validation estimates of sigma squared and R-squared. The following scalars contain useful information: %P Prediction sum of squares %R Cross-validation estimate of R-squared %S Cross-validation estimate of sigma squared Further information 1. The macros take account of any $OFFSET parameter settings in the calculations. Restrictions 1. The macro cannot be used if prior weights are set. If this is attempted, an error message is printed and the macro is abandoned. References Analysis', 1. Draper and Smith (1981), 'Applied Regression 2nd edition, New York, Wiley 2. Copas, J. (1983), 'Regression, Prediction and Shrinkage', JRSSB, 45, p342 <PAGE> Example Input $C BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS FOR MALE AND FEMALE BABIES (FROM DOBSON P.14) $UNITS 24 $DATA AGE WEIGHT $READ 40 2968 38 2795 40 3163 35 2925 36 2625 41 3292 40 3473 37 2628 38 3176 40 3421 40 3317 36 2729 40 2935 38 2754 42 3210 40 3126 37 2539 36 2412 38 2991 39 2875 $CAL SEX=%GL(2,12)$FACT SEX 2$ $C THE FIRST 12 PAIRS OF VALUES ARE FOR SECOND FOR FEMALES. $INPUT %PLC 80 PRESS$ 37 38 39 40 2847 2975 2817 3231 MALES AND THE $MAC MODEL AGE+SEX $END $MAC YVAR WEIGHT $END $USE PRESS$ Example Output deviance = 658771. d.f. = 21 PRESS (Prediction Sum of Squares) = Cross Validation Estimates of : sigma squared = 36896. R-squared = 0.5255 885495. <PAGE> Macro WEIBULL in subfile WEIBULL Author Purpose specified B. J. Francis, Centre for Applied Statistics, Cartmel College, University of Lancaster, U.K. The macro regression WEIBULL model allows to the survival fitting data of a using either the exponential or Weibull distributions. The survival data may be partially right-censored, or can be entirely uncensored. Estimation The likelihood for the exponential or Weibull distributions can be expressed as a Poisson likelihood, with a linear model for the Poisson mean corresponding to a linear model for the hazard function. loglog- If the shape parameter is fixed or known, then this procedure corresponds to using the censor variate as y- variate, specifying the log of the vector of survival times as an offset, and fitting a Poisson model with log link. Thus, for the exponential distribution, a single Poisson fit in GLIM is all that is needed. The Weibull distribution requires an iterative procedure - fixing the shape parameter, fitting the model until and reestimating the shape parameter - convergence is achieved. Full computational details are given in Aitkin and Clayton(1980) The hazard function in t has the following form: Exponential Weibull where a is B is X is (and * and Maximum h(t)=exp(B'X) h(t)=a*(t**(a-1))*exp(B'X) the shape parameter the vector of parameters for a specified model the vector of explanatory variables. ** have their usual meanings as operators) likelihood estimates of a and B are given by the macro. The value of -2 log(maximised likelihood) or the 'deviance' is also given. Standard errors of B are displayed with the estimate of B. For the exponential distribution, these standard errors are correct, but for the Weibull, slightly underestimated, the standard errors of B are as they do not allow for the fact that a is an estimated parameter, and not fixed. No standard error for a is given. Macro WEIBULL may be called as many times as is necessary, changing the model formula specified in macro MODEL (and other arguments if necessary) as appropriate. <PAGE> Formal Arguments %1 The variate containing the survival times, some of which may be right-censored. %2 An indicator or censor variate. The elements of the variate should be 1 if the corresponding survival time is uncensored, and 0 if the corresponding survival time is censored. %3 (optional) A scalar. Determines whether the Weibull or exponential distribution is fitted. If the argument is not set or set to a non-zero scalar, then the Weibull distribution is fitted, using an iterative fitting procedure. The starting value of the shape parameter is taken from %A. If %A is zero or negative, then a starting value of 1 is used. If the third argument is set to a scalar equal to zero, then the exponential distribution is fitted. The value of the shape parameter is taken to be 1, and only one iteration of the iterative procedure is carried out. Macro Arguments MODEL (obligatory) The model formula to be fitted needs to be stored in a macro called MODEL. CONV (optional) The convergenge criterion for the change in the estimate of the shape parameter. The default is .001 CYCLE (optional) Should be set to contain either the GLIM directive $CYCLE$, or $RECYCLE$. Determines whether the underlying GLIM IRLS procedure recycles from the previous estimates or not. Setting The contents of macro CYCLE to $RECYCLE$ should speed up the fitting of complex models, but persistent recycling in a macro which uses an iterative fitting procedure may occasionally cause divergence of the deviance. The default is $CYCLE$. DISP (optional) Determines the $DISPLAY options to be used after convergence has occurred. DISP should contain valid display option letters. The default contents of DISP is E. Scalar arguments %A parameter. The starting value for the estimate of the shape Used only when the Weibull distribution is being fitted. If %A is zero or negative, then the default starting value of 1 is used. %W Sets the maximum number of iterations carried out by to 0, the macro. If set to a scalar less than or equal the default setting of 15 is used. <PAGE> Uses vectors Within the macro, scalars %A, %D, %F, and %W and OFV_ and LGT_ are used. Subsidiary macros Macro WEIBULL calls or uses macros MODEL, MOD1, MOD2, WAR1, MESS, DISP, CONV and CYCLE. By default, DISP is set to E, CONV is set to .001, CYCLE is set to $CYCLE$ and MODEL is undefined. Output the At each scale iteration, parameter the current deviance, and the number of degrees estimate of of freedom are displayed. On convergence, or after %W (15) iterations the parameter estimates together with their standard errors are displayed. Note that the standard errors may be slightly underestimated. On normal exit, the vector %FV contains the scaled from These residuals the fit. are not independent but will have approximately standard exponential distributions. The following scalars contain useful information: %A Estimate of shape parameter %D 'Deviance' or -2 log(maximised likelihood) %F Number of degrees of freedom %W Current setting of maximum number of iterations Further information 1. The macro takes NO account of any $OFFSET variate or $SCALE parameter settings. If prior weights are set ,the macro will display a warning message and unset them. 2. As the macro may carry out many $FITs, the $OUTPUT is switched off for part of the execution. If the macro fails, for instance, model, or because of an incorrectly specified if break-in is used. the output may still be switched off when control returns to the user at the terminal. See section 6 of the macro library description for appropriate remedial action. 3. On normal exit from the macro: The $ERROR setting is Poisson, with $LOG link. The GLIM $YVARiate is set to the censor variate specified in formal argument %2. The model formula is set to the model formula specified in macro MODEL. The $OFFSET is unset. The $WEIGHT is unset. Any settings of the $ERROR, $LINK, $WEIGHT, $OFFSET and $CYCLE made before the call of macro WEIBULL are lost. The display is inhibited. <PAGE> References of Aitkin, M. Exponential, and Clayton, Weibull and D. (1980), Extreme Value 'The fitting distributions to complex censored survival data using GLIM', Appl. Statist, 29, pp156-163 Example Input $C REMISSION TIMES IN ACUTE LEUKEMIA $C GEHAN'S DATA...JRSS B 1978,P.217 AND BIOMETRIKA,1965,52,P213 $DATA T $READ 1 1 2 2 3 4 4 5 5 8 8 8 8 11 11 12 12 15 17 22 23 6 6 6 6 7 9 10 10 11 13 16 17 19 20 22 23 25 32 32 34 35 $DATA C $READ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 ! (1=UNCENSORED OBSN, 0=CENSORED OBSN) $CAL G=%GL(2,21)$ ! VARIABLES: ! T REMISSION TIMES IN WEEKS ! C CENSOR VARIATE ! G GROUP TREATMENT(1=PLACEBO,2=6-MERCAPTOPURINE(6-MP)) ! $M MODEL G $E $CALC %B=1 : %W=0 :%A=0 $USE WEIB T C %B$ Example Output -- Model is G Exponential fit Deviance shape parameter df 217.05 1.0000 40 Weibull fit 213.22 1.3156 39 213.16 1.3546 39 213.16 1.3632 39 213.16 1.3652 39 ---Standard errors of estimates given below are underestimated estimate s.e. parameter 1 -1.339 0.5491 1 2 -1.731 0.3983 G scale parameter taken as 1.000 -- model changed <PAGE> Macro RESPLOT in subfile WEIBULL Author Purpose macro M. A. Aitkin, Centre for Applied Statistics, Cartmel College, University of Lancaster, U.K. After a fit of a Weibull model using macro WEIBULL, the RESPLOT displays two quantile-quantile residual plots. The macro uses the vector %FV, which on exit from macro WEIBULL contains the scaled residuals, which will have approximately standard exponential distributions. Formal Arguments %1 An indicator or censor variate. The elements of the variate should be 1 if the corresponding element of %FV is uncensored, and 0 if the corresponding element of %FV is censored. Macro Arguments -NoneUses used. Within the macro, vectors WK1_, WK2_, WK3_ and WK4_ are Subsidiary macros -NoneOutput The The macro displays two quantile-quantile residual plots. first plot is uncorrected for heterogeneity of the variance of the scaled residuals. If the The second is variance stabilised. probability model holds, both plots should give straight lines with slopes of unity. Further information 1. Macro RESPLOT does not destroy the contents of either %1 or %FV. References of Aitkin, M. Exponential, and Clayton, Weibull and D. (1980), Extreme 'The Value fitting distributions to complex censored survival data using GLIM', Appl. Statist, 29, pp156-163 Example Input $UNITS 42 $C GEHAN'S DATA...JRSS B 1978,P.217 AND BIOMETRIKA,1965,52,P213 $UNITS 42$DATA T $READ 1 1 2 2 3 4 4 5 5 8 8 8 8 11 11 12 12 15 17 22 23 6 6 6 6 7 9 10 10 11 13 16 17 19 20 22 23 25 32 32 34 35 $DATA C $READ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0 $CA G=%GL(2,21)$MAC MODEL G $E $INPUT %PLC 80 WEIB$ $CA %A=%W=0$USE WEIB T C$ $USE RESP C$ <PAGE> Example Output Residual plot 3.800 | 3.600 | 3.400 | 3.200 | + 3.000 | 2.800 | 2.600 | + 2.400 | 2.200 | 2.000 | + 1.800 | + 1.600 | 1.400 | 2 1.200 | + 1.000 | 22 + 0.800 | 4 0.600 | 22+ 0.400 | +24 0.200 |432 0.000 |5 ----------:---------:---------:---------:---------:---------:---------: 0.000 0.800 1.600 2.400 3.200 4.000 4.800 Variance stabilised residual plot 1.5200 | 1.4400 | + 1.3600 | + 1.2800 | 3 1.2000 | + + 1.1200 | 2+ 2 1.0400 | ++ 2 0.9600 | 2 + 0.8800 | + ++2 0.8000 | + ++ 0.7200 | 2 3 0.6400 | + ++ 0.5600 | + + 0.4800 | + 0.4000 | + + 0.3200 | 0.2400 | 2 0.1600 | 0.0800 | 0.0000 | ----------:---------:---------:---------:---------:---------:---------: 0.125 0.375 0.625 0.875 1.125 1.375 1.625 <PAGE> Macro TUNIFORM in subfile TUNIFORM Author R. A. Reese, Computer Centre, Hull University, U.K. Purpose The macro TUNIFORM tests the distribution of a variate against a theoretical uniform distribution, using a chi- squared goodness of fit test after dividing the range into intervals. The parameters of the theoretical distribution may be supplied or evaluated from the sample. In the latter case, the chi-squared statistic is biased. Formal Arguments -NoneMacro Arguments ARG1 (obligatory). Macro ARG1 must contain the name of a vector or a vector expression. The vector may be of any length. (It is up to you to ensure that there are enough values to make the test worthwhile.) PRINT (Optional). Can contain a scalar or a number. If the scalar or number is not zero (default), then the result of the test will be printed on the current output channel. If macro PRINT contains a number or scalar which evaluates to zero, then the results will be saved and not printed. Scalar Arguments %U,%V If the value of ordinary scalar %U is strictly greater than that of %V, then those values will be used as the maximum and minimum respectively of the theoretical rectangular distribution; otherwise, the maximum and minimum values of the sample vector will be used, with consequent loss of degrees of freedom. Uses Within the macro, ordinary scalars %P, %T, %U, %V, %W, %X, %Y and %Z and variates OBS_, EXP_ and PCH_ are used. Subsidiary macros Macro TUNIFORM calls or uses macros TU1, TU2, ARG1 and PRINT. ARG1 is initially defined as empty. PRINT initially contains the number 1. Output the If the macro argument PRINT is set to a non-zero value, macro will optionally print a table of the observed and expected frequencies within each interval of the range, titled with the name of the variable, the parameters of the theoretical distribution and the test statistic. <PAGE> The number of intervals is selected to give at least five expected observations in each partition, subject to a maximum of twenty intervals. On exit, the following system scalars will contain values: %P %U %V %W %X %Y %Z - probability (significance) of test statistic. maximum value for distribution. minimum value for distribution. range of distribution. overall chi-squared value. number of intervals. degrees of freedom for significance test. If macro argument PRINT is set to a zero value, then the first %Y values of the following variables will contain: OBS_ EXP_ PCH_ - observed frequency in each interval. - expected frequency in each interval. - contribution to chi-squared from each interval. Example Input $UNIT 23 $CA X=%SR(0) $INPUT %PLC 80 TUNIFORM $ $M ARG1 X $E $CALC %U=%V=0$ $USE TUNI$ Example Output Sample estimates used as parameters. df adjusted. Test of distribution of X against uniform. observed expected partitioned freq freq chi-squared 7.000 5.000 5.750 5.750 0.27174 0.09783 5.000 6.000 5.750 5.750 0.09783 0.01087 minimum (%v) 0.0578 maximum (%u) 0.9879 chi-squared (%x) 0.4783 with (%z) 1 df P= (%p) *** END OF DOCUMENTATION *** <PAGE> $ x) 0.4783 0.489