Download maclib

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

World Values Survey wikipedia , lookup

Transcript
run tran
GLIM macro library Release 1.0
------------------------------
January 1985
------------
Macro Library Description
------------------------<PAGE>
1. Introduction
-----------The GLIM macro library is provided as a standard facility in release
3.77
and future releases of GLIM. It provides a convenient method of
access to
many commonly used GLIM macros, some of which have been previously
published
in the GLIM newsletter. The macros which have been included in the
library
have been chosen for their usefulness either in providing extra
facilities
or by extending the range of data analysis and modelling which
can be
performed on the GLIM system. The library is distributed free of charge
with
new releases of the GLIM system, starting with release 3.77. The
library
uses many new features of GLIM 3.77 and is not suitable for use on
previous
releases of GLIM.
Updates to the macro library will be produced periodically and
details of
the updates will be published in the GLIM newsletter. The updates
can be
obtained on magnetic tape or disk from NAG for an additional fee.
Submissions to the macro library are welcome, and potential authors
should
contact the macro library editor to submit macros for refereeing
and to
obtain submission guidelines.
This document consists of a short description of the GLIM macro
library and
its use, followed by detailed documentation on each macro. This
documentation contains an example of the use of the macro together with
example
output. The document was written using the following symbols:
Directive symbol
$
Repetition symbol
:
Substitution symbol #
End of record symbol !
Function symbol
%
Quote symbol
'
2. Structure of the macro library
-----------------------------The macro library consists of a number of subfiles, with each subfile
(apart
from the first) containing one or more GLIM macros. Within each
subfile,
there will be one or more main macros, and possibly many other
subsidiary
macros which are called by the main macro(s).
The
subfiles are arranged into eight sections, as follows:
i) Data description, exploration and display
ii) Statistical utilities
iii) Normal models
iv) Poisson models and contingency tables
v) Binomial models
vi) Gamma models
vii) Survival models and censored data
viii) Other statistical models and techniques
The current contents of the macro library is listed in table 1. All
subfile
names and main macro names are given. This information can also be
obtained
interactively during a GLIM session by echoing the contents of the
first
subfile (INFO) as it is read in by GLIM. The GLIM statements needed
to do
this are:
$ECHO $INPUT %PLC 80 INFO $ECHO
<PAGE>
TABLE 1
Current Contents of GLIM 3.77 Macro Library - Release 1.0
----------------------------------------------------------------------------Subfile name
============
Macro name
==========
Description
===========
i) Data description, exploration and display
SUMM
STEM
SMOO
SUMM
STEM
SMOO
Summary statistics of a variate
Stem and Leaf plots
Tukey smoothing of a variate
ii) Statistical utilities
CHIP
CHIP
Chi-squared probability
QPLOT
STAN
JACK
TNOR
WDASH
Normal probability plotting
Standardised residuals
Jack-knife residuals
Test for normality of variate by
chi-squared goodness of fit test
Shapiro - Francia W' test for
RSQ
TVAL
LEV
BOXCOX
R- squared statistic
t-values of parameter estimates
Leverage values
Box-Cox transformation family on y-
BOXFIT
Box-Cox transformation for fixed
PRESS
Prediction error sum of squares
iii) Normal models
QPLOT
QPLOT
QPLOT
TNOR
TNOR
normality.
NORMAC
NORMAC
LEV
BOXCOX
variate
BOXCOX
lambda
PRESS
iv) Poisson models and contingency tables
v) Binomial models
vi) Gamma models
vii) Survival models and censored data
WEIB
WEIB
WEIB
RESP
Fitting the Exponential and Weibull
distributions to censored data.
Residual plotting after use of macro
WEIB
viii) Other statistical models and techniques
TUNI
against
TUNI
Test for distribution of variate
Uniform distribution by chi-square
goodness
of fit test.
----------------------------------------------------------------------------<PAGE>
3. Reading the contents of a macro library subfile into a GLIM session.
-------------------------------------------------------------------The GLIM macro library has been automatically assigned to a
specific
FORTRAN channel. In general, this channel number will vary over
different
machine ranges and installations of GLIM. The macro library channel
number
for your particular installation of GLIM may be found by issuing the
GLIM
statement:
$ENV C$
A system scalar, %PLC, (i.e. program library channel) is also
available and
contains the channel number to which the macro library has been
assigned.
The contents of a subfile can be read in by issuing the GLIM statement
$INPUT %PLC 80 <subfile name> $
For example, the subfile containing the Box-Cox transformation
macros is
called BOXCOX. The macros contained in this subfile can be read into
GLIM by
using
$INPUT %PLC 80 BOXCOX$
The contents of more than one subfile can be read in by issuing a
series of
$INPUT directives. e.g.
$INPUT %PLC 80 NORMAC$
$INPUT %PLC 80 BOXCOX$
or, more simply
$INPUT %PLC 80 NORMAC BOXCOX$
Note that GLIM reads the macro library sequentially. Therefore, for
greatest
efficiency, the subfile names should be specified in the order in
which the
subfiles are stored (given in Table 1).
The contents of a subfile may be displayed as it is being read
by
ensuring that the $ECHO facility is switched on before reading the
subfile.
in
$ECHO
$INPUT %PLC 80 BOXCOX$
$ECHO
Each subfile contains a short description of its contents together
with
condensed documentation on the use of the main macros contained within
the
subfile. This information is stored at the start of each subfile. The
$ECHO
facility therefore provides a way of obtaining on-line documentation
on the
use of the various macros within the library.
<PAGE>
4. Passing data to a GLIM library macro.
------------------------------------Your data can be passed to a GLIM macro in many ways. Some ways of
passing
data to a macro are outlined below; each of these methods has been
used in
at least one macro in the macro library. The documentation indicates
the
method or methods required for each macro.
a) Formal Arguments
Formal arguments to a macro are specified either by using the
$ARGUMENT
directive before the macro is used or, alternatively, by specifying
the
arguments as part of the $USE directive. For example, if a macro M
needs
two formal arguments, the first a vector and the second a scalar,
then the
macro is used with the first argument set to the vector V and the
second
argument set to the scalar %A as follows:
$ARG M V %A $USE M$
or
$USE M V %A$
b) Macro Arguments
Macro arguments to a library macro are the most suitable way of
passing
text information to the library macro. The text might be a model
formula,
a variate name or a calculate expression.
Macro arguments are simply macros of the required names which have
been
set up by the user and which contain the required information
stored as
text.
The user needs to declare all macro arguments before calling
the
library macro.
Examples of suitable declarations are:
$MACRO MODEL AGE*REGION $ENDM
$MACRO ARG1 V $ENDM
$MACRO PRINT 0 $ENDM
c) Scalar Arguments
Sometimes certain scalars may need to be set before the macro is
called.
These are referred to here as scalar arguments. An example of this
is the
macro TUNIFORM in subfile TUNIFORM. This macro checks and acts upon
the
values of the scalars %U and %V.
d) Macro prompting
Certain macros may prompt the user for information while the
macro is
being executed. The information asked for will always be numeric and
never
textual. If any non-numeric answer is given, then the macro will
fail.For
example, the macro BOXFIT prompts the user as follows:
Value of lambda?
$DIN?
<PAGE>
5. Macro Library conventions.
-------------------------a) Locally declared vectors
Many
macros
in the macro library need to declare vectors local
to
that
macro.
Some
of these vectors may be useful to the user on exit from
the
macro.
that all
In the GLIM macro library,
a convention has been adopted
vectors declared within a macro have names which end with the
underline
symbol (_). This convention minimises the possibility of the
names of
these locally declared vectors clashing with the names of any vectors
set
up by the user. The user should therefore avoid choosing names for
vectors
which end with the underline symbol (e.g. OBS_ ,EXP_)
All
locally
declared vectors are deleted on sucessful completion of
the
macro, unless the documentation specifies otherwise. Some macros
have the
option to keep certain useful
vectors in the workspace (see
the
documentation for each macro)
b) Scalars
Most macros in the GLIM macro library use the macro library scalars
(%Z1,
%Z2,...,
%Z9) for temporary storage of scalars.
Some macros
may
additionally use some ordinary scalars (%A, %B,..., %Z) and these
may
contain scalar values potentially useful to the user. If any
ordinary
scalars are used by a library macro, then this is documented,
together
with their contents, if useful.
c) Space recovery.
Library macros occupy part of the data space and it may be
necessary to
reclaim this space after the macro has been used. To facilitate this,
some
subfiles contain an additional macro called DEL;
the text of this
macro
is, of course, different for each subfile.
After execution of
the
required library macro, $USE DEL$ will delete all identifiers from
that
subfile, with the exception of DEL itself.
If a different subfile
were
then to be read in, DEL could be left, as it would then be
overwritten;
otherwise it could be deleted. The directive $PRINT DEL$ is a
further way
of obtaining the list of macro identifiers used in a subfile.
<PAGE>
6. Error messages and reporting.
---------------------------Many library macros produce their own error and information messages;
these
are hopefully self-explanatory when read together with the
documentation.
All macros may fail with a GLIM error message in certain circumstances.
Common causes are:
- failure to set or to correctly specify all necessary
to a
macro,
- exceeding the maximum number of identifiers
- exceeding the size of the workspace
arguments
Some library macros use the $OUTPUT$ directive to switch off
unnecessary
output (for instance, iterative macros using the $FIT directive). If
such a
library macro should fail (or if the user should break-in to it) then
when
control returns to the user, output to the primary output channel may
still
be switched off, and the user will seemingly get no response when
further
GLIM statements are entered. If this should happen, then
$OUTPUT %POC $
will set the output back to its default setting.
If unexplained errors are encountered when using the library macros
and any
local support available fails to locate the problem, then a copy of
the
transcript file which reproduces the error should be sent either to
The GLIM coordinator,
NAG Central Office,
Mayfield House,
256, Banbury Road,
OXFORD OX2 7DE
or to
Brian Francis
GLIM macro library editor
Centre for Applied Statistics
Cartmel College
University of Lancaster
Lancaster LA1 4YL
The transcript file should contain the GLIM directive $ENV I$ together
with
its output. Details of the machine on which GLIM is installed
the
operating system being used should also be provided e.gs.
IBM PC-XT 512K RAM running under PC/DOS,
VAX 11/780 running under VAX/VMS 3.7 etc.
<PAGE>
GLIM macro library Release 1.0
------------------------------
January 1985
------------
Macro Library Documentation
--------------------------<PAGE>
Macro SUMMARY in subfile SUMMARY
Author
B. J. Francis, Centre for Applied Statistics,
Cartmel College, University of Lancaster, U.K.
Purpose
The macro SUMMARY evaluates and prints a number of
simple descriptive statistics of a vector.
Formal Arguments
and
%1
The name of a variate or factor.
%2
(optional). A scalar.
If the second argument is not set or is set and equal
to zero, then the summary statistics are not saved.
If the second argument is set and not equal to zero then
the summary statistics are saved.
Macro Arguments
-NoneUses
MAX_,
Within the macro,
MIN_,
scalar %Z and variates MEA_,
VAR_,
MED_, TOT, SDV_, P25_, P75_ and RAN_ (all of length
1)
are used. The variates can be optionally saved.
Subsidiary macros
Macro SUMMARY calls macros
Output
statistics.
The
SUM1 and NOTS.
macro will evaluate and print the following
On exit,
scalar %Z will contain the length of the variate
or
factor.
In addition,
if the second formal argument has
been
set
to a non-zero value,
then the following variables
will
contain values:
MEA_
VAR_
SDV_
MIN_
MAX_
RAN_
P75_
MED_
P25_
-
Mean Value.
Variance.
Standard deviation.
Minimum value.
Maximum value.
Range
75% ile value.
Median ( 50% ile).
25% ile value.
All statistics are evaluated by using the $TABULATE
directive
and users should consult the GLIM3.77 update manual for
exact
definitions.
<PAGE>
Example Input
$UNIT 1000$
$CAL X=%SR(0)$
$INPUT %PLC 80 SUMM$
$USE SUMM X$
Example Output
Summary statistics for X
Length
1000.
Maximum 0.9975
Minimum 0.0023
Range 0.9952
Mean 0.5003
Total
500.3
Variance 0.0842 Std. Dev. 0.2902
1st Quartile 0.2474
Median 0.5083 3rd Quartile
0.7466
<PAGE>
Macro STEM in subfile STEM
Author
R. A. Reese, Computer Centre, Hull University, U.K.
Purpose
The macro STEM forms a "stem and leaf" plot of the values
of a vector (ref. Exploratory Data Analysis, J. W. Tukey
1977 Addison-Wesley). The interval size, which controls
the breakpoint in each value between stem and leaf, and the
formats using which the stem and the leaf are printed may
be provided as optional arguments or the default values
may be used.
Formal Arguments
-noneMacro Arguments
ARG1
This macro is obligatory and should contain the name
of a variate or a vector expression. The vector may
be of any length.
ARG2
(optional)
Contains a positive value that
defines the interval between successive stems. The default
value is 10. This means that the units digit and
fractional part of each value will be split off as the
leaf. Tukey shows the use of values of 10, 100 and 5 in
examples. In general, the best plots (in the sense of
being easiest to interpret) are obtained if the values (of
ARG1) have been scaled to be integers and a positive power
of 10 is used for ARG2 but, where the number of values
shown on each line is too great, the values can be split
over more than one line by setting ARG2 to the next lower
power of 10 multiplied by 2 or 5.
ARG3
(optional)
Contains an integer to define the
format for the stem values. The default is -1, meaning
that they are to be printed as integers in the smallest
possible number of characters.
ARG4
(optional)
Contains an integer to define the format for
the
leaf values.
The default is 2, to avoid rounding to
integers
which look as if they are then on the wrong line but,
if
the
leaf values are integer,
-1
will pack them closer and
allow
more values to be displayed.
Any or all of ARG2 to ARG4 may be reset before or between
calls of STEM.
Uses
Within the macro, vectors TMP_ and LIN_ are used.
<PAGE>
Subsidiary macros
Macro STEM calls or uses macros SL1, SL2, SL3, SL4, SL5,
ARG1, ARG2, ARG3, ARG4 and PERR.
ARG1 is initially defined as empty.
ARG2 to ARG4 are given default settings as described above.
Output
The macro outputs the stem and leaf plot only.
Where one or more successive stem values would appear on a
line with no leaves, they are not printed but a single row
of dots is output to indicate the jump in stem value. This
is not precisely after Tukey but is necessary to avoid
plots of excessive height.
Further Information
The stem and leaf plot is a good way of displaying values
so as to show their distribution. When some of the numbers
are negative, the method cannot be immediately applied
(which line would 0 itself go on?), so the macro adopts the
conventions that
(1) If all the numbers are negative, the sign is ignored
and the absolute values are plotted
and
(2) If the values straddle 0, then a constant is added
to
each, so as to make them all greater than 0, and
these
shifted values are plotted.
In
either case,
with
the plot.
an explanatory message will be output
Example Input
$UNIT 23
$DATA OBS $READ ! Tukey P 212 Coal Production.
569 416 422 565 484 520 573 518
501 505 468 382 310 334 359 372
439 446 349 395 461 511 583
$INPUT %PLC 80 STEM $
$M ARG1 OBS $E
$USE STEM$
! ARG2, ARG3 and ARG4 defaults used
<PAGE>
Example Output
Stem and leaf plot of OBS
31.
...
33.
34.
35.
...
37.
38.
39.
...
41.
42.
43.
44.
...
46.
...
48.
...
50.
51.
52.
...
56.
57.
58.
-
in steps of
10.00
0.
-
4.0
9.0
9.0
-
2.0
2.0
5.0
-
6.0
2.0
9.0
6.0
-
1.0
-
4.0
-
1.0
1.0
0.
5.0
8.0
-
5.0
3.0
3.0
9.0
8.0
Number of values: 23
<PAGE>
Macro SMOOTH in subfile SMOOTH
Author
R. A. Reese, Computer Centre, Hull University, U.K.
Purpose
SMOOTH provides a number of methods for smoothing the
values of a variate and allows for the easy addition of
other methods or refinement in dealing with end values.
Macro arguments
ARG1
(obligatory) The text of the macro ARG1 should contain
the name of the variate to be smoothed. This variate will
be overwritten by the smoothed values calculated by
the
macro.
The vector may be of any length.
Uses
ARG2
(optional) If set by the user, macro ARG2 must contain
the name of a macro.
Note the substitution character must be present. By
default it is set as #KMEAN, but of course the default may
be changed locally. #KMEAN replaces the values of ARG1 by
running means. The other methods provided as macros are
three term running median ( #MEDIAN )
Tukey's 3R repeated median with end smoothing ( #T3R )
and Hanning ( #HANN ).
#KMEAN, #MEDIAN and #HANN copy the end values (actually,
they never alter them) and #T3R applies #MEDIAN until the
values do not change and then chooses values for the end
points.
ARG3
(optional) Macro ARG3 is used only if #KMEAN is the
method and is an integer to define the number of terms to
be included in the running mean. If ARG3 has the value K,
then 2K+1 term running means will be substituted for all
except the first and last K values, which will be
unchanged. ARG3 has the default value 1 to give three
terms in each mean.
Within the macro, ordinary scalars %Y and %Z and
vectors IND_, TMP_ and CPY_ are used.
Subsidiary macros
Macro SMOOTH calls or uses macros SMOOTH, KMEAN, HANN,
MEDIAN, T3R, KME1, ARG1, ARG2 and ARG3.
ARG1 is initially defined as empty. ARG2 and ARG3 are, as
described above, initially set to give three term running
mean smoothing and it is necessary for the user to redefine these only to choose another method. If another
algorithm is provided, its name would be given as the text
of ARG2.
Output
The only output is a confirmatory message.
<PAGE>
Example Input
$UNIT 23
$DATA OBS $READ
! Tukey P 212 Coal Production.
569 416 422 565 484 520 573 518
501 505 468 382 310 334 359 372
439 446 349 395 461 511 583
$INPUT %PLC 80 SMOOTH $
$M ARG1 O $E
!We will use all possible smoothings
$CALC O=OBS $USE SMOO $CALC S1=O
$M ARG2 #MEDIAN $E
$CALC O=OBS $USE SMOO $CALC S2=O
$M ARG2 #T3R $E
$CALC O=OBS $USE SMOO $CALC S3=O
$M ARG2 #HANN $E
$CALC O=OBS $USE SMOO$
$PR '
OBS
MEANS MEDIANS
3R
HANNED';$
$LOOK(S=-1) OBS S1 S2 S3 O $
Example Output
OBS
MEANS
MEDIANS
569.0
416.0
422.0
565.0
484.0
520.0
573.0
518.0
501.0
505.0
468.0
382.0
310.0
334.0
359.0
372.0
439.0
446.0
349.0
395.0
461.0
511.0
583.0
569.0
469.0
467.7
490.3
523.0
525.7
537.0
530.7
508.0
491.3
451.7
386.7
342.0
334.3
355.0
390.0
419.0
411.3
396.7
401.7
455.7
518.3
583.0
569.0
422.0
422.0
484.0
520.0
520.0
520.0
518.0
505.0
501.0
468.0
382.0
334.0
334.0
359.0
372.0
439.0
439.0
395.0
395.0
461.0
511.0
583.0
3R
422.0
422.0
422.0
484.0
520.0
520.0
520.0
518.0
505.0
501.0
468.0
382.0
334.0
334.0
359.0
372.0
439.0
439.0
395.0
395.0
461.0
511.0
583.0
HANNED
569.0
455.7
456.2
509.0
513.2
524.2
546.0
527.5
506.2
494.7
455.7
385.5
334.0
334.2
356.0
385.5
424.0
420.0
384.7
400.0
457.0
516.5
583.0
<PAGE>
Macro CHIP in subfile CHIP
Author
R. A. Reese, Computer Centre, Hull University, U.K.
Purpose
The text of CHIP is an expression that evaluates the uppertail probability of a chi-squared variate. This expression
is incorporated into macros TUNIFORM and TNORMAL but is not
separately usable from those macros.
Formal Arguments
%1
%2
The value of the chi-squared statistic.
The appropriate number of degrees of freedom.
Macro arguments
-NoneUses
This macro uses no named structures or variables.
Subsidiary macros
Macro CHIP calls or uses no subsidiary macros.
Output
The macro is an expression, so must be used in a $CALCULATE
directive. If the value is not assigned, it will be output
by the directive.
eg
$CAL #CHIP $
Otherwise, you may store it.
eg
$CAL %P=#CHIP $
Example Input
$CAL %A=3.84 : %B=1$
$INPUT %PLC 80 CHIP$
$ARG CHIP %A %B$
$CAL #CHIP$
Example Output
0.05004
<PAGE>
Macro QPLOT in subfile QPLOT
Author
Purpose
plot
B. J. Francis, Centre for Applied Statistics,
Cartmel College, University of Lancaster, U.K.
The
macro
(normal
QPLOT produces a
plot) of raw,
Normal
quantile-quantile
standardised or jack-knife
residuals.
The
quantile-
macro
can also be used to produce
a
Normal
quantile
plot of a variate.
Alternative expressions for
residual
calculation may be defined by the user.
the
The
macro
takes
account of any prior weights being used to weight
out
observations.
If prior weights are set,
only those
values
for
which
the prior weight is non-zero will be used in
the
production
of the plot.
Prior weights with non-zero
values
will be taken to have a prior weight of one for the
purposes
of this macro.
A
general expression
for
the
quantiles
of
the
Normal
distribution
is
allowed.
For
0<=k<1,
the
quantiles
are
calculated using the formula %ND((i-k)/(n+1-2k)) for
i=1...n.
Common values of k are 0 and 0.5.
Quantiles given by Filliben(1975), (which set k to .3175,
but
have
adjusted
first
and last quantiles)
are
provided
by
default.
If
these
are
used,
the
Filliben
correlation
coefficient,
which
calculates the correlation
between
the
sorted
values
and
the
Normal
quantiles,
is
given.
The
significance
of this correlation may be assessed by a
table
provided in Filliben.
Formal Arguments
%1
(optional)
The
name
of a macro which
gives
the
type
of
residuals required. Valid macro names are:
RAW
raw residuals (default)
STAN
standardised residuals
JACK
jack-knife or cross-validatory residuals
The
name
or
a macro containing the name of
of
a
variate
of
standard length
The
or
name of a macro containing an alternative expression
for
the calculation of residuals.
%2
(optional) a scalar.
If not set, or if set and its value lies outside the
range
[0,1),
then the Normal quantiles given by Filliben are
used.
The
Filliben
correlation
coefficient
is
calculated
and
displayed.
If set, and
the value of the scalar lies within the
range
[0,1),
then
this
is taken as the value of k in
the
above
expression.
The
Filliben
correlation
coefficient
is
not
calculated or displayed.
<PAGE>
Macro Arguments
-NoneUses
are
Within the macro,
vectors PW_,
IND_,
RES_,
R_ and I_
used.
Subsidiary macros
Macro QPLOT calls or uses macros
QP1,
QP2,
QP3,
QP4,
QP5,
RAW, STAN, JACK and NOTSET.
Output
the
The macro will produce a Normal quantile-quantile plot of
sorted
residuals
or
sorted values of
the
vector
plotted
against their equivalent Normal deviate. If prior weights
are
set,
then a message will be output informing the user of
the
number of units used in the production of the plot. If
formal
argument %2 is not set, or lies outside the range [0,1),
then
the
Filliben
correlation
coefficient
for
the
plot
is
displayed and is stored in scalar %C.
Further information
1. Macros
RAW,
STAN
and JACK can be used in
a
$CALCULATE
statement
to
to enable the unsorted residuals from a fit
be stored.
Macros STAN and JACK require the system vector %VL, so
the
GLIM directive $EXTRACT %VL$ must be used.
Macro JACK calls macros STAN and RAW.
Macro STAN calls macro RAW.
Examples:
$CALC RRES=`RAW
$EXT %VL $CALC SRES=`STAN
2. The macro will produce uninformative residual plots if
the
number of points weighted in is close to ,equal to or
less
than the number of parameters being fitted in the model.
3. The macros take account of any $OFFSET or $SCALE
parameter
settings
in the calculation of
residuals.
Macro
QPLOT
takes
account
of points weighted out of a fit by use
of
prior weights in the manner desribed above.
4. If
prior
weights are set and standardised
or
jackknife
residuals are being plotted,
the macro will calculate
the
residual expression for all observations, and not just
for
those observations weighted in to the fit.
In this
case,
the GLIM warning message
--- Invalid function/operator arguments
may be produced. This will be caused by
observations
weighted out of the fit, and will not therefore affect
the
Normal plot.
5. QPLOT deletes the system vector %RE.
<PAGE>
Restrictions 1. If the macro QPLOT is used to produce a Normal
quantilequantile plot of a variate, then that variate must be
of
standard length.
References
coefficient
Filliben(1975)
'The probability plot correlation
test for Normality', Technometrics,17,pages 111-117
Example Input
$C
BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS
FOR MALE AND FEMALE BABIES (FROM DOBSON P.14)
$UNITS 24 $DATA AGE WEIGHT $READ
40 2968 38 2795 40 3163 35 2925 36 2625 37 2847
41 3292 40 3473 37 2628 38 3176 40 3421 38 2975
40 3317 36 2729 40 2935 38 2754 42 3210 39 2817
40 3126 37 2539 36 2412 38 2991 39 2875 40 3231
$CAL SEX=%GL(2,12)$FACT SEX 2$
$C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE
SECOND FOR FEMALES.
$YVAR WEIGHT$FIT AGE+SEX$
$INPUT %PLC 80 QPLOT$
$USE QPLOT STAN$
! Q-Q plot of standardised residuals
$M VECT WEIGHT $E
$CALC %A=0$
$USE QPLOT VECT %A$ ! Q-Q plot for vector WEIGHT using k=0
<PAGE>
Example Output
deviance = 658771.
d.f. =
21
Normal Q-Q plot (STAN)
2.200 |
2.000 |
1.800 |
1.600 |
1.400 |
1.200 |
1.000 |
0.800 |
0.600 |
0.400 |
0.200 |
0.000 |
-0.200 |
-0.400 |
-0.600 |
-0.800 |
-1.000 |
+
+
+
+ +
+++
+
++
+++ +
+
+++
+ +
-1.200 |
+
-1.400 |
+
-1.600 |
+
----------:---------:---------:---------:---------:---------:---------:
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
Filliben correlation coefficient equals 0.9718
Normal Q-Q plot (VECT)
3540.0 |
3480.0 |
+
3420.0 |
+
3360.0 |
3300.0 |
+ +
3240.0 |
+
3180.0 |
+++
3120.0 |
+
3060.0 |
3000.0 |
++
2940.0 |
++ +
2880.0 |
+
2820.0 |
+++
2760.0 |
+
2700.0 |
+
2640.0 |
+ +
2580.0 |
2520.0 |
+
2460.0 |
2400.0 |
+
----------:---------:---------:---------:---------:---------:---------:
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
<PAGE>
Macros TNORMAL and WDASH in subfile TNORMAL
Author
R. A. Reese, Computer Centre, Hull University, U.K.
Purpose
The macro TNORMAL tests the distribution of a variate
against a theoretical normal distribution, using both a
Shapiro-Francia W' test and a chi-squared goodness of fit
test
based on dividing the range into equal probability
intervals. The parameters of the theoretical distribution
may be supplied or evaluated from the sample. In the
latter case, the chi-squared statistic is biased. If only
the W' test is required, WDASH should be called in place of
TNORMAL.
The number of intervals is selected to give at least five
expected observations in each partition, subject to a
maximum of twenty intervals.
Formal Arguments
-NoneMacro Arguments
ARG1
name
(obligatory) The text of the macro ARG1 must contain the
of a vector or a vector expression. The vector may be of
any
length.
PRINT
(Optional). Can contain a scalar or a number. If the
scalar
or number is not zero (default),
then the result of the
test
will be printed on the current output channel. If macro
PRINT
contains a number or scalar which evaluates to zero, then
the
results will be saved and not printed.
Scalar Arguments
%X,%Y
Uses
For TNORMAL (but not WDASH), if the value of
ordinary scalar %X is greater than zero, then %Y will be
used as the mean and %X as the variance of the theoretical
normal distribution. Otherwise, the sample mean and
variance will be used, with consequent loss of degrees of
freedom.
Within the macros, ordinary scalars %P, %T, %U, %V, %W, %X,
%Y and %Z and variates OBS_, EXP_ and PCH_ are used.
Subsidiary macros
Macros TNORMAL and WDASH call or use macros TN1, TN2, TN3,
TN4 and PRINT.
ARG1 is initially defined as empty.
<PAGE>
Output
If the number of observations is within the range 5 to
1000, the Shapiro-Francia W' test (Royston 1983) will be
applied first and the macro will print the test statistic
and the approximate probability of obtaining that value if
sampling from a normal distribution.
If
the macro argument PRINT is set to a non-zero value
then
the
macro
will print a table of the observed
and
expected
frequencies within each interval of the range titled with
the
name
of
the variable (see example),
the parameters of
the
theoretical distribution
and the chi-squared test
statistic.
If WDASH is called, the W' test statistic and probability
will be output followed by an optional normal plot.
On exit, the following system identifiers will contain
values:
%P
%U
%V
%W
%X
%Y
%Z
-
probability of W' or chi-squared statistic.
overall chi-squared value.
number of intervals.
standard deviation of distribution.
variance of distribution.
mean of distribution.
degrees of freedom for significance test.
If the macro argument PRINT is set to zero then
additionally
the first %V values of the following
OBS_
EXP_
PCH_
References
vectors will contain:
- observed frequency in each interval.
- expected frequency in each interval.
- contribution to chi-squared from each interval.
J P Royston (1983) A Simple Method for Evaluating the
Shapiro-Francia W' Test.
The Statistician Vol 32, No 3, p 297.
(but note that the example in the paper is evaluated not
using Blom's normal order values.)
Example Input
$UNIT 23$CAL X=%SR(0)$
$INPUT %PLC 80 TNORMAL $
$M ARG1 X $E
$CALC %X=0$
$USE TNOR$
<PAGE>
Example Output
Sample estimates used as parameters. - df adjusted.
Shapiro and Francia's W' test (ref. Royston, STATISTICIAN 1983)
small W and P values suggest non-normality
test value (W') 0.9595
P= 0.3886
Test of distribution of X
against Normal.
observed expected partitioned
freq
freq
chi-squared
8.000
4.000
5.000
6.000
5.750
5.750
5.750
5.750
0.88043
0.53261
0.09783
0.01087
mean (%y)
0.5189
chi-squared (%u)
st. dev. (%w)
0.2907
1.522 with (%z) 1 df
P= (%P)
0.217
<PAGE>
Macro RSQ in subfile NORMAC
Author
Purpose
the
M. Slater, Dept of Computer Science and Statistics,
Queen Mary College, Mile End Road, LONDON E1
The
macro
RSQ displays the R-squared
coefficient
for
current fit.
Formal Arguments
-None
Macro Arguments
-NoneUses
Within the macro, scalar %R is used.
Subsidiary macros
-NoneOutput
The macro prints the value of the R-squared coefficient,
which is also stored in scalar %R.
Further information
1. The
macro
takes
account
of
any
use
of
the
$OFFSET
directive but ignores any use of prior weights.
Example Input
$C
BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS
FOR MALE AND FEMALE BABIES (FROM DOBSON P.14)
$UNITS 24 $DATA AGE WEIGHT $READ
40 2968 38 2795 40 3163 35 2925 36 2625 37 2847
41 3292 40 3473 37 2628 38 3176 40 3421 38 2975
40 3317 36 2729 40 2935 38 2754 42 3210 39 2817
40 3126 37 2539 36 2412 38 2991 39 2875 40 3231
$CAL SEX=%GL(2,12)$FACT SEX 2$
$C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE
SECOND
FOR FEMALES.
$YVAR WEIGHT$FIT AGE+SEX$
$INPUT %PLC 80 NORMAC$
$USE RSQ$
Example Output
deviance = 658771.
d.f. =
21
R-squared equals
0.6400
<PAGE>
Macro TVAL in subfile NORMAC
Author
M. Slater, Dept of Computer Science and Statistics,
Queen Mary College, Mile End Road, LONDON E1
Purpose
The
the
macro TVAL computes, displays and optionally stores
t-test values for a Normal model. These are produced
by
dividing
the
vector
of
parameter
estimates
by
the
corresponding
vector of the standard errors of the
parameter
estimates.
Formal Arguments
-NoneMacro Arguments
-NoneUses
Within the macro,
vectors TV_ and IND_
are used.
Subsidiary macros
-NoneOutput
fitted
The
macro
will
display the t-test values
for
the
normal model. The t-test values are not labelled but they
are
displayed
in the same order as the parameter
estimates
and
standard
errors
directive
$DIS
are displayed in the output from
the
GLIM
E$.
The t-values are stored in the
variate
TV_, which has length %PL.
Further information
1. The macros take account of any $OFFSET,
$SCALE or
$WEIGHT
settings.
<PAGE>
Example Input
$C
BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS
FOR MALE AND FEMALE BABIES (FROM DOBSON P.14)
$UNITS 24 $DATA AGE WEIGHT $READ
40 2968 38 2795 40 3163 35 2925 36 2625 37 2847
41 3292 40 3473 37 2628 38 3176 40 3421 38 2975
40 3317 36 2729 40 2935 38 2754 42 3210 39 2817
40 3126 37 2539 36 2412 38 2991 39 2875 40 3231
$CAL SEX=%GL(2,12)$FACT SEX 2$
$C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE
SECOND
FOR FEMALES.
$YVAR WEIGHT$FIT AGE+SEX$INPUT %PLC 80 NORMAC$
$D E$
$USE TVAL$
Example Output
deviance = 658771.
d.f. =
21
1
2
3
estimate
-1447.
120.9
-163.0
s.e.
784.3
20.46
72.81
scale parameter taken as
parameter
1
AGE
SEX
31370.
T values
+----------+
|
TV_
|
+---+----------+
| 1 | -1.845 |
| 2 |
5.908 |
| 3 | -2.239 |
+---+----------+
<PAGE>
Macro LEV in subfile LEV
Author
M. Slater, Dept of Computer Science and Statistics,
Queen Mary College, Mile End Road, LONDON E1
Purpose
(or
The
macro LEV calculates and stores the leverage values
influence
values) after a fit of a Normal model.
The
macro
also
displays a plot of the leverage values plotted
against
observation number.
Formal Arguments
-NoneMacro Arguments
-NoneUses
are
Within the macro,
vectors PW_, LEV_, LIM_, IND_ and LWT_
used.
Subsidiary macros
Macro LEV calls or uses macros
Output
against
LEV1 and LEV2.
Displays an index plot of the leverage values plotted
observation
number.
Observations weighted in to the fit
are
plotted with a plus symbol (+), whereas observations
weighted
out
of
the
fit are plotted with a dot (.)
.
Hoaglin
and
Welsch's criterion for detecting influential points is
2p/n,
where
p is the number of parameters,
and n is the number
of
observations
included
in the fit.
This criterion
is
also
calculated,
and is displayed on the plot as a line of
dashes
(-). Points labelled with a plus symbol lying above this
line
have high influence.
On
exit
from
the macro, scalar %Z contains the value
2p/n.
Additionally, the following variates contain values:
LEV_
LWT_
out
contains the leverage values
contains a vector which can be used to weight
from the fit points of high influence (i.e.
those
points
whose influence values are
greater
than
%Z).
LWT_
is set to zero if the observation
was
previously weighted out or if the point has
high
influence, and set to 1 otherwise.
<PAGE>
Further information
1. The macros take account of any $OFFSET or $SCALE
parameter
settings in the calculation of leverage values.
Macro
LEV
also
takes account of points weighted out of a fit in
the
manner
desscribed
above.
Prior
weights
with
non-
zero
values will be taken to have a prior weight of one for
the
purposes of this macro.
2. LEV deletes the system vector %RE.
References
in
Hoaglin,
D.
and
Welsch,
R.
(1978)
'The
hat
matrix
regression and ANOVA' ,American Statistician, 32,pp 17-22
and
132
Cook,
R and Weisburg,
S (1982), 'Residuals and influence
in
regression', Chapman and Hall, London
Example Input
$C
BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS
FOR MALE AND FEMALE BABIES (FROM DOBSON P.14)
$UNITS 24 $DATA AGE WEIGHT $READ
40 2968 38 2795 40 3163 35 2925 36 2625 37 2847
41 3292 40 3473 37 2628 38 3176 40 3421 38 2975
40 3317 36 2729 40 2935 38 2754 42 3210 39 2817
40 3126 37 2539 36 2412 38 2991 39 2875 40 3231
$CAL SEX=%GL(2,12)$FACT SEX 2$
$C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE
SECOND FOR FEMALES.
$YVAR WEIGHT$FIT AGE+SEX$
$INPUT %PLC 80 QPLOT$
$USE LEV$
$CAL
$WEI
$FIT
$USE
W=SEX-1$
W$
.$
LEV$
<PAGE> Example Output
deviance = 658771.
d.f. =
21
Leverage plot
0.2600 |
0.2500 | - - - - - - - - - - - - - - - - - - - - - - - 0.2400 |
0.2300 |
+
0.2200 |
+
0.2100 |
0.2000 |
0.1900 |
0.1800 |
+
+
+
0.1700 |
0.1600 |
+
0.1500 |
0.1400 |
0.1300 |
0.1200 | +
+
+
+
+
0.1100 |
+
+
0.1000 |
+
+
+
+
0.0900 |
+
+
0.0800 |
+
+
+
+
+
0.0700 |
----------:---------:---------:---------:---------:---------:---------:
0.00
5.00
10.00
15.00
20.00
25.00
30.00
-- model changed
deviance = 248726.
d.f. =
10 from 12 observations
Leverage plot
--- weight variate is W with 12 units weighted out (.)
0.5250 |
0.5000 | - - - - - - - - - - - - - - - - - - - - - - - 0.4750 |
.
0.4500 |
0.4250 |
0.4000 |
0.3750 |
+
0.3500 |
0.3250 |
0.3000 |
.
+
+
0.2750 |
0.2500 |
0.2250 |
.
0.2000 |
0.1750 |
.
.
+
0.1500 |
0.1250 | .
.
.
.
+
+
+
+
0.1000 |
.
.
.
+
+
0.0750 |
+
+
0.0500 |
----------:---------:---------:---------:---------:---------:---------:
0.00
5.00
10.00
15.00
20.00
25.00
30.00
<PAGE>
Macro BOXCOX in subfile BOXCOX
Author
B. J. Francis, Centre for Applied Statistics,
Cartmel College, University of Lancaster, U.K.
Purpose
The
macro BOXCOX fits the Box-Cox transformation family to
a
selected y-variate and model over a grid of values of
lambda.
Formal Arguments
%1
(optional) A scalar.
If the first formal argument is set
and
not
zero,
then
the
variates
containing
the
values
of
-2
log(maximised
likelihood) (DEV_) and
the
corresponding
values
of lambda (LMB_) are saved in the workspace.
If
the
argument
is
not set,
or is set to a scalar which
has
the
value 0, then these variates are displayed but not saved.
Macro Arguments
YVAR
(obligatory)
The
untransformed y-variate name needs
to
be
stored in a macro called YVAR.
MODEL (obligatory)
The
model
formula to be fitted
needs
to
be
stored in a macro called MODEL.
Macro Prompting
The macro prompts for the required grid values of lambda.
The
maximum and minimum values of lambda,
together with the
grid
increment to be used, need to be specified. Suitable
starting
values are 2 (0.4) -2 .
Uses
vectors
Within
the macro,
scalars %D,
%L,
%N and %Z
and
INP_, YTR_, IND_, SYV_, DEV_, LMB_,
and PW_ are used.
Subsidiary macros
Macro BOXCOX calls or uses macros
YVAR,
MODEL,
BOX1,
BOX2,
BOX3, BOX4, BOX5, BOX6, BOX7, BOX8, BOX9 and BOXA.
Output
each
Calculates and displays -2*log(maximized likelihood) for
value
of lambda on the grid and produces a 'deviance'
plot.
Additionally,
if the first formal parameter is set to a
nonzero scalar, the following variates of length %Z are saved:
DEV_
Variate containing -2 log(maximized
likelihood)
for each value of lambda.
Variate containing corresponding grid values
LMB_
of
lambda.
The following scalars contain useful information:
%N
Number of points weighted in. Equal to %NU if
prior weights are not set.
%Z
Number of grid points for lambda. Length of
DEV_
and LMB_.
<PAGE>
Further information
1.
Macro
BOXCOX can also be used when GLIM is being used
in
batch
rather than interactive mode.
In true batch
mode,
there is no interaction with the user. However, the
values
of
the
maximum value of lambda,
increment
and
minimum
value
of lambda can still be passed to
the
macro.
They
should
be
placed in the input file in
order,
with
one
number on a line, following the call to BOXCOX:
$USE BOXCOX$
2
0.4
-2
The three numbers will then be read from the primary
input
channel by the macro.
2. Remember
for
that
a separate fit has to be carried
out
each
value
of
lambda on
the
grid.
Choosing
a
small
increment
value
will
produce
a large
number
of
grid
points and the macro will take a long time to produce
any
output.
3. The
macro
takes account of any $OFFSET
variate.
BOXCOX
also
takes
account of any prior weights
being
used
to
weight out observations.
If prior weights are set,
only
those
values for which the prior weight is non-zero
will
be
used
in
the calculation
of
the
likelihood.
Prior
weights with non-zero values will be taken to have a
prior
weight of one for the purposes of this macro.
BOXCOX finds the m.l.e of sigma for each fixed value
of
lambda on the grid, so any $SCALE setting is ignored.
4. BOXCOX deletes the system vector %RE.
5. As
the
macro
carries out many
$FITs,
the
$OUTPUT
is
switched
off
for part of the
execution.
If
the
macro
fails,
for instance,
model,
or
because of an incorrectly
specified
if break-in is used,
the output may still
be
switched
off
when
control returns to the
user
at
the
terminal.
See
section 6 of the macro library
description
for appropriate remedial action.
6. On normal exit from the macro:
The $TRANSCRIPT options will be reset to the
default.
The GLIM $YVARiate is set to the untransformed
y-variate specified in macro YVAR.
The model formula is set to the model formula
specified
in macro MODEL.
7. Macro
BOXCOX produces more than one screenful of
output.
To
a
examine the output from the macro one screenful
at
time, the $PAGE directive may be issued before the call
to
BOXCOX.
<PAGE>
Restrictions
1. Macro
prompting works only when free format is in
force.
If
the
$FORMAT
directive
has
been
used
to
read
in
formatted data, then the format setting should be reset
by
the GLIM directive:
$FORMAT FREE$
2. The
macro
will
work only for y-variates which
have
no
negative or zero values (as transformations such as log
or
square root may be included).
References
Box, G. & Cox, D.(1964), 'An analysis of
transformations',
JRSSB, 26 ,211-252
Example Input
$C
BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS
FOR MALE AND FEMALE BABIES (FROM DOBSON P.14)
$UNITS 24 $DATA AGE WEIGHT $READ
40 2968 38 2795 40 3163 35 2925 36 2625 37 2847
41 3292 40 3473 37 2628 38 3176 40 3421 38 2975
40 3317 36 2729 40 2935 38 2754 42 3210 39 2817
40 3126 37 2539 36 2412 38 2991 39 2875 40 3231
$CAL SEX=%GL(2,12)$FACT SEX 2$
$C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE
SECOND
FOR FEMALES.
$INPUT %PLC 80 BOXCOX$
$MAC MODEL AGE+SEX $END
$MAC YVAR WEIGHT $END
$USE BOXCOX$
Max. value of lambda?
$DIN? 2
Increment?
$DIN? 0.4
Min. value of lambda?
$DIN? -2
<PAGE>
Example Output
--- Model is AGE+SEX
--- Y-Variate is WEIGHT
+----------:---------:---------:---------:---------:---------:---------:
| 324.160 |
|
| 324.000 |
*
|
| 323.840 |
|
| 323.680 |
|
| 323.520 |
|
| 323.360 |
*
|
| 323.200 |
|
| 323.040 |
|
| 322.880 |
|
| 322.720 |
*
|
| 322.560 |
|
| 322.400 |
|
| 322.240 |
*
|
| 322.080 |
*
|
| 321.920 |
*
|
| 321.760 |
*
|
| 321.600 |
*
*
*
|
| 321.440 |
*
|
|----------:---------:---------:---------:---------:---------:---------:
|
-2.400
-1.600
-0.800
0.000
0.800
1.600
2.400
+----------------------------------------------------------------------+
+----------------------+
|
DEV_
|
LMB_
|
+----+----------------------+
| 1 |
324.0 | -2.0000 |
| 2 |
323.3 | -1.6000 |
| 3 |
322.7 | -1.2000 |
| 4 |
322.2 | -0.8000 |
| 5 |
321.9 | -0.4000 |
| 6 |
321.6 |
0.0000 |
| 7 |
321.5 |
0.4000 |
| 8 |
321.5 |
0.8000 |
| 9 |
321.6 |
1.2000 |
| 10 |
321.8 |
1.6000 |
| 11 |
322.1 |
2.0000 |
+----+----------------------+
<PAGE>
Macro BOXFIT in subfile BOXCOX
Author
Purpose
value
B. J. Francis, Centre for Applied Statistics,
Cartmel College, University of Lancaster, U.K.
The
of
(or
macro BOXFIT fits the chosen model for a selected
lambda
using
the transformed
y-variate
y**lambda
log(y) if lambda is zero).
Displays a Quantile-quantile
plot
of the raw residuals from the fit.
Calculates and
displays
-2*log (maximised likelihood) for the chosen value of
lambda.
Formal Arguments
-NoneMacro Arguments
YVAR
(obligatory)
The
untransformed y-variate name needs
to
be
stored in a macro called YVAR.
MODEL (obligatory)
The
model
formula to be fitted
needs
to
be
stored in a macro called MODEL.
Macro Prompting
The macro prompts for the required value of lambda.
Uses
Within the macro, scalars %D, %L and %N and vectors
INP_, YTR_, IND_, SYV_, RES_, ND_ and PW_ are used.
Subsidiary macros
Macro BOXFIT calls or uses macros
YVAR,
MODEL,
BOX1,
BOX2,
BOX3, BOX4, BOX5, BOX9 and BOXA.
Output
of
Displays -2*log(maximized likelihood) for the chosen value
lambda.
Displays
the GLIM deviance and parameter
estimates
for the transformed y-variate:
y to the power lambda
for lambda not equal to zero
log(y)
for lambda equal to zero
Produces a quantile-quantile plot of the sorted raw
residuals
plotted
against their eqivalent Normal deviates
(for
those
points weighted in)
The following scalars contain useful information:
%D
-2*log(maximized likelihood) for
transformed
%N
%L
y-variate.
Number of points weighted in. Equal to %NU if
prior weights are not set.
Chosen value of lambda.
<PAGE>
Further information
1. Macro
BOXFIT can also be used when GLIM is being used
in
batch
rather
than interactive mode.In true
batch
mode,
there is no interaction with the user.
However, the
value
of
the chosen value of lambda can still be passed to
the
macro.
They
value
should
be placed in the
input
file
following the call to BOXFIT:
$USE BOXFIT$
0
The number will then be read by the macro from the
primary
input channel.
2. The
macro takes account of any
$OFFSET
variate.
BOXFIT
also
takes
account
of any prior weights being
used
to
weight out observations.
If prior weights are set,
only
those
values for which the prior weight is non-zero
will
be
used in the calculation of the likelihood and
in
the
production of the plot. Prior weights with non-zero
values
will
be
taken
to
have a prior weight of
one
for
the
purposes of this macro.
BOXFIT find the m.l.e.
of sigma for
the
fixed,
chosen
value of lambda, so any $SCALE setting is ignored.
3. BOXFIT deletes the system vector %RE.
4. The macro will produce uninformative residual plots if
the
number of points weighted in is close to ,equal to or
less
than the number of parameters being fitted in the model.
5. On normal exit from the macro:
The GLIM $YVARiate is set to the untransformed
y-variate specified in macro YVAR.
The model formula is set to the model formula
specified
in macro MODEL.
Restrictions
1. Macro
prompting works only when free format is in
force.
If
the
$FORMAT
directive
has
been
used
to
read
in
formatted data, then the format setting should be reset
by
the GLIM directive:
$FORMAT FREE$
2. The
macro
will
work only for y-variates which
have
no
negative or zero values (as transformations such as log
or
square root may be included).
<PAGE>
References
Box, G. & Cox, D.(1964), 'An analysis of
transformations',
JRSSB, 26 ,211-252
Example Input
$C
BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS
FOR MALE AND FEMALE BABIES (FROM DOBSON P.14)
$UNITS 24 $DATA AGE WEIGHT $READ
40 2968 38 2795 40 3163 35 2925 36 2625 37 2847
41 3292 40 3473 37 2628 38 3176 40 3421 38 2975
40 3317 36 2729 40 2935 38 2754 42 3210 39 2817
40 3126 37 2539 36 2412 38 2991 39 2875 40 3231
$CAL SEX=%GL(2,12)$FACT SEX 2$
$C THE FIRST 12 PAIRS OF VALUES ARE FOR MALES AND THE
SECOND
FOR FEMALES.
$INPUT %PLC 80 BOXCOX$
$MAC MODEL AGE+SEX $END
$MAC YVAR WEIGHT $END
$USE BOXFIT$
Value of lambda?
$DIN? 0
Example Output
--- Model is AGE+SEX
--- Y-Variate is WEIGHT
-- $data list abolished
--- Transformed Y-variate is log(WEIGHT )
deviance = 0.075739
d.f. = 21
1
estimate
6.487
s.e.
0.2659
parameter
1
2
0.04118
0.006938
AGE
3
-0.05543
0.02469
SEX
scale parameter taken as 0.003607
-- model changed
-2 log l =
321.6
lambda = 0.
<PAGE>
Raw residual Q-Q plot
+----------:---------:---------:---------:---------:---------:---------:
|
0.1200 |
|
|
0.1080 |
+
|
|
0.0960 |
|
|
0.0840 |
+
|
|
0.0720 |
+ +
|
|
0.0600 |
++ +
|
|
0.0480 |
+
|
|
0.0360 |
|
|
0.0240 |
+
|
|
0.0120 |
|
|
0.0000 |
++
|
| -0.0120 |
|
| -0.0240 |
+++ +
|
| -0.0360 |
++++
|
| -0.0480 |
|
| -0.0600 |
+ +
|
| -0.0720 |
+
|
| -0.0840 |
+
+
|
|----------:---------:---------:---------:---------:---------:---------:
|
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
+----------------------------------------------------------------------+
<PAGE>
Macro PRESS in subfile PRESS
Author
Purpose
and
M. A. Aitkin, Centre for Applied Statistics,
Cartmel College, University of Lancaster, U.K.
After a fit of a Normal model, the macro PRESS calculates
displays
the
prediction
sum of
squares,
and
the
crossvalidation estimates of sigma squared and R squared.
Formal Arguments
-NoneMacro Arguments
-None-
Uses
YMI_,
Within
EST_,
the macro,
RES_
scalars %P,
%R and %S and vectors
and YVA_ are used.
Subsidiary macros
Macro PRESS calls or uses macros PRE1
Output
and
The
macro will display
the prediction sum of
squares,
cross-validation estimates of sigma squared and R-squared.
The following scalars contain useful information:
%P
Prediction sum of squares
%R
Cross-validation estimate of R-squared
%S
Cross-validation estimate of sigma squared
Further information
1. The
macros take account of any $OFFSET parameter
settings
in the calculations.
Restrictions
1. The
macro cannot be used if prior weights are
set.
If
this
is attempted,
an error message is printed
and
the
macro is abandoned.
References
Analysis',
1.
Draper and Smith (1981),
'Applied Regression
2nd edition, New York, Wiley
2. Copas, J. (1983), 'Regression, Prediction and
Shrinkage',
JRSSB, 45, p342
<PAGE>
Example Input
$C
BIRTHWEIGHT IN GRAMS AND GESTATIONAL AGE IN WEEKS
FOR MALE AND FEMALE BABIES (FROM DOBSON P.14)
$UNITS 24 $DATA AGE WEIGHT $READ
40 2968 38 2795 40 3163 35 2925 36 2625
41 3292 40 3473 37 2628 38 3176 40 3421
40 3317 36 2729 40 2935 38 2754 42 3210
40 3126 37 2539 36 2412 38 2991 39 2875
$CAL SEX=%GL(2,12)$FACT SEX 2$
$C THE FIRST 12 PAIRS OF VALUES ARE FOR
SECOND
FOR FEMALES.
$INPUT %PLC 80 PRESS$
37
38
39
40
2847
2975
2817
3231
MALES AND THE
$MAC MODEL AGE+SEX $END
$MAC YVAR WEIGHT $END
$USE PRESS$
Example Output
deviance = 658771.
d.f. =
21
PRESS (Prediction Sum of Squares) =
Cross Validation Estimates of :
sigma squared =
36896.
R-squared =
0.5255
885495.
<PAGE>
Macro WEIBULL in subfile WEIBULL
Author
Purpose
specified
B. J. Francis, Centre for Applied Statistics,
Cartmel College, University of Lancaster, U.K.
The
macro
regression
WEIBULL
model
allows
to
the
survival
fitting
data
of
a
using
either
the
exponential or Weibull distributions.
The survival data
may
be partially right-censored, or can be entirely uncensored.
Estimation
The likelihood for the exponential or Weibull
distributions
can be expressed as a Poisson likelihood, with a
linear
model for the Poisson mean corresponding to a
linear
model
for the hazard function.
loglog-
If the shape parameter is fixed or known, then this
procedure
corresponds
to
using
the
censor
variate
as
y-
variate,
specifying
the
log
of the vector of survival times
as
an
offset,
and fitting a Poisson model with log link. Thus,
for
the exponential distribution, a single Poisson fit in GLIM
is
all
that
is needed.
The Weibull distribution
requires
an
iterative procedure - fixing the shape parameter, fitting
the
model
until
and
reestimating
the
shape
parameter
-
convergence is achieved.
Full
computational
details
are
given
in
Aitkin
and
Clayton(1980)
The hazard function in t has the following form:
Exponential
Weibull
where a is
B is
X is
(and * and
Maximum
h(t)=exp(B'X)
h(t)=a*(t**(a-1))*exp(B'X)
the shape parameter
the vector of parameters for a specified model
the vector of explanatory variables.
** have their usual meanings as operators)
likelihood
estimates
of a and B are given
by
the
macro.
The
value
of -2 log(maximised
likelihood)
or
the
'deviance' is also given.
Standard
errors
of B are displayed with the estimate of
B.
For the exponential distribution,
these standard errors
are
correct,
but
for the Weibull,
slightly
underestimated,
the standard errors of B
are
as they do not allow for the
fact
that a is an estimated parameter,
and not fixed. No
standard
error for a is given.
Macro
WEIBULL
may be called as many times as is
necessary,
changing
the
model formula specified in
macro
MODEL
(and
other arguments if necessary) as appropriate.
<PAGE>
Formal Arguments
%1
The variate containing the survival times,
some of which
may
be right-censored.
%2
An
indicator or censor variate.
The elements of the
variate
should be 1 if the corresponding survival time is
uncensored,
and 0 if the corresponding survival time is censored.
%3
(optional)
A
scalar.
Determines
whether
the
Weibull
or
exponential distribution is fitted.
If the argument is not set or set to a non-zero scalar,
then
the
Weibull
distribution
is
fitted,
using
an
iterative
fitting procedure.
The starting value of the shape
parameter
is taken from %A.
If %A is zero or negative, then a
starting
value of 1 is used.
If the third argument is set to a scalar equal to zero,
then
the
exponential
distribution is fitted.
The value
of
the
shape parameter is taken to be 1,
and only one iteration
of
the iterative procedure is carried out.
Macro Arguments
MODEL (obligatory)
The
model
formula to be fitted
needs
to
be
stored in a macro called MODEL.
CONV
(optional)
The convergenge criterion for the change
in
the
estimate of the shape parameter. The default is .001
CYCLE (optional) Should be set to contain either the GLIM
directive
$CYCLE$, or $RECYCLE$. Determines whether the underlying
GLIM
IRLS
procedure recycles from the previous estimates or
not.
Setting The contents of macro CYCLE to $RECYCLE$ should
speed
up the fitting of complex models, but persistent recycling
in
a
macro
which
uses
an
iterative
fitting
procedure
may
occasionally cause divergence of the deviance.
The default is $CYCLE$.
DISP
(optional)
Determines the $DISPLAY options to be used
after
convergence has occurred.
DISP should contain valid
display
option letters. The default contents of DISP is E.
Scalar arguments
%A
parameter.
The
starting value for the estimate of the shape
Used only when the Weibull distribution is being
fitted.
If
%A is zero or negative,
then the default starting value of
1
is used.
%W
Sets
the
maximum number of iterations carried
out
by
to
0,
the
macro.
If
set
to
a scalar less than or equal
the
default setting of 15 is used.
<PAGE>
Uses
vectors
Within
the macro,
scalars %A,
%D,
%F,
and %W and
OFV_ and LGT_ are used.
Subsidiary macros
Macro WEIBULL calls or uses macros MODEL, MOD1, MOD2,
WAR1, MESS, DISP, CONV and CYCLE.
By default, DISP is set to E, CONV is set to .001, CYCLE
is
set to $CYCLE$ and MODEL is undefined.
Output
the
At
each
scale
iteration,
parameter
the current deviance,
and the number of degrees
estimate of
of
freedom
are
displayed.
On
convergence,
or after %W (15) iterations
the
parameter
estimates together with their standard errors
are
displayed.
Note
that
the standard errors may
be
slightly
underestimated.
On normal exit,
the vector %FV contains the scaled
from
These
residuals
the
fit.
are not
independent
but
will
have
approximately standard exponential distributions.
The following scalars contain useful information:
%A
Estimate of shape parameter
%D
'Deviance' or -2 log(maximised likelihood)
%F
Number of degrees of freedom
%W
Current setting of maximum number of
iterations
Further information
1. The
macro
takes
NO account of any
$OFFSET
variate
or
$SCALE
parameter settings.
If prior weights are set
,the
macro
will
display
a warning message
and
unset
them.
2. As
the
macro may carry out many $FITs,
the
$OUTPUT
is
switched
off
for
part of the execution.
If
the
macro
fails,
for instance,
model,
or
because of an incorrectly
specified
if break-in is used.
the output may still
be
switched
off
when
control returns to the
user
at
the
terminal.
See section 6 of the macro library
description
for appropriate remedial action.
3. On normal exit from the macro:
The $ERROR setting is Poisson, with $LOG link.
The GLIM $YVARiate is set to the censor
variate
specified in formal argument %2.
The model formula is set to the model formula
specified
in macro MODEL.
The $OFFSET is unset.
The $WEIGHT is unset.
Any settings of the $ERROR, $LINK, $WEIGHT, $OFFSET
and
$CYCLE made before the call of macro WEIBULL are
lost.
The display is inhibited.
<PAGE>
References
of
Aitkin,
M.
Exponential,
and
Clayton,
Weibull
and
D.
(1980),
Extreme
Value
'The
fitting
distributions
to
complex
censored survival data using GLIM',
Appl.
Statist,
29, pp156-163
Example Input
$C
REMISSION TIMES IN ACUTE LEUKEMIA
$C
GEHAN'S DATA...JRSS B 1978,P.217 AND BIOMETRIKA,1965,52,P213
$DATA T $READ
1 1 2 2 3 4 4 5 5 8 8 8 8 11 11 12 12 15 17 22 23
6 6 6 6 7 9 10 10 11 13 16 17 19 20 22 23 25 32 32 34 35
$DATA C $READ
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0
!
(1=UNCENSORED OBSN, 0=CENSORED OBSN)
$CAL G=%GL(2,21)$
!
VARIABLES:
!
T REMISSION TIMES IN WEEKS
!
C CENSOR VARIATE
!
G GROUP TREATMENT(1=PLACEBO,2=6-MERCAPTOPURINE(6-MP))
!
$M MODEL G $E
$CALC %B=1 : %W=0 :%A=0 $USE WEIB T C %B$
Example Output
-- Model is G
Exponential fit
Deviance
shape
parameter
df
217.05
1.0000
40
Weibull fit
213.22
1.3156
39
213.16
1.3546
39
213.16
1.3632
39
213.16
1.3652
39
---Standard errors of estimates given below are underestimated
estimate
s.e.
parameter
1
-1.339
0.5491
1
2
-1.731
0.3983
G
scale parameter taken as 1.000
-- model changed
<PAGE>
Macro RESPLOT in subfile WEIBULL
Author
Purpose
macro
M. A. Aitkin, Centre for Applied Statistics,
Cartmel College, University of Lancaster, U.K.
After a fit of a Weibull model using macro WEIBULL, the
RESPLOT
displays two quantile-quantile residual
plots.
The
macro uses the vector %FV,
which on exit from macro
WEIBULL
contains the scaled residuals,
which will have
approximately
standard exponential distributions.
Formal Arguments
%1
An
indicator or censor variate.
The elements of the
variate
should
be
1
if
the
corresponding
element
of
%FV
is
uncensored,
and
0
if the corresponding element of
%FV
is
censored.
Macro Arguments
-NoneUses
used.
Within the macro, vectors WK1_, WK2_, WK3_ and WK4_ are
Subsidiary macros
-NoneOutput
The
The macro displays two quantile-quantile residual plots.
first
plot is uncorrected for heterogeneity of the
variance
of
the scaled residuals.
If
the
The second is variance
stabilised.
probability
model holds,
both
plots
should
give
straight lines with slopes of unity.
Further information
1. Macro RESPLOT does not destroy the contents of either
%1
or %FV.
References
of
Aitkin,
M.
Exponential,
and
Clayton,
Weibull
and
D.
(1980),
Extreme
'The
Value
fitting
distributions
to
complex censored survival data using
GLIM',
Appl.
Statist,
29, pp156-163
Example Input
$UNITS 42
$C
GEHAN'S DATA...JRSS B 1978,P.217 AND BIOMETRIKA,1965,52,P213
$UNITS 42$DATA T $READ
1 1 2 2 3 4 4 5 5 8 8 8 8 11 11 12 12 15 17 22 23
6 6 6 6 7 9 10 10 11 13 16 17 19 20 22 23 25 32 32 34 35
$DATA C $READ
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 0 1 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 0
$CA G=%GL(2,21)$MAC MODEL G $E
$INPUT %PLC 80 WEIB$
$CA %A=%W=0$USE WEIB T C$
$USE RESP C$
<PAGE>
Example Output
Residual plot
3.800 |
3.600 |
3.400 |
3.200 |
+
3.000 |
2.800 |
2.600 |
+
2.400 |
2.200 |
2.000 |
+
1.800 |
+
1.600 |
1.400 |
2
1.200 |
+
1.000 |
22 +
0.800 |
4
0.600 |
22+
0.400 | +24
0.200 |432
0.000 |5
----------:---------:---------:---------:---------:---------:---------:
0.000
0.800
1.600
2.400
3.200
4.000
4.800
Variance stabilised residual plot
1.5200 |
1.4400 |
+
1.3600 |
+
1.2800 |
3
1.2000 |
+ +
1.1200 |
2+ 2
1.0400 |
++ 2
0.9600 |
2 +
0.8800 |
+ ++2
0.8000 |
+ ++
0.7200 |
2 3
0.6400 |
+ ++
0.5600 |
+ +
0.4800 |
+
0.4000 |
+ +
0.3200 |
0.2400 | 2
0.1600 |
0.0800 |
0.0000 |
----------:---------:---------:---------:---------:---------:---------:
0.125
0.375
0.625
0.875
1.125
1.375
1.625
<PAGE>
Macro TUNIFORM in subfile TUNIFORM
Author
R. A. Reese, Computer Centre, Hull University, U.K.
Purpose
The macro TUNIFORM tests the distribution of a variate
against a theoretical uniform distribution, using a chi-
squared
goodness of fit test after dividing the range into
intervals. The parameters of the theoretical distribution
may be supplied or evaluated from the sample. In the
latter case, the chi-squared statistic is biased.
Formal Arguments
-NoneMacro Arguments
ARG1 (obligatory). Macro ARG1 must contain
the name of a
vector
or a vector expression. The vector may be of any length.
(It is up to you to ensure that there are enough values to
make the test worthwhile.)
PRINT (Optional). Can contain a scalar or a number. If the
scalar
or number is not zero (default),
then the result of the
test
will be printed on the current output channel. If macro
PRINT
contains a number or scalar which evaluates to zero, then
the
results will be saved and not printed.
Scalar Arguments
%U,%V
If
the value of ordinary scalar %U is strictly greater
than
that of %V, then those values will be used as the maximum
and
minimum
respectively
of
the
theoretical
rectangular
distribution;
otherwise,
the maximum and minimum values
of
the
sample
vector
will be used,
with consequent
loss
of
degrees of freedom.
Uses
Within the macro, ordinary scalars %P, %T, %U, %V, %W, %X,
%Y and %Z and variates OBS_, EXP_ and PCH_ are used.
Subsidiary macros
Macro TUNIFORM calls or uses macros TU1, TU2, ARG1
and PRINT.
ARG1 is initially defined as empty.
PRINT initially contains the number 1.
Output
the
If the macro
argument
PRINT is set to a non-zero value,
macro will optionally print a table of the observed and
expected frequencies within each interval of the range,
titled with the name of the variable, the parameters of the
theoretical distribution and the test statistic.
<PAGE>
The number of intervals is selected to give at least five
expected observations in each partition, subject to a
maximum of twenty intervals.
On exit, the following system scalars will contain
values:
%P
%U
%V
%W
%X
%Y
%Z
-
probability (significance) of test statistic.
maximum value for distribution.
minimum value for distribution.
range of distribution.
overall chi-squared value.
number of intervals.
degrees of freedom for significance test.
If macro argument PRINT is set to a zero value, then
the first %Y values of the following variables will
contain:
OBS_
EXP_
PCH_
- observed frequency in each interval.
- expected frequency in each interval.
- contribution to chi-squared from each interval.
Example Input
$UNIT 23 $CA X=%SR(0)
$INPUT %PLC 80 TUNIFORM $
$M ARG1 X $E
$CALC %U=%V=0$
$USE TUNI$
Example Output
Sample estimates used as parameters.
df adjusted.
Test of distribution of X
against uniform.
observed expected partitioned
freq
freq
chi-squared
7.000
5.000
5.750
5.750
0.27174
0.09783
5.000
6.000
5.750
5.750
0.09783
0.01087
minimum (%v)
0.0578
maximum (%u)
0.9879
chi-squared (%x)
0.4783 with (%z) 1 df
P= (%p)
*** END OF DOCUMENTATION ***
<PAGE>
$
x)
0.4783
0.489