Download Elandt-Johnson, R.C.; (1979)Some Prior and Posterior Distributions in Survival Anlaysis, and Their Applications."

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
•
•
SOME PRIOR AND POSTERIOR DISTRIBUTIONS
IN SURVIVAL ANALYSIS 1 AND THEIR APPLICATIONS*
by
Regina C. E1andt-Johnson
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1206
JANUARY 1979
SO~ffi PRIOR AND POSTERIOR DISTRIBUTIONS
IN SURVIVAL ANALYSIS, AND THEIR APPLICATIONS*
Regina C. Elandt-Johnson
Department of Biostatistics
University of North Carolina
Chapel Hill, N.C, 27514, U.S.A.
SUMMARY
Concomitant variables
introduced into survival models are
(~)
often regarded as risk or aging factors which contribute to the
mortality patterns in various study groups.
observed at the time point
t =0
regarded as random variables,
•
If the
~,
(and having values
can also be
do, indeed, contribute to mortality, their posterior dis-
Z's
t = 0,
~)
with some prior joint distribution.
tribution among the survivors to time
at
Concomitant variables
even when the values of the
t
will be different from that
z' s
do not depend on
t,
General formulae for the posterior distributions are given in Section 2
(assuming
z's
varying with
not to depend on
t).
t), and in Section 4 (for
z's
Our interest is especially focussed on models with
linear additive hazard functions of the form (3.2).
It is shown that
in such cases the posterior distribution does not depend on the "underlying" hazard; its general form is given by (3.6), and by (3.7) where
•
*This work was supported by the National Heart and Lung Institute
contract NIH-NHLI-7l2243 and by NIH research grant number 1 ROI CA17l07
from the National Cancer Institute.
-2-
the concomitant variables are independent,
Of some interest is the
Zo is a binary variable - an applica-
case when a univariate variable
tion of posterior probabilities in clinical trials is suggested in
;-
Section 3.
Of special interest is the case when
function of t.
examples.
t
Z is a continuous
Level of serum cholesterol, or blood pressure are
Posterior distributions and posterior regression of
among the survivors to time
t
Z on
are those which are usually observed.
Assuming an additive model for the hazard function (6.2), one may
infer, under certain conditions, the prior distribution of
Z,
without
using repeated measurements (Section 6).
Key Words & Phrases:
Concomitant variables; Prior and posterior
distributions, and regression functions; Additive hazard rate function;
Survival function.
•
•
-3-
I,
INTRODUCTION
Assessment of the role of concomitant variables as risk factors of
mortality in various diseases is the subject of many current longitudinal studies and clinical trials,
Let
#
denote the survival time (age) and let
T >0
_.z be a k xl
vector of concomitant variables which can be measured on each individual (unit) under consideration,
Let
time
t
~
be a
k x 1 vector of concomitant variables at initial
=0, and
(1.1)
be the corresponding survival distribution function (SDF).
The usual approach to estimation of SDF defined in (1.1) is to
assume a parametric or semiparametric form for the hazard function of
•
(1.1), in which
~O
is treated as a vector of parameters
[e,g. Cox
(1972), Prentice (1973), Byar and Corle (1977), Kay (1977), and many
others).
However, in a given study, one may consider a random vector
having a specified prior distribution (CDF),
F~ (~).
~
~'s
If the
playa significant role in mortality (i.e, are risk or aging factors),
then the posterior distribution among the survivors to age
FZ It(~olt), would be different from
-0
not functions of t,
FZ (~)
~
even though
t,
are
z 's
o
The purpose of this paper is to investigate the properties of
•
certain models of prior and/or posterior distributions of
their use in estimating the SDF's.
ZO's,
and
Though the problem is of Bayesian
nature, it should be pointed out that the forms of the prior distributions need not always be assumed 'a priori' - they can be estimated
-4-
from actually observed distributions,
The cases when the
zO's
are independent of time
2, 3, and 4), and when they are functions of t
t
(Sections
(Sections 5 and 6) are
treated separately, for convenience,
2, POSTERIOR OISTRIBUTIONS:
CONCOMITANT VARIABLES INDEPENDENT OF TIME
We discuss here some models, in which the concomitant variables do
not depend on time
Let
dF z (!o)
t,
be the probability element of the random vector
NO
k.O = (ZOl' •.• , ZOk) , where !o E
probability element of
~
in
'~.
We call
dF Z (!o)
the prior
N()
~.
The posterior probability element among the survivors to age
t
with the SOF defined in (1.1) is
STet; !o)dF Z (!o)
-0
(2,1)
•
Note that
(2.2)
is the average SOF over the whole set of
zo's,
It can be estimated
from survival data.
Sometimes we might be able to estimate the prior as well as the
posterior distributions of
and then the SOF from the formula
~
dF Z It(!olt)
.::.0
dF z
NO
C!o)
E [STet;
Z
"'0
~)] ,
(2.3)
•
-5-
3.
MODELS WIlli ADDITIVE HAZARD RATE FUNCTIONS
We define the general additive model of hazard rate function as
k
A(t; zO) = A(t)
where
i
I
io:;l
h. (t) 'g. (zo.) ,
1
1
(3.1)
1
is the so called underZying hazard rate,
A(t)
h.(t)'s
+
Note that the
are entirely
functions of t, while the g. (zo.)'s
1 1
1
dependent on
are not
t.
For convenience, however, and without loss of generality (by
appropriate definition of zO's) we confine our discussion to Zinear
additive models of the form
A(t; ~.n) =A(t)
~
k
+
L h.(t)zO·
1
1
. 1
1=
,
(3,2)
and in special cases
k
A(t; zo) = A(t)
...
•
where the
a.'s
1
I
i=l
(3.3)
a. zOo ,
1
1
are constants.
Further, define
A(t) =
r
A(u)du,
o
and denote by
+
~
(~)
H. (t) =
1
Jt
o
h. (u)du ,
1
the joint moment generating function (MGF) of
-0
the distribution of concomitant variables
FZ (!o).
.::.0
Then the cumulative hazard function (CHF) of (3.2) is
.
k
A(t;
k,n)
·-v
=i\(t)
+
•
I 1H.1 (t)z·o
1
(3.4)
1=
the survival function is
k
•
STet; ·-v
;.n) =exp[-A(t)]exp[- I H. (t)zo']
• 1 1
1
1=
and (from (2.2))
(3.4a)
-6-
EZ [STet;
~
k.o)]
=expI ....ACt)]
.
J' \
k
exp!...
L Hi(t)zo·]dFZ C!o)
1=1
~
.1
= expI-A(t)]M-J.o{-R(t)],
'"
where !!(t) = (li l (t), •• "
~ (t))
•
(See Elandt-Johnson (1976),)
Hence, the posterior distribution of zO's
to time t
is (from
(3.5)
among the survivors
(2.1))
(3.6)
In particular, when the
zO's
are mutuaZZy
independent~
k
dF Z (z -) =n dF Z (zO') ,
~~
i=l
Oi 1
and
k
MZ [-H(t)] =nMz [-H.(t)],
ZQ ~
i=l
Oi 1
•
so that (3.6) takes the form
k exp[-H.(t)]dF Z (ZO,)
1
Oi
1
dFz_1 t (.t.o It) = . _
M [-H. (t)]
N(J
n
1-1
z001
1
.
k
=n dF z It(zo·lt),
. 1
O·1
1
1=
(3.7)
where
exp[~.(t)]dFZ
1
Oi
(zo,)
1
is the posterior probability element with respect to variable
Summarizing these results:
(3.8)
ZOi'
•
-7-
A(t;~)
If the haza!'d rate funation"
Unea!' additive form (3,2)" and the
,
is of the
are random
ZO's
variabZes with joint distribution dF Z (zO)' then the
-0
posterior distribution of the Zo's among the survivors
to time
does not depend on the underZying hazard
t
and its expZiait form is given by (3,6).
A(t)"
additionaZZy" the
the
"7
,
•
'.0 s , g1.-ven
ZO's
If
are mutuaZZy independent" then
t, a!'e aZso mutuaHy independent.
The general multipZiaative model of hazard rate function can be
defined as
(3.9)
or more specifically as
k
ACt;
•
!oJ
= A(t)n g(zo') .
i=l
~
(3.10)
A special case, the so called muZtipZiaative exponentiaZ model
A(t; ~.n) =A(t)exp(
'-v
k
L (3.z0')
'1~
1=
1
,
(3.l0a)
is in common use (e.g. Cox (1972)).
General multiplicative models will not be discussed in detail,
though special cases will occasionally be used for comparisons.
4.
SOME APPLICATIONS IN CLINICAL TRIALS
Consider a simple clinical trial in which only two groups are
distinguished:
•
control and experimental.
zo =
Let
{o.1 ifif control
experimental
Suppose that the patients are assigned to these groups in the
initial ratio
-8-
Control : Experimental = p; (1 -. p) = c
Then the prior distribution of
Zo
(p,
1 .. P
:> 0)
,
among the survivors to time
t
is
pr{Zo =O} =p =c(l +c) ..l
4.1.
and
pr{Zo =1} =l-p = (1 +c)-l
",."
Models with additive hazard function
Suppose that the hazard rate.has the additive form
(4.1)
Thus, (from (3.4»
(4.2)
and
1
Ez [ST(tj Zo)] = I ST(tj zo)Pr{Zo = zo}
o
zo=o
= (1 +c)-lexp[-A(t)]{c +exp[-H(t)]} •
The posterior probabilities among the survivors to time
(4.3)
tare
pr{ZO = 0 It} = dc + exp [-H(t)] r 1 ,
•
and
Pr{Zo =llt} =exp[-H(t)]{c +exp[_H(t)]}-l
Their ratio is
Pr{Zo = 0 It}
Pr{Z = 1 rn= R(t) -c'exp!H(t)],
(4.4)
or
log[R(t)/c] =H(t).
In particular, when
(4.5)
h(t) =cx,
log[R(t)/c]=cxt,
(4.6)
Estimation and fitting.
Suppose that at time
t = 0,
there are
N
OO
and
in the control and experimental groups, respectively.
c =NOO/N
10
'
N
lO
individuals
Clearly
Suppose that there are no new entries or withdrawals during
.
'
-9-
the observation period, so that we may consider the
NOO
and
NIO
individuals as two 'cohorts' observed over a certain period,
Let
NOt
and
NIt
'"
be the numbers, and
POt = NO/N OO
and
'" = Nlt/N - the corresponding proportions of survivors to time
PIt
IO
I.
in the control and experimental groups, respectively.
estimated
R(t)
t
Then the
and
is clearly,
R(t) _ NOt /
-c- - NIt
NOO _ POt
- p
It
(4.7)
,
~
so that (from (4.5) and (4.7))
(4.8)
and in particular (for (4,6))
log (~Ot/PIt)
=at.
(4.9)
Note that the estimated relative risk is
the ratio
4.2.
'"
'"
POt/PIt
"reZative survivaZ",
might be thought of as
Models with proportional hazard rates
In a simple form of multiplicative model, we assume
A(t; zO)
= A(t)e
SzO
(4.10)
,
so that
A(t; l)/A(t; 0) = e S = e
e >0
,
(4.11)
- the hazard rate in the experimental group is proportional to that
in the control group,
Then
= exp[-A(t)e
SZO
]
(4.12)
and
E [STet; Zo)]=(l +c)-lexp[-A(t)]{c +exp[(l - e)1\(t)]} ,
Zo
(4.13)
-10-
The posterior probabilities among the survivors to
time
tare
pr{ZO =olt} = dc +expICl .. SlA(t)]}-l,
and
pr{ZO =llt} = exp[(1-6)J\(t)]{c +exp[(l-SJA(t)]}-11
so that
10g[R(t)/c] = (1 - S)A(t) •
Note that where
ACt) = A,
(4.14)
(4,14) takes the form
10g[R(t)/c] = (l .. SlAt =at 1
C4.l5)
which is essentially the same as (4.9),
Therefore, (4.9) cannot be used to infer about the appropriateness
of the additive model
that
A(t)
ACt; zO) = A(t) + az O'
is not constant,
unless it can be assumed
Inference about proportional hazard rate model,
If the mortality data are complete, and no parametric function
for
A(t)
is assumed,
ACt)
can be estimated from the mortality data
in the control group, using the formula
"
A(t) =
where
t!
1
i
l
j=l
1
N
.
00 -. J
+
1
for
t!1 st <t!1+ 1 '
(4.16)
is the ith ordered time at death (Nelson (1972)).
In view of (4.7) and (4.14), one may study (graphically) the
relation
10g(POt/P It)
+
(1 - S) ~(t) ,
(4.17)
to investigate whether a multiplicative model (4.10) is appropriate in
a given clinical trial.
.,
-11-
S. fOSTER lOR DISTRIBUTION$:
TIME DEPENDENT CONCOMITANT VARIABLES.
Continuous variables which might be thought of as risk or aging
factors for mortality are often time dependent.
We will restrict ourselves to the case of a single concomitant
,.I
variable
Zt
z.
Let
the value of
Zo
Zo
denote the value of
which would be reached at time
survival- to that time.
(i)
If
Zo
Z at time
t =0,
and
t, assuming
For example,
denote initial age (at
t = 0),
then at time
t, the
age of an individual is clearly
if he is alive at time
(ii)
t.
Suppose that
t
represents age and
cholesterol (or blood pressure).
z the level of
It is often assumed that
z
is
linearly related to age, that is
Zt = Zo + St ,
where
Zo
is the initial level of
would be the value of
z at time
z
in an individual, and
t
Zt
in the same individual if he had
not died.
(iii)
It is often assumed that concentrations of certain cell
constituents increase exponentially with time (e.g. Arley (1961)).
Zo
is the initial concentration of
z,
then at time
t,
If
we have
(for the same individual),
can be a function of
In general case,
n'
additional parameters,
= (n l , ""~)'
t,
and
m
say,
Zt=W(t; zo' n) •
We may consider
Zo
and the
n's
to be continuous random
(S .1)
-12-
variables with the joint density f z
O,n
rate function at time
t
(zo'
n).
is a function of
Further, the hazard
Zt
(5.2)
In particular, the general additive model is of the form
A(t ; Zt) = A(t) + azt '
Thus the survival function
(5 .4)
STet; Zt)
is equal to
(5,5)
where
A*(t; zo' n) =
r
A*(U; zo' n)du .
o
The posterior joint density of
time
t
Zo
and
n
among the survivors to
is
(5.6)
where the
(m+l)-tuple integral in the denominator of (5.6) over the
parameter space
nm+1 , represents the average survival function,
EZ n[Sr(t; ZO' n)]·
0' .....
The posterior density of Zt
among the survivors to age
t
can
be obtained by applying the transformation
n's,
n=n
Zt
n= n
Zo
= ~(t; zo' n), with inverse
= ~ -1 (t; Zt' n), and integrating
out the
giving
J
r fZo,nlt[~- 1 (t; Zt' n) It]
fZtlt(Ztlt) = J~"
m
Id~-ll
dZ dn
(5.7)
t
In further generalization, one may consider a random vector of
concomitant variables,
~,
This would be a natural multivariate
-13-
extension of the model just discussed,
In practice, however, the
technical problems become rather difficult.
6.
APPLICATIONS IN EPIDEMIOLOGY
It is often assumed (though not fUlly established) that there is
J
a tendency for serum cholesterol level to increase with age for a
normal (healthy) individual.
Studies of such relations require re-
peated measurements on the same individuals, under specified conditions,
and over long period of time - they are difficult (and costly) to
obtain on a mass scale.
The available data are usually cross-sectional population data.
It has been shown (e.g. Lewis et.dZ. (1957), Carlson and Lindstedt
(1968)) that the distribution of serum cholesterol in each age group is
approximately normal, and that a third order polynomial (or linear)
regression function, of serum cholesterol on age, for females, and
quadratic - for males. is not unreasonable to fit.
We now show that these
'posterior' results are in agreement with certain simple 'prior'
assumptions.
Let
Zt
denote the level of cholesterol at age
t, and suppose
that
(6.1)
Zt =Zo +BljJ(t) ,
where
Zo
(the initial level of cholesterol) and
'!Jet)
2
2
ZO'" N(1;O' a O)' B.... N(B. a l ).
is a certain (specified) function of t.
normally distributed random variables:
and
B are independent
Suppose that the hazard rate function is of the additive form
A(t; Zt) = A(t) +aZ t
= A(t)+azo+aljJ(t)b.=A*(t; zO' b).
where
a
is a constant.
(6.2)
-14-
Note that
with
).,*(ts zo' b)
zOl = zo'
z02 =b,
is of the same fom as (3.2) for
hI (t) =a,
h 2 Ct) =all/(t),
from a different biological situation,
distribution of
Zo
and
B,
k =2
though it arises
To evaluate the posterior
we can apply the results obtained in
Section 3.
We have
HI (t) =at,
H2 (t) =af1JJCU)dU.
Recall that the moment
o
X with mean
generating function of a normal variable
variance
a
2
is
~
and
1 2 2
MX(S) = exp(~s) exp(~ s ).
Thus
and
(6.3)
Prom (3.8), the corresponding posterior densities of
Bit
Zolt
and
are
1 2 2 2
t )
exp(-r;oat)exp(~Oa
=
1
v'2iT 0 0
exp{-
~[Zo 20
(1;;0
_ao~t)]2};
(6.4)
0
- this is the PDP of a normal variate with mean
1;;0 -
ao~t and variance
122
exp[-aH2(t)]exP[~1(H2Ct)) ]
=
1
IiiT 0 1
exp{ _ ~[b - (a ... 0~H2 (t))]2}
. 20
.
1
- this is also the PDP of a normal variate with mean
(6.5)
IS - 0~H2 (t) 1 and
-15-
variance
The joint posterior PDP
(6.4) and (6.5),
f
BltlZOI bit) is the product of
Z0'
Zt=ZO+Bh (tL where Zo and Bare
2
But since
independent normal variates, then the posterior distribution of
~I
among the survivors to time
t,
Zt
is normal with mean (posterior
regression function)
(6.6)
E(Zt1t) =E(Zolt)+[1fi(t)]2var(Blt),
and variance
2
Var(Zt!t) =Var(Zolt) + [1jJ(t)] Var(Blt).
(6.7)
SpeaiaZ aases
(a)
Suppose that
Zt = Zo + 81jJ(t) ,
where
8
is a constant.
Then
(from (6.6)) the posterior regression function is
E(Zt! t) = (/;O In particular, when
E(Zt~t)
when
cxo~t)
(6.8)
+ 81jJ(t) .
1jJ(t) = t
2
= /;0 - (cxo - 8)t
o
.,. "linear;
(6.9)
V(t) = t 2
2
2
E(Zt l t) - /;0 - cxoot + 8t ••• quadratia;
when
1jJ(t) = t
(6.10)
3
2 + Qj.Jt 3 ... ....
nub'"v ....n"
E ( Zt •l t ) = /;0 - cxoot
(6.11)
etc.
(b)
Suppose that
Zt = Zo + B1jJ(t),
independent random variables.
h (t) = cxt), we
2
have
H (t) =
2
but both
Now, however)when
1
2
~t,
Zo
and
1jJ(t) =t
Bare
(or
and so
2
122
E(Zt 1t) = /;0-cxoot+(8-~olt)t
=
2
123
/;0-(cxoO-8)t-?Olt
- this is also aubia (as for
1jJ(t) =t
3
with
(6.12)
B constant), but with
-16-
different coefficients,
Similarly, for
of the form
~(t) =t 2 , the posterior regression function is
E(Zt 1t) =AO +Alt
+~~+A4t4, etc,
Zt = ZOlPl (t) + B~2 (t) ,
The more general form of the relationship,
allows for a variety of posterior regression functions - the mathematics is straightforward,
As has already been mentioned, we can usually observe the posterior,
but almost never the prior, distribution and regression function,
However, assuming that the hazard rate function,
A(t; Zt); is of
additive form (6.2), the following information about 'prior' distribution of Z,
and regression of Z
on
t,
can be deduced from the
available information.
(i)
and
Bit
If the posterior distribution of
Zt1t is normal, and
are independent, then the distributions of Zolt
for those values of t,
and
Zolt
Bit
where observations are available, are also
normal (by Cramer's Theorem - see, for example, Mathai and Pederzo1i
(1977), p. 6).
By inversion of (3.7),
pendent normal variables, and so
(ii)
Zt
Zo
and
B are also inde-
is normal.
If the posterior regression function of
polynomial form
Zt\t
on
t
is of
the prior regression
function is also of the polynomial form, but not necessarily of the
same order; it depends on further assumptions about the stochastic
nature of the prior regression coefficients.
To illustrate that our results can agree with observations, we
present in Fig. 1 (taken from Lewis et.aZ. (1957), CirauZation,
1£,
p. 236), based on cross-sectional data on cholesterol level as
function of age.
The authors fit a third order polynomial posterior
-17-
regression to the female
data~
while male data are better fitted
by a quadratic regression (solid lines),
•
)
,-..
280
~
~
C/)
260
,..J
:E:
•
0
0
....-l
c::: 240
~
•
•
0..
.
~ 220
'-'
,..J
.~
~
~
Eo-<
• MALES
200
o FEMALES
C/)
~
,..J
g
u
~
c:::
~
C/)
160
AGE IN YEARS
Fig. 1.
Mean serum cholesterol level by age and sex (From:
et.al. (1957) ~ Cil'auZ,ation !&.~ p. 236).
Lewis
ACKNOWLEDGMENTS
I would like to thank my
discussion.
husband~
Dr. N,L. Johnson for a helpful
-18-
REFERENCES
ARLEY, Nt (1961). Theoretical analysis of carcinogenesis,
Berkeley Symposium Vol, i, 7....17,
'
Fourth
BYAR, D.P, and CORLE, D,K, (1977), Selecting optimal treatment in
clinical trials using covariate information, J t Chron. Dis,
30, 445-449.
---
CARLSON, L.A. and LINDSTEDT, S, (1958). The Stockholm prospective
study. 1. The initial values for plasma lipids. Aata. Med.
Saand., Suppl. ill, 1-135,
COX, D.R. (1972). Regression models of life tables (with discussion).
J. Roy. Statist. Soa. Ser B. ~, 187-220.
t
ELANDT-JOHNSON, R.C. (1976). A class of distributions generated from
distributions of exponential type. Nav. Res. Log. Quart. ~,
131-138.
KAY, R.
(1977). Proportional hazard regression models and the
analysis of censored survival data. Appl. Statist. £2,
227-237.
LEWIS, L.A., OLMSTED, F. et.al. (1957). Serum lipids level in
normal persons. Findings of a cooperative study of lipoproteins and arteriosclerosis. Ciraulation 12, 227-245.
MAlHAI, A.M. and PEDERZOLI, G. (1977), Charaterization of the Normal
Probability ~, J. Wiley &Sons, New York.
NELSON, W.A. (1972). Theory and application of hazard plotting for
censored failure data. Teahnometrias,!i, 945-966.
PRENTICE, R.L. (1973). Exponential survivals with censoring and
explanatory variables. Biometrika, 2Q, 279-288.
t