Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
• • SOME PRIOR AND POSTERIOR DISTRIBUTIONS IN SURVIVAL ANALYSIS 1 AND THEIR APPLICATIONS* by Regina C. E1andt-Johnson Department of Biostatistics University of North Carolina at Chapel Hill Institute of Statistics Mimeo Series No. 1206 JANUARY 1979 SO~ffi PRIOR AND POSTERIOR DISTRIBUTIONS IN SURVIVAL ANALYSIS, AND THEIR APPLICATIONS* Regina C. Elandt-Johnson Department of Biostatistics University of North Carolina Chapel Hill, N.C, 27514, U.S.A. SUMMARY Concomitant variables introduced into survival models are (~) often regarded as risk or aging factors which contribute to the mortality patterns in various study groups. observed at the time point t =0 regarded as random variables, • If the ~, (and having values can also be do, indeed, contribute to mortality, their posterior dis- Z's t = 0, ~) with some prior joint distribution. tribution among the survivors to time at Concomitant variables even when the values of the t will be different from that z' s do not depend on t, General formulae for the posterior distributions are given in Section 2 (assuming z's varying with not to depend on t). t), and in Section 4 (for z's Our interest is especially focussed on models with linear additive hazard functions of the form (3.2). It is shown that in such cases the posterior distribution does not depend on the "underlying" hazard; its general form is given by (3.6), and by (3.7) where • *This work was supported by the National Heart and Lung Institute contract NIH-NHLI-7l2243 and by NIH research grant number 1 ROI CA17l07 from the National Cancer Institute. -2- the concomitant variables are independent, Of some interest is the Zo is a binary variable - an applica- case when a univariate variable tion of posterior probabilities in clinical trials is suggested in ;- Section 3. Of special interest is the case when function of t. examples. t Z is a continuous Level of serum cholesterol, or blood pressure are Posterior distributions and posterior regression of among the survivors to time t Z on are those which are usually observed. Assuming an additive model for the hazard function (6.2), one may infer, under certain conditions, the prior distribution of Z, without using repeated measurements (Section 6). Key Words & Phrases: Concomitant variables; Prior and posterior distributions, and regression functions; Additive hazard rate function; Survival function. • • -3- I, INTRODUCTION Assessment of the role of concomitant variables as risk factors of mortality in various diseases is the subject of many current longitudinal studies and clinical trials, Let # denote the survival time (age) and let T >0 _.z be a k xl vector of concomitant variables which can be measured on each individual (unit) under consideration, Let time t ~ be a k x 1 vector of concomitant variables at initial =0, and (1.1) be the corresponding survival distribution function (SDF). The usual approach to estimation of SDF defined in (1.1) is to assume a parametric or semiparametric form for the hazard function of • (1.1), in which ~O is treated as a vector of parameters [e,g. Cox (1972), Prentice (1973), Byar and Corle (1977), Kay (1977), and many others). However, in a given study, one may consider a random vector having a specified prior distribution (CDF), F~ (~). ~ ~'s If the playa significant role in mortality (i.e, are risk or aging factors), then the posterior distribution among the survivors to age FZ It(~olt), would be different from -0 not functions of t, FZ (~) ~ even though t, are z 's o The purpose of this paper is to investigate the properties of • certain models of prior and/or posterior distributions of their use in estimating the SDF's. ZO's, and Though the problem is of Bayesian nature, it should be pointed out that the forms of the prior distributions need not always be assumed 'a priori' - they can be estimated -4- from actually observed distributions, The cases when the zO's are independent of time 2, 3, and 4), and when they are functions of t t (Sections (Sections 5 and 6) are treated separately, for convenience, 2, POSTERIOR OISTRIBUTIONS: CONCOMITANT VARIABLES INDEPENDENT OF TIME We discuss here some models, in which the concomitant variables do not depend on time Let dF z (!o) t, be the probability element of the random vector NO k.O = (ZOl' •.• , ZOk) , where !o E probability element of ~ in '~. We call dF Z (!o) the prior N() ~. The posterior probability element among the survivors to age t with the SOF defined in (1.1) is STet; !o)dF Z (!o) -0 (2,1) • Note that (2.2) is the average SOF over the whole set of zo's, It can be estimated from survival data. Sometimes we might be able to estimate the prior as well as the posterior distributions of and then the SOF from the formula ~ dF Z It(!olt) .::.0 dF z NO C!o) E [STet; Z "'0 ~)] , (2.3) • -5- 3. MODELS WIlli ADDITIVE HAZARD RATE FUNCTIONS We define the general additive model of hazard rate function as k A(t; zO) = A(t) where i I io:;l h. (t) 'g. (zo.) , 1 1 (3.1) 1 is the so called underZying hazard rate, A(t) h.(t)'s + Note that the are entirely functions of t, while the g. (zo.)'s 1 1 1 dependent on are not t. For convenience, however, and without loss of generality (by appropriate definition of zO's) we confine our discussion to Zinear additive models of the form A(t; ~.n) =A(t) ~ k + L h.(t)zO· 1 1 . 1 1= , (3,2) and in special cases k A(t; zo) = A(t) ... • where the a.'s 1 I i=l (3.3) a. zOo , 1 1 are constants. Further, define A(t) = r A(u)du, o and denote by + ~ (~) H. (t) = 1 Jt o h. (u)du , 1 the joint moment generating function (MGF) of -0 the distribution of concomitant variables FZ (!o). .::.0 Then the cumulative hazard function (CHF) of (3.2) is . k A(t; k,n) ·-v =i\(t) + • I 1H.1 (t)z·o 1 (3.4) 1= the survival function is k • STet; ·-v ;.n) =exp[-A(t)]exp[- I H. (t)zo'] • 1 1 1 1= and (from (2.2)) (3.4a) -6- EZ [STet; ~ k.o)] =expI ....ACt)] . J' \ k exp!... L Hi(t)zo·]dFZ C!o) 1=1 ~ .1 = expI-A(t)]M-J.o{-R(t)], '" where !!(t) = (li l (t), •• " ~ (t)) • (See Elandt-Johnson (1976),) Hence, the posterior distribution of zO's to time t is (from (3.5) among the survivors (2.1)) (3.6) In particular, when the zO's are mutuaZZy independent~ k dF Z (z -) =n dF Z (zO') , ~~ i=l Oi 1 and k MZ [-H(t)] =nMz [-H.(t)], ZQ ~ i=l Oi 1 • so that (3.6) takes the form k exp[-H.(t)]dF Z (ZO,) 1 Oi 1 dFz_1 t (.t.o It) = . _ M [-H. (t)] N(J n 1-1 z001 1 . k =n dF z It(zo·lt), . 1 O·1 1 1= (3.7) where exp[~.(t)]dFZ 1 Oi (zo,) 1 is the posterior probability element with respect to variable Summarizing these results: (3.8) ZOi' • -7- A(t;~) If the haza!'d rate funation" Unea!' additive form (3,2)" and the , is of the are random ZO's variabZes with joint distribution dF Z (zO)' then the -0 posterior distribution of the Zo's among the survivors to time does not depend on the underZying hazard t and its expZiait form is given by (3,6). A(t)" additionaZZy" the the "7 , • '.0 s , g1.-ven ZO's If are mutuaZZy independent" then t, a!'e aZso mutuaHy independent. The general multipZiaative model of hazard rate function can be defined as (3.9) or more specifically as k ACt; • !oJ = A(t)n g(zo') . i=l ~ (3.10) A special case, the so called muZtipZiaative exponentiaZ model A(t; ~.n) =A(t)exp( '-v k L (3.z0') '1~ 1= 1 , (3.l0a) is in common use (e.g. Cox (1972)). General multiplicative models will not be discussed in detail, though special cases will occasionally be used for comparisons. 4. SOME APPLICATIONS IN CLINICAL TRIALS Consider a simple clinical trial in which only two groups are distinguished: • control and experimental. zo = Let {o.1 ifif control experimental Suppose that the patients are assigned to these groups in the initial ratio -8- Control : Experimental = p; (1 -. p) = c Then the prior distribution of Zo (p, 1 .. P :> 0) , among the survivors to time t is pr{Zo =O} =p =c(l +c) ..l 4.1. and pr{Zo =1} =l-p = (1 +c)-l ",." Models with additive hazard function Suppose that the hazard rate.has the additive form (4.1) Thus, (from (3.4» (4.2) and 1 Ez [ST(tj Zo)] = I ST(tj zo)Pr{Zo = zo} o zo=o = (1 +c)-lexp[-A(t)]{c +exp[-H(t)]} • The posterior probabilities among the survivors to time (4.3) tare pr{ZO = 0 It} = dc + exp [-H(t)] r 1 , • and Pr{Zo =llt} =exp[-H(t)]{c +exp[_H(t)]}-l Their ratio is Pr{Zo = 0 It} Pr{Z = 1 rn= R(t) -c'exp!H(t)], (4.4) or log[R(t)/c] =H(t). In particular, when (4.5) h(t) =cx, log[R(t)/c]=cxt, (4.6) Estimation and fitting. Suppose that at time t = 0, there are N OO and in the control and experimental groups, respectively. c =NOO/N 10 ' N lO individuals Clearly Suppose that there are no new entries or withdrawals during . ' -9- the observation period, so that we may consider the NOO and NIO individuals as two 'cohorts' observed over a certain period, Let NOt and NIt '" be the numbers, and POt = NO/N OO and '" = Nlt/N - the corresponding proportions of survivors to time PIt IO I. in the control and experimental groups, respectively. estimated R(t) t Then the and is clearly, R(t) _ NOt / -c- - NIt NOO _ POt - p It (4.7) , ~ so that (from (4.5) and (4.7)) (4.8) and in particular (for (4,6)) log (~Ot/PIt) =at. (4.9) Note that the estimated relative risk is the ratio 4.2. '" '" POt/PIt "reZative survivaZ", might be thought of as Models with proportional hazard rates In a simple form of multiplicative model, we assume A(t; zO) = A(t)e SzO (4.10) , so that A(t; l)/A(t; 0) = e S = e e >0 , (4.11) - the hazard rate in the experimental group is proportional to that in the control group, Then = exp[-A(t)e SZO ] (4.12) and E [STet; Zo)]=(l +c)-lexp[-A(t)]{c +exp[(l - e)1\(t)]} , Zo (4.13) -10- The posterior probabilities among the survivors to time tare pr{ZO =olt} = dc +expICl .. SlA(t)]}-l, and pr{ZO =llt} = exp[(1-6)J\(t)]{c +exp[(l-SJA(t)]}-11 so that 10g[R(t)/c] = (1 - S)A(t) • Note that where ACt) = A, (4.14) (4,14) takes the form 10g[R(t)/c] = (l .. SlAt =at 1 C4.l5) which is essentially the same as (4.9), Therefore, (4.9) cannot be used to infer about the appropriateness of the additive model that A(t) ACt; zO) = A(t) + az O' is not constant, unless it can be assumed Inference about proportional hazard rate model, If the mortality data are complete, and no parametric function for A(t) is assumed, ACt) can be estimated from the mortality data in the control group, using the formula " A(t) = where t! 1 i l j=l 1 N . 00 -. J + 1 for t!1 st <t!1+ 1 ' (4.16) is the ith ordered time at death (Nelson (1972)). In view of (4.7) and (4.14), one may study (graphically) the relation 10g(POt/P It) + (1 - S) ~(t) , (4.17) to investigate whether a multiplicative model (4.10) is appropriate in a given clinical trial. ., -11- S. fOSTER lOR DISTRIBUTION$: TIME DEPENDENT CONCOMITANT VARIABLES. Continuous variables which might be thought of as risk or aging factors for mortality are often time dependent. We will restrict ourselves to the case of a single concomitant ,.I variable Zt z. Let the value of Zo Zo denote the value of which would be reached at time survival- to that time. (i) If Zo Z at time t =0, and t, assuming For example, denote initial age (at t = 0), then at time t, the age of an individual is clearly if he is alive at time (ii) t. Suppose that t represents age and cholesterol (or blood pressure). z the level of It is often assumed that z is linearly related to age, that is Zt = Zo + St , where Zo is the initial level of would be the value of z at time z in an individual, and t Zt in the same individual if he had not died. (iii) It is often assumed that concentrations of certain cell constituents increase exponentially with time (e.g. Arley (1961)). Zo is the initial concentration of z, then at time t, If we have (for the same individual), can be a function of In general case, n' additional parameters, = (n l , ""~)' t, and m say, Zt=W(t; zo' n) • We may consider Zo and the n's to be continuous random (S .1) -12- variables with the joint density f z O,n rate function at time t (zo' n). is a function of Further, the hazard Zt (5.2) In particular, the general additive model is of the form A(t ; Zt) = A(t) + azt ' Thus the survival function (5 .4) STet; Zt) is equal to (5,5) where A*(t; zo' n) = r A*(U; zo' n)du . o The posterior joint density of time t Zo and n among the survivors to is (5.6) where the (m+l)-tuple integral in the denominator of (5.6) over the parameter space nm+1 , represents the average survival function, EZ n[Sr(t; ZO' n)]· 0' ..... The posterior density of Zt among the survivors to age t can be obtained by applying the transformation n's, n=n Zt n= n Zo = ~(t; zo' n), with inverse = ~ -1 (t; Zt' n), and integrating out the giving J r fZo,nlt[~- 1 (t; Zt' n) It] fZtlt(Ztlt) = J~" m Id~-ll dZ dn (5.7) t In further generalization, one may consider a random vector of concomitant variables, ~, This would be a natural multivariate -13- extension of the model just discussed, In practice, however, the technical problems become rather difficult. 6. APPLICATIONS IN EPIDEMIOLOGY It is often assumed (though not fUlly established) that there is J a tendency for serum cholesterol level to increase with age for a normal (healthy) individual. Studies of such relations require re- peated measurements on the same individuals, under specified conditions, and over long period of time - they are difficult (and costly) to obtain on a mass scale. The available data are usually cross-sectional population data. It has been shown (e.g. Lewis et.dZ. (1957), Carlson and Lindstedt (1968)) that the distribution of serum cholesterol in each age group is approximately normal, and that a third order polynomial (or linear) regression function, of serum cholesterol on age, for females, and quadratic - for males. is not unreasonable to fit. We now show that these 'posterior' results are in agreement with certain simple 'prior' assumptions. Let Zt denote the level of cholesterol at age t, and suppose that (6.1) Zt =Zo +BljJ(t) , where Zo (the initial level of cholesterol) and '!Jet) 2 2 ZO'" N(1;O' a O)' B.... N(B. a l ). is a certain (specified) function of t. normally distributed random variables: and B are independent Suppose that the hazard rate function is of the additive form A(t; Zt) = A(t) +aZ t = A(t)+azo+aljJ(t)b.=A*(t; zO' b). where a is a constant. (6.2) -14- Note that with ).,*(ts zo' b) zOl = zo' z02 =b, is of the same fom as (3.2) for hI (t) =a, h 2 Ct) =all/(t), from a different biological situation, distribution of Zo and B, k =2 though it arises To evaluate the posterior we can apply the results obtained in Section 3. We have HI (t) =at, H2 (t) =af1JJCU)dU. Recall that the moment o X with mean generating function of a normal variable variance a 2 is ~ and 1 2 2 MX(S) = exp(~s) exp(~ s ). Thus and (6.3) Prom (3.8), the corresponding posterior densities of Bit Zolt and are 1 2 2 2 t ) exp(-r;oat)exp(~Oa = 1 v'2iT 0 0 exp{- ~[Zo 20 (1;;0 _ao~t)]2}; (6.4) 0 - this is the PDP of a normal variate with mean 1;;0 - ao~t and variance 122 exp[-aH2(t)]exP[~1(H2Ct)) ] = 1 IiiT 0 1 exp{ _ ~[b - (a ... 0~H2 (t))]2} . 20 . 1 - this is also the PDP of a normal variate with mean (6.5) IS - 0~H2 (t) 1 and -15- variance The joint posterior PDP (6.4) and (6.5), f BltlZOI bit) is the product of Z0' Zt=ZO+Bh (tL where Zo and Bare 2 But since independent normal variates, then the posterior distribution of ~I among the survivors to time t, Zt is normal with mean (posterior regression function) (6.6) E(Zt1t) =E(Zolt)+[1fi(t)]2var(Blt), and variance 2 Var(Zt!t) =Var(Zolt) + [1jJ(t)] Var(Blt). (6.7) SpeaiaZ aases (a) Suppose that Zt = Zo + 81jJ(t) , where 8 is a constant. Then (from (6.6)) the posterior regression function is E(Zt! t) = (/;O In particular, when E(Zt~t) when cxo~t) (6.8) + 81jJ(t) . 1jJ(t) = t 2 = /;0 - (cxo - 8)t o .,. "linear; (6.9) V(t) = t 2 2 2 E(Zt l t) - /;0 - cxoot + 8t ••• quadratia; when 1jJ(t) = t (6.10) 3 2 + Qj.Jt 3 ... .... nub'"v ....n" E ( Zt •l t ) = /;0 - cxoot (6.11) etc. (b) Suppose that Zt = Zo + B1jJ(t), independent random variables. h (t) = cxt), we 2 have H (t) = 2 but both Now, however)when 1 2 ~t, Zo and 1jJ(t) =t Bare (or and so 2 122 E(Zt 1t) = /;0-cxoot+(8-~olt)t = 2 123 /;0-(cxoO-8)t-?Olt - this is also aubia (as for 1jJ(t) =t 3 with (6.12) B constant), but with -16- different coefficients, Similarly, for of the form ~(t) =t 2 , the posterior regression function is E(Zt 1t) =AO +Alt +~~+A4t4, etc, Zt = ZOlPl (t) + B~2 (t) , The more general form of the relationship, allows for a variety of posterior regression functions - the mathematics is straightforward, As has already been mentioned, we can usually observe the posterior, but almost never the prior, distribution and regression function, However, assuming that the hazard rate function, A(t; Zt); is of additive form (6.2), the following information about 'prior' distribution of Z, and regression of Z on t, can be deduced from the available information. (i) and Bit If the posterior distribution of Zt1t is normal, and are independent, then the distributions of Zolt for those values of t, and Zolt Bit where observations are available, are also normal (by Cramer's Theorem - see, for example, Mathai and Pederzo1i (1977), p. 6). By inversion of (3.7), pendent normal variables, and so (ii) Zt Zo and B are also inde- is normal. If the posterior regression function of polynomial form Zt\t on t is of the prior regression function is also of the polynomial form, but not necessarily of the same order; it depends on further assumptions about the stochastic nature of the prior regression coefficients. To illustrate that our results can agree with observations, we present in Fig. 1 (taken from Lewis et.aZ. (1957), CirauZation, 1£, p. 236), based on cross-sectional data on cholesterol level as function of age. The authors fit a third order polynomial posterior -17- regression to the female data~ while male data are better fitted by a quadratic regression (solid lines), • ) ,-.. 280 ~ ~ C/) 260 ,..J :E: • 0 0 ....-l c::: 240 ~ • • 0.. . ~ 220 '-' ,..J .~ ~ ~ Eo-< • MALES 200 o FEMALES C/) ~ ,..J g u ~ c::: ~ C/) 160 AGE IN YEARS Fig. 1. Mean serum cholesterol level by age and sex (From: et.al. (1957) ~ Cil'auZ,ation !&.~ p. 236). Lewis ACKNOWLEDGMENTS I would like to thank my discussion. husband~ Dr. N,L. Johnson for a helpful -18- REFERENCES ARLEY, Nt (1961). Theoretical analysis of carcinogenesis, Berkeley Symposium Vol, i, 7....17, ' Fourth BYAR, D.P, and CORLE, D,K, (1977), Selecting optimal treatment in clinical trials using covariate information, J t Chron. Dis, 30, 445-449. --- CARLSON, L.A. and LINDSTEDT, S, (1958). The Stockholm prospective study. 1. The initial values for plasma lipids. Aata. Med. Saand., Suppl. ill, 1-135, COX, D.R. (1972). Regression models of life tables (with discussion). J. Roy. Statist. Soa. Ser B. ~, 187-220. t ELANDT-JOHNSON, R.C. (1976). A class of distributions generated from distributions of exponential type. Nav. Res. Log. Quart. ~, 131-138. KAY, R. (1977). Proportional hazard regression models and the analysis of censored survival data. Appl. Statist. £2, 227-237. LEWIS, L.A., OLMSTED, F. et.al. (1957). Serum lipids level in normal persons. Findings of a cooperative study of lipoproteins and arteriosclerosis. Ciraulation 12, 227-245. MAlHAI, A.M. and PEDERZOLI, G. (1977), Charaterization of the Normal Probability ~, J. Wiley &Sons, New York. NELSON, W.A. (1972). Theory and application of hazard plotting for censored failure data. Teahnometrias,!i, 945-966. PRENTICE, R.L. (1973). Exponential survivals with censoring and explanatory variables. Biometrika, 2Q, 279-288. t