Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
doi: 10.1111/j.1467-9469.2006.00544.x © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 34: 17–32, 2007 Direct Modelling of Regression Effects for Transition Probabilities in Multistate Models THOMAS H. SCHEIKE Department of Biostatistics, University of Copenhagen MEI-JIE ZHANG Division of Biostatistics, Medical College of Wisconsin ABSTRACT. A simple and standard approach for analysing multistate model data is to model all transition intensities and then compute a summary measure such as the transition probabilities based on this. This approach is relatively simple to implement but it is difficult to see what the covariate effects are on the scale of interest. In this paper, we consider an alternative approach that directly models the covariate effects on transition probabilities in multistate models. Our new approach is based on binomial modelling and inverse probability of censoring weighting techniques and is very simple to implement by standard software. We show how to do flexible regression models with possibly time-varying covariate effects. Key words: binomial modelling, inverse-censoring probability weighting, multistate modelling, regression effects, transition probability 1. Introduction Multistate models are very useful for describing complicated event history data. The models have been widely used in medical research where they provide a framework for approximating complicated time-dependent outcomes and to describe the time of events in many medical settings. The graphical presentation of the models helps facilitate the interface between the statistician and subject matter researchers. Our key motivation for undertaking this work was to describe covariate effects for bone marrow transplant (BMT) patients with leukaemia. For these patients the health status can be described by the nine-state model in Fig. 1. All patients start being alive and in complete remission (CR) after BMT (state 0). Patients can die before relapse (in CR) (state 1) or relapse (state 2). For relapsed patients, a salvage treatment of donor lymphocyte infusion (DLI) will be given, although patients may die while waiting for DLI (state 3). DLI is a highly effective treatment in restoring CR. Patients receiving DLI (state 4) may then die with DLI in relapse (state 5) or achieve CR again (state 6). Finally, patients can die in CR after DLI (state 7) or relapse after second remission (state 8). The probability of being in state 0 at time t after BMT is the leukaemia-free survival (LFS) probability which is a standard measurement for the efficacy of an allogeneic BMT. LFS can be estimated by a Kaplan–Meier estimator, where the event is death in first CR (state 1) or first relapse (state 2) and patients who survive without disease (state 0) are censored. Existing regression methods for time-to-event data, such as Cox’s proportional hazards model or Aalen’s additive model, can be applied to model the covariate effects on LFS (Cox, 1972; Aalen, 1980, 1989). For our nine-state model it is of interest, as a way of summarizing the combined treatment effect, to estimate and model the probability of the current LFS (CLFS), that is, the probability of being in state 0 or 6 at time t, denoted as CLFS(t). Note 18 T. H. Scheike and M.-J. Zhang Scand J Statist 34 Fig. 1. A nine-state model that describes the treatment of leukaemia patients receiving bone marrow transplants, see text for details. that this probability differs from the prevalence of LFS, which is the probability of being in state 0 or 6 among those alive (Pepe et al., 1991). The standard approach, that we describe in the next section, was applied to the same data and problem in Klein et al. (2000a). This approach models and estimates all transition intensities of the multistate model and then use the product-limit estimator to estimate CLFS(t), (see e.g. Andersen et al., 1993). Klein et al. (2000b) also discuss some alternative estimators based on an extension of a result by Pepe (1991). In biomedical studies, we often need to assess covariate effects on certain events of interest or on the transition probabilities. This can be achieved by fitting regression models for all related transition intensities as described above, but although the covariate effects can be described by direct computation of the quantities of interest they are difficult to summarize as is carried out by any direct regression method. In this paper, we introduce a new simple approach to model the transition probabilities in a multistate model using an inverse censoring probability weighting technique. A specific multistate model that has received considerable attention is the simpler competing risks model, where one often wishes to examine the covariate effects on the cumulative incidence probability of dying from one of the causes before time t. For this model, Cheng et al. (1998) studied the estimation of cumulative incidence probabilities based on Cox’s regression model for each cause-specific hazard function. Shen & Cheng (1999) considered a special additive risk model and Scheike & Zhang (2003) used a flexible Cox–Aalen model. However, it is difficult to summarize covariate effects on the cumulative incidence probability when fitting and modelling the cause-specific hazard regression models. Fine & Gray (1999) developed a direct regression approach for the subdistribution hazard function based on a Cox model. Sun et al. (2006) considered this approach for a flexible model proposed by Martinussen & Scheike (2002). Unfortunately, it is not possible to generalize the subdistribution hazard approach to complex multistate models, where the aim is to estimate quantities such as CLFS(t). A recent alternative approach by Scheike et al. (2006) considered a direct binomial regression technique for the competing risks model based on inverse censoring probability weighting that we here extend to the multistate setting. The extension to multistate models raises many issues to deal with and we here consider direct modelling of entering a transient state or an absorbing state. Recently, a ‘pseudo-observation’ approach has been proposed and studied by Andersen et al. (2003) and Klein & Andersen (2004) to model the transition probability directly, which assesses the covariate effects on the probability of interest. The proposed approach provides the regression effects on a grid of time points and it needs to be extended to all jump time points. © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Scand J Statist 34 Regression for multistate models 19 Let Pij (s, t; x) be the probability that a subject moves from state i to j from time s to t given a set of covariates x. We are interested in modelling the probability SL (t; x) = j∈L P0j (0, t; x), where L is a collection of states. Note that this implies that all subjects starts out in state 0 at time 0. As a special case, CLFS(t; x) = S0, 6 (t; x). In this paper, we introduce a new simple direct modelling approach using binomial regression methods. We specify a semiparametric flexible model and for that purpose we sometimes partition the covariates into x that models the effects non-parametrically as time varying and z that models the effects parametrically. This means that one should condition on both x and z when specifying semiparametric regression models but we do not always make this precise as above where we only conditioned on x. Formally, the regression models we consider are on the form h(SL (t; x, z)) = g1 ((xT (t)), g(z, , t)), (1) where h is a known link function, x is a p-dimensional covariate vector, (t) is a pdimensional time-varying regression coefficient function, g is a known function that models the q-dimensional covariate vector z, is a set of regression coefficients related to z and g1 is a known function that models the relationship between the regression effects of x and z. One example of such a model is SL (t; x, z) = 1 − exp(−{xT (t)} exp (zT )), that in the competing risks setting with L giving one cause of death is equivalent to having the subdistribution on Cox–Aalen form. Here some effects are allowed to vary freely with time and some are modelled parametrically by multiplicative effects on the subdistribution hazard. The new method is very simple to carry out, and aims directly at assessing covariate effects on any transition probability of interest. One advantage of our approach is that the regression effects can be fitted by standard software. Standard software will lead to conservative estimates of the uncertainty but in our experience based on extensive simulations appears to estimate the uncertainty well. One drawback of our direct approach is that one needs to estimate the censoring distribution for each individual. This is often performed by using the Kaplan–Meier estimator for the censoring distribution. Robins & Rotnitzky (1992) building on the semiparametric efficiency theory developed by Bickel et al. (1993) showed that regression modelling of the censoring distribution improves efficiency of the inverse probability weighting technique, even if the censoring is independent of the covariates. There is information in each censored observation, no matter the relation of covariates and censoring. We here consider only the simple case where the censoring probabilities are modelled without the use of covariates, but the method may extended to deal with more flexible models and then the efficiency will be increased. The paper is structured as follows. In section 2, we outline the standard approach based on modelling all transition intensities and the product-limit estimator. Section 3 presents the direct modelling of the transition probabilities based on binomial regression using inverse probability censoring weighting techniques. Section 4 contains worked examples that illustrate the models for S0, 6 (t; x) in the complex multistate model from Fig. 1. The appendices contain the derivations and explicit expressions for the variance estimators. 2. Modelling transition intensity approach – product-limit estimation We consider the motivating example in Fig. 1 to illustrate this approach, and start by ignoring the covariates. This is a multistate model with nine states, and we let the transition intensities from state i to j be denoted as ij (t), for i = / j and i, j = 1, . . . , 9, let ii (t) = − j =/ i ij (t), and © Board of the Foundation of the Scandinavian Journal of Statistics 2007. 20 T. H. Scheike and M.-J. Zhang Scand J Statist 34 t the cumulative intensity is denoted as Aij (t) = 0 ij (s) ds. Let A(t) be the 9 × 9 matrix with entries Aij (t). Note that some of these intensities will be 0 because such transitions are not possible. We let Â(t) denote the estimator of A(t) based on Nelson–Aalen estimates in the case of no covariates (see Nelson, 1969, 1972; Aalen, 1975, 1978). Let {i (t), t ≥ 0} be the health history for the ith individual giving the state for the ith subject at time t, and let Phj (s, t) = P{(t) = j |(s) = h}, for s ≤ t and h, j = 1, . . . , 9, be the transition probabilities. It is clear that CLFS probability S0, 6 (t) = P00 (0, t) + P06 (0, t). The transition probabilities from time s to t organized in the 9 × 9 matrix P(s, t) can be written as a product integral P(s, t) = {I + dA(u)} u∈(s, t] (2) that is estimated by P̂(s, t) = {I + dÂ(u)}, u∈(s, t] (3) where I is the 9 × 9 identity matrix. An estimator of the covariance matrix of P̂(s, t) follows from the martingale decomposition as in Andersen et al. (1993) under the Markov model assumption, but is omitted here. Recently, Datta & Satten (2001, 2002) showed that the Aalen & Johansen (1978) estimator (3) of transition probabilities is valid for non-Markov models and under state-dependent censoring. It has been pointed out that one can estimate P06 (s, t) as the difference of two Kaplan– Meier estimators, Ŝ1 (t) − Ŝ2 (t), where S1 (t) and S2 (t) are the probabilities of being in states of {0, 2, 4, 6} and {0, 2, 4} respectively (see Klein et al., 2000b for a result by Pepe, 1991). Pepe (1991) suggested variance estimators based on a moment approach. Pepe’s model-free approach works for estimating the probability of being in any of the transient states, but it does not work for the absorbing states, and it relies on the particular structure of model where each state is visited at most once. When covariates affect the outcome events of interest, one may extend the transition probabilities (2) to a regression setting P(s, t; x) = {I + dA(u; x)}. u∈(s, t] Here A(u; x) denotes the cumulative intensity matrix given the covariates x. Aalen et al. (2001) studied the product estimator based on an additive regression model. This approach should be based on well-fitting models for all transition intensities. It is therefore crucial to carefully investigate the goodness-of-fit of all suggested models. Note that multiple sets of covariates and multiple type of regression models could be fitted for different transition intensities. Even though estimates of the transition probabilities can be obtained by simply plugging in an estimator of A(u; x) one problem is that it may be hard to summarize the effect of specific covariates on a specific transition probability. 3. Direct binomial regression for complex multistate models 3.1. Simple direct estimation We here propose a simple procedure aimed at assessing the effect of covariates on the CLFS in the DLI study (Fig. 1). The proposed method can be applied to model any transition probability directly, SL (t; x), for complex multistate models, but for simplicity we start by considering S0, 6 (t; x) and the log-link function as well as the case without a parametric term. © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Regression for multistate models Scand J Statist 34 21 Current LFS is the probability of being in state 0 or 6 at time t, i.e. S0, 6 (t; x) = P00 (0, t; x) + P06 (0, t; x). We consider a non-parametric regression model for S0, 6 (t; x) with log-link function, S0, 6 (t; x) = exp(−xT (t)), (4) where (t) is a p-dimensional vector of regression effects and x = (1, x1 , . . . , xp−1 ) where the first element corresponds to a baseline. The logit transformation is another commonly used link function that can be applied here as well. Note that both states in S0, 6 (t; x) are transient. Recall that {i (t), t ≥ 0} is the health history for the ith individual giving the state for the ith subject at time t. Suppose further that Ci are right-censoring times independent of {i (s) : s ≥ 0}, with survival distribution GC (·). This assumption may be relaxed to the censoring and state-transition process being independent given covariates, and in that case we need to model the censoring probability given the covariates. For simplicity we assume that the censoring is independent of the covariates. Assume that we have n independent and indentically distributed (i.i.d.) observations over the time period [0, ], and that all subjects start in state 0 at time 0. To estimate (t) in the case with a known censoring distribution, we simply consider the score equations n I (i (t) ∈ {0, 6})I (Ci > t) − S0, 6 (t; xi ) = 0, Di (t)wi (t) GC (t) i =1 where Di (t) = ∂S0, 6 (t; xi ) = − exp −xiT (t) xi = −S0, 6 (t; xi )xi , ∂(t) and wi (t) are weights. The estimating equation is motivated by the fact that ⎫⎤ ⎡ ⎧ ⎬ ⎨ I ( (t) ∈ {0, 6})I (C t) I (i (t) ∈ {0, 6})I (Ci > t) > i i i (s) : s ≥ 0 ⎦ = S0, 6 (t; xi ). = E ⎣E E ⎭ ⎩ GC (t) GC (t) The weights are assumed to be independent of the parameters and deterministic but one may extend the results to weights that depend on the covariates as well as the parameters just like for standard generalized linear models (GLM). The optimal weights are the inverse variance of the responses that can be expressed as a function of the parameters. We estimate (t) by solving score equation U(t, (t)) = 0 for all t ∈ [a, ], where n I (i (t) ∈ {0, 6})I (Ci > t) U(t, (t)) = − S0, 6 (t; xi ) , Di (t)wi (t) (5) Ĝ C (t) i =1 where Ĝ C (t) is the Kaplan–Meier estimate of the censoring distribution. The lower limit a in the interval where we estimate (t) is needed because the regression coefficients cannot be identified close to 0 because S0, 6 (0; x) = 1. Note that transitions into absorbing states are considered as censored individuals when estimating the censoring distribution. We denote the observed response as Ri (t) = I (i (t) ∈ {0, 6})I (Ci > t). Note that the score equations will lead to a piecewise constant estimator for (t) that only changes its values at censoring times and when the responses Ri (t) change their values. We outline the main arguments of deriving the variance estimation. We start by noting that the score for known censoring weights is an i.i.d. process n n Ri (t) = − S0, 6 (t; xi ) = Ũ(t, (t)) Di (t)wi (t) i (t). GC (t) i =1 i =1 © Board of the Foundation of the Scandinavian Journal of Statistics 2007. 22 T. H. Scheike and M.-J. Zhang Scand J Statist 34 Following a well-known martingale integral representation (Gill, 1980 or Andersen et al., 1993, formula 4.3.4), the difference between U and Ũ is asymptotically equivalent to ⎫ ⎧ n ⎨ n n Rj (t) ⎬ t dMiC (u) = (t) ≈P Dj (t)wj (t) i (t), ⎩ = GC (t) ⎭ 0 Y• (u) i =1 j 1 i =1 where Yi (u) = I (Xi ≥ u), Y• (u) = i Yi (u) and Xi is the observed time for ith individual (the minimum of the censoring time and the time to entering an absorbing state), MiC (u) = I (u ≥ Xi , Ci = 1) − u Yi (s)c (s) ds 0 are the martingales for the censoring distribution, Ci = I (Xi = Ci ) indicate the censored individuals and C (·) is the hazard function for the censoring time. Here ≈P means equal up to op (n−1/2 ) and uniformly in t ∈ [a, ]. Define I(t, (t)) = n wi (t)(Di (t))⊗2 , i =1 where for a p × 1 vector a, a⊗2 = aaT . By a Taylor series approximation ˆ (t) − (t) ≈P {I(t, (t))}−1 U(t, (t)) ≈P n n {I(t, (t))}−1 {i (t) + i (t)} = Wi (t), i =1 Wi (t) i =1 Ŵi (t). can be estimated by plug-in estimators, denoted as where Under regularity conditions as in Scheike et al. (2006) and by similar arguments, it follows √ that the asymptotic variance of n{ˆ(t) − (t)} can be estimated by ˆ (t) = n n ⊗2 . Ŵi (t) i √ It further follows that n{ˆ(t) − (t)} for t ∈ [a, ] converges to a Gaussian process and that the resampling version of the residuals n √ n Vi Ŵi (t) i =1 has the same limiting distribution conditional on the data for V1 , . . . , Vn independent standard normals. Resampling techniques can thus be applied to get an approximation of the entire process as in Lin et al. (1994). For a given x, the predicted CLFS probability can be estimated by Ŝ0, 6 (t; x) = exp −xT ˆ (t) . Then, it follows that Ŝ0, 6 (t; x) − S0, 6 (t; x) = −S0, 6 (t; x) xT {ˆ(t) − (t)} + op (n−1/2 ) = −S0, 6 (t; x)xT n Wi (t) + op (n−1/2 ). i =1 √ The asymptotic variance of n Ŝ0, 6 (t; x) − S0, 6 (t; x) can be estimated by n 2 ⊗2 ˆ S (t; x) = n Ŝ0, 6 (t; x) xT x. Ŵi (t) 0, 6 i =1 © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Regression for multistate models Scand J Statist 34 23 Remark 1. An absorbing state is dealt similarly. Let Ti,∗k be the first time for entering state k for the ith subject (with the convention that this is ∞ if this never happens, and in the situation without censoring). Let state k be an absorbing state. We are interested in estimating and modelling the transition probability P0k (0, t). Note that for censored data, I {i (t) ∈ {k}}I {Ci > (Tik∗ ∧ t)} is always computable for all time t. As, ⎫⎤ ⎡ ⎧ ⎬ ⎨ I { (t) ∈ {k}}I {C (T ∗ ∧ t)} I {i (t) ∈ {k}}I {Ci > (Tik∗ ∧ t)} > i i ik i (s) : s ≥ 0 ⎦ = E ⎣E E ∗ ∗ ⎭ ⎩ GC (Tik ∧ t) GC (Tik ∧ t) = P0k (0, t; xi ), we consider the score equation n I {i (t) ∈ {k}}I {Ci > (Tik∗ ∧ t)} − P0k (0, t; xi ) = 0, Di (t)wi (t) U(t, (t)) = Ĝ C (Tik∗ ∧ t) i =1 (6) where Di (t) = ∂/(∂(t))P0k (0, t; xi ). In the DLI data example without covariates, P01 (0, t) is the cumulative incidence probability of dying from CR (without relapse), and can be estimated by solving an unweighted version of (6), thus giving 1 1 R (t) n i =1 i n P̂01 (0, t) = with Rik (t) =(I {i (t) ∈ {k}}I {Ci > (Tik∗ ∧ t)})/ Ĝ C (Tik∗ ∧ t). Let Ĝ C be the Kaplan–Meier estimator where individuals who stay in state 0 at the end of the study are considered as censored individuals. Then, following Efron’s (1967) redistribution to the right approach, P̂01 (0, t) is identical to the standard Aalen–Johansen estimator t AJ (0, t) = (7) P̂00 (0, u−) dÂ01 (u). P̂01 0 Remark 2. To combine transient and absorbing states we simply add up the contributions for different terms. Consider a transient state j and an absorbing state k then we can use the modified responses Rik (t) defined above and Rij (t) = (I {i (t) ∈ {j}}I {Ci > t})/ Ĝ C (t). A score equation for the parameters of Sj, k (t; x) = P0j (0, t; x) + P0k (0, t; x) = exp{−xT (t)} now becomes n Di (t)wi (t) Rij (t) + Rik (t) − Sj, k (t; xi ) = 0, i =1 where Di (t) = ∂/(∂(t))Sj, k (t; xi ). 3.2. Semiparametric modelling We now consider the semiparametric model (1). For simplicity, we consider the motivating DLI study (Fig. 1) and with the aim of making a semiparametric model for the CLFS, S0, 6 (t; x, z). Let (x, z) be the covariates, where x = (1, x1 , . . . , xp−1 ) is a p × 1 vector and z is a q × 1 vector. We consider two general semiparametric regression models h{S0, 6 (t; x, z)} = − xT (t) exp zT , (8) h{S0, 6 (t; x, z)} = − xT (t) + (zT )t , © Board of the Foundation of the Scandinavian Journal of Statistics 2007. (9) 24 T. H. Scheike and M.-J. Zhang Scand J Statist 34 where h is a known link function. Both semiparametric models allow that x have time– varying effects and forces z to have constant effects. The estimation procedures for fitting the two semiparametric models are similar. In this paper, we give the estimating procedure for model (8) with log-link function. Any other standard link function can be used. The partial derivatives of S0, 6 (t; xi , zi ) with respect to the parameters of (t) and are ∂S0, 6 (t; xi , zi ) = −{S0, 6 (t; xi zi )} xi exp ziT ∂(t) ∂S0, 6 (t; xi , zi ) = −{S0, 6 (t; xi , zi )} xiT (t) exp ziT zi . D, i (t, (t), ) = ∂ D, i (t, (t), ) = We organize the derivatives into an n × q matrix D (t, (t), ) and an n × p matrix D (t, (t), ). The score equations for (t) at time t and equals = 0, U (t, (t), ) = DT (t, (t), )W(t) R(t) − S(n) (10) 0, 6 (t, (t), ) U (, (t), ) = = 0, DT (t, (t), )W(t) R(t) − S(n) 0, 6 (t, (t), ) dt 0 (11) where S(n) 0, 6 (t, (t), ) is the n × 1 vector of S0, 6 (t; xi , zi ), R(t) is the n × 1 vector of adjusted responses Ri (t)/ Ĝ C (t) (Ri (t) = I (i (t) ∈ {0, 6})I (Ci > t)), and W(t) is an n × n diagonal matrix with elements wi (t). We define ˆ (t) and ˆ as the solutions to U (t, ˆ (t), ˆ ) = 0 and U (t, ˆ (t), ˆ ) = 0 for all t ∈ [a, ]. These equations may be solved iteratively, and motivated as in Martinussen & Scheike (2006, p. 218). The (v + 1)th iteration step for is ˆ v + 1 = ˆ v + C−1 B , (12) where C = DT (t)W(t)H(t)D (t) dt, B = a H(t) = I − D (t)I−1 (t)DT (t)W(t) , DT (t)W(t)H(t) R(t) − S(n) 0, 6 (t) dt, a I (t) = DT (t)W(t)D (t). The (v + 1)th iteration step for (t) is v+1 ˆ v + 1 (t) = ˆ v (t) + I−1 (t)DT (t)W(t) R(t) − S(n) ˆ − ˆ v 0, 6 (t) − D (t) −1 = ˆ v (t) + I−1 (t)DT (t)W(t) R(t) − S(n) 0, 6 (t) − D (t)C B . (13) All quantities are computed with respect to (ˆv , ˆ v (t)). This iterative procedure is equivalent to solving a standard GLM type problem. √ Under regularity conditions as in Scheike et al. (2006) it follows that n{ˆ(t) − (t)} and √ n(ˆ − ) are asymptotically normal and their asymptotic variance can be estimated by ˆ 2 (t) = n n i =1 ⊗2 Ŵi 2 (t) , ˆ = n n Ŵi ⊗2 , i =1 respectively, where explicit expressions for Ŵi 2 (t) and Ŵi can be found in appendix A.1. Again, resampling techniques can be employed to obtain an approximation of the Gauss√ ian process that n{ˆ(t) − (t)} converges to, and it can be applied to construct confidence bands. © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Regression for multistate models Scand J Statist 34 25 For a given set value of covariates (x, z), we estimate the predicted CLFS function for model (8) as Ŝ0, 6 (t; x, z) = exp − xT ˆ (t) exp zT ˆ . Let (t; x, z) = exp zT ˆ xT Î−1 (t)Ŵi 2 (t) + xT ˆ (t) zT Ĉ−1 Ŵi . √ Then asymptotic variance of n Ŝ0, 6 (t; x, z) − S0, 6 (t; x, z) can be estimated by n 2 2 S ˆ S (t; x, z) = n Ŝ0, 6 (t; x, z) . Ŵ 0, 6 (t; x, z) S0, 6 Ŵi i 0, 6 i =1 Remark 3. The semiparametric models for the transition probability of an absorbing state is dealt similarly (see remark in section 3.1.) 3.3. Estimating the censoring distribution To do the inverse probability censoring weighting, one needs to estimate the unknown survival distribution of the censoring times. In the previous sections, we have considered a simple Kaplan–Meier estimator, denoted as Ĝ C , where individuals in any one of the transient states at the end of the study were considered as censored individuals. Using Ĝ C , the following property holds in the no-covariate setting, 8 P̂0k (0, t) = 1, for t ∈ [0, ]. (14) k =0 However, in the DLI data example, P̂00 (0, t) and P̂01 (0, t) are not identical to the standard Kaplan–Meier estimator for P00 (0, t) and the standard Aalen–Johansen’s estimator for P01 (0, t) given in (7) where only individuals in state 0 at the end of the study were considered as censored individuals. Now, we present an alternative estimating procedure for the censoring distribution so that the above property (14) holds and all transition probabilities equals the ‘standard’ estimators from the product-limit estimator. This is based on estimating the censoring distribution in a different fashion based on an extension of the redistribution to the right principle by Efron (1967). The proposed alternative censoring distribution estimation depends on the state we are interested in, and on the particular structure of the considered multistate model where all states are visited at most once. We now present an estimation procedure for S0, 6 (0, t) as an example. As I (i (t) ∈ {0, 6}) = I (i (t) ∈ {0}) + I (i (t) ∈ {6}), we estimate the censoring distribution based on the states 0 and 6 as described below. For I (i (t) ∈ {0}), one observes only Ti, 0 = (Ti,∗1 ∧ Ti,∗2 ) ∧ Ci and i = I {(Ti,∗1 ∧ Ti,∗2 ) ≤ Ci }. For censored individuals, I (i (t) ∈ {0}) is unknown for those t > Ci . Let Yi, 00 (t) = I (Ti, 0 ≥ t) be the at-risk indicator of the ith patient being in state 0 at time t, then Ri0 (t) = I (i (t) ∈ {0})Yi, 00 (t) is always computable. Note that Ci is censored by (Ti,∗1 ∧ Ti,∗2 ) for observing individuals stay in state 0, which can be estimated by a Kaplan–Meier estimator and denoted as Ĝ C00 (t). For the binomial outcome I (i (t) ∈ {6}) = I (Ti,∗6 ≤ t < Ti,∗E , i ≥ 6), where i indicates the final state where the ith individual stayed at end of study, Ti,∗E = (Ti,∗7 ∧ Ti,∗8 ). Let Ti, E = (Ti,∗E ∧ Ci ) which is of interest only if (Ci > Ti,∗6 , i ≥ 6). The corresponding risk set for observing I (i (t) ∈ {6}) is defined as Yi, 6 (t) = I {Ti,∗6 ≤ t ≤ Ti, E , i ≥ 6}. Then Ri6 (t) = I (i (t) ∈ {6})Yi, 6 (t) is computable for all time t. © Board of the Foundation of the Scandinavian Journal of Statistics 2007. 26 T. H. Scheike and M.-J. Zhang Scand J Statist 34 First, we re-configure the nine-state model (Fig. 1) to a five-state model where we combine states 0, 2 and 4 to a new initial state 0̃ and combine states 1, 3 and 5 to a new absorbing state 1̃ with underlying time T̃i,∗1 . We estimate the censoring distribution P(Ci > t) for Ri6 (t) as GC60 (Ti,∗6 )GC66 (t, t), where GC60 (t) = P(Ci > t, i ≥ 6), here the censoring process is censored by (T̃i,∗1 ∧ Ti,∗6 ), and GC66 (t, t) = P(Ci > t|Ti,∗6 ≤ t, i ≥ 6). GC60 (t) can be simply estimated by a Kaplan–Meier estimator, denoted as Ĝ C60 (t). To estimate GC66 (t, t), individuals with I {Ti,∗6 ≤ t, i ≥ 6) carry an initial weight of qi = Ĝ C60 (Ti,∗6 ). These weights need to be redistributed to the right only to those individuals entered the state 6 at time t. We thus suggest to use a weighted Kaplan–Meier estimator to estimate GC66 (t∗ , t), ! " C dN• 66 (u, t) ∗ Ĝ C66 (t , t) = ∗ 1 − C , u≤t Y• 66 (u, t) where for u ≤ t, N•C66 (u, t) = Y•C66 (u, t) = n i =1 n C66 Ni C66 Yi (u, t) = (u, t) = i =1 n i =1 n I (Ti,∗E ≤ u, i = 6)qi I (Ti,∗E ≥ u, Ti,∗6 ≤ t, i ≥ 6)qi . i =1 Now, we have estimated scaled binomial outcomes Ri0 (t) Ĝ C00 (t) + Ri6 (t) Ĝ C60 (Ti,∗6 )Ĝ C66 (t, t) which can be plugged into score equation (5) for the non-parametric regression model, and (10, 11) for the semiparametric model. Similar asymptotic results can be derived (see details in Appendix A.2). 3.4. Testing for non-parametric regression effects For simplicity, we consider model (4) and a simple hypothesis about significance. We consider the first l coefficients of the p non-parametric regression effects. To examine whether the risk factor, {x1 , . . . , xl }, had a significant effect on CLFS, we are interested in testing the null hypothesis H0 : 1 (t) = · · · = l (t) = 0, for all t ∈ [a, ]. Let C = (c1 , . . . , cl )T be an (l × p) contrast matrix where c1 is a p-dimensional vector of zeros except 1 for the lth element. Consider the test statistic = T C Q(t)ˆ(t) dt, (15) a where Q(t) is a weight function. Under H0 , C(t) = 0, then Q ˆ i (t) dt = C W(t){I(t, ˆ (t))}−1 ˆ i (t) + Ŵi (L), T ≈P C a i i which has asymptotic zero mean and the variance can be estimated by ! " Q ⊗2 ˆ = T C Ŵ (L) CT . i i © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Regression for multistate models Scand J Statist 34 27 Then, under H0 , −1 ˆ T ≈ χ2 . TT T q Alternatively, one could consider a test based on maximal deviation of the contrast process process from the null value, and resampling techniques can be applied to simulate the limiting distribution (Lin et al., 1994). Scheike & Zhang (2003) proposed two tests to examine whether a covariate had a timevarying effect. These tests can also be implemented here. Although these tests are useful, in practice, it will often suffice to simply visually inspect the estimated time-varying effects. 4. Worked example: DLI data We now return to the BMT data where we followed 614 patients who received allogeneic stem cell transplantation (SCT) for chronic myeloid leukaemia (CML) at the Hammersmith Hospital in London between February 1981 and February 2002. All patients achieved CR. A total of 166 patients were alive in remission (state 0) at the end of the study period; 202 patients died in remission (state 1); 246 patients relapsed (entered state 2) and 30 patients were alive in relapse (state 2) by the end of the study. Ninety-two patients died in relapse (state 3); 124 patients received (DLI) from original stem cell donor as a salvage treatment (state 4); 26 patients subsequently died without achieving remission (state 5) and 77 patients achieved remission (state 6). Finally, five patients died in remission (state 7) and eight patients relapsed (state 8). There were 64 patients in remission (state 6) by the end of the study. There were 371 (60%) patients who received human leucocyte antigen (HLA)-identical sibling transplant and 243 (40%) patients were treated with HLA-matched unrelated donor SCT. A total of 453, 138 and 23 patients were transplanted with early (first chronic phase), intermediate (accelerated phase or greater than or equal to second chronic phase) and advanced (blast phase) disease stages respectively. The mean (range) of patient age was 34 (4–60) years. A number of 254, 68, 208 and 86 patients received CsA + MTX + Other, CsA + Other, Campth/ ATG + CsA + Other, and T-cell depletion for graph-versus-host disease (GVHD) prophylaxis. The mean duration of disease (time from diagnosis to transplant) was 2.0 (0.14-14.51) years. Three hundred and fifty-one patients were male. To assess the possible covariate effect on CLFS, we first fit the non-parametric additive model (4) with time-varying effects, and performed tests (15) to examine which risk factor had significant effect on CLFS. We found that disease stage (intermediate/advanced versus early disease; p < 0.0001) and GVHD prophylaxis (three indicator variables with 3 d.f. test of p < 0.0001) had effect on CLFS. The time-varying regression functions ˆ k (t) with 95% confidence interval are shown in Fig. 2. As expected we found that patients with intermediate/advanced disease did worse than patients transplanted in early disease stage. For GVHD prophylaxis, Campth had a worse CLFS than CsA + ATG and the prediction might vary slightly with time, patients receiving CsA alone or T-cell depleted marrow had significantly worse CLFS the first 5–6 yr after transplant, but it improved the long-term disease-free survival probabilities. In the fully non-parametric model, age was found non-significant (p = 0.13). However, it turns out that the age effect is well described with a multiplicative effect as in the semiparametric model (8). Basically, the effect of age has similarities with the shape of the baseline effect and can thus be approximated well by a constant. We therefore fit a semiparametric model (8) with age in the multiplicative part of the model, and disease stage and GVHD prophylaxis in the additive part of the model. This leads to age being significant with ˆ age = 0.0102; ˆ age = 0.0024; and p < 0.0001, which indicated that older © Board of the Foundation of the Scandinavian Journal of Statistics 2007. 28 T. H. Scheike and M.-J. Zhang 2 Baseline Scand J Statist 34 Age (Per Year) Disease Stage (Med/Adv vs Early) 2 0.04 1 0.02 1 0.0 0 −0.02 0 3 6 9 12 0 0 Years since transplantation GVHD Prophylaxis (CsA vs CsA+ATG) 1 4 3 6 9 12 0 3 6 9 12 Years since transplantation Years since transplantation GVHD Prophylaxis (Campth vs CsA+ATG) GVHD Prophylaxis (T−Dept vs CsA+ATG) 1 3 0 0 2 −1 1 −2 −1 0 −2 0 3 6 9 12 Years since transplantation 0 3 6 9 12 Years since transplantation 0 3 6 9 12 Years since transplantation Fig. 2. Regression coefficients of S0, 6 (t) with 95% pointwise confidence intervals for the donor lymphocyte infusion data. patients have lower CLFS probability. This is also consistent with Fig. 2. The semiparametric model gave similar regression functions for the other effects. Even though the effect of disease stage shows a time-varying behaviour, it may be relatively well approximated by a constant multiplicative effect. We therefore also fit a model with age and disease stage in the multiplicative part and GVHD prophylaxis in the additive part. This leads to essentially unaltered effects for GVHD and the following multiplicative effects ˆ age = 0.0138; ˆ age = 0.0020 and ˆ stage = 0.6303; ˆ stage = 0.0427. Both effects being clearly significant. To address formally whether or not model (9) provides a good approximation one may consider extensions such as xT (t) exp(zT (t)) and make a formal test for constant effects of (t). Based on the semiparametric regression model we can estimate the predicted CLFS for a given patient characteristic and GVHD prophylaxis used with its confidence interval/band, which are extremely useful for physician and patient. 5. Discussion We have reviewed regression techniques for estimating probabilities in a multistate setting, and have suggested a new simple approach based on inverse-probability censoring weighting. The standard product-limit estimator that is based on modelling of the cause-specific hazards is very useful but does not give direct estimates of the regression effects on the transition probabilities. Modelling of all intensities will lead to efficient estimates of the transition probabilities, and then modelling of the censoring distribution is not needed. When basing estimation on the underlying intensities we also know well how to deal with left-truncation issues. To get regression estimates that relates directly to the scale of interest, such as for example, transition probabilities we suggested a new simple direct approach. One advantage of the approach is that it is very easy to implement and is very simple to estimate flexible regression effects. One drawback of the method is that the censoring distribution need to be modelled, but in contrast to the intensity-based procedure one does not need to model other © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Scand J Statist 34 Regression for multistate models 29 intensities. The use of this was illustrated with the BMT example where several covariates showed an important time-varying performance. Even though the method is simple and has appeal there are also various unsettled and unsatisfactory properties. The simple binomial regression technique is not efficient and will lead to estimates that need not satisfy natural constraints (such as k P̂0k (0, t; x) = 1 in the regression setting). The suggested modelling approach can in principle use time-dependent covariates, but one should be careful in doing so because of the difficulties in interpreting such regression models. Another issue that needs further study is how to include lefttruncated survival data or more generally data where all subjects need not start in the initial state. A related issue is how to estimate general transition probabilities such as Pjk (s, t; x). Another issue that also needs further study is to improve efficiency by modelling of the censoring distribution as well as by the inclusion of weights in the estimating equations. Even though theoretically there may be much to gain here it is, however, our impression from extensive simulations that the simple estimator based on the Kaplan–Meier estimator for the censoring distribution and without weights does quite well even when compared with the fully efficient product-limit estimator. Given the flexibility in choice of link functions and alternative parameterizations there should be room to approximate many transition probabilities well, but there is also a need to further develop goodness-of-fit methods that can borrow quite a bit from binomial regression techniques. Acknowledgements The research was supported by a National Cancer Institute grant. We thank Dr Richard Szydlo, Imperial College, for providing us the DLI data. The first author did part of the work while visiting the Center for Advanced Study in Oslo. We would like to thank two referees and the editor for their important comments that improved this paper. References Aalen, O. O. (1975). Statistical inference for a family of counting processes. PhD Thesis, University of California, Berkeley, CA. Aalen, O. O. (1978). Nonparametric inference for a family of counting processes. Ann. Statist. 6, 701– 726. Aalen, O. O. (1980). A model for non-parametric regression analysis of counting processes. In Mathematical statistics and probability theory (eds W. Klonecki, A. Kozek & J. Rosinski). Lecture Notes in Statistics, Vol. 2, 1–25. Springer-Verlag, New York. Aalen, O. O. (1989). A linear regression model for the analysis of life times. Statist. Med. 8, 907–925. Aalen, O. O. & Johansen, S. (1978). An empirical transition matrix for nonhomogeneous Markov chains based on censored observations. Scand. J. Statist. 5, 141–150. Aalen, O. O., Borgan, Ø. & Fekjær, H. (2001). Covariate adjustment of event histories estimated from Markov chain: the additive approach. Biometrics 57, 993–1001. Andersen, P. K., Borgan, Ø., Gill, R. D. & Keiding, N. (1993). Statistical models based on counting processes. Springer-Verlag, New York, NY. Andersen, P. K., Klein, J. P. & Rosthøl, S. (2003). Generalized linear models for correlated pseudoobservations with applications to multi-state models. Biometrika 90, 15–27. Bickel, P. J., Klaassen, C. A. J., Ritov, Y. & Wellner, J. A. (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore. Cheng, S. C., Fine, J. P. & Wei, L. J. (1998). Prediction of cumulative incidence function under the proportional hazards model. Biometrics 54, 219–228. © Board of the Foundation of the Scandinavian Journal of Statistics 2007. 30 T. H. Scheike and M.-J. Zhang Scand J Statist 34 Cox, D. R. (1972). Regression models and life-tables (with discussion). J. Roy. Statist. Soc. Ser. B 34, 187–220. Datta, S. & Satten, G. A. (2001). Validity of the Aalen–Johansen estimators of stage occupation probabilities and integrated transition hazards for non-Markov models. Statist. Probab. Lett. 55, 403– 411. Datta, S. & Satten, G. A. (2002). Estimation of integrated transition hazards and stage occupation probabilities for non-Markov system under stage dependent censoring. Statist. Probab. Lett. 58, 792–802. Efron, B. (1967). The two sample problem with censored data. In Proceedings of the Fifth Berkeley symposium on mathematical statistics and probability (eds L. M. LeCam & J. Neyman), Vol. 4, 831–853. Prentice-Hall, New York. Fine, J. P. & Gray, R. J. (1999). A proportional hazards model for the subdistribution of a competing risk. J. Am. Statist. Assoc. 94, 496–509. Fine, J. P., Yan, J. & Kosorok, M. R. (2004). Temporal process regression. Biometrika 91, 683–703. Gill, R. D. (1980). Censoring and stochastic integrals, Mathematical Centre Tracts 124. Mathematisch Centrum, Amsterdam. Klein, J. P. & Andersen, P. K. (2004). Regression modeling of competing risk data based on pseudo-values of the cumulative incidence function. Biometrics 61, 223–229. Klein, J. P., Szydlo, R. M., Craddock, C. & Goldman, J. M. (2000a). Estimation of current leukemiafree survival following donor lymphocyte infusion therapy for patients with leukemia who relapse after allografting: application of a multistate model. Statist. Med. 19, 3005–3016. Klein, J. P., Keiding, N., Shu, Y., Szydlo, R. & Goldman, J. M. (2000b). Summary curves for patients transplanted for chronic myeloid leukemia salvage by a donor lymphocyte infusion: the current leukemia-free survival curve. Br. J. Haematol. 109, 148–152. Lin, D. Y., Fleming, T. R. & Wei, L. J. (1994). Confidence bands for survival curves under the proportional hazards model. Biometrika 81, 73–81. Martinussen, T. & Scheike, T. H. (2002). A flexible additive multiplicative hazard model. Biometrika 89, 283–298. Martinussen, T. & Scheike, T. H. (2006). Dynamic regression models for survival data. Springer-Verlag, New York. Nelson, W. (1969). Hazard plotting for incomplete failure data. J. Qual. Technol. 1, 25–27. Nelson, W. (1972). Theory and applications of hazard plotting for censored failure data. Technometrics 14, 945–965. Pepe, M. S. (1991). Inference for events with dependent risks in multiple endpoint studies. J. Amer. Statist. Assoc. 86, 770–778. Pepe, M. S., Longton, G. & Thornquist, M. (1991). A qualifier Q for the survival function to describe the prevalence of a transient condition. Statist. Med. 10, 413–421. Robins, J. M. & Rotnitzky, A. (1992). Recovery of information and adjustment of dependent censoring using surrogate markers. In AIDS epidemiology-methodological issues (eds N. Jewell, K. Dietz & V. Farewell, 24–33. Birkhauser, Boston, MA. Scheike, T. H. & Zhang, M. J. (2003). Extensions and applications of the Cox–Aalen survival model. Biometrics 59, 1036–1045. Scheike, T. H., Zhang, M. J. & Gerds, T. (2006). Predicting cumulative incidence probability by direct binomial regression (submitted). Shen, Y. & Cheng, S. C. (1999). Confidence bands for cumulative incidence curves under the additive risk model. Biometrics 55, 1093–1100. Sun, L. Q., Liu, J. X., Sun, J. G. & Zhang, M. J. (2006). Modeling the subdistribution of a competing risk. Statist. Sinica 16, 1367–1385. Received November 2005, in final form September 2006 Thomas H. Scheike, Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5 B, PO Box 2099, DK-1014 Copenhagen, Denmark. E-mail: [email protected] Appendix A.1 Here we present the expressions for Ŵi and Ŵi 2 (t) from section 3.2. Ŵi is the estimator of Wi , obtained by plugging in estimates of the unknown quantities, defined by © Board of the Foundation of the Scandinavian Journal of Statistics 2007. Regression for multistate models Scand J Statist 34 31 Wi = C−1 , i + , i , Ri (t) − S0, 6 (t, xi , zi ) dt, , i = {D, i (t) − K(t)D, i (t)} wi (t) GC (t) 0 t q (t) dMiC (s) , i = dt, Y• (s) 0 GC (t) 0 {D, i (t) − K(t)D, i (t)} wi (t)Ri (t), q (t) = i where K(t) = DT (t)W(t)D (t)I−1 (t), and where Y• (s) is the number of subjects under risk. The second term , i is due to the uncertainty related to the censoring distribution. Wi 2 (t) = I−1 (t) , i (t) + , i (t) wi (t) − I−1 (t)Q(t)Wi , where Q(t) is the limit in probability of DT (t)W(t)D (t) and where , i (t) = D, i (t) Ri (t) − S0, 6 (t, xi , zi ) , GC (t) q (t) t dMiC (s) , GC (t) 0 Y• (s) q (s, t) = D, i (t)wi (t)Ri (t). , i (t) = i Appendix A.2 We now outline the main differences for the estimators using the censoring weights given in section 3.3. Let Ri6 (t) Ri0 (t) = + − S Di (t)wi (t) (t; x ) i (t). Ũ(t, (t)) = 0, 6 i ∗ GC00 (t) GC60 (Ti, 6 )GC66 (t, t) i i The difference between U and Ũ is ! " 1 1 − (t) = Di (t)wi (t)Ri0 (t) Ĝ C00 (t) GC00 (t) i ! " 1 1 Ri6 (t) + − Di (t)wi (t) Ĝ C66 (t, t) Ĝ C60 (Ti,∗6 ) GC60 (Ti,∗6 ) i ! " 1 1 Ri6 (t) + − Di (t)wi (t) Ĝ C60 (Ti,∗6 ) Ĝ Ci, 6 (t, t) GCi, 6 (t, t) i ! " C t Rj0 (t) dMi 00 (u) Dj (t)wj (t) ≈P C GC00 (t) Y• 00 (u) 0 i j " ! C t Rj6 (t) dMi 60 (u) ∗ + I (u ≤ T Dj (t)wj (t) ≤ t) j, 6 C GC60 (Tj,∗6 )GC66 (t, t) Y• 60 (u) 0 i j ! " C t Rj6 (t) dMi 66 (u, t) + Dj (t)wj (t) C GC60 (Tj,∗6 )GC66 (t, t) Y• 66 (u, t) 0 i j = i (t), i © Board of the Foundation of the Scandinavian Journal of Statistics 2007. 32 T. H. Scheike and M.-J. Zhang Scand J Statist 34 C C C C where Y• 00 (u) = i Yi, 00 (u), Y• 60 (u) = i Yi 60 (u) = i I {[(T̃i,∗1 ∧ Ti,∗6 ] ∧ Ci ) > t}, and Mi 0 (u), C06 C66 Mi (u) and Mi (u, t) are the corresponding martingales for the censoring processes C00 , C60 and C66 respectively. The remaining arguments parallel that based on the simple Kaplan–Meier given in the previous section. For the semiparametric regression models (8 and 9), similar as Appendix A.1, we have ! " Ri6 (t) Ri0 (t) + Wi = − S0, 6 (t, xi , zi ) dt D, i (t) − K(t)D, i (t) wi (t) GC00 (t) GC60 (Ti,∗6 )GC66 (t, t) 0 " ! C Rj0 (t) dMi 00 (u) + I (u ≤ t) dt D, j (t) − K(t)D, j (t) wj (t) C GC00 (t) Y• 00 (u) 0 0 j " ! C Rj6 (t)I (u ≤ Tj,∗6 ≤ t) dMi 60 (u) + dt D, j (t) − K(t)D, j (t) wj (t) C GC60 (Tj,∗6 )GC66 (t, t) Y• 60 (u) 0 0 j " ! C Rj6 (t)I (u ≤ t) dMi 66 (u) + dt D, j (t) − K(t)D, j (t) wj (t) . C GC60 (Tj,∗6 )GC66 (t, t) Y• 66 (u) 0 0 j √ With plug-in estimators, variance of n(ˆ − )) can be estimated by ! " ⊗2 ˆ = C−1 Ŵ C−1 . i i Let ! " Ri6 (t) Ri0 (t) + − S0, 6 (t, xi , zi ) GC00 (t) GC60 (Ti,∗6 )GC66 (t, t) $ t C # Rj0 (t) dMi 00 (u) + D, j (t)wj (t) I (u ≤ t) C GC00 (t) Y• 00 (u) 0 j " ! t C Rj6 (t) dMi 60 (u) ∗ + D, j (t)wj (t) I (u ≤ Tj, 6 ≤ t) ∗ C GC60 (Tj, 6 )GC66 (t, t) Y• 60 (u) 0 j " ! C t Rj6 (t) dMi 66 (u) + D, j (t)wj (t) ∗ C GC66 (Tj, 6 )GC66 (t, t) Y• 66 (u) 0 j Wi 2 (t) =D, i (t)wi (t) − DT (t)D (t)C−1 Wi . With plug-in estimators for Wi 2 (t), the asymptotic variance of mated by ! " ⊗2 −1 ˆ (t) = {I(t, ˆ (t))} Ŵ 2 {I(t, ˆ (t))}−1 . i i © Board of the Foundation of the Scandinavian Journal of Statistics 2007. √ n{ˆ(t) − (t)} can be esti-