Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2007 A Method for Finding the Nadir of NonMonotonic Relationships Fei Tan Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected] THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES A METHOD FOR FINDING THE NADIR OF NON-MONOTONIC RELATIONSHIPS By FEI TAN A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy Degree Awarded: Fall Semester, 2007 The members of the Committee approve the Dissertation of Fei Tan defended on November 8, 2007. Daniel McGee Professor Directing Dissertation Donald Lloyd Outside Committee Member Fred Huffer Committee Member Xufeng Niu Committee Member Gareth Dutton Committee Member The Office of Graduate Studies has verified and approved the above named committee members. ii This dissertation is dedicated to my family iii ACKNOWLEDGEMENTS In 2006, Dr. Dan McGee provided me with the opportunity of working as a research assistant in the biostatistics group on the problem of non-monotonic relationships. It was during that period of time that the idea of estimating the nadir using free-knot polynomial spline functions was motivated. I would like to express my sincere thanks to my advisor, Dr. Dan McGee, for giving me this opportunity, stimulating my interest in this topic, his guidance, patience and constant support. I would like to thank Dr. Fred Huffer, who taught the Counting Processes class in Fall 2006. Without taking this class, I could not have proved the asymptotic results in this dissertation. My gratitude would like to go to Dr. Xufeng Niu for his co-advising in the biostatistics group and his insightful and constructive suggestions. I have worked with Dr. Gareth Dutton in the College of Medicine since Summer 2006. I would like to thank him for giving me the chance to obtain experiences in applying statistics to solving medical research problems. I would like to thank both Dr. Dutton and my outside committee member Dr. Donald Lloyd for their interest in statistical methodology. Also, I would like to thank the departmental staff, Pam McGee, Jennifer Rivera, Evangelous Robinson, Virginia Hellman, and Megan Trautman for their help. Especially, I would like to thank James Stricherz for his great work in maintaining the good computing environment and being patient to all my computer-related questions. iv TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Current Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 6 3. PROPOSED METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Splines With Free Knots . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 33 4. ASYMPTOTIC PROPERTIES OF THE PROPOSED METHOD 4.1 Asymptotic Normality Of The Score Process . . . . . . . . . 4.2 Consistency Of The Maximum Partial Likelihood Estimator 4.3 Asymptotic Normality Of The MPLE . . . . . . . . . . . . . 4.4 Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 The Neighborhood Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 34 52 56 57 59 5. SIMULATION STUDIES . . . . . . . . . . . . . 5.1 A Goodness Of Fit Test For Survival Models 5.2 Transformation Model . . . . . . . . . . . . 5.3 Free-Knot Spline Model . . . . . . . . . . . 5.4 Other J-shaped Function . . . . . . . . . . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 66 67 72 76 80 6. FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIST OF TABLES 3.1 Model Comparisons And Nadir Estimations, NHIS White Female . . . . . . 22 3.2 Model Comparisons And Nadir Estimations, NHIS White Male . . . . . . . . 26 3.3 Model Comparisons And Nadir Estimations, The Norwegian Counties Study (full sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 Model Comparisons And Nadir Estimations, The Norwegian Counties Study (1 obs dropped) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5 Model Comparisons Using Likelihood Ratio Tests . . . . . . . . . . . . . . . 32 3.6 Model Comparisons Using BIC . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1 Simulation Parameters, NHANES I White Male . . . . . . . . . . . . . . . . 69 5.2 Simulation Results 15 − 50, NHANES I White Male . . . . . . . . . . . . . . 71 5.3 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50, NHANES I White Male . . . . . . . . . . . . . . . . . . . . . . . . 72 5.4 Simulation Results 15 − 60, NHANES I White Male . . . . . . . . . . . . . . 73 5.5 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60, NHANES I White Male . . . . . . . . . . . . . . . . . . . . . . . . 73 5.6 Simulation Results 15 − 70, NHANES I White Male . . . . . . . . . . . . . . 73 5.7 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70, NHANES I White Male . . . . . . . . . . . . . . . . . . . . . . . . 73 5.8 Simulation Parameters, The NHIS White Male . . . . . . . . . . . . . . . . . 74 5.9 Simulation Results 15 − 50, The NHIS White Male . . . . . . . . . . . . . . 76 5.10 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50, The NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.11 Simulation Results 15 − 60, The NHIS White Male . . . . . . . . . . . . . . 77 vi 5.12 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60, The NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.13 Simulation Results 15 − 70, The NHIS White Male . . . . . . . . . . . . . . 77 5.14 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70, The NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.15 Simulation Parameters, NHIS White Male . . . . . . . . . . . . . . . . . . . 78 5.16 Simulation Results 15 − 50, NHIS White Male . . . . . . . . . . . . . . . . . 80 5.17 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50, NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.18 Simulation Results 15 − 60, NHIS White Male . . . . . . . . . . . . . . . . . 81 5.19 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60, NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.20 Simulation Results 15 − 70, NHIS White Male . . . . . . . . . . . . . . . . . 81 5.21 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70, NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 vii LIST OF FIGURES 3.1 Change Point Model Profile Likelihood, NHIS White Female . . . . . . . . . 24 3.2 Quadratic Spline Profile Likelihood, NHIS White Female . . . . . . . . . . . 24 3.3 Fitted Curves, NHIS White Female . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Change Point Model Profile Likelihood, NHIS White Male . . . . . . . . . . 27 3.5 Spline Model Profile Likelihood, NHIS White Male . . . . . . . . . . . . . . 27 3.6 Fitted Curves, NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . 27 3.7 Change Point Model Profile Likelihood, The Norwegian Counties Study (full sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.8 Spline Model Profile Likelihood, The Norwegian Counties Study (full sample) 29 3.9 Fitted Curves, The Norwegian Counties Study (full sample) . . . . . . . . . 29 3.10 Change Point Model Profile Likelihood, The Norwegian Counties Study (1 obs dropped) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.11 Spline Model Profile Likelihood, The Norwegian Counties Study (1 obs dropped) 30 3.12 Fitted Curves, The Norwegian Counties Study (1 obs dropped) . . . . . . . . 31 4.1 Gap 1: 27—28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2 Gap 2: 27.3—27.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3 Gap 3: 27.5—27.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4 No Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.1 LBMI Histogram With Normal Density Curve, NHANES I White Male . . . 69 5.2 Assumed Underlying Curve, NHANES I White Male . . . . . . . . . . . . . 70 5.3 Assumed Underlying Curve, The NHIS White Male . . . . . . . . . . . . . . 75 viii 5.4 Assumed Underlying Curve, NHIS White Male . . . . . . . . . . . . . . . . . ix 79 ABSTRACT Different methods have been proposed to model the J-shaped or U-shaped relationship between a risk factor and mortality so that the optimal risk-factor value (nadir) associated with the lowest mortality can be estimated. The basic model considered is the Cox Proportional Hazards model. Current methods include a quadratic method, a method with transformation, fractional polynomials, a change point method and fixed-knot spline regression. A quadratic method contains both the linear and the quadratic term of the risk factor, it is simple but often it generates unrealistic nadir estimates. The transformation method converts the original risk factor so that after transformation it has a Normal distribution, but this may not work when there is no good transformation to normality. Fractional polynomials are an extended class of regular polynomials that applies negative and fractional powers to the risk factor. Compared with the quadratic method or the transformation method it does not always have a good model interpretation and inferences about it do not incorporate the uncertainty coming from pre-selection of powers and degree. A change point method models the prognostic index using two pieces of upward quadratic functions that meet at their common nadir. This method assumes the knot and the nadir are the same, which is not always true. Fixed-knot spline regression has also been used to model non-linear prognostic indices. But its inference does not account for variation arising from knot selections. Here we consider spline regressions with free knots, a natural generalization of the quadratic, the change point and the fixed-knot spline method. They can be applied to risk factors that do not have a good transformation to normality as well as keep intuitive model interpretations. Asymptotic normality and consistency of the maximum partial likelihood estimators are established under certain condition. When the condition is not satisfied simulations are used to explore asymptotic properties. The new method is motivated by and applied to the nadir estimation in non-monotonic relationships between x BMI (body mass index) and all-cause mortality. Its performance is compared with that of existing methods, adopting criteria of nadir estimation ability and goodness of fit. xi CHAPTER 1 MOTIVATION In studies where researchers are interested in effects of risk factors on disease outcomes or mortality, J-shaped or U-shaped relationships have been reported. Such a non-monotonic relationship between a covariate and mortality has the interpretation that excess mortality happens at both very low and very high values of the covariate and beyond a certain point increasing values of the covariate are associated with increased mortality, whereas below that point of the risk factor mortality is inversely related to the covariate. An example is the BMI-mortality relationship. People have often examined the relationship between body weight and mortality and more importantly, tried to establish guidelines about optimal body weight. Various findings about the relationship have been reported in the literature, including a linearly increasing relationship, a decreasing association and no association between weight and death [1, 2, 3, 4, 5]. However, most observational studies show non-monotone curves with excess mortality associated with both very low and very high levels of BMI [6, 7, 8, 9, 10, 11]. Alcohol and mortality in [12] and the EURAMIC (EURopean study on Antioxidants, Myocardial Infarction, and breast Cancer) study involving alcohol intake and risk of myocardial infarction in [13] present two other examples of a non-monotone J-shaped relationship. During 1991 and 1992 a total of 1499 men, who were less than seventy years old, were recruited to the EURAMIC international case-control study from eight European countries and Israel with the primary goal of examining the association between antioxidants and the risk of developing a first myocardial infarction. Later 330 cases and 441 controls, who reported some alcohol intake during the previous year, were selected from the study. The investigators report that the risk of myocardial infarction dropped compared to non-drinkers when the level of the risk factor was low and the risk kept increasing as alcohol intake rose. Detrimental effects were observed at both very low and very high values of alcohol intake. 1 In another study, 686 middle-aged hypertensive men were followed for 12 years in the Primary Prevention Trial in Gothenburg, Sweden, to study the relationship between the blood pressure level achieved through anti-hypertensive treatment and the incidence of coronary heart disease (CHD). The incidence of CHD showed a J-shaped relationship to achieved treated systolic and diastolic blood pressure levels [14]. The incidence of CHD, adjusted for age, serum cholesterol, blood pressure and smoking habits, decreased with increasing level of blood pressure achieved through treatment, until a level of about 150/85 mmHg, and then increased as the treated blood pressure went up. In this study the J-shaped pattern was also observed when data from patients with pre-existing ischemic heart disease were excluded. Other risk factors that have been reported to have an upturn to the left in the relationship with mortality include cholesterol [15]. Nadirs in such non-monotone patterns give us information about the optimal value and range of a risk factor and whether it is dangerous to excessively lower the risk factor. On the other hand, there are examples where the relationship between a risk factor and mortality is a “mirror image of the J-shape”. During 1975 and 1976 scientists studied the association of coronary mortality with temperature and air pollution in Athens. Data analysis for that study demonstrated in the two-year period that the low mortality point occurred at 27◦ C-30◦ C [16] showing the mirror image of a J-shape curve. Such a mirror image was also observed in the association between birth weight and early neonatal mortality [17]. Very high mortality was seen at both the lowest and the heaviest birth weights. In such cases an inverse association between the risk factor and mortality is well known and it is necessary to detect and model a less obvious upturn to the right. For these covariates increasing the risk factor value beyond a certain point is associated with an elevated risk. Accurately estimating the nadir and constructing the confidence interval are very important problems that statisticians face. To model such non-monotone data and estimate nadirs various methods such as the quadratic model [14], a model with a transformation of the risk factor [18] and a change point model [19] have been proposed. Fractional polynomials [20] and spline regression with fixed knots [21] have also been used, with focus on overall fitting of non-monotonic relationships. We address the following questions: • Are the current methods adequate under different conditions? If not, can we propose a new one? 2 • What can we say about the asymptotic properties of the Maximum Partial Likelihood Estimators of the new method? • How do these methods behave in general, is the new method better than existing ones? We will address the above questions in the rest of the dissertation. 3 CHAPTER 2 BACKGROUND 2.1 Basic Models The semi-parametric Cox Proportional Hazards Model and the parametric logistic regression model are commonly used to describe the risk factor-mortality relationship. Here we focus on the Cox model because it is more appropriate for time to event data and is frequently used in epidemiological studies with long-term follow-up. A description of the logistic model will also be given since the method with a transformation of the risk factor was originally derived and proposed for logistic models. 2.1.1 Logistic Regression Model Logistic regression is one of the generalized linear models. For every observation the response variable Yi comes from a Bernoulli distribution with the probability of a success given the vector of covariates being π(z i ) = P (Yi = 1|z i ) and the logistic regression model assumes: π(z i ) = 1 1 + exp − (z i ⊤ γ) (2.1.1) If information on p predictors is collected z i can be written as z i ⊤ = [zi0 , zi1 , ..., zip ] and γ ⊤ = [γ0 , γ1 , ..., γp], where zi0 is 1. The interpretation of logistic model is that increasing the risk factor zij by one unit causes the patient to be exp(γj ) times more likely to have a success if all other risk factor values do not change and the odds ratio is estimated to be exp(γˆj ). In order to find MLEs (maximum likelihood estimators) of parameters in this model we Q define the likelihood function as L(γ) = ni=1 π(z i )yi (1 − π(z i ))1−yi and the score function ∂log(L(γ )) as U(γ) = , where yi is a realization of random variable Yi with n being the sample ∂γ 4 size. The MLE is obtained by maximizing the log of the likelihood function. In other words, MLE occurs at the zero of the score function. Applying the Newton-Raphson method to the score function leads to an iterative expression, the relationship between the (m − 1)th and mth step is γ (m) = γ (m−1) + −1 I(γ (m−1) ) U(γ (m−1) ) with I(γ (m−1) ) being the information matrix evaluated at γ (m−1) . This expression makes it possible to find the estimators of parameters iteratively if an initial value of γ (0) is well chosen. It turns out γ (m) can, in each step, be estimated using a weighted least squares method. For logistic models the Newton-Raphson method is equivalent to iteratively re-weighted least squares. 2.1.2 Proportional Hazards Model Suppose T1 , T2 , · · · , Tn are the lifetimes with sample size n and they are independently identically distributed. U1 , U2 , · · · , Un are the censoring times, Z1 , Z2 , · · · , Zn are the covariate vectors. Let Xi = min{Ti , Ui }, δi = 1{Ti ≤Ui } . The hazard at time t is defined to be the instantaneous probability that one dies at t given this person was alive right before ≥t) t. The mathematical expression of the hazard function is h(t) = limε→0 P (t≤T <t+ε|T . The ε Cox Proportional Hazards Model was introduced by D. R. Cox in 1972 [22]. It assumes every individual in the population has a unique hazard given the covariates and the hazard is a product between an arbitrary and unknown function of time and a function of the explanatory variables and the unknown regression coefficients, where the arbitrary function of time is called the baseline hazard. If the ith observation has a covariate vector z i and the model has a vector of regression coefficients γ, the hazard function at time t of person i is given by: h(t|z i ) = h0 (t) × exp(z i ⊤ γ) (2.1.2) where h0 (t) is the baseline hazard function, z i ⊤ = [zi1 , ..., zip ] and γ ⊤ = [γ1 , ..., γp ]. Compared with the log odds ratio of logistic models, the log hazard ratio of the Cox model does not contain the constant term because it is absorbed into the baseline hazard. The proportional hazards model can be used to find the relative hazard between two subjects. For instance, if subject i has covariate vector z i and subject j has covariates z j then their relative hazard using subject j as the reference is exp((z i −z j )⊤ γ). We can see, from the previous expression, the ratio between hazards of two individuals is constant over time if their covariates do not 5 change as time goes by, this is why the model is called “Proportional Hazards”. The partial likelihood involving only the parametric term is used to generate parameter estimators for the Cox Model. We observe (Xi , δi , Zi), i = 1, · · · , n and let T10 < T20 < · · · < TL0 be the distinct uncensored death times assuming ties occurring with probability zero. (i) is the label of the single individual who dies at Ti0 . Let R(t) = {i : Xi ≥ t}. Define Ri = R(Ti0 ) to be the risk set containing everybody who is still at risk at the ith uncensored death time. Then the partial likelihood function is: L exp z ⊤ γ Y (i) P P L(γ) = ⊤ exp z γ j j∈R i i=1 However, in practice although we assume that ties occur with probability zero, simultaneous uncensored deaths could still happen by chance, or the time scale on which death is measured may result in apparent ties. To accommodate tied observations, various approximations to the partial likelihood function have been proposed, examples are approximations suggested by Breslow [23], Peto [24], Efron [25] and Cox [22]. The maximization of the partial likelihood function is approximated using the Newton-Raphson technique. 2.2 Current Methods Categorical analyses could be used at an initial stage to detect a potential J-shape, but it is hard to draw formal statistical inferences based on such descriptive analyses. Another drawback of categorical analyses is that cut points of groups are usually selected arbitrarily by using percentiles or adopting suggestions from previous similar studies. This creates problems if subjects within one group are not homogeneous. So, we will focus on models that treat independent variables as continuous and in this section we review the various methods that have been used in epidemiologic research to deal with a non-monotonic predictor-response relationship. The models considered are a quadratic model, a model with transformation, fractional polynomials, change point models and spline regression with fixed knots. 2.2.1 Quadratic Model Quadratic models have been applied to describe the J-shape pattern between DBP and coronary heart disease in the Swedish primary prevention trial [14] and the relationship 6 between BMI and mortality in black and white women [26]. It is probably the first and most natural model one could think of when a U-shape or J-shape is observed. A quadratic model under the assumption of proportional hazards assumes there is both a linear and a quadratic term of the risk factor of our interest in the linear combination of all predictors. Let xi be the variable that we expect to have a non-monotonic relationship to the response and z i be the vector of the remaining risk factors, the hazard function of subject i is written as h(t|xi , z i ) = h0 (t) × exp β1 xi + β2 xi 2 + z i ⊤ γ (2.2.1) The optimal predictor value is calculated by applying the formula of a quadratic nadir, that is, the nadir is Xmin = β1 −2β2 (2.2.2) where X is the risk factor with observed sample [x1 , x2 , ......, xn ]⊤ . Other than the simple quadratic functional form, the advantage of such models is the capability of directly adopting the existing parameter estimation algorithm under the Cox model since the prognostic index is still linear in parameters. The nadir is not a parameter in the model thus the confidence interval can not be obtained from standard software. Methods that can be utilized to find it include the Delta method, Fieller’s theorem and bootstrap estimation. These methods will be discussed in detail when the transformation model is introduced. A problem with the quadratic model is that in most real studies a covariate-mortality curve is not symmetric about the nadir and quadratic models force the curve to be symmetric. Applying this model can generate unrealistic nadir estimates such as a nadir much higher than the empirical optimal predictor range based on categorical data analyses [18]. To surmount the problem, other methods have been proposed. 2.2.2 A Model with Transformation Applying a transformation of the risk factor is one way to avoid the unrealistic nadir estimate given by the quadratic method. The idea was proposed for logistic models in 1997 to study the J-shape relationship between BMI and mortality [18]. In this paper although the analysis was under a multivariate setting, the authors focused on the marginal distribution of the risk factor BMI. The transformation method was motivated by a result, pointed out by 7 Cornfield [27], which states if random variable X is the predictor that we expect to have a J-shaped relationship to the response and it has pdf (probability density function) f1 (x) among cases and pdf f0 (x) among non-cases with p = P (case) and q = 1 − p = P (non-case), then π(xi ) = 1 q × f0 (xi ) 1+ p × f1 (xi ) If the ratio between f0 (x) and f1 (x) can be further expressed as an exponential function of some functional form of x, say g(x), then π(xi ) = 1 1 + K × exp g(xi ) where K is a constant and it is appropriate to apply the logistic model to risk factor X, given g(x) is linear in parameters. The Normal distribution was given as an example. Suppose f0 (x) is the pdf of N(µ0 , σ0 2 ) and f1 (x) is the density of N(µ1 , σ1 2 ) then we have π(xi ) = 1 + K × exp µ 0 2 σ0 µ1 − 2 σ1 1 xi + 1 1 − 2 2 2σ1 2σ0 xi 2 (2.2.3) and the xi 2 term remains in the model if σ02 6= σ12 . Equation (2.2.3) implies the form of predictor X in the logit function is quadratic. For risk factors whose densities f0 (x) and f1 (x) are not Normal, transformations could be applied to achieve Normality and according to the above illustration applying the quadratic model to the transformed Normally distributed predictor (with unequal variances among cases and non-cases) is appropriate. When variances are the same, the functional form of the transformed variable is linear in the logit function. The authors who proposed the idea of transforming variables used the BMI-mortality relationship as an example and suggested the inverse transformation of BMI. Among others, this transformation was commonly selected and the transformed BMI, 1/BMI, is called lean body mass index, denoted by LBMI. It has been noted by Flegal [28] and Nevill and Holder [29] to be not only a transformation to Normality, but also an appropriate measure of percentage of body fat in terms of its biological meaning. Using the transformation method in the Cox model is straightforward when the disease is rare. Under this assumption the odds ratio given by the logistic model is approximately equal 8 to the relative risk and the hazard under Cox model can be thought of as the instantaneous risk thus relative hazard is roughly the relative risk, which means the two models essentially measure the same quantity. Hence the idea of transforming the predictor followed by a quadratic fitting of the Normally distributed transformed predictor can be adopted under Cox model directly. The proportional hazards model with transformation is the same as (2.2.1), except this time xi is a Normally distributed predictor obtained by transformation. In their paper, the BMI associated with the lowest risk is calculated by finding the nadir of LBMI then transforming LBMI value back to the BMI value. The formula for BMI nadir is: Xmin = −2β2 β1 If a reasonably good transformation to Normality can be found, the advantage of this method is its simplicity and the fact that the log hazard ratio function is linear in parameters compared with the change point model that will be discussed later. Linearity in parameters guarantees we can use existing software, and point estimators as well as variance estimators in standard output are valid. Since our concern is the risk factor value associated with the lowest mortality, after obtaining the point estimator of the nadir we still need to find the confidence interval. Three methods were used: The Delta method, Fieller’s theorem and bootstrap estimation, where results of Fieller’s theorem in the paper turned out to be very close to those of Delta method. The Delta method is a very useful tool which can be employed to calculate the variance of a function of multivariate Normal statistics. In fact, not only the variance but also the asymptotic Normality of the function is given by Delta method [30]. In our case the quantity −2βˆ2 of interest is the nadir estimator , which is a function of multivariate Normal statistics βˆ1 −2βˆ2 [βˆ1 , βˆ2 ]. According to Delta method, the asymptotic distribution of is Normal and the βˆ1 variance is given by: ! −2βˆ 4βˆ 2 Var(βˆ ) Cov(βˆ1 , βˆ2 ) Var(βˆ2 ) 2 2 1 = + Var −2 2 2 2 βˆ1 βˆ1 βˆ2 βˆ1 βˆ1 βˆ2 To estimate the variance of the nadir, we need to replace variances and covariance in the above equation by their estimates. The confidence interval is constructed based on asymptotic Normality and estimated variance. 9 Bootstrap estimation is a more computer intensive method, it requires repeated sampling with replacement from the original sample [31]. If we can assume the original sample well represents the true population then the generated samples are also good representatives from the population. Based on each generated sample one nadir estimator can be obtained using the transformation method described above. Thus a sequence of nadirs can be generated from these bootstrap samples, and the empirical centiles are the limits of the bootstrap confidence interval. For example the 95% bootstrap confidence interval contains all values between the 2.5th and the 97.5th empirical percentiles. The transformation method works well if a transformation to Normality can be found. If no good transformation to Normality exists however it can not be applied and we need to develop other methods. Additional methods, including the fractional polynomial model, the change point model and the spline model with fixed knots, work without Normality assumption on the predictor. 2.2.3 Fractional Polynomial It is possible that other transformations are more suitable for describing the non-linear relationship that is seen between response and the independent variable when the independent variable is not Normal. Naturally, people think of polynomials. In general, however, low order polynomials do not always fit the data well due to lack of curvature and high order polynomials behave badly at the ends of the range of an observed risk factor although they follow the data more closely. So other transformations of the predictor such as inverse polynomials have been proposed [32], they exemplify a broader class of models, fractional polynomials. Royston and Altman [20] extended the family of polynomials by including fractional and negative powers of the risk factor. Their method was originally proposed for overall model fitting instead of nadir estimation, but if a good overall model can be found one can always get the nadir by taking the first derivative of the log hazard ratio. Thus the class of fractional polynomials is a candidate for nadir estimation. In our case we assume non-monotonicity of one variable and linearity in all other predictors. A full definition of fractional polynomial of degree m is: φm (x, z; βm , γ m , pm ) = m X j=0 10 βm,j Hj (x) + z T γ m (2.2.4) with if j = 0 1 (p ) m,j Hj (x) = x if j 6= 0 and pm,j = 6 pm,j−1 Hj−1(x)ln(x) if j = 6 0 and pm,j = pm,j−1 where pm,0 = 0, x is the non-monotonic risk factor, z is the vector of the remaining covariates, β m is the vector of regression coefficients associated with transformations of x, γ m are coefficients of the remaining predictors, pm,j is the j th element of pm , the power vector of fractional polynomial with degree m, and x(pm,j ) is the Box-Tidwell transformation, i.e. ( xpm,j if pm,j 6= 0 x(pm,j ) = ln(x) if pm,j = 0. The variable to which we apply the fractional polynomial transformations is assumed to be positive and if it takes non-positive values a preliminary transformation is required to turn non-positive values to positive. When a fractional polynomial is incorporated into the Cox model the coefficient βm,0 of the constant term is always zero. First we need to select the degree m and power vector pm . Fractional polynomial models with degree higher than 2 are rarely required in practice and those with degree lower than or equal to 2 fit better than classical polynomials by offering more flexibility and stability, Royston and Altman suggest that m ≤ 2 suffices for most situations. As for the power vector pm , they suggest that one choose the best power vector, p̃m , from all the m-tuples of powers selected with replacement from the fixed set P = {−2, −1, −0.5, 0, 0.5, 1, 2, ..., max(3, m)}, that maximizes the partial likelihood in model fitting when m is given. For a given m, they developed the confidence interval of pm . If partial deviance is defined to be D(m, pm ) = −2 × logP L(m, pm ), suppressing all other parameters, and p̂m denotes the MPLE (maximum partial likelihood estimator) of pm , then for testing H0 : pm = p̈m theoretically we have the likelihood ratio statistic D(m, p̈m ) − D(m, p̂m ) that is asymptotically a χ2m random variable. However, in practice we do not know p̂m but p̃m , we also know D(m, p̃m ) ≥ D(m, p̂m ). The statistic D(m, p̈m ) − D(m, p̃m ), which we actually use, is conservative for the above test. The corresponding confidence interval adopting the conservative test statistic consists of all p̈m that are not rejected in the test. For the comparison between degrees m and m + 1 the likelihood ratio test statistic, D(m, p̂m )−D(m+1, p̂m+1 ), applies, again theoretically. It has an asymptotic χ22 distribution. But we can only calculate D(m, p̃m ) − D(m + 1, p̃m+1 ), which is used as an approximation 11 of D(m, p̂m ) − D(m + 1, p̂m+1 ) to test H0 : the degree is m. vs Ha : the degree is m + 1. Royston and Altman further defined the gain of a model against the baseline linear model as follows: G(m, pm ) = D(1, 1) − D(m, pm ), so that a larger gain indicates a better fit. This way D(m, p̃m ) −D(m+ 1, p̃m+1 ) in the above test equals G(m + 1, p̃m+1 ) − G(m, p̃m ). In practice people could focus on, say, degrees 1 and 2 as Royston and Altman suggested to find G(1, p̃1 ) and G(2, p̃2 ). The comparison between these two models helps us determine the best degree, thus the final model. After degree m and power vector pm are estimated they are treated as constants in model fitting thus the log hazard ratio of the Cox model with a fractional polynomial is linear in parameters and the coefficients again are obtained by applying the usual estimation algorithm. However, this time the linearity is “artificial” and the variation arising from preestimation of degree m and power vector pm is not considered when regression coefficients β m and γ m are estimated. Hence the statistical inference based on the “artificial” linear log hazard ratio is not reliable, specifically, the confidence intervals of regression coefficients are not reliable and if we apply the Delta method to calculate a confidence interval of the nadir the estimated variance will not be accurate. 2.2.4 Change Point Model This method was suggested by Goetghebeur and Pocock and was motivated by the quadratic model originally proposed for modeling the relationship between DBP and coronary mortality [19]. To avoid confounding between evidence for left and right upturn the authors used two independent pieces of quadratic functions with different regression coefficients to model both sides of the nadir. The full change point model can be described by the following equation: h(t|xi , z i ) = h0 (t) × exp β1 (xi − η)2 1{xi ≤η} 2 ⊤ +β2 (xi − η) 1{xi >η} + z i γ (2.2.5) with η, the nadir, being a point between the lowest possible value of the nadir, ηl , and the highest possible nadir, ηu , z i being the covariate vector of the ith observation excluding the variable with a change point and xi being the variable with a J-shape relationship to the 12 response. Equation (2.2.5) models the J-shape curve with two different pieces of quadratic functions: the first piece contributes to the left of η, the second describes the branch to the right, and the two quadratic branches meet at η. Their work consists of deriving the asymptotic distribution of MPLE, a proposed parameter estimation algorithm and inference for parameters. The change point model (2.2.5) involves an unknown nadir, thus the log hazard ratio is nonlinear in parameters and the asymptotic distribution of MPLE under linearity assumption does not apply. In 1995 Goetghebeur and Pocock [19] showed the asymptotic Normality of the MPLE under model (2.2.5) if no subject’s risk factor level is too close to the true nadir. Their proposed parameter estimation method is the following 2-step algorithm: • Scan the range of possible values of η, each η value is used as a known change point in fitting model (2.2.5). η̂, the MPLE of η, is the one that generates the maximum of the profile log-partial-likelihood. • Fix η = η̂ in model (2.2.5) so that the log hazard ratio is linear in parameters. Apply the usual estimation algorithm to find [βˆ1 , βˆ2 , γ̂] = [βˆ1 (η̂), βˆ2 (η̂), γ̂(η̂)], the MPLEs of β1 , β2 and γ. The advantage of applying the profile likelihood search is that the non-linear log hazard ratio is turned to linear in each step by fixing the change point. Therefore, people do not need to work directly on the partial likelihood function and can simply use existing statistical softwares to find the MPLE. Inferences about parameters are based on likelihood ratio statistics resulting from the asymptotic distribution of MPLEs. The authors focused on the asymptotic p value of β1 and the confidence interval of η. To formally test if a J-shape exists, they considered H0 : β1 = 0 vs Ha : β1 6= 0. The likelihood ratio test statistic is 2[logP L(η̂, βˆ1 , βˆ2 , γ̂) − logP L(η̃, 0, β̃2, γ̃)], which has an asymptotic χ21 distribution, where [η̂, βˆ1 , βˆ2 , γ̂] are the MPLEs of [η, β1 , β2 , γ] and [η̃, β̃2 , γ̃] are the MPLEs restricted on the parameter space with requirement β1 = 0. The asymptotic confidence interval of the nadir consists of all η values such that 2logP L(η, βˆ1 (η), βˆ2 (η), γ̂(η)) is above 2logP L(η̂, β̂1 , β̂2 , γ̂) − χ21,α . In fact, the authors suggest that in a screening step before the above full analysis one should check the existence of an upturn to the left. One way of doing so is to fit a sequence 13 of models among observations with xi ≤ η: h(t|xi , z i ) = h0 (t) × exp β1 (xi − η)2 1{xi ≤η} + z i ⊤ γ where η is a point that runs from ηl to ηu . The test is completed by calculating ( ) βˆ1 Zmax = max zη = : ηl < η < ηu se(βˆ1 ) and the p value, p = P (Z > Zmax ) with Z being a standard Normal random variable, for the null hypothesis H0 : β1 = 0 (there is no upturn at the low end). This way the true p value for H0 is at least equal to the calculated p value, and p = P (Z > Zmax ) is anti-conservative for H0 . A non-significant observed p value indicates the true p value is also not significant and we do not have evidence to support the upturn on the low end. In this case, we stop here since the J-shape does not exist. On the other hand, if the observed p value is significantly small we do not know whether the true p value is significant thus further analysis is necessary. The sequential tests are used in the preliminary step to study the necessity of further full analysis. The change point model is not linear in parameters. To overcome this problem the authors wisely adopted a profile partial likelihood search to convert the non-linear problem to linear in each step. But when the MPLE, η̂, of the change point is fixed in the Cox model to obtain the point estimators and variance estimators of regression coefficients, the output from standard Cox model can not be used since it assumes a fixed change point. In other words, the variance estimators of regression coefficients do not incorporate variation coming from change point estimation. Likelihood-based inference, resulting from the asymptotic distribution of the MPLEs, could produce asymptotic confidence interval for the change point and asymptotic p values for regression coefficients. Their asymptotic distribution result is under the condition that no risk factor value is too close to the true change point, which is a restriction that is not satisfied by continuous predictors. Goetghebeur and Pocock studied, via type I error and power, the plausibility of the likelihood ratio test of β1 = 0 and the confidence interval of η when the sample size was taken to be moderate, 1000, and the covariate was simulated from a Normal distribution without any restriction on the neighborhood of the true change point. More research needs to be done to study the asymptotic behavior of estimated parameters 14 when the risk factor is continuous and the neighborhood assumption is not satisfied. At the same time a question we ask ourselves is can we avoid the neighborhood assumption. Viewed as a sub-model nested in the spline model that will be discussed later, the change point model assumes the knot is the same as the nadir. However this assumption will be shown, through examples, to be not always true. The authors proposed the change point method to model the upturn at the low end and estimate the nadir if a positive relationship is widely accepted. A simpler way is to fit a quadratic form in the Cox model and estimate the nadir using (2.2.2). They claimed the change point method was better but only one example was given where both methods fit the data with monotonically increasing curves therefore the performance of the two nadir estimation methods were not compared. We will compare these methods in terms of overall fitting and nadir estimation ability. 2.2.5 Spline Regression with Fixed Knots Sleeper and Harrington described how regression splines with fixed knots could be applied in the Cox model [21]. Relevant hypothesis testing problems were also addressed. Their method again, was proposed for overall model fitting. As mentioned before, one can always obtain the nadir by taking first order derivative of the spline function and selecting the value generating the lowest log hazard ratio to eliminate any local minimum if the degree of the spline function is higher than two. Before applying the spline function no assumption about the pattern of the curve needs to be made and spline functions allow the data to speak for themselves. As described by Sleeper and Harrington, “A spline is a piecewise polynomial with continuity conditions on the function and its derivatives at the points where the pieces join”. Specifically, a spline function is characterized by its order, sequence of knots and continuity conditions on the knots. The order of a spline is the highest degree plus one. A knot is the point where two adjacent polynomial pieces join. The last set of parameters is the class of numbers that determine the number of continuity conditions at each knot. Hence if a linear spline function is fit to the data the estimated relationship could be nonsmooth, which is not appropriate for smooth relationships usually seen in epidemiological research. Sleeper and Harrington mentioned cubic splines with continuous first and second order derivatives were sufficient for most log hazard ratio functions. In their example both quadratic and cubic splines were adopted and comparisons among them generated the best model. Greenland [33] also pointed out that compared with more complex functional forms 15 arising from engineering, relationships observed in epidemiology were simple so that lower order splines such as quadratic and cubic sufficed for model fitting in this area. He further suggested the use of relatively simple quadratic splines since cubic splines could produce weird shapes in wide or open-ended intervals. Besides being a minor disadvantage, cubic splines result in poor interpretability of coefficients. Sleeper and Harrington suggested using of 5 or fewer interior knots as their experience indicated “they are typically needed to approximate the effect of a covariate on survival”. An introduction to polynomial splines can be found in a book by Schumaker [34]. Let PPm,M ,∆ be the linear space of all piecewise polynomials with order m, the highest degree plus one. In this notation, m = the highest degree plus one ∆ = [c1 , ..., ck ], the ordered known knot sequence M = [m1 , ..., mk ], the vector of multiplicities such that the j th derivative of the spline function at knot ci exists, where j = 0, 1, ..., m − 1 − mi and i = 1, ..., k If every mi , i = 1, ..., k is taken to be m then the spline function allows discontinuities at knots; when mi , i = 1, ..., k is taken to be one then the smoothest spline function with order m is obtained since if any further smoothness condition is added the knots will disappear. The vector of multiplicities also controls the number of times every knot appears in the extended knot sequence that will be introduced later. The dimension of this linear space is P d = m + K = m + ki=1 mi . For instance, the space of all piecewise quadratic functions with continuous first order derivative and two fixed knots is denoted by PP3,M ,∆ where ∆ = [c1 , c2 ] is a pair of constants denoting the two fixed knots and M = [1, 1] specifies at both knots there are two continuity conditions: continuity of the piecewise quadratic function itself and its first order derivative. The dimension of this space is 3 + 1 + 1 = 5 meaning five parameters will need to be estimated if a quadratic spline with continuous first order derivative and two fixed knots is used. PP4,M ,∆ represents cubic splines with continuous first, second order derivatives and one fixed knot if ∆ = c1 is a fixed constant with M = 1 requiring the function, its first and second order derivatives to be continuous at c1 . The dimension of this space is 4 + 1 = 5. 16 Suppose the range of the predictor is [a, b], the one-sided basis of PPm,M ,∆ given in [34] is and mi ρi,j (x) = (x − a)m−j j=1, , i=0 mi , i = 1, ..., k ρi,j (x) = (x − ci )m−j + j=1, where m0 = m and (x)q+ = (x × 1{x>0} )q . Specifically, the basis of the linear space consisting of all quadratic splines with continuous first order derivative and two fixed knots c1 c2 is {1, (x − a), (x − a)2 , (x − c1 )2+ , (x − c2 )2+ }; the basis of cubic splines with continuous first two derivatives and one fixed knot is {1, (x − a), (x − a)2 , (x − a)3 , (x − c1 )3+ }. Other authors such as Sleeper and Harrington [21] or Gallant and Fuller [35] used a truncated power basis that is similar to the one-sided basis. The truncated power basis of quadratic splines with continuous first order derivative and two fixed knots is {1, x, x2 , (x − c1 )2+ , (x − c2 )2+ }, that of cubic spline with continuous first two derivatives and one fixed knot is {1, x, x2 , x3 , (x − c1 )3+ }. The utilization of the above two truncated power bases under the Cox model leads us to the log hazard ratio functions β1 x + β2 x2 + β3 (x − c1 )2 1{x>c1 } + β4 (x − c2 )2 1{x>c2 } and β1 x + β2 x2 + β3 x3 + β4 (x − c1 )3 1{x>c1 } , respectively. The truncated power basis is very intuitive, especially in some simple cases where the model has a nice interpretation. For example the Cox model using a quadratic spline with one fixed knot and continuous first derivative has form: h(t|xi , z i ) = h0 (t) × exp β1 x + β2 x2 + β3 (x − c1 )2 1{x>c1} + z i ⊤ γ (2.2.6) which can be viewed as a baseline parabola with an extra adjustment piece attached to it and the adjustment piece starts from knot c1 with xi being the non-monotonic risk factor, z i being the vector of all other predictors. β1 , of the −2β2 baseline parabola, then after the adjustment piece is merged into the baseline parabola the A special case of this function is when the knot c1 is taken to be the center, function becomes two parabolic branches that meet at their common center. If the center is a parameter this is the change point model that we already discussed. Furthermore, if the two branches happen to share the same regression coefficient then a regular parabola is obtained. 17 Let’s start from the quadratic spline with one fixed knot and continuous first derivative again, this time if the x2 term is dropped, the function reduces to β1 x + β3 (x − c1 )2 1{x>c1 } , which can be interpreted as a baseline linear model with an adjustment parabolic branch that exists on x > c1 and meets the baseline straight line at c1 . We note that in order for two spline functions with fixed knots to be nested, the knot locations in the smaller model have to be a portion of that of the bigger model since a knot is viewed as a constant, not a parameter. In their paper, Sleeper and Harrington fit seven quadratic and seven cubic spline models, with three potential knot locations taken to be the three quartiles of uncensored death times. The models they tried were three quadratic-one-knot models, three quadratic-twoknot models with each model adopting one pair of the three quartiles of uncensored death times, one quadratic-three-knot model and the cubic counterparts of the above seven models. The final model was selected using a statistic equivalent to AIC. They noted that the standard variance estimators were estimated under the assumption that the knots were fixed. When various knot locations were tried in order to find the best knot sequence, like they did, such standard inference methods did not generate the true variances, which were larger. Besides, the knot locations were arbitrary so that even they mentioned “the shape of some estimates can depend heavily on the knots selected”. All of these are problems that could affect the accuracy of the nadir estimation and relevant statistical inference. 18 CHAPTER 3 PROPOSED METHOD 3.1 Motivation When the predictor has a Normal distribution in both cases and non-cases with distinct variances the quadratic method is appropriate for model fitting and nadir calculation. If a good transformation to Normality can be obtained, such as the inverse transformation of BMI, the quadratic model can be applied to the transformed predictor and the nadir can also be easily found. If no good transformation to Normality is known, fractional polynomials can then be used to fit the overall model, but the variance of each parameter estimator does not incorporate variation arising from pre-selection of the degree and power vector, and this could affect the accuracy of the inference for the nadir. Regression splines with fixed knots could also be used for overall model fitting. However when various knot locations are tried to find the best knot sequence standard inference does not generate the true variances, which are larger. Besides, estimated parameters could depend heavily on the knots selected. The change point model does not require the predictor to be Normal thus can be adopted when no good transformation to Normality exists. It is nested in model (2.2.6) by forcing β1 c1 = if knot c1 is a parameter. In other words, the change point model is nested in −2β2 the quadratic spline model with continuous first order derivative and one free knot (2.2.6). Compared with the more general free-knot spline function, the change point model implicitly assumes the nadir is the same as the knot c1 . Is this assumption always true? The nadir is a predictor value associated with the lowest risk, it is usually in a small subset of the data range [a, b]. The knot is a point where the adjustment piece starts, it could be any point in the data range [a, b]. Intuitively it is not clear why they have to be the same. If they are forced to be the same then the confidence interval of the nadir is forced to be the same as that of the knot, therefore the constructed nadir confidence interval might be affected by 19 the knot. On the other hand, if the knot is forced to be the nadir then the overall model fitting could be sacrificed. Will this assumption cause any problem? Does the change point method always work? We will answer these questions through the following examples. The examples being considered are the National Health Interview Survey white female cohort, white male cohort, the Norwegian Counties Study and the Diverse Populations Collaboration. Detailed descriptions of these examples will be given later. The change point model is proposed to analyze non-monotone quadratic-looking data. So before data analysis the existence of a non-monotonic relationship was examined. For the quadratic model with BMI or LBMI, three models were considered: 1. A null model with linear terms of all other covariates 2. A linear model with a linear term of the main risk factor and linear confounders 3. A quadratic model involving linear and quadratic terms of the risk factor with confounders These models were checked and likelihood ratio test statistics were calculated to compare these nested models. There is an indication of non-monotonicity if the quadratic model is selected by likelihood ratio tests. Fractional polynomial models were fit using the fracpoly command with option compare in the statistical package STATA. The compare option does comparisons via deviance tests that are given in section 2.2, among four models: null model, linear model, fractional polynomial model with degree one and fractional polynomial with degree two. A first-degree fractional polynomial is always monotone when the risk factor is positive. So only when the best fractional polynomial model of a sample has degree two can the covariate-mortality relationship be non-monotonic. Before a change point model was fit the screening test given by Goetghebeur and Pocock, appeared in previous section, was applied, and a p value less than 0.05 was used as an indication of an upturn on the low end. For each of the following examples, the four non-monotonicity detection tools were used and all of them suggested a non-monotonic relationship. To every dataset we applied the Cox model with main risk factor BMI, adjusted for age and smoking status. Other risk factors could also be used, the main reasons for using BMI 20 are its non-Normal distribution and the fact that LBMI is Normally distributed thus results from a quadratic model applied to LBMI could be used as references. For each example a quadratic form (2.2.1) was utilized for both BMI and LBMI, representing the quadratic model and the transformation model, respectively. A second order fractional polynomial model (2.2.4) was fit. The change point model (2.2.5) and quadratic spline with continuous first order derivative and one free knot (2.2.6) were adopted for every example as well. Likelihood-based model comparisons were employed. The transformation and the fractional polynomial models are not nested in any of the other three models nor do they contain any of them, thus they were not compared with the other three methods via likelihood ratio tests. Likelihood ratio comparisons involving the spline function are based on the asymptotic distribution of MPLEs for parameters in (2.2.6), which has not been proved yet, therefore BIC was also adopted as a guide of model selection. BIC, or the Bayesian Information Criterion, is defined to be −2ln(partial-likelihood) + kln(n), where k is the number of estimated parameters in the model and n is the sample size. We must point out that when BIC is used as the model selection guidance the fractional polynomial BIC is misleading since it is calculated assuming both degree m and power vector pm are constants instead of parameters. Therefore the k, number of estimated parameters, in the BIC formula does not reflect the fact that m and pm are estimated and the calculated BIC value is misleading. As mentioned before, the change point model (2.2.5) is nested in the free-knot spline model (2.2.6) by taking the knot to be the center of the baseline parabola, hence the degrees of freedom for the likelihood-based χ2 test is 1. The quadratic model can be achieved by forcing β1 = β2 from the change point model (2.2.5) hence the degrees of freedom for the model comparison is again 1. Estimated nadirs were obtained using methods described in section 2.2. The nadir of the quadratic spline model was calculated by taking the first order derivative of the spline function then comparing the function values at the two potential nadirs. The local maximum or minimum was eliminated by taking the nadir to be the one generating a lower function value. 95% Confidence intervals of the quadratic model or the model with transformation were found via the Delta method mentioned in section 2.2. Those of the change point model or the spline model were likelihood-based confidence intervals found through a profile likelihood search. 21 3.1.1 The National Health Interview Survey (NHIS) White Female The first example is the National Health Interview Survey. The NHIS is a continuing nationwide survey of the U.S. civilian non-institutionalized population conducted through households. Each week a probability sample of households is interviewed by trained personnel from the U.S. Census Bureau to obtain information about the health and other characteristics of each member of the sample household. The average annual sample consists of 36000 to 47000 households, yielding 92000 to 125000 people. Completed questionnaires are sent from the U.S. Census Bureau field offices to the National Center for Health Statistics (NCHS) for coding and editing. Beginning with survey year 1986, linkage information has been collected on NHIS respondents to allow for matching with other data systems, including the National Death Index (NDI). Linkage of NHIS respondents with NDI provides a longitudinal component to NHIS, which allows for the ascertainment of vital status. (From the Florida State University Diverse Populations Collaboration website biostat.stat.fsu.edu). We focused on the subgroup of 61521 white females with 5091 deaths, from whom smoking status is known. The shortest follow-up time is 3 days and the longest follow-up time is 3286 days. The distribution of BMI is right skewed hence not Normal, whereas the LBMI distribution is approximately Normal. The results of model fitting and nadir estimations are given in Table 3.1. Table 3.1: Model Comparisons And Nadir Estimations, NHIS White Female p(χ2 ) BIC Nadir C.I. transformation fractional polynomial 100436.4 23.7 23.1 − 24.3 100421.9 24.2 23.7 − 24.8 quadratic change quadratic point spline 0 0 100610.8 100474.9 100432.1 28.4 22.2 24.4 24.0 − 32.7 21.7 − 22.7 23.2 − 25.0 This particular dataset shows the p value of the likelihood ratio test between the quadratic model and the change point model is trivially small, that between quadratic model and the quadratic spline model is also trivially small and if the change point model is compared with the quadratic spline model the significant one is the spline model. BIC could also be 22 used as a model selection guide. The advantage of using BIC is that it accounts for the number of parameters appearing in each model. One can improve the likelihood value by including a large number of unknown parameters that need to be estimated, BIC penalizes such excessive utilization of parameters. The comparisons among simple quadratic model, change point model and the spline model using BIC as the criterion suggest the quadratic spline model is the best with the smallest BIC of 100432.1. Hence both the likelihood ratio test and the BIC value indicate the spline model should be selected. We notice the BIC value of fractional polynomial method is the lowest, this does not indicate the fractional polynomial method is the best due to the reason mentioned before. As for nadir estimation, if we take the nadir, 23.7, calculated from the model with a LBMI transformation as the reference, the fractional polynomial nadir 24.2, the change point nadir 22.2 and the spline nadir 24.4 are close to the reference. The quadratic nadir 28.4 is too far from the reference point. As for confidence intervals, apparently the worst is the quadratic method since its confidence interval does not cover the reference nadir estimator and is the widest among all confidence intervals. The change point confidence interval is too narrow so that it does not cover the reference point 23.7. The reference nadir falls on the lower end of the fractional polynomial nadir. The quadratic spline confidence interval contains the reference nadir and is not too wide. Figure 3.1 and Figure 3.2 are the profile likelihood curves based on the change point model and the spline model, respectively, with the x−axis representing the BMI values and the y−axis representing the log partial likelihood. The profile curves achieve their maximums at the maximum partial likelihood estimators, and the horizontal lines are 12 χ21,α units lower than the corresponding maximum log partial likelihood values. In other words, the log partial likelihood values that are above the horizontal line correspond to BMIs in the 95% likelihood-based confidence interval if α is taken to be 0.05. For example, in Figure 3.1 the horizontal axis represents BMI, 50 equally spaced points in the range of BMI were taken to be the change point value one by one and the log partial likelihood was calculated for each change point. The solid curve is the log partial likelihood versus change point. The 95% confidence interval of the true change point consists of every change point with a log partial likelihood above the horizontal line, that is 21.7 − 22.7. Similarly, by applying a grid search of the knot and a grid search of the nadir under the quadratic free-knot spline model, we obtained the profile curves of the knot and the nadir in 23 15 20 25 30 35 −50170 −50210 −50240 −50250 cohort 65, NHIS White Female log partial likelihood −50200 −50190 −50180 log partial likelihood −50230 −50220 −50210 −50200 cohort 65, NHIS White Female 40 BMI Figure 3.1: Change Point Model Profile Likelihood, NHIS White Female 15 20 25 BMI 30 35 Figure 3.2: Quadratic Spline Profile Likelihood, NHIS White Female Figure 3.2. The knot profile curve is the solid line and the nadir profile curve is the dashed line. The two ends of the likelihood-based confidence interval of the nadir are given by the two intersection points of the profile likelihood curve and the horizontal line. Actually, the method used to generate the knot profile curve is similar to the process of generating the change point profile curve. 50 equally-spaced grid points were selected in the BMI range and each time the knot is taken to be one grid point so that the log partial likelihood value under the fixed knot can be obtained using a standard package. Plotting the log partial likelihood values against corresponding grid points generates the knot profile curve. The nadir profile curve is a bit different. Since we only focus on “quadratic-looking” non-monotonic curves the zero of the first order derivative of the quadratic spline function is the nadir. For each fixed nadir, using such a relationship, we can express one regression coefficient as a function of the other regression coefficients, the knot and the given nadir. This way the nadir enters the model as a parameter. Now for a fixed nadir a grid search within the BMI range needs to be applied to the knot since the knot is non-linear in the model. This double-grid-search utilizing standard software created the nadir profile curve in Figure 3.2. In this example the profile curves of the knot and nadir under the more general quadratic spline model show that the point estimators of the knot and the nadir are not equal, and their confidence 24 intervals are very different. The nadir is the optimal risk factor value, the knot takes care of the overall model fitting, forcing them to be the same will either affect nadir estimation or sacrifice the overall model fitting. Another problem of the change point method that we notice is its narrow confidence interval compared to other methods. If the confidence interval is too narrow the coverage probability might be affected. The adjusted fitted curves of this NHIS study, using the five different methods, are presented in Figure 3.3. For an easier visual comparison, their vertical locations are adjusted so that each fitted curve, evaluated at the nadir, is equal to zero. In this graph, the fitted quadratic curve is very different from all other curves in that it is almost flat, although all coefficients are significant, and its nadir is much higher than nadirs given by other methods. This problem is caused by the symmetry of the quadratic curve. 0 adjusted prognostic index .2 .4 .6 .8 1 cohort 65, NHIS White Female 10 20 30 40 50 BMI Trans Quad Frac Chgpt Spline Figure 3.3: Fitted Curves, NHIS White Female 3.1.2 NHIS White Male The NHIS White Male cohort contains 46264 males with 4582 deaths for whom smoking status was available. The shortest follow-up time is 1 day and the longest is 3286 days. According to both the likelihood ratio tests and the BIC values, the spline model is the best. 25 As shown in Table 3.2, the quadratic model again generates an unrealistic nadir estimate, its confidence interval is wider than other confidence intervals and does not cover the reference point estimator 26.2. The change point model and the quadratic spline model have nadirs that are equally close to the reference point, and both the confidence intervals contain the reference nadir 26.2 as a borderline case. The fractional polynomial model generates a nadir that is close to the reference point, its confidence interval covers the reference nadir. However, this method has its defect and the BIC value is misleading. Figure 3.4 is the change point profile curve. In Figure 3.5 the two profile likelihood curves are different, especially the generated confidence intervals are completely non-overlapping, indicating the knot and the nadir are not the same. Again, this example shows we should not force the knot and the nadir to be equal. Figure 3.6 is the fitted curve graph, once again the symmetric curve obtained from the fitting of a quadratic model is very different from other curves and gives a nadir estimator that is too high. Table 3.2: Model Comparisons And Nadir Estimations, NHIS White Male p(χ2 ) BIC Nadir C.I. 3.1.3 transformation fractional polynomial 88299.8 26.2 25.1 − 27.2 88267.0 26.5 25.8 − 27.2 quadratic change quadratic point spline 0 0 88396.8 88294.2 88261.2 34.2 25.5 26.9 31.6 − 36.9 24.7 − 26.3 26.2 − 27.4 The Norwegian Counties Study The Norwegian Counties Study is a population-based survey of counties in Norway. The cohort of 50, 000 individuals was examined initially in 1974 − 78, with follow-up visits from 1978 − 83 and 1983 − 88. Information on mortality is complete through 1992 (From Florida State University Diverse Populations Collaboration website biostat.stat.fsu.edu). The part of the data that we adopted was 24631 males. There are 2434 deaths with the follow-up period ranging from 29 days to 6866 days. Their lowest BMI is 13.04 and the highest is 60.64. Figure 3.7 shows as we move the change point to the left the model fitting gets better and better, and all points to the left of the data range could be used as the change 26 −44090 cohort 66, NHIS White Male log partial likelihood −44110 −44100 20 25 30 BMI 35 −44120 −44150 log partial likelihood −44140 −44130 −44120 −44110 cohort 66, NHIS White Male 40 Figure 3.4: Change Point Model Profile Likelihood, NHIS White Male 20 25 30 BMI 1.5 adjusted prognostic index .5 1 0 20 30 40 50 BMI Trans 40 Figure 3.5: Spline Model Profile Likelihood, NHIS White Male cohort 66, NHIS White Male 10 35 Quad Frac Chgpt Spline Figure 3.6: Fitted Curves, NHIS White Male 27 point. This means the change point model fit the data using a monotonically increasing curve although the likelihood ratio test based on BMI and LBMI, fractional polynomial test and the change point screening test all agreed the relationship was non-monotone. The confidence interval of the nadir based on LBMI also suggests existence of an upturn at the low end. Figure 3.8 is the profile likelihood curves of the spline model. It is clear the confidence interval of the nadir and that of the knot are completely apart from each other with a gap between them, indicating they are different hence the assumption that they are the same is not appropriate sometimes. The likelihood ratio tests in Table 3.3 suggest the change point model is not significantly different from the quadratic model, which is not surprising because the estimated change point is close to the lowest BMI value thus the change point model almost degenerates to a quadratic model. However the comparison between quadratic and the spline model shows the latter is significant with a p value of 3.70 × 10−11 . The best, among three models other than the transformation and fractional polynomial models, according to BIC is the spline model. By comparing estimated nadirs and confidence intervals we realize the quadratic model and the change point model both fit the data with monotonically increasing curves, whereas both the transformation model and the spline model detected the non-monotonicity. In this example the fractional polynomial fitting generated powers −1 and −2 for BMI, hence this model is the same as the transformation model. Figure 3.9 contains fitted curves using these methods. Since the fractional polynomial powers of BMI are −1 and −2, the transformation model and the fractional polynomial curve coincide. Table 3.3: Model Comparisons And Nadir Estimations, The Norwegian Counties Study (full sample) p(χ2 ) BIC Nadir C.I. transformation fractional polynomial 45862.5 22.8 22.1 − 23.4 45862.5 22.8 22.1 − 23.4 quadratic change quadratic point spline 0.2 1.2 × 10−11 45898.1 45906.2 45870.3 12.6 Failed 23.8 0.9 − 24.2 Failed 23.5 − 24.4 A closer look at the dataset reveals it is an extreme value that failed the quadratic and the change point model. After the observation with BMI 60.64 is dropped all BMI values 28 15 20 25 −22900 −22920 −22940 −22945 cohort 68, Norwegian Counties Study log partial likelihood −22915 −22910 −22905 log partial likelihood −22935 −22930 −22925 −22920 cohort 68, Norwegian Counties Study 30 15 20 25 BMI BMI Figure 3.7: Change Point Model Profile Likelihood, The Norwegian Counties Study (full sample) 30 Figure 3.8: Spline Model Profile Likelihood, The Norwegian Counties Study (full sample) 0 adjusted prognostic index .5 1 1.5 2 cohort 68, The Norwegian Counties Study 10 20 30 40 50 BMI Trans Quad Frac Chgpt Spline Figure 3.9: Fitted Curves, The Norwegian Counties Study (full sample) 29 are less than 50. Figure 3.10 shows that this time the change point profile likelihood curve goes down at the low end and a valid confidence interval for the nadir can be obtained. Figure 3.11 is the spline model profile curves. After the extreme BMI value is dropped there is still no intersection between the confidence interval of the knot and that of the nadir. Table 3.4 contains the model comparison and nadir estimation information obtained after the extreme value is dropped. If results based on the transformation model are used as the reference and the misleading BIC of fractional polynomial model is not considered, according to both BIC and likelihood ratio tests the best model is the quadratic spline model. The likelihood ratio p value between the change point model and the spline model is 3.0 × 10−4 (not shown). The estimated nadirs are around 22 and 23 and the confidence intervals are close. The transformation confidence interval and the spline confidence interval overlap. The fitted curves in Figure 3.12 were obtained after the extreme BMI was dropped. This time to the left of the nadir all curves are close except the quadratic model. To the right of the nadir the change point curve and the quadratic curve are very similar, the spline and the transformation model curves are close. cohort 68, Norwegian Counties Study log partial likelihood −22905 20 25 30 −22910 −22940 log partial likelihood −22930 −22920 −22910 −22900 −22900 cohort 68, Norwegian Counties Study 35 BMI Figure 3.10: Change Point Model Profile Likelihood, The Norwegian Counties Study (1 obs dropped) 20 25 BMI 30 Figure 3.11: Spline Model Profile Likelihood, The Norwegian Counties Study (1 obs dropped) This example is a special case where the change point model and the quadratic model fail when there are extreme values in the data. This is an indication that the quadratic and 30 Table 3.4: Model Comparisons And Nadir Estimations, The Norwegian Counties Study (1 obs dropped) p(χ2 ) BIC Nadir C.I. transformation fractional polynomial 45857.8 22.8 22.1 − 23.4 45854.1 23.3 22.6 − 23.9 quadratic change quadratic point spline −3 3.5 × 10 2.1 × 10−5 45869.7 45871.3 45868.3 22.9 22.4 23.8 21.2 − 24.6 21.0 − 23.3 23.2 − 24.1 0 adjusted prognostic index 1 2 3 4 cohort 68, The Norwegian Counties Study 10 20 30 40 50 BMI Trans Quad Frac Chgpt Spline Figure 3.12: Fitted Curves, The Norwegian Counties Study (1 obs dropped) change point models might not be as stable as the transformation, fractional polynomial and spline models in presence of extreme values. The effect of extreme values on nadir estimation will be explored later in the simulation part. 3.1.4 Diverse Populations Collaboration The Diverse Populations Collaboration is a group of investigators who have pooled data from their studies into a single database in order to examine issues of heterogeneity of results in epidemiological studies. The database available to the collaboration currently 31 includes person-level data from 27 studies providing 395, 682 observations. Over 4, 500, 000 person-years of follow-up is available documenting 60, 374 deaths, 17, 708 deaths from CHD and 15, 523 deaths from cancer. Baseline information begins in 1950 and continues through 1990. Data samples include both sexes, and white, black, hispanic, and other ethnic subgroups (From Florida State University Diverse Populations Collaboration website biostat.stat.fsu.edu). As mentioned before, four non-monotonicity detection tools, two quadratic models, second order fractional polynomial and the change point model screening procedure, were adopted. Actually they were applied to all 78 cohorts in the Diverse Populations Collaboration, and on 31 of the 78 cohorts all the four tests agreed the relationship between BMI and mortality was non-monotonic. Three models, the simple quadratic model with BMI, the change point model and the spline model, were fit to each of these 31 cohorts and likelihood ratio test statistics were calculated. Table 3.5 contains results of model comparisons based on the 31 cohorts: about one third of them are quadratic, one third are change point and the other one third are spline models. Table 3.6 are model comparison results based on BIC values. Half of them are quadratic, one fourth are change point models and the other one fourth are spline models. Table 3.5: Model Comparisons Using Likelihood Ratio Tests model frequency Quadratic 9 Change Point 10 Quadratic Spline 12 Table 3.6: Model Comparisons Using BIC model frequency Quadratic 15 Change Point 8 Quadratic Spline 8 32 3.2 Splines With Free Knots (3.2.1), as an example of free-knot polynomial splines, is the quadratic spline with one free knot and continuous first order derivative. h(t|xi , z i ) = h0 (t) × exp β1 x + β2 x2 + β3 (x − c1 )2 1{x>c1} + z i ⊤ γ (3.2.1) The only difference between (3.2.1) and (2.2.6) is that in (3.2.1) c1 is a parameter instead of a constant. The change point model assumes the knot and the nadir of the function are the same, which is shown to be not always true through examples in the previous section. Forcing the nadir and the knot to be equal makes the confidence interval of the nadir to be affected by that of the knot, maybe by the point where the nadir and the knot are the same as well. On the other hand, pushing the knot to be equal to the nadir could sacrifice the overall model fitting. A natural generalization of the change point model is to separate the two numbers and adopt the spline model with free knots. As a generalization of the change point model, the free-knot spline model can be applied to covariates that are not Normally distributed. Compared to fractional polynomials or splines with fixed knots it provides more accurate inference if the asymptotic distribution of the MPLEs for model parameters can be established. Therefore in the next chapter we will derive the asymptotic distribution of estimated parameters of (3.2.1). 33 CHAPTER 4 ASYMPTOTIC PROPERTIES OF THE PROPOSED METHOD We will first derive the asymptotic Normality of the score process, then prove the consistency of the Maximum Partial Likelihood Estimator, followed by the asymptotic Normality of the maximum partial likelihood estimator. All lemmas required for the proofs are presented in section 4.4. 4.1 Asymptotic Normality Of The Score Process Consider the Cox proportional hazards model in which the hazard rate at time t for an individual with q-variate covariate Z(t) is h(t) = h0 (t) exp(gθ (Z(t))), (4.1.1) where θ ∈ Rp , gθ (z) is twice continuously differentiable with respect to θ with first derivative g˙θ (p-column vector) and second derivative g¨θ (p × p matrix), and h0 is a baseline hazard rate. Our candidate for gθ (Z) is the one-free-knot quadratic spline gθ (z) = β1 z + β2 z 2 + β3 (z − k)2 1{z>k}, where θ = [β1 , β2 , β3 , k]⊤ a unknown column vector parameter, when z is not close to the knot k. From now on we shall denote the true unknown parameter by θ0 = [β1,0 , β2,0 , β3,0 , k0]⊤ . Our focus will be on the estimation of θ0 and its asymptotic behaviors. The first derivative is a 4-dimensional column vector ∂gθ g˙θ (Z) ≡ (Z) = [Z, Z 2 , (Z − k)2 1{z>k} , −2β3 (Z − k)1{z>k}]⊤ . ∂θ The second derivative is a 4 × 4 matrix g¨θ (Z) given by 0 0 0 0 0 0 ∂ 2 gθ 0 0 (Z) = 0 0 0 −2(Z − k)1{Z>k} ∂θ∂θ⊤ 0 0 −2(Z − k)1{Z>k} 2β3 1{Z>k} 34 (4.1.2) Note that g˙θ is a continuous function of θ, and under the assumption k 6= Z, g¨θ (Z) is also a continuous function of θ. Let T and U be the failure time and censoring time of a person and Z be a covariate associated with the person such as diastolic blood pressure, body mass index (BMI), etc. Suppose that the data available are i.i.d. observations (Xi , δi , Zi ) for i = 1, ..., n, where Xi ≡ min(Ti , Ui ), representing the observed time of person i; δi ≡ 1{Ti ≤Ui } , indicating that the observed time is a death time not a censoring. Let Ni (t) ≡ 1{Xi ≤t,δi =1} , being one when person i dies before or on time t and zero otherwise. Let the at-risk process Yi(t) ≡ 1{Xi ≥t} , denote whether person i is still alive, or at risk, at time t. Throughout we assume the following hold. MC. (N1 , ..., Nn ) is a multivariate counting process, from which it follows in particular that no two component processes jump at the same time, i.e., for any t ≥ 0 and i 6= j, P {∆Ni (t) = ∆Nj (t) = 1} = 0, (4.1.3) where ∆ξ(t) ≡ ξ(t) − ξ(t−) for a process ξ(t) : t ≥ 0 which is right-continuous with left-hand limit. PD. Each of the at-risk process Yi and covariate process Zi is predictable with respect to a right-continuous filtration {Ft : t ≥ 0} which represents the statistical information accruing over time. We have the following result. Proposition 1. In the Cox Proportional Hazards model, the covariate Z is expressed in a smooth function gθ (Z). Suppose that gθ has continuous first and second derivatives. Suppose that for i = 1, ..., n, the failure time Ti and the censoring Ui are conditionally independent given the covariate Zi ; the covariate Zi are bounded and constant in time; and P {Yi(τ ) > 0} > 0. Then the following hold. Rτ (I) The time τ is such that 0 h0 (x)dx < ∞. (II) Let n Sn(0) (θ, t) ≡ 1X Yi (t) exp(gθ (Zi )), n i=1 n Sn(1) (θ, t) 1X ≡ g˙θ (Zi )Yi (t) exp(gθ (Zi )), n i=1 n Sn(2) (θ, t) 1X ≡ g˙θ (Zi )⊗2 Yi (t) exp(gθ (Zi)). n i=1 35 Then for any compact neighborhood Θ of θ0 and on Θ × [0, τ ] there exists a scalar s(0) , a vector s(1) and a matrix s(2) such that for j = 0, 1, 2 sup x∈[0,τ ],θ∈Θ P ||Sn(j) (θ, x) − s(j) (θ, x)|| −→ 0, n → ∞, where g˙θ (Zi )⊗2 ≡ g˙θ (Zi ) × g˙θ (Zi )⊤ , kMk ≡ max{|Mij | : ∀i, j} is a norm of matrix M. (III) Using the definitions of Θ and s(j) , j = 0, 1, 2, given above, define e ≡ s(1) /s(0) , v ≡ s(2) /s(0) − e⊗2 then for any θ ∈ Θ and x ∈ [0, τ ], ∂ (0) s (θ, x) = s(1) (θ, x) ∂θ ∂ (1) s (θ, x) = s(2) (θ, x) + E[g¨θ (Z)Y (x) exp (gθ (Z))]. ∂θ (IV) For j = 0, 1, 2, the functions s(j) (θ, x) are bounded; the function families s(j) (., x), x ∈ [0, τ ] are equicontinuous at θ = θ0 ; and s(0) (θ, x) is bounded away from zero on Θ × [0, τ ]. Proof: Since Yi(τ ) = 1{Xi ≥τ } = 1{Ti ≥τ,Ui ≥τ } , it follows that P {Yi(τ ) > 0} = P {Ti ≥ τ, Ui ≥ τ }. By first taking the conditional expectation given Zi then taking the expectation of Zi and in view of the conditional independence assumption on Ti and Ui , we further have P {Yi(τ ) > 0} = P {Ti ≥ τ, Ui ≥ τ } = E(P {Ti ≥ τ |Zi }P {Ui ≥ τ |Zi }), Rτ where P {Ti ≥ τ |Zi } = exp − exp(gθ (Zi )) 0 h0 (x)dx . Hence condition P {Yi(τ ) > 0} > 0 implies that neither the non-negative random variable P {Ti ≥ τ |Zi } nor P {Ui ≥ τ |Zi } is Rτ zero almost surely, thus P {exp[− exp(gθ (Zi)) 0 h0 (x)dx] > 0} > 0. Therefore the desired Rτ result (I) 0 h0 (x)dx < ∞ follows. (0) We are now about to show (II). By the Strong Law of Large Numbers, Sn (θ, t) → E(Y (t) exp(gθ (Z))) almost surely for arbitrarily fixed (θ, t). Next we will show this pointwise convergence is uniform in Θ × [0, τ ] except on a measure zero set, for some compact neighborhood Θ of θ0 . (1) (2) The same argument applies to Sn (θ, t) and Sn (θ, t). P (Ui ≥ t|Zi ) could be a discontinuous function of t and s(0) (θ, t) = E[Yi (t) exp(gθ (Zi))] = E[exp(gθ (Zi))E(Yi (t)|Zi )] Z t = E[exp(gθ (Zi )) exp{− exp(gθ (Zi )) h0 (x)dx}P (Ui ≥ t|Zi )], 0 36 Since it follows that s(0) (θ, t) is not necessarily continuous in t. Following the idea used in the proof (0) of Glivenko-Cantelli Theorem 5.5.1 [36], we will prove the uniform convergence of Sn (θ, t) to s(0) (θ, t) on t ∈ [0, τ ], i.e., ξn (θ) ≡ sup ||Sn(0) (θ, x) − s(0) (θ, x)|| → 0, a.s. n → ∞. x∈[0,τ ] The details are given in Lemma 1. For the uniform convergence in θ, we need to show supθ∈Θ ξn (θ) → 0 on a set with probability one. Suppose the contrary, that is, there is an ǫ > 0, sequences {nk : k = 1, 2, ...} and {θk } such that for all k, ξnk (θk ) ≥ ǫ. If Θ is any compact neighborhood of θ0 , then there exists a convergent subsequence, denoted still by θk without loss of generality, such that θk → θ ∈ Θ. Then a contradiction can be derived as follows: ǫ ≤ ξnk (θk ) = sup ||Sn(0) (θk , x) − s(0) (θk , x)|| k x∈[0,τ ] ≤ sup x∈[0,τ ] ||Sn(0) (θk , x) k − Sn(0) (θ, x)|| k + sup ||s(0) (θk , x) − s(0) (θ, x)|| x∈[0,τ ] + sup ||Sn(0) (θ, x) − s(0) (θ, x)||. k x∈[0,τ ] The last line of the inequality tends to zero as k → ∞, based on the established uniform (0) convergence result in t when θ is fixed. By applying the defining expression of Sn (θ, x), the first term can be written as sup ||Sn(0) (θk , x) − Sn(0) (θ, x)|| k k x∈[0,τ ] nk h i 1 X = sup || Yi (x) exp(gθk (Zi)) − exp(gθ (Zi )) || x∈[0,τ ] nk i=1 nk h i 1 X sup ||Yi(x) exp(gθk (Zi)) − exp(gθ (Zi )) || ≤ nk i=1 x∈[0,τ ] nk 1 X ≤ sup || exp(gθk (Zi )) − exp(gθ (Zi ))|| nk i=1 x∈[0,τ ] nk 1 X ⊤ ∗ (Zi ))ġθ ∗ (Zi ) (θk − θ)| | exp(gθi,k = i,k nk i=1 nk 1 X ⊤ ∗ (Zi ) || × ||θk − θ|| ≤ B1 q||ġθi,k nk i=1 ≤ B1 B2 q||θk − θ||. 37 ∗ where θi,k ∈ Θ is a point on the line segment between θ and θk , B1 is the bound of | exp(gθ (Zi ))| on Θ with i, k signifying dependence on Zi and θk , B2 is the bound of ||g˙θ (Zi)⊤ || ∗ (Zi ). Hence when ||θk − θ|| → 0 the first term converges on Θ and q is the dimension of ġθi,k to zero also. For the second term we have sup ||s(0) (θk , x) − s(0) (θ, x)|| x∈[0,τ ] = sup ||EY (x)[exp(gθk (Z)) − exp(gθ (Z))]|| x∈[0,τ ] ≤ E sup ||Y (x)[exp(gθk (Z)) − exp(gθ (Z))]|| x∈[0,τ ] and it converges to zero as well based on similar arguments. Therefore a contradiction 0 < ǫ ≤ 0 is reached, and we conclude sup x∈[0,τ ],θ∈Θ ||Sn(0) (θ, x) − s(0) (θ, x)|| → 0, a.s. (4.1.4) We can prove (III) by applying a classical result about exchangeability of differentiation and expectation (see, e.g., Theorem 16.8 [37]) in a compact neighborhood Θ of θ0 and in view of the boundedness assumption of Zi . In fact, ∂ ∂ (0) s (θ, x) = E[Y (x) exp(gθ (Z))] ∂θ ∂θ and the derivative of Y (x) exp(gθ (Z)) w.r.t. θ is Y (x)g˙θ (Z) exp(gθ (Z)). By the boundedness of Z and compactness of Θ, this derivative is bounded by a constant that is independent of θ. Now the assumptions of Theorem 16.8 [37] are satisfied and hence ∂ (0) ∂ ∂ s (θ, x) = E[Y (x) exp(gθ (Z))] = E [Y (x) exp(gθ (Z))] ∂θ ∂θ ∂θ (1) = E[Y (x)g˙θ (Z) exp(gθ (Z))] = s (θ, x). To prove ∂ (1) s (θ, x) = s(2) (θ, x) + E[g¨θ (Z)Y (x) exp (gθ (Z))] ∂θ we use exactly the same idea. The derivative of Y (x)g˙θ (Z) exp(gθ (Z)) w.r.t. θ is Y (x)[g˙θ (Z)⊗2 + g¨θ (Z)] exp(gθ (Z)). Now g¨θ (Z) is bounded by a constant due to boundedness of Z and compactness of Θ so that the derivative is bounded by a constant. Again Theorem 38 16.8 [37] shows ∂ (1) ∂ s (θ, x) = E[Y (x)g˙θ (Z) exp(gθ (Z))] ∂θ ∂θ ∂ = E [Y (x)g˙θ (Z) exp(gθ (Z))] ∂θ = E[Y (x)(g˙θ (Z)⊗2 + g¨θ (Z)) exp(gθ (Z))] = s(2) (θ, x) + E[Y (x)g¨θ (Z) exp(gθ (Z))]. We are left with the last result (IV). Due to the boundedness of Z1 compactness of Θ, all s(j) (θ, t), j = 0, 1, 2 are bounded on Θ × [0, τ ]. Since Θ is compact and Z is bounded, it follows that there is a finite constant B such that |gθ (Z)| ≤ B for all θ ∈ Θ and Z. Hence for all points in Θ × [0, τ ] s(0) (θ, t) = E[Y1 (t) exp(gθ (Z1 ))] ≥ exp(−B)E[Y1 (t)] ≥ exp(−B)E[Y1 (τ )] = exp(−B)P {Y1 (τ ) > 0} > 0 The last inequality is due to the assumption P {Y1(τ ) > 0} > 0. At the end, we will show function families s(j) (., x), x ∈ [0, τ ] are equicontinuous at θ = θ0 for j = 0, 1, 2. We will demonstrate it with s(0) (., x) as an example. Consider supt∈[0,τ ] ||s(0) (θm , t) − s(0) (θ0 , t)|| for ||θm − θ0 || → 0 as m → ∞, sup ||s(0) (θm , t) − s(0) (θ0 , t)|| t∈[0,τ ] = sup ||E[Y (t)(egθm (Z1 ) − egθ0 (Z1 ) )]|| t∈[0,τ ] ≤ sup E[||egθm (Z1 ) − egθ0 (Z1 ) || · E(Y1 (t)|Z1 )] t∈[0,τ ] ≤ E||egθm (Z1 ) − egθ0 (Z1 ) || = E||ġθ∗ (Z1 )⊤ egθ∗ (Z1 ) (θm − θ0 )|| ≤ qE||ġθ∗ (Z1 )egθ∗ (Z1 ) || · ||θm − θ0 ||, where ġθ∗ (Zi) is the derivative of gθ (Zi ) w.r.t. θ evaluated at θ∗ ∈ Θ, q is the dimension of ġθ∗ (Z1 ). By letting m go to infinity the last member of the above inequality tends to zero due to boundedness of Zi and compactness of Θ. Therefore s(0) (., x), x ∈ [0, τ ] is a family of equicontinuous functions at θ = θ0 . This completes the proof. The covariate vector Z = [z1 , z2 , ..., zq ]⊤ is q-dimensional. In our case we will focus on the single main risk factor, thus q = 1. The Proportional Hazards Model with free knot 39 spline link function gθ (Z) assumes the hazard function satisfies h(t) = h0 (t) exp[gθ (Z)] with baseline hazard rate h0 . The model gθ (Z) is taken to be a quadratic spline with one free knot, gθ (Z) = β1 Z + β2 Z 2 + β3 (Z − k)2+ , where k is the knot parameter and (Z − k)2+ is (Z − k)2 1{Z>k} . The column parameter vector θ = [β1 , β2 , β3 , k]⊤ will be estimated through maximum partial likelihood method (MPL). The partial likelihood function is P L(θ) = L Y i=1 exp [gθ (Z(i) )] Σj∈Ri exp [gθ (Zj )] where T10 < T20 < ... < TL0 are ordered distinct death times, (i) is the label of the single individual which dies at Ti0 and Ri is the risk set of persons who are still alive at Ti0 , including (i). The log partial likelihood function can be expressed in martingale notation, l(θ) = log P L(θ) = n Z X i=1 ∞ 0 n h i X gθ (Zi ) − log Yj (t) exp(gθ (Zj )) dNi (t). j=1 The score function U(θ) is n ∂l(θ) X = ∂θ i=1 Z ∞ 0 Pn i h j=1 Yj (t)g˙θ (Zj ) exp(gθ (Zj )) Pn dNi (t), g˙θ (Zi ) − j=1 Yj (t) exp(gθ (Zj )) ∂gθ = [Z, Z 2 , (Z − k)2+ , −2β3 (Z − k)+ ]⊤ is a column vector. Let Ai (t) be ∂θ the compensator of the counting process Ni (t), in other words, Mi (t) = Ni (t) − Ai (t) is a where g˙θ = martingale. Under the Cox Proportional Hazards Model dAi (t) = Yi(t) exp(gθ0 (Zi ))h0 (t)dt, hence Pn i h j=1 Yj (t)g˙θ0 (Zj ) exp(gθ0 (Zj )) Pn dMi (t) U(θ0 ) = g˙θ0 (Zi ) − Y (t) exp(g (Z )) j θ j 0 0 j=1 i=1 Pn n Z ∞h i X j=1 Yj (t)g˙θ0 (Zj ) exp(gθ0 (Zj )) Pn + g˙θ0 (Zi ) − dAi (t) j=1 Yj (t) exp(gθ0 (Zj )) i=1 0 n Z X and the second term n Z X ∞ Pn i h j=1 Yj (t)g˙θ0 (Zj ) exp(gθ0 (Zj )) Pn dAi (t) g˙θ0 (Zi) − j=1 Yj (t) exp(gθ0 (Zj )) i=1 0 Z ∞ Z ∞ (1) Sn (θ0 , t) (0) (1) = nSn (θ0 , t)h0 (t)dt − nSn (θ0 , t)h0 (t)dt = 0 (0) 0 0 Sn (θ0 , t) ∞ 40 Therefore U(θ0 ) = n Z X i=1 ∞ 0 Pn i h j=1 Yj (t)g˙θ0 (Zj ) exp(gθ0 (Zj )) Pn dMi (t) g˙θ0 (Zi ) − j=1 Yj (t) exp(gθ0 (Zj )) is a martingale. Now let the score process be Pn n Z th i X j=1 Yj (x)g˙θ0 (Zj ) exp(gθ0 (Zj )) Pn dMi (x). U(θ0 , t) = g˙θ0 (Zi ) − Y (x) exp(g (Z )) j θ j 0 0 j=1 i=1 Based on the above proposition, we have the asymptotic Normality of the score process. Theorem 1. (Asymptotic Normality of the score process) Consider the Cox Proportional Hazards model. Suppose that the covariate structure of the main risk factor Zi is expressed as a free knot spline function gθ (Zi ) with Zi bounded and constant in time, P {Yi(τ ) > 0} > 0, k 6= Zi , ∀i, there is a neighborhood around the true knot k0 where no Zi falls in, and Rτ Σ(θ0 , τ ) ≡ 0 v(θ0 , x)s(0) (θ0 , x)h0 (x)dx is positive definite. Then, denoting the score process by U(θ0 , t), t ∈ [0, τ ], the following hold. (a) n−1/2 U(θ0 , t) converges in distribution to a Gaussian process, where each component of the Gaussian process has independent increments, the mean of the limiting process is zero and the covariance matrix of the limiting process at time t is Z t Σ(θ0 , t) = v(θ0 , x)s(0) (θ0 , x)h0 (x)dx 0 (b) If θ̂n is a consistent estimator of θ0 , then the plug-in estimator of Σ(θ0 , t), Σ̂(θ0 , t) = Pn R t 1 i=1 0 Vn (θ̂n , x)dNi (x), satisfies n n 1X sup || t∈[0,τ ] n i=1 Z t Vn (θ̂n , x)dNi (x) − Σ(θ0 , t)|| → 0 0 in probability as n tends to infinity. Furthermore, 1/n times the observed information matrix, 2 l(θ,t) − n1 ∂∂θ∂θ ⊤ , evaluated at θ̂n , is a consistent estimator of Σ(θ0 , t), for all t ∈ [0, τ ]. Proof: We first write n Z th n i X X U(θ0 , t) = g˙θ0 (Zi ) − g˙θ0 (Zj )pj (θ0 , x) dMi (x) i=1 where 0 j=1 Yj (x) exp(gθ0 (Zj )) pj (θ0 , x) ≡ Pn k=1 Yk (x) exp(gθ0 (Zk )) 41 can be viewed as the probability that, at time point x when θ = θ0 , index i is selected from an urn containing all the n indices. Defined this way, the selected index I is a random variable and ġθ0 (ZI ), as a vector function of I, is a random vector. The expectation of P ġθ0 (ZI ) is equal to nj=1 ġθ0 (Zj )pj (θ0 , x), where to stress the dependence on θ, x we write I Eθ,x the expectation calculated under the urn model with probabilities pi (θ, x) : i = 1, ..., n. Therefore n Z th i X U(θ0 , t) = ġθ0 (Zi ) − EθI0 ,x (ġθ0 (ZI )) dMi (x) = i=1 0 n Z t X Hi (θ0 , x)dMi (x) 0 i=1 and vector Hi (θ0 , x) = ġθ0 (Zi) − EθI0 ,x (ġθ0 (ZI )) is bounded and predictable. We will now use the Martingale Central Limit Theorem (see, e.g. Theorem 5.3.5 [38]) to prove (a) the asymptotic Normality of U (n) (θ0 , t) ≡ n−1/2 U(θ0 , t). Under our assumptions we need to show the bracket process n <U (n) 1X (θ0 , ·) > (t) = n i=1 Z t 0 Hi⊗2 (θ0 , x)dAi (x) converges in probability to a limiting matrix that is a function of t as n → ∞. We also need to show that if we define, based on the lth element of U(θ0 , t) and any ǫ > 0, that (n) Ul,ǫ (θ0 , t) ≡ n Z X i=1 then t 0 n−1/2 Hi,l (θ0 , x)1{|n−1/2 Hi,l (θ0 ,x)|≥ǫ}dMi (x) n < (n) Ul,ǫ (θ0 , ·) 1X > (t) = n i=1 Z t 0 2 Hi,l (θ0 , x)1{|n−1/2 Hi,l (θ0 ,x)|≥ǫ}dAi (x) converges in probability to 0. 42 Notice that n Z 1 X t ⊗2 < U (θ0 , ·) > (t) = H (θ0 , x)dAi (x) n i=1 0 i n Z i⊗2 1 X th g˙θ0 (Zi ) − EI (θ0 , x) dAi (x) = n i=1 0 !⊗2 Z t h (2) (1) i Sn (θ0 , x) 1 Sn (θ0 , x) dĀ(x) − = (0) n 0 Sn(0) (θ0 , x) Sn (θ0 , x) Z i 1 th I ⊗2 ⊗2 = Eθ0 ,x (g˙θ0 (ZI ) ) − EI (θ0 , x) dĀ(x) n 0 Z 1 t Vn (θ0 , x)dĀ(x) = n 0 Z t = Vn (θ0 , x)Sn(0) (θ0 , x)h0 (x)dx (n) 0 Here (2) Vn (θ0 , x) ≡ EθI0 ,x (g˙θ0 (ZI )⊗2 ) − EI (θ0 , x)⊗2 = Ā(x) ≡ Pn i=1 Sn (θ0 , x) (0) Sn (θ0 , x) − " (1) Sn (θ0 , x) (0) Sn (θ0 , x) #⊗2 Ai (x). In the above notation, EθI0 ,x (g˙θ0 (ZI )⊗2 ) is the urn model expectation of random matrix g˙θ0 (ZI )⊗2 hence Vn (θ0 , x) is the urn model variance-covariance matrix of g˙θ0 (ZI ) at time point x when θ = θ0 . (j) By boundedness of Sn (θ, x), j = 0, 1, 2 on Θ × [0, τ ] and bounded away from zero of s(0) (θ, x) on Θ × [0, τ ] in (IV) and (II) of Proposition 1, it can be shown that supx∈[0,τ ],θ∈Θ ||Vn (θ, x) − v(θ, x)|| → 0 in probability as n tends to infinity. In fact, (2) sup x∈[0,τ ],θ∈Θ − ||Vn (θ, x) − v(θ, x)|| = sup || Sn (θ, x) (0) Sn (θ, x) !⊗2 ⊗2 (1) (1) s (θ, x) s(2) (θ, x) Sn (θ, x) || + − (0) (0) s (θ, x) s(0) (θ, x) Sn (θ, x) x∈[0,τ ],θ∈Θ (2) s(2) (θ, x) ≤ sup || (0) || − (0) s (θ, x) x∈[0,τ ],θ∈Θ Sn (θ, x) !⊗2 ⊗2 (1) s(1) (θ, x) Sn (θ, x) ||. + sup || − (0) s(0) (θ, x) x∈[0,τ ],θ∈Θ Sn (θ, x) Sn (θ, x) 43 (4.1.5) For the first term, we have (2) sup Sn (θ, x) || (0) Sn (θ, x) x∈[0,τ ],θ∈Θ − s(2) (θ, x) || s(0) (θ, x) (2) ≤ sup || Sn (θ, x) − s(2) (θ, x) (0) Sn (θ, x) x∈[0,τ ],θ∈Θ + sup || x∈[0,τ ],θ∈Θ || s(2) (θ, x) (0) s(0) (θ, x)Sn (θ, x) || · ||Sn(0) (θ, x) − s(0) (θ, x)|| ! (2) supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − s(2) (θ, x)|| ≤ (0) + ||Sn (θ, x)|| supx∈[0,τ ],θ∈Θ ||s(2) (θ, x)|| (0) inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| · ||Sn (θ, x)|| × sup x∈[0,τ ],θ∈Θ ||Sn(0) (θ, x) − s(0) (θ, x)||, (4.1.6) (0) where ||Sn (θ, x)|| in the denominators can be further expressed as ||Sn(0) (θ, x)|| = ||Sn(0) (θ, x) − s(0) (θ, x) + s(0) (θ, x)|| ≥ ||s(0) (θ, x)|| − ||Sn(0) (θ, x) − s(0) (θ, x)|| ≥ inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| − sup x∈[0,τ ],θ∈Θ ||Sn(0) (θ, x) − s(0) (θ, x)||. Since s(0) (θ, x) is bounded away from zero on Θ × [0, τ ], it follows inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| ≥ η > 0 for some constant η > 0. From P sup x∈[0,τ ],θ∈Θ ||Sn(0) (θ, x) − s(0) (θ, x)|| −→ 0, (0) it follows that for sufficiently large n we have supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − s(0) (θ, x)|| ≤ 1 2 inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| ≤ η/2 on an event with probability tending to one. (0) ||Sn (θ, x)|| ≥ η/2 on an event tending to one. Therefore, (2) (2) supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − s(2) (θ, x)|| s(2) (θ, x) || ≤ − (0) || (0) 1 inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| Sn (θ, x) s (θ, x) 2 supx∈[0,τ ],θ∈Θ ||s(2) (θ, x)|| +1 sup ||Sn(0) (θ, x) − s(0) (θ, x)|| (0) (θ, x)||)2 (inf ||s x∈[0,τ ],θ∈Θ x∈[0,τ ],θ∈Θ 2 Sn (θ, x) ≤ (4/η 2 ) sup x∈[0,τ ],θ∈Θ ||s(2) (θ, x)|| sup x∈[0,τ ],θ∈Θ 44 ||Sn(0) (θ, x) − s(0) (θ, x)||. Hence By the boundedness of s(2) (θ, x), on [0, τ ] × Θ, and bounded away from zero of s(0) (θ, x) on [0, τ ] × Θ, both inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| and supx∈[0,τ ],θ∈Θ ||s(2) (θ, x)|| are positive and finite. (2) From the results in Proposition 1, both supx∈[0,τ ],θ∈Θ ||Sn (θ, x)−s(2) (θ, x)|| and supx∈[0,τ ],θ∈Θ (0) ||Sn (θ, x) − s(0) (θ, x)|| converge to zero in probability, the first term in (4.1.5) goes to zero in probability. To show the second term of (4.1.5) is negligible, we first have that for vectors A and B, ||A⊗2 − B ⊗2 || = ||AA⊤ − BB ⊤ || = ||(A − B)A⊤ + B(A − B)⊤ || ≤ ||(A − B)A⊤ || + ||B(A − B)⊤ || ≤ ||A − B|| · ||A⊤ || + ||B|| · ||(A − B)⊤ || = ||A − B|| · (||A|| + ||B||) ≤ ||A − B||(||A − B|| + 2||B||). Let vector A be (1) Sn (θ,x) (0) Sn (θ,x) and B be s(1) (θ,x) . s(0) (θ,x) Then the negligibility of the first term in (4.1.5) corresponds to the negligibility of supx∈[0,τ ],θ∈Θ ||A − B||, we only need to show 2 supx∈[0,τ ],θ∈Θ ||B|| is bounded. This is easily seen by considering the boundedness of s(1) (θ, x) and noting that s(0) (θ, x) is bounded away from zero. Hence both terms in (4.1.5) converge to zero in probability and sup P ||Vn (θ, x) − v(θ, x)|| −→ 0. x∈[0,τ ],θ∈Θ (0) Together with the conclusion supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − s(0) (θ, x)|| → 0 in probability in (II), it can be shown that sup x∈[0,τ ],θ∈Θ therefore Rt 0 P ||Vn (θ, x)Sn(0) (θ, x) − v(θ, x)s(0) (θ, x)|| −→ 0 h0 (x)dx < ∞ of Proposition 1 indicates: <U (n) (θ0 , ·) > (t) = P −→ Z t Z0 t Vn (θ0 , x)Sn(0) (θ0 , x)h0 (x)dx v(θ0 , x)s(0) (θ0 , x)h0 (x)dx. 0 Next, we show the second condition required by the Martingale Central Limit Theorem 45 is satisfied. Consider n < (n) Ul,ǫ (θ0 , .) 1X > (t) = n i=1 ≤ 1 n t Z 2 Hi,l (θ0 , x)1{|n−1/2 Hi,l (θ0 ,x)|≥ǫ}dAi (x) 0 n Z t X (2B)2 1{n−1/2 (2B)≥ǫ} dAi (x) 0 i=1 where B in the last term is the bound of g˙θ0 (Z) on Θ × [0, τ ] based on the boundedness of Zi and compactness of Θ. Then the indicator function in the integrand will be zero when n is P (n) large enough, therefore < Ul,ǫ (θ0 , .) > (t) −→ 0 as n → ∞. Now by the Martingale Central Limit Theorem, see, e.g. (Theorem 5.3.5 [38]), the score process {n−1/2 U(θ0 , t) : t ∈ [0, τ ]} converges in distribution to a Gaussian process with mean zero and independent increments of each component. The variance-covariance matrix of the limiting process at time t is the Rt limit of < n−1/2 U(θ0 , .) > (t), Σ(θ0 , t) = 0 v(θ0 , x)s(0) (θ0 , x)h0 (x)dx. This finishes the proof of part (a) of the theorem. To prove (b), we first have <U (n) 1 (θ0 , .) > (t) = n Z t Vn (θ0 , x)dĀ(x) Z Z 1 t 1 t P Vn (θ0 , x)dN̄ (x) − Vn (θ0 , x)dM̄ (x) −→ Σ(θ0 , t). = n 0 n 0 Rt P P where N̄(x) ≡ ni=1 Ni (x), M̄ (x) ≡ ni=1 Mi (x). Since n1 0 Vn (θ0 , x)dM̄ (x) is a mean zero Rt martingale, n1 0 Vn (θ0 , x)dN̄ (x) is a reasonable estimator of Σ(θ0 , t). The plug-in estimator 0 satisfies ≤ + + + t Z t 1 || Vn (θ̂n , x) dN̄(x) − v(θ0 , x)s(0) (θ0 , x)h0 (x)dx|| n 0 Z0 t h i1 || Vn (θ̂n , x) − v(θ̂n , x) dN̄ (x)|| n 0 Z th i1 || v(θ̂n , x) − v(θ0 , x) dN̄(x)|| n 0 " # Z t n X 1 || v(θ0 , x) dN̄(x) − Yi(x) exp(gθ0 (Zi ))h0 (x)dx || n 0 i=1 Z t || v(θ0 , x) Sn(0) (θ0 , x) − s(0) (θ0 , x) h0 (x)dx|| Z 0 46 (4.1.7) Applying Lemma 2, for any c, δ > 0 we have 1 δ P { N̄(t) > c} ≤ + P{ n c Z δ + P{ c Z = t 0 0 t n 1X Yi (t) exp(gθ0 (Zi ))h0 (x)dx > δ} n i=0 Sn(0) (θ0 , x)h0 (x)dx > δ}. By the SLLN, n1 N̄ (t) converges almost surely, hence the LHS of the above inequality has a limit limn→∞ P { n1 N̄ (t) > c}. As n → ∞ it can be shown using the Bounded Convergence Theorem that P{ Z t 0 Sn(0) (θ0 , x)h0 (x)dx > δ} → P { Z t s(0) (θ0 , x)h0 (x)dx > δ}. 0 Due to results (I) and (IV) in Proposition 1, Rt s(0) (θ0 , x)h0 (x)dx is a bounded random δ c when n → ∞. Eventually letting c → ∞, 0 variable thus δ can be chosen such that Z t s(0) (θ0 , x)h0 (x)dx ≤ δ, 0 therefore the RHS of the inequality is reduced to we obtain the result that n1 N̄(t) is bounded in probability, i.e. 1 lim lim P { N̄(t) > c} = 0. c→∞ n→∞ n Now consider the first term in (4.1.7). It has been shown that sup P ||Vn (θ, x) − v(θ, x)|| −→ 0. x∈[0,τ ],θ∈Θ If θ̂n is a consistent estimator of θ0 , θ̂n ∈ Θ except on an event with probability tending to zero for sufficiently large n. Hence P sup ||Vn (θ̂n , x) − v(θ̂n , x)|| −→ 0 x∈[0,τ ] when n → ∞. This together with the boundedness in probability of 1 N̄ (t) n lead to convergence in probability of the first term in (4.1.7). From (IV) of Proposition 1, functions s(j) (θ, x), j = 0, 1, 2 are bounded on Θ × [0, τ ], the function families s(j) (., x), j = 0, 1, 2, x ∈ [0, τ ] are equicontinuous at θ = θ0 and s(0) (θ, x) 47 is bounded away from zero on Θ × [0, τ ]. These can be used to show v(·, x), x ∈ [0, τ ] is a family of equicontinuous functions at θ = θ0 . Indeed when ||θm − θ0 || → 0, consider sup ||v(θm , x) − v(θ0 , x)|| x∈[0,τ ] ⊗2 (1) s (θm , x) s(2) (θm , x) − = sup || (0) s(0) (θm , x) x∈[0,τ ] s (θm , x) ⊗2 (1) s(2) (θ0 , x) s (θ0 , x) − (0) || + s (θ0 , x) s(0) (θ0 , x) s(2) (θm , x) s(2) (θ0 , x) ≤ sup || (0) − (0) || s (θ0 , x) x∈[0,τ ] s (θm , x) ⊗2 (1) ⊗2 (1) s (θ0 , x) s (θm , x) − ||. + sup || (0) s (θm , x) s(0) (θ0 , x) x∈[0,τ ] Similar to the argument used in proving P sup ||Vn (θ̂n , x) − v(θ̂n , x)|| −→ 0 x∈[0,τ ] the boundedness of s(j) (θ, x), j = 0, 1, 2, bounded away from zero of s(0) (θ, x) and equicontinuity of s(j) (·, x), x ∈ [0, τ ], j = 0, 1, 2 at θ = θ0 then imply v(·, x), x ∈ [0, τ ] is a family of equicontinuous functions at θ = θ0 , that is, sup ||v(θm , x) − v(θ0 , x)|| → 0 x∈[0,τ ] Now, the boundedness in probability result of 1 N̄(t) n and the equicontinuity of v(·, x), x ∈ [0, τ ] guarantees the second term in (4.1.7) converges to zero in probability if θ̂n is a consistent estimator of θ0 . Because of results (I), (II) and (IV) in Proposition 1, the fourth term of (4.1.7) converges to zero in probability. The convergence of the third term uses the second part of Lemma 2. Rt Consider one element, the (j, k)th entry, in the matrix 0 v(θ0 , x) n1 dN̄(x) − dĀ(x) : Z t 1 P { sup | v(θ0 , x)j,k dM̄ (x)| ≥ ρ} n y∈[0,t] 0 Z t δ 1 ≤ 2 + P { v(θ0 , x)2j,k 2 dĀ(x) ≥ δ} ρ n 0 Z t n X δ 2 1 = 2 + P { v(θ0 , x)j,k 2 Yi (x) exp(gθ0 (Zi ))h0 (x)dx ≥ δ} ρ n i=1 0 Z 1 t δ v(θ0 , x)2j,k Sn(0) (θ0 , x)h0 (x)dx ≥ δ}. = 2 + P{ ρ n 0 48 By the boundedness condition, the second term in the last member of the inequality vanishes when n is large enough. Because δ, ρ are arbitrary, we choose δ = ρ3 . Hence the above probability is bounded by δ so that the third term goes to zero in probability. Thus n 1X sup || t∈[0,τ ] n i=1 Z t P Vn (θ̂n , x)dNi (x) − Σ(θ0 , t)|| −→ 0 0 is shown. The derivative of the score process w.r.t. θ is a matrix given by !⊗2 Pn n Z ∞h X ∂ 2 l(θ) j=1 Yj (t)g˙θ (Zj ) exp(gθ (Zj )) Pn = g¨θ (Zi ) + ∂θ∂θ⊤ j=1 Yj (t) exp(gθ (Zj )) i=1 0 Pn ⊗2 + g¨θ (Zj )] i j=1 Yj (t) exp(gθ (Zj ))[g˙θ (Zj ) Pn dNi (t). − j=1 Yj (t) exp(gθ (Zj )) Note that 1/n times the observed information matrix is defined to be − 1 ∂ 2 l(θ) n ∂θ∂θ⊤ which is equal to Pn Z th ⊗2 + g¨θ (Zj )] j=1 Yj (t) exp(gθ (Zj ))[g˙θ (Zj ) Pn g¨θ (Zi ) − 0 j=1 Yj (t) exp(gθ (Zj )) !⊗2 Pn i j=1 Yj (t)g˙θ (Zj ) exp(gθ (Zj )) P dNi (t) + n j=1 Yj (t) exp(gθ (Zj )) n Z n Z i 1X t 1 X th I = Vn (θ, x)dNi (x) − g¨θ (Zi ) − Eθ,x g¨θ (ZI ) dNi (x) n i=1 0 n i=1 0 n 1X − n i=1 where g¨θ is given in (4.1.2) and it is a continuous function of θ under the assumption k 6= Zi , ∀i. To show − 1 ∂ 2 l(θ, t) n ∂θ∂θ⊤ evaluated at θ̂n is a consistent estimator of Σ(θ0 , t), uniform in t ∈ [0, τ ], we need to show 49 that as Θ shrinks to θ0 , n 1X sup || t∈[0,τ ],θ∈Θ n i=1 1 ≤ sup || t∈[0,τ ],θ∈Θ n P Z th i I g̈θ (Zi ) − Eθ,x g̈θ (ZI ) dNi (x)|| 0 n Z t X i=1 0 h g̈θ (Zi ) − g̈θ0 (Zi ) i I −Eθ,x g̈θ (ZI ) + EθI0 ,x g̈θ0 (ZI ) dNi (x)|| n Z i 1 X th I g̈θ0 (Zi ) − Eθ0 ,x g̈θ0 (ZI ) dNi (x)|| + sup || t∈[0,τ ] n i=1 0 (4.1.8) (4.1.9) −→ 0 This can be done by proving each of (4.1.8) and (4.1.9) goes to zero in probability. The proof of (4.1.9) is quite simple. Using dN = dM − dA, it is implied by the following two limits: n 1X sup || t∈[0,τ ] n i=1 and Z th i P g̈θ0 (Zi ) − EθI0 ,x g̈θ0 (ZI ) dMi (x)|| −→ 0, (4.1.10) Z th i P g̈θ0 (Zi ) − EθI0 ,x g̈θ0 (ZI ) dAi (x)|| −→ 0. (4.1.11) n 1X sup || t∈[0,τ ] n i=1 0 0 Immediately (4.1.11) holds because what’s in the norm is equal to Z thX n 0 i=1 i g̈θ0 (Zi)pi (θ0 , x) − EθI0 ,x (g̈θ0 (ZI )) Sn(0) (θ0 , x)h0 (x)dx = 0. Apply Lemma 3 to every entry of (4.1.10), we can show n Z i 1 X th η I P { sup || g̈θ0 (Zi ) − Eθ0 ,x θ0 , xg̈θ0 (ZI ) dMi (x)||2 ≥ ǫ} ≤ ǫ j,k t∈[0,τ ] n i=1 0 Z n 1 1X τ [g̈θ0 (Zi ) − EθI0 ,x g̈θ0 (ZI )]2j,k Yi (x) exp(gθ0 (Zi ))h0 (x)dx ≥ η} +P { n i=1 0 n Z τ η 1 ≤ + P { B2 Sn(0) (θ0 , x)h0 (x)dx ≥ η}, ǫ n 0 h i2 on Θ × [0, τ ]. Then results (I), (II) and where B is the bound of g̈θ0 (Zi ) − EθI0 ,x g̈θ0 (ZI ) j,k boundedness in (IV) guarantee 1 P { B2 n Z τ 0 Sn(0) (θ0 , x)h0 (x)dx ≥ η} 50 is zero when n is large enough for any η > 0. Therefore by taking η = ǫ2 , (4.1.10) is proved. Observe that g̈· (Zi ) and g· (Zi), Z ∈ Z are both equicontinuous functions of θ at θ0 , where Z is the sample space of all Zi , because Z is bounded and there is a neighborhood of the true knot k0 such that in the neighborhood there is no Zi . This fact is used below. Break (4.1.8) into the following two parts: n 1X sup || t∈[0,τ ],θ∈Θ n i=1 Z th i g̈θ (Zi ) − g̈θ0 (Zi ) dNi (x)||, (4.1.12) 0 and sup t∈[0,τ ],θ∈Θ Z th i I || Eθ,x g̈θ (ZI ) − EθI0 ,x g̈θ0 (ZI ) dN̄ (x)/n||. (4.1.13) 0 For (4.1.12), we have n 1X sup || t∈[0,τ ],θ∈Θ n i=1 ≤ 1 n n X Z th i g̈θ (Zi) − g̈θ0 (Zi ) dNi (x)|| 0 sup ||g̈θ (Zi ) − g̈θ0 (Zi )||Ni (τ ) i=1 θ∈Θ Since g̈· (Z) is an equicontinuous family of functions of θ at θ0 with respect to Z, it follows that as Θ shrinks to θ0 , supθ∈Θ ||g̈θ (Zi) − g̈θ0 (Zi )|| is negligible. Together with the fact that 1 N̄(τ ) n is bounded in probability, it follows (4.1.12) converges to zero in probability as Θ shrinks to θ0 and n tends to infinity. To deal with (4.1.13), let n 1X Tn (θ, t) ≡ g¨θ (Zi )Yi (t) exp(gθ (Zi )). n i=1 It can be seen, from the definitions of g̈θ (Z), Z ∈ Z and exp(gθ (Z)), Z ∈ Z, that both are equicontinuous families of functions of θ w.r.t. Z, and both are bounded on Θ × Z. These imply sup x∈[0,τ ],θ∈Θ ||g̈θ (Zj ) exp(gθ (Zj )) − g̈θ0 (Zj ) exp(gθ0 (Zj ))|| is negligible in probability as Θ shrinks to θ0 and n tends to infinity, so that sup P kTn (θ, t) − Tn (θ0 , t)k −→ 0, t∈[0,τ ],θ∈Θ 51 (4.1.14) Now the integrand in (4.1.13) can be broken as = Tn (θ, x) − Tn (θ0 , x) (0) Sn (θ, x) ≡ Bn (θ, x) − Cn (θ, x). − Tn (θ, x) Tn (θ0 , x) − (0) (0) Sn (θ, x) Sn (θ0 , x) (0) (0) Tn (θ0 , x)[Sn (θ, x) − Sn (θ0 , x)] (0) (0) Sn (θ0 , x)Sn (θ, x) I Eθ,x g̈θ (ZI ) − EθI0 ,x g̈θ0 (ZI ) = (0) Since Sn (θ, x) is bounded away from zero (by 1/η > 0 say) for large n, it follows Z t dN̄(x) || sup || Bn (θ, x) n t∈[0,τ ],θ∈Θ 0 1 ≤ η × sup kTn (θ, x) − Tn (θ0 , x)k × N̄(τ ). n x∈[0,τ ],θ∈Θ The boundedness in probability of n1 N̄ (τ ) and (4.1.14) imply that the above is negligible in probability as Θ shrinks to θ0 and n tends to infinity. A similar argument verifies Z t dN̄(x) sup || Cn (θ, x) || n t∈[0,τ ],θ∈Θ 0 also becomes negligible in probability as Θ shrinks to θ0 and n tends to infinity. Combining the above shows that (4.1.13) and hence (4.1.8) is negligible in probability again as Θ shrinks to θ0 and n tends to infinity. Thus we have shown the negligibility of both (4.1.8) and (4.1.9). Since a consistent estimator θ̂n of θ0 can be made eventually inside the shrinking compact neighborhood Θ of θ0 for n → ∞ on an event with probability tending to one, it follows 1 ∂ 2 l(θ, t) − n ∂θ∂θ⊤ evaluated at θ̂n is a consistent estimator of Σ(θ0 , t) uniform in t ∈ [0, τ ]. The proof is complete. 4.2 Consistency Of The Maximum Partial Likelihood Estimator We now give the consistency of the maximum partial likelihood estimator. Theorem 2. (Consistency) In the Cox Proportional Hazards model if the covariate structure of the main risk factor Zi is expressed as a free knot spline function gθ (Zi) with 52 Zi bounded and constant in time, P {Yi(τ ) > 0} > 0, k0 6= Zi , ∀i and Σ(θ0 , τ ) = Rτ v(θ0 , x)s(0) (θ0 , x)h0 (x)dx is positive definite, then the maximum partial likelihood esti0 P mator θ̂n is consistent, i.e., θ̂n −→ θ0 when n → ∞. Proof: The proof uses Lemma 4. Let Xn (θ, t) ≡ n−1 [log P L(θ, t) − log P L(θ0 , t)]. Then n 1X Xn (θ, t) = n i=1 1 − n Z th i gθ (Zi) − gθ0 (Zi ) dNi (x) 0 n Z t X 0 i=1 h Accordingly, define An (θ, t) as follows: n 1X An (θ, t) ≡ n i=1 1 − n Pn i j=1 Yj (x) exp(gθ (Zj )) dNi (x). log Pn j=1 Yj (x) exp(gθ0 (Zj )) Z th i gθ (Zi ) − gθ0 (Zi ) dAi (x) 0 n Z t X i=1 0 h Pn i j=1 Yj (x) exp(gθ (Zj )) dAi (x). log Pn j=1 Yj (x) exp(gθ0 (Zj )) Then Xn (θ, t) − An (θ, t) is a martingale and n < Xn (θ, ·) − An (θ, ·) > (t) Z t X n h i2 1 = gθ (Zi) − gθ0 (Zi) Yi (x) exp(gθ0 (Zi ))h0 (x)dx 0 n i=1 Z th n (0) i Sn (θ, x) i 1 X h 2 gθ (Zi ) − gθ0 (Zi ) dAi (x) − log (0) Sn (θ0 , x) n i=1 0 Z th (0) Sn (θ, x) i2 (0) + log (0) Sn (θ0 , x)h0 (x)dx. Sn (θ0 , x) 0 (4.2.1) Where h i2 gθ (Zi ) − gθ0 (Zi ) h i2 = (β1 − β10 )Zi + (β2 − β20 )Zi2 + β3 (Zi − k)2+ − β30 (Zi − k0 )2+ with θ = [β1 , β2 , β3 , k]⊤ , θ0 = [β10 , β20 , β30 , k0]⊤ , (Zi − k)2+ = (Zi − k)2 1{Zi >k} and (Zi − k0 )2+ = (Zi − k0 )2 1{Zi >k0 } . Notice that g˙θ = ∂gθ = [Z, Z 2 , (Z − k)2 1{z>k}, −2β3 (Z − k)1{z>k} ]⊤ , ∂θ 53 and n Sn(1) (θ, x) = 1X Yi (x)g˙θ (Zi ) exp(gθ (Zi )) n i=1 n Sn(2) (θ, x) = hence 1 n Pn h i=1 1X Yi(x)g˙θ (Zi )⊗2 exp(gθ (Zi )), n i=1 i2 gθ (Zi )−gθ0 (Zi) Yi (x) exp(gθ0 (Zi )) in the first term of (4.2.1) can be expressed (2) (2) as a combination of entries in Sn (θ, x), therefore the fact that supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − i2 Pn h P 1 (2) s (θ, x)|| −→ 0 indicates the convergence of n i=1 gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi )) and in fact n i2 1 Xh gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi )) sup || x∈[0,τ ],θ∈Θ n i=1 h i2 P − E gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))|| −→ 0. The uniform convergence in probability of the first term in (4.2.1) is established. i Pn h 1 Similarly, n i=1 2 gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi )) in the second term of (4.2.1) (1) (1) can be expressed as a combination of entries in Sn (θ, x), and supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − P s(1) (θ, x)|| −→ 0 implies n i 1X h 2 gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi )) x∈[0,τ ],θ∈Θ n i=1 h i P − 2E gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))|| −→ 0. sup || In view of the convergence of log (0) Sn (θ,x) (0) Sn (θ0 ,x) (0) , the uniform convergence in to log ss(0) (θ(θ,x) 0 ,x) probability of the second term in (4.2.1) is verified. Uniform convergence in probability of the third term in (4.2.1) is confirmed by considering (0) Rt (0) n (θ,x) the convergence of log S(0) , Sn (θ0 , x) and the condition that 0 h0 (x)dx < ∞. Sn (θ0 ,x) Therefore, it has been shown n < Xn (θ, ·) − An (θ, ·) > (t) has a finite limit and (2) of Lemma 2 implies P Xn (θ, t) − An (θ, t) −→ 0, n → ∞. Next consider An (θ, τ ), n Z i 1X τh gθ (Zi ) − gθ0 (Zi ) Yi(x) exp(gθ0 (Zi ))h0 (x)dx An (θ, τ ) = n i=1 0 Z τh (0) Sn (θ, x) i (0) − log (0) Sn (θ0 , x)dx. 0 Sn (θ0 , x) 54 By the SLLN, boundedness of Zi , compactness of Θ and Rτ 0 h0 (x)dx < ∞, we have n Z i 1X τh gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi ))h0 (x)dx n i=1 0 Z τh i → E gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))h0 (x)dx 0 almost surely. Results (II), boundedness of (IV) and bounded away from zero of s(0) (θ, x) in Proposition 1 lead to the following convergence in probability Z τ 0 h (0) Sn (θ, x) i (0) P Sn (θ0 , x)dx −→ log (0) Sn (θ0 , x) Z 0 τ h s(0) (θ, x) i (0) s (θ0 , x)dx. log (0) s (θ0 , x) Hence, P An (θ, τ ) −→ E − Z τ 0 Z τ Z τ 0 h i gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))h0 (x)dx h log s(0) (θ, x) i (0) s (θ0 , x)dx s(0) (θ0 , x) and P Xn (θ, τ ) −→ E − Z 0 0 τ h i gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))h0 (x)dx h log s(0) (θ, x) i (0) s (θ0 , x)dx, s(0) (θ0 , x) and the common limit is denoted by A(θ, τ ). We already derived the expression of ∂ 2 l(θ,τ ) , ∂θ∂θ⊤ therefore n Z ∂ 2 Xn (θ, τ ) 1 ∂ 2 l(θ, τ ) 1X τ Vn (θ, x)dNi (x) = =− ∂θ∂θ⊤ n ∂θ∂θ⊤ n i=1 0 n Z i 1X τh I + g¨θ (Zi) − Eθ,x g¨θ (ZI ) dNi (x). n i=1 0 It has been shown the second term of the last line converges to zero in probability for all θ ∈ Θ when n → ∞ and Θ shrinks to θ0 . Vn (θ, x) is the urn model variance-covariance P Rτ matrix of g˙θ (ZI ) at time point x, thus it is positive definite and − n1 ni=1 0 Vn (θ, x)dNi (x) is negative definite. Due to the assumption k0 6= Zi , ∀i, we know ∂ 2 Xn (θ,τ ) ∂θ∂θ⊤ is a continuous function in θ. All these together imply Xn (θ, τ ) is a concave function of θ when n is large and θ ∈ Θ with samll compact neighborhood Θ of θ0 . 55 Next we will show A(θ, τ ) has a unique maximum at θ = θ0 . The boundedness of Zi , compactness of Θ and result (III) in Proposition 1 imply we can change the order of differentiation and taking expectation due to Theorem 16.8 [37], and we have Z τ ∂A(θ, τ ) = E g˙θ (Z)Y (x) exp(gθ0 (Z))h0 (x)dx ∂θ 0 Z τ (1) s (θ, x) (0) − s (θ0 , x)h0 (x)dx, (0) 0 s (θ, x) and ∂A(θ, τ ) |θ=θ0 = 0. ∂θ By boundedness of Zi, compactness of Θ and result (III) in Proposition 1 Theorem 16.8 [37] is used again and ∂ 2 A(θ, τ ) = ∂θ∂θ⊤ Z τ E g¨θ (Z)Y (x) exp(gθ0 (Z))h0 (x)dx 0 − − τ Z E Z0 τ g¨θ (Z)Y (x) exp(gθ0 (Z)) (0) s (θ0 , x)h0 (x)dx s(0) (θ, x) v(θ, x)s(0) (θ0 , x)h0 (x)dx. 0 All integrands in the above expression are continuous at θ = θ0 . Results (I), boundedness and bounded away from zero in result (IV) of Proposition 1 show in the neighborhood Θ all integrands are bounded by integrable functions. Then Theorem 16.8 [37] applies, ∂ 2 A(θ,τ ) ∂θ∂θ⊤ is a continuous function of θ at θ0 and ∂ 2 A(θ, τ ) → − ∂θ∂θ⊤ Z τ v(θ0 , x)s(0) (θ0 , x)h0 (x)dx, 0 a negative definite matrix, when θ → θ0 . It is verified in the neighborhood Θ, ∂ 2 A(θ,τ ) ∂θ∂θ⊤ unique maximum at θ = θ0 . Now applying Lemma 4 shows if θ̂n is the MPLE of θ0 , then P θ̂n −→ θ0 This completes the proof. 4.3 Asymptotic Normality Of The MPLE We now prove the asymptotic Normality of the MPLE. 56 has a Theorem 3. (Asymptotic Normality of MPLE) Σ(θ0 , t) is defined as before. Then n1/2 (θ̂n − θ0 ) =⇒ N 0, Σ−1 (θ0 , τ ) Proof: Expand U(θ̂n , τ ) at θ0 using Taylor’s series. We obtain U(θ̂n , τ ) = U(θ0 , τ ) + ∂ 2 l(θ, τ ) |θ=θ∗ (θ̂n − θ0 ) ∂θ∂θ⊤ where θ∗ is on a line segment between θ̂n and θ0 . Because θ̂n is the MPLE of θ0 , U(θ̂n , τ ) = 0 and 1/2 n −1 ∂ h (θ̂n − θ0 ) = − n We already showed −n−1 2 i−1 h i l(θ, τ ) −1/2 |θ=θ∗ × n U(θ0 , τ ) . ∂θ∂θ⊤ ∂ 2 l(θ, τ ) P |θ=θ∗ −→ Σ(θ0 , τ ), ∂θ∂θ⊤ n−1/2 U(θ0 , τ ) =⇒ N(0, Σ(θ0 , τ )). These and the Slutsky’s Theorem together imply n1/2 (θ̂n − θ0 ) =⇒ N 0, Σ−1 (θ0 , τ ) . 4.4 Lemmas In this section, we collect several lemmas. (0) Lemma 1. supx∈[0,τ ] ||Sn (θ, x) − s(0) (θ, x)|| → 0 almost surely when n → ∞. (0) Proof: Both Sn (θ, x) and s(0) (θ, x) are bounded left-continuous non-increasing functions of x. Hence s(0) (θ, x) has at most countably many jumps on [0, τ ]. On [0, τ ], let Q denote the set of rational numbers and J the set of jumps of s(0) (θ, x). Then for each x in Q, (0) Sn (θ, x) → s(0) (θ, x) on a set of probability one by the Strong Law of Large Numbers. Due (0) to countability of Q, there is a set of probability one on which Sn (θ, x) → s(0) (θ, x) for all x ∈ Q. On the other hand, there exists a set of probability one such that for all x in J, (0) (0) Sn (θ, x+ ) −Sn (θ, x− ) → s(0) (θ, x+ ) −s(0) (θ, x− ). The intersection of the above two sets has (0) probability one and we will prove supx∈[0,τ ] ||Sn (θ, x) − s(0) (θ, x)|| → 0 on the intersection set by contradiction. 57 Suppose there is a fixed ǫ > 0, we can find a sequence of indices nk and a sequence (0) xk ∈ [0, τ ] which satisfy ||Snk (θ, xk ) − s(0) (θ, xk )|| ≥ ǫ for all k. Since τ can not take infinity, the sequence xk is bounded and has a convergent subsequence that is, for simplicity, denoted by xk → x when k → ∞. Let rational numbers r1 ∈ Q and r2 ∈ Q be such that r1 < x < r2 . When k is large enough we have the following four cases: (0) (0) (0) 1. xk ↑ x, xk ≤ x: ǫ ≤ Snk (θ, xk ) − s(0) (θ, xk ) ≤ Snk (θ, r1 ) − s(0) (θ, x) = Snk (θ, r1 ) − s(0) (θ, r1 ) + s(0) (θ, r1 ) − s(0) (θ, x). (0) (0) 2. xk ↑ x, xk ≤ x: ǫ ≤ s(0) (θ, xk ) − Snk (θ, xk ) ≤ s(0) (θ, r1 ) − Snk (θ, x) ≤ s(0) (θ, r1 ) − (0) (0) (0) s(0) (θ, r2 ) + s(0) (θ, r2 ) − Snk (θ, r2 ) + Snk (θ, x+ ) − Snk (θ, x). (0) (0) (0) 3. xk ↓ x, xk > x: ǫ ≤ Snk (θ, xk ) − s(0) (θ, xk ) ≤ Snk (θ, x+ ) − s(0) (θ, r2 ) ≤ Snk (θ, x+ ) − (0) (0) Snk (θ, x) + Snk (θ, r1 ) − s(0) (θ, r1 ) + s(0) (θ, r1 ) − s(0) (θ, r2 ). (0) (0) 4. xk ↓ x, xk > x: ǫ ≤ s(0) (θ, xk ) − Snk (θ, xk ) ≤ s(0) (θ, x+ ) − Snk (θ, r2 ) = s(0) (θ, x+ ) − (0) s(0) (θ, r2 ) + s(0) (θ, r2 ) − Snk (θ, r2 ). We first let k go to infinity then let r1 and r2 tend to x, based upon the convergence results on the intersection set all above inequalities lead to a contradiction 0 < ǫ ≤ 0. The proof is complete. Lemma 2. Let N be a univariate counting process with continuous compensator A, let M = N − A, and let H be a locally bounded, predictable process. Then for all δ, ρ > 0 and any t ≥ 0, (1) P {N(t) ≥ ρ} ≤ δ + P {A(t) ≥ δ}. ρ (2) P { sup | y∈[0,t] Z y 0 δ H(x)dM(x)| ≥ ρ} ≤ 2 + P { ρ Z t H 2 (x)dA(x) ≥ δ}. 0 Proof. See Lemma 8.2.1 [38]. Lemma 3. Suppose that M is a square integrable martingale with M(0) = 0. Then for all η, ǫ > 0, P { sup M 2 (t) ≥ ǫ} ≤ t∈[0,τ ] η + P {< M > (τ ) ≥ η} ǫ 58 Proof. See Theorem 3.4.1 [38] Lemma 4. Let E be an open convex subset of Rp , and let F1 , F2 , ..., be a sequence of random concave functions on E and f a real-valued function on E such that, for all x ∈ E, lim Fn (x) = f (x) n→∞ in probability. Then 1. The function f is concave. 2. For all compact subsets A of E, P sup |Fn (x) − f (x)| −→ 0, n → ∞. x∈A P 3. If Fn has a unique maximum at Xn and f has one at x, then Xn −→ x as n → ∞. Proof. See Lemma 8.3.1 [38]. 4.5 The Neighborhood Condition In the previous sections, a neighborhood condition was added to the model. This condition requires the existence of a neighborhood of the true knot k0 , in which the risk factor does not fall. Such a condition is satisfied if the risk factor is discrete, not satisfied if it is continuous. When there is no such a neighborhood, the asymptotic Normality of the MPLE is examined through simulations. The quadratic one free-knot spline model (2.2.6) was taken as an example. Data adopted for checking asymptotic Normality of MPLE were simulated based on the Glostrup Female cohort in the Diverse Populations Collaboration. The Glostrup Study is a pool of seven observational cohorts from Glostrup, a city west of Copenhagen, Denmark. The female cohort consists of 5061 observations with 420 deaths. Since the inverse transformation of BMI is approximately Normal, it is easier and better to first simulate LBMI from a Normal distribution and then transform LBMI back to BMI. The hazard function h0 (t) × exp(gθ (Z)) was modeled using the parametric model with Weibull distribution, that is, the corresponding hazard scale can be expressed as h(t|Z) = ptp−1 exp(β0 ) × exp(gθ (Z)) 59 where p is the parameter coming from the Weibull distribution, β0 is an intercept term serving to scale the baseline hazard. Then the survival time before the death happens can be generated from Uniform random variable Y ∼ U[0, 1], using the relationship h i1/p T = − ln(Y ) × exp − (β0 + gθ (Z)) (4.5.1) where gθ (Z) is the spline expression, Z is BMI. For the purpose of simulation, the censoring time was assumed to be independent of the covariate effects and was generated similarly using formula h i1/q U = − ln(Y ) × exp(−δ0 ) . (4.5.2) where δ0 and q were both estimated when censoring times were treated as if they were the real death times and the death times were censoring times. The observed follow-up time was taken to be X = min {T, U}. For (4.5.1), the mean LBMI used is 4.24 × 10−2 and the standard deviation is 6.80 × 10−3 . Other parameters are p = 1.34, β0 = −6.14, β1 = −6.60 × 10−1 , β2 = 1.46 × 10−2 , β3 = −2.09 × 10−2 , k = 27.65. For (4.5.2), δ0 = −19.62 and q = 2.38. These values are pre-estimated MPLEs of parameters. When BMI was generated, values falling in certain range were dropped to create a neighborhood containing no BMI. For each neighborhood width 1000 simulated samples of size 20000 were used. And the neighborhood width gradually shrank to 0. This allows us to compare the asymptotic behavior of the estimated parameters with and without the neighborhood condition. The MPLEs were obtained adopting the likelihood ratio method. The gap adopted in Figure 4.1 is 27 − 28. Subfigure 4.1(a) is the smoothed density curve of the coefficient β1 in model (2.2.6). Subfigure 4.1(b) is the corresponding qq-norm plot. Subfigures 4.1(c) and 4.1(d) are for β2 . β3 ’s smoothed density curve is 4.1(e) and qq-norm plot is 4.1(f). The two graphs 4.1(g) and 4.1(h) in the last row of Figure 4.1 are the density curve and qq-norm plot of knot c1 in model (2.2.6), respectively. Graphs in Figures 4.2, 4.3 and 4.4 are arranged in the same manner, except that Figure 4.2 was generated using a gap of 27.3 − 27.8, Figure 4.3’s gap is 27.5 − 27.7 and the last one, Figure 4.4, does not have any gap. These graphs show that when the width of the neighborhood shrinks to 0 the asymptotic behavior of the estimated MPLEs remain stable. The smoothed density curves are similar when the neighborhood width changes, all curves are roughly centered at their true parameter values. The qq-norm plots are straight lines. It is very interesting that when 60 the sample size was taken to be 5000 the smoothed density curves and qq-norm plots did not show Normality and qq-norm plots were not straight lines (pictures not shown), as the sample size increased to 20000 both the density curves and the qq-norm plots became much better. This suggests that the asymptotic results may not apply very quickly. The effect of sample size on nadir estimation will be examined in the simulation part. 61 Quadratic Spline with One Free Knot: 27−−−28 0.0 −1.2 0.5 −1.0 −0.8 Sample Quantiles 1.5 1.0 Density −0.6 2.0 −0.4 2.5 Quadratic Spline with One Free Knot: 27−−−28 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 −3 −2 bmi Smoothed Density −1 0 1 2 3 Theoretical Quantiles Normal Q−Q Plot (a) (b) Quadratic Spline with One Free Knot: 27−−−28 0.020 Sample Quantiles 0 0.010 0.015 60 20 40 Density 80 100 0.025 120 Quadratic Spline with One Free Knot: 27−−−28 0.005 0.010 0.015 0.020 0.025 0.030 −3 −2 bmisq Smoothed Density −1 0 1 2 3 Theoretical Quantiles Normal Q−Q Plot (c) (d) Quadratic Spline with One Free Knot: 27−−−28 −0.020 −0.025 Sample Quantiles 60 0 −0.035 20 −0.030 40 Density 80 100 −0.015 120 Quadratic Spline with One Free Knot: 27−−−28 −0.035 −0.030 −0.025 −0.020 −0.015 −3 −2 right Smoothed Density −1 (e) 1 2 3 (f) Quadratic Spline with One Free Knot: 27−−−28 28 0.00 26 0.05 0.10 0.15 Sample Quantiles 30 0.20 0.25 32 Quadratic Spline with One Free Knot: 27−−−28 Density 0 Theoretical Quantiles Normal Q−Q Plot 24 26 28 30 32 −3 −2 knot Smoothed Density −1 0 1 Theoretical Quantiles Normal Q−Q Plot (g) (h) Figure 4.1: Gap 1: 27—28 62 2 3 Quadratic Spline with One Free Knot: 27.3−−−27.8 0.0 −0.8 −1.6 0.5 −1.4 −1.2 −1.0 Sample Quantiles 1.5 1.0 Density 2.0 −0.6 2.5 −0.4 3.0 Quadratic Spline with One Free Knot: 27.3−−−27.8 −1.5 −1.0 −0.5 −3 −2 bmi Smoothed Density −1 (a) 1 2 3 (b) Quadratic Spline with One Free Knot: 27.3−−−27.8 0.025 0.020 0 0.010 20 0.015 40 60 80 Sample Quantiles 0.030 100 120 0.035 Quadratic Spline with One Free Knot: 27.3−−−27.8 Density 0 Theoretical Quantiles Normal Q−Q Plot 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 −3 −2 bmisq Smoothed Density −1 (c) 2 3 0 −0.025 −0.040 20 −0.035 −0.030 60 80 Sample Quantiles −0.020 100 −0.015 120 Quadratic Spline with One Free Knot: 27.3−−−27.8 40 Density 1 (d) Quadratic Spline with One Free Knot: 27.3−−−27.8 −0.040 −0.035 −0.030 −0.025 −0.020 −0.015 −0.010 −3 −2 right Smoothed Density −1 0 1 2 3 Theoretical Quantiles Normal Q−Q Plot (e) (f) Quadratic Spline with One Free Knot: 27.3−−−27.8 28 0.00 24 0.05 0.10 26 0.15 Sample Quantiles 0.20 30 0.25 0.30 32 Quadratic Spline with One Free Knot: 27.3−−−27.8 Density 0 Theoretical Quantiles Normal Q−Q Plot 22 24 26 28 30 32 −3 −2 knot Smoothed Density −1 0 1 Theoretical Quantiles Normal Q−Q Plot (g) (h) Figure 4.2: Gap 2: 27.3—27.8 63 2 3 Quadratic Spline with One Free Knot: 27.5−−−27.7 −0.8 0.0 −1.4 0.5 −1.2 −1.0 Sample Quantiles 1.5 1.0 Density 2.0 −0.6 2.5 −0.4 Quadratic Spline with One Free Knot: 27.5−−−27.7 −1.6 −1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 −3 −2 bmi Smoothed Density −1 0 1 2 3 Theoretical Quantiles Normal Q−Q Plot (a) (b) Quadratic Spline with One Free Knot: 27.5−−−27.7 0.025 0 0.010 20 0.015 0.020 Sample Quantiles 60 40 Density 80 100 0.030 120 0.035 Quadratic Spline with One Free Knot: 27.5−−−27.7 0.005 0.010 0.015 0.020 0.025 0.030 0.035 −3 −2 bmisq Smoothed Density −1 (c) 1 2 3 (d) Quadratic Spline with One Free Knot: 27.5−−−27.7 −0.025 Sample Quantiles 60 40 0 −0.035 20 −0.030 80 −0.020 100 120 −0.015 Quadratic Spline with One Free Knot: 27.5−−−27.7 Density 0 Theoretical Quantiles Normal Q−Q Plot −0.040 −0.035 −0.030 −0.025 −0.020 −0.015 −3 −2 right Smoothed Density −1 0 1 2 3 Theoretical Quantiles Normal Q−Q Plot (e) (f) Quadratic Spline with One Free Knot: 27.5−−−27.7 26 28 Sample Quantiles 0.15 0.10 24 0.05 0.00 Density 0.20 30 0.25 32 0.30 Quadratic Spline with One Free Knot: 27.5−−−27.7 24 26 28 30 32 −3 −2 knot Smoothed Density −1 0 1 Theoretical Quantiles Normal Q−Q Plot (g) (h) Figure 4.3: Gap 3: 27.5—27.7 64 2 3 Quadratic Spline with One Free Knot: no gap −0.8 Sample Quantiles 0.0 −1.4 0.5 −1.2 −1.0 1.5 1.0 Density 2.0 −0.6 2.5 −0.4 3.0 Quadratic Spline with One Free Knot: no gap −1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 −3 −2 bmi Smoothed Density −1 0 1 2 3 Theoretical Quantiles Normal Q−Q Plot (a) (b) Quadratic Spline with One Free Knot: no gap 0.020 0.015 0 0.010 20 40 Density 60 80 Sample Quantiles 0.025 100 120 0.030 140 Quadratic Spline with One Free Knot: no gap 0.005 0.010 0.015 0.020 0.025 0.030 0.035 −3 −2 bmisq Smoothed Density −1 0 1 2 3 Theoretical Quantiles Normal Q−Q Plot (c) (d) Quadratic Spline with One Free Knot: no gap −0.025 −0.030 0 −0.035 20 40 Density 60 80 Sample Quantiles −0.020 100 120 −0.015 140 Quadratic Spline with One Free Knot: no gap −0.035 −0.030 −0.025 −0.020 −0.015 −0.010 −3 −2 right Smoothed Density −1 (e) 1 2 3 (f) Quadratic Spline with One Free Knot: no gap 0.00 28 24 0.05 26 0.10 0.15 Sample Quantiles 0.20 30 0.25 0.30 32 Quadratic Spline with One Free Knot: no gap Density 0 Theoretical Quantiles Normal Q−Q Plot 24 26 28 30 32 −3 knot Smoothed Density −2 −1 0 1 Theoretical Quantiles Normal Q−Q Plot (g) (h) Figure 4.4: No Gap 65 2 3 CHAPTER 5 SIMULATION STUDIES Now we have proposed the free-knot spline method, which can be used as a nadir estimation tool when the data is quadratic-looking or J-shape-looking. Then is the proposed method better than existing ones? We will compare the performance of the new method with that of the quadratic method, transformation method, fractional polynomials and change point method using simulations in this chapter. The comparisons will include nadir estimation ability and the goodness of fit. We observed in the Norwegian Counties Study that in presence of extreme values the quadratic and change point methods generated unrealistic nadir estimators as well as bad confidence intervals, when all non-monotonicity detection tests agreed on the existence of a nadir and the transformation method as well as the freeknot spline method detected the nadir. Then which methods are generally more sensitive to extreme values and which are more robust? Another problem that was observed is sample size. The asymptotic results of all the methods apply when the sample size is large enough. But how large is large? Will these methods produce reasonably good nadir estimates and confidence intervals when the sample size is moderate? We will examine the effects of extreme values and sample size on nadir estimation and compare the performance of both the new and existing methods under different conditions. Next, the model comparison criterion is introduced first. 5.1 A Goodness Of Fit Test For Survival Models In 1996 Grønnesby and Borgan proposed an overall goodness of fit test based on martingale residuals for the Cox Proportional Hazards model [39]. May and Hosmer showed in 1998 that the proposed method is “algebraically identical to one obtained from adding group indicator variables to the model and testing the hypothesis the coefficients of the group 66 indicator variables are zero via the score test”[40]. Around the same period, in January 1996, independently of the previous authors, Parzen and Lipsitz submitted their paper in which they defined the same goodness of fit test and compared two non-nested Cox models via the proposed test [41]. The test is based on the notion of partitioning the subjects into groups according to the covariate values. In the Cox regression model the hazard for subject i at a given time t with covariate vector z is h(t|z i ⊤ ) = h0 (t) × exp(z i ⊤ β). (5.1.1) To form the goodness of fit statistic one first partitions the subjects, based on the percentiles of the estimated risk, ψ̂i = exp(z i ⊤ β̂), into 10 regions, with the first region containing the lowest ten percent estimated risks and the 10th region containing the highest ten percent. Given this partition, G − 1 group indicators Iig are defined as ( 1 if ψ̂i is in region g Iig = 0 if otherwise and alternative Cox model " h(t|z i ⊤ ) = h0 (t) × exp z i ⊤ β + G−1 X g=1 Iig γg # (5.1.2) is considered. If model (5.1.1) is correct then γ1 = γ2 = ... = γg = 0. The null hypothesis H0 : γ1 = γ2 = ... = γg = 0 can be tested using a likelihood ratio, Wald or score statistic. If the model under consideration has been correctly specified then each of these statistics has a chi-square distribution with G − 1 degrees of freedom. While comparing different non-nested models the model with a smaller chi-square statistic is the better model [41]. After every method is applied to a simulated sample the goodness of fit test statistic is calculated and the χ29 values are compared. 5.2 Transformation Model In this section the true underlying non-monotonic relationship between prognostic index and BMI is assumed to be quadratic in LBMI. In other words this situation represents cases where a good Normal transformation of the main risk factor exists and it is appropriate to apply the transformation method. Under such assumptions we would expect the transformation method to perform very well and we would like to see how other methods behave. 67 The cohort selected to generate the simulated data is the First National Health and Nutrition Examination Survey Epidemiologic Follow-up Study (NHANES I). The white male cohort was used. This cohort was also used in [18] to compare the performance of the transformation and the change point methods. The NHANES I Epidemiologic Followup Study tracks morbidity and mortality for 14, 407 individuals, initially aged 25 − 74, who received complete medical examinations during the NHANES I survey conducted from 1971 − 1975. Follow-up surveys were conducted from 1982 − 1984, again in 1986 (for those age 55 or older at baseline), and again in 1987 and 1992 (From the Florida State University Diverse Populations Collaboration website biostat.stat.fsu.edu). The white male cohort consists of 4623 participants where 1900 are deaths by the end of the study. The average follow-up is 5615 days with a minimum of 16 days and a maximum of 7943 days. The mean BMI of this group is 25.7, and the BMI values range from 13.0 to 52.6. The histogram of LBMI is in Figure 5.1, we can see it is approximately Normal. The transformation method within a Weibull parametric survival model was first used to fit survival time. Censoring was assumed to be independent of covariate effects hence a null Weibull model was used to describe the time before a censoring happens. Then LBMI values were simulated according to its mean and standard deviation, adopting the Normal distribution. Next the survival times were simulated using the Weibull distribution and expression (4.5.1). In other words they were simulated according to h T = − ln(Y ) × exp i p1 − (β0 + gθ (Z)) , where Y is the uniform [0, 1] random variable, z is a simulated LBMI value and gθ (Z) = β1 z + β2 z 2 Parameters β0 , β1 , β2 and p were estimated from the Weibull model and are given in Table 5.1. Similarly the censoring times were simulated using formula (4.5.2), that is, h U = − ln(Y ) × exp(−δ0 ) i 1q . Where no covariate effects was included and parameter δ0 as well as q are given in Table 5.1. The assumed underlying prognostic index curve is given in Figure 5.2, where the curve has been vertically adjusted so that the prognostic index is zero when BMI is taken to be the 68 nadir. The simulated survival times were compared with the generated censoring times. A death was simulated If the survival time was smaller than the corresponding censoring time, and the follow-up time was taken to be the shorter one. After data generation the five methods were applied to the data under the Cox Proportional Hazards model. 0 20 Density 40 60 80 cohort 56, NHANES I White Male .02 .04 .06 .08 LBMI Figure 5.1: LBMI Histogram With Normal Density Curve, NHANES I White Male Table 5.1: Simulation Parameters, NHANES I White Male Simulation Parameters Mean (LBMI) Standard Deviation (LBMI) β0 β1 β2 p δ0 q Values 3.98 × 10−2 6.15 × 10−3 −9.08 −168.08 2097.07 1.32 −132.16 14.88 First, the BMI values are restricted to 15 − 50. This represents a typical range of BMI in the NHANES I White Male cohort. The five methods: quadratic, transformation, fractional 69 0 adjusted prognostic index .2 .4 .6 .8 cohort 56, NHANES I White Male 10 20 30 BMI 40 50 Figure 5.2: Assumed Underlying Curve, NHANES I White Male polynomial, change point and free-knot spline are applied to each of the 500 simulated samples of 5000 observations. Results are give in Table 5.2. We hope the estimated nadir values can target at the true nadir. Therefore column “Nadir Mean” in the table is generated to reflect the central tendencies of the methods by averaging the 500 nadir values. Column “Nadir MSE” in the table is the mean squared error of estimated nadirs. It represents the average squared distance between the true nadir 25.0 and the estimated nadirs. MSE in the column title is the abbreviation of mean squared error. The “95% C.I. Length” contains averaged lengths of estimated 95% confidence intervals. Column “95% C.I. Coverage Probability” consists of the proportions of 95% confidence intervals that cover the true nadir. A good estimation method should have a nadir mean that is approximately 25.0, a low nadir MSE, a coverage probability that is close to 95% and the length of the confidence interval is as short as possible. We see from Table 5.2 that the transformation nadir estimates target at exactly 25.0, the means of the fractional polynomial and free-knot spline nadirs are also very close to the true value. The quadratic method is over-estimating and the change point method is underestimating. The “Nadir MSE” column shows the nadirs produced by the transformation method not only center at the correct true nadir but also stay very close to the true nadir. 70 Table 5.2: Simulation Results 15 − 50, NHANES I White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 26.9 25.0 25.1 23.4 24.9 Nadir 95% C.I. MSE Length 4.95 4.04 0.32 2.11 0.64 2.13 4.08 4.57 1.76 5.53 95% C.I. Coverage Probability 0.45 0.96 0.83 0.73 0.95 Hence The best point estimator is given by the transformation, the fractional polynomial and free-knot spline methods behave reasonably well. The worst point estimators are given by the quadratic and the change point methods since they are biased and they are too far from the true nadir. An interesting finding is the difference between the performance of the change point nadirs and that of the free-knot spline nadirs. By separating the nadir and the knot in the change point model the estimated nadirs are brought closer to the true nadir, and the mean value of all estimated nadirs becomes closer to the true. There is a large variation in the coverage probabilities. Only the transformation method and the free-knot spline method generate the desired 95% probability. The fractional polynomial method, as expected, produces confidence intervals whose coverage probability is much smaller than 95%, due to ignorance of the variation during power selection. The quadratic method gives an extremely low coverage probability of 45% when the width of its confidence interval is on average 4.04. Summary statistics of the quadratic method confidence limits show that although all upper limits are higher than the true nadir only 45% of its lower limits are lower than the nadir. By forcing the non-monotonic curve to be symmetric about the nadir, the quadratic method pushes its minimum point to the right, therefore generates bad nadir estimators and confidence intervals. Surprisingly, the coverage probability of the change point method is as low as 73%. It turns out to be some of the upper limits of the confidence intervals that are not high enough to enclose the true nadir. This indicates the change point confidence intervals need to be widened or adjusted to the right. For this example, as expected, the transformation method gives the best results and the free-knot spline method performs reasonably well. 71 Goodness of fit results are shown in Table 5.3. These tables contain the percentages that the null hypothesis of H0 : γ1 = γ2 = ... = γg = 0 is rejected in the goodness of fit test. In other words, a percentage represents how often, in the 500 simulated samples, the fit of the model is not good enough. According to the results, 21.8% of 500 quadratic models did not generate good fit. The performance of transformation method and fractional polynomial method is similar, considering the goodness of fit. The free-knot spline generated the lowest percentage of 3.6%, indicating the test here is slightly conservative, that is maybe due to the small simulation size of 500. Table 5.3: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50, NHANES I White Male Quadratic Transformation 21.8 5.4 Fractional Polynomial 5.0 Change Point 7.2 Free-Knot Spline 3.6 As the BMI range changes from 15 − 50 to 15 − 70 more extreme values are included in the dataset, hence we could study the stability of methods to extreme values. These results are given in Table 5.4 and Table 5.6. The quadratic method and the change point method are more sensitive to enclosure of extreme values; the transformation and fractional polynomial methods are almost not affected, although the coverage probability of the fractional polynomial method stays at around 85%. The free-knot spline method is slightly affected. Table 5.5 and Table 5.7 are goodness of fit results with extended BMI range. Our conclusion stays the same that the quadratic method is the worst and the free-knot spline gives the best fit. 5.3 Free-Knot Spline Model In this section, we will assume the underlying relationship is given by an one-free-knot polynomial spline function. This case represents situations where no good Normal transformation exists, we have to work with the original risk factor directly and the knot is not necessarily equal to the nadir. Similar to the previous section, we first estimate the non-monotonic 72 Table 5.4: Simulation Results 15 − 60, NHANES I White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 26.8 25.0 25.1 23.2 24.8 Nadir 95% C.I. MSE Length 5.37 4.78 0.32 2.09 0.61 2.12 4.66 4.43 1.97 5.20 95% C.I. Coverage Probability 0.55 0.95 0.84 0.69 0.93 Table 5.5: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60, NHANES I White Male Quadratic Transformation 34.6 5.4 Fractional Polynomial 4.4 Change Point 9.4 Free-Knot Spline 4.2 Table 5.6: Simulation Results 15 − 70, NHANES I White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 26.7 25.0 25.1 23.2 24.8 Nadir 95% C.I. MSE Length 5.58 5.16 0.31 2.08 0.60 2.12 4.93 4.38 2.01 5.11 95% C.I. Coverage Probability 0.59 0.95 0.85 0.67 0.92 Table 5.7: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70, NHANES I White Male Quadratic Transformation 37.2 5.2 Fractional Polynomial 4.8 73 Change Point 11.0 Free-Knot Spline 4.0 relationship using the free-knot spline method. And the estimated parameters are saved for future simulation. Then LBMI is simulated based on the real data so that BMI is taken to be the inverse of simulated LBMI. The survival times are generated from the estimated freeknot polynomial spline curve, adopting simulated BMI. The dataset used for this simulation study is the NHIS White Male cohort of 46264 observations with 4582 deaths. The survival times are simulated according to (4.5.1), where gθ (Z) = β1 z + β2 z 2 + β3 (z − k)2 1{z>k} . And the censoring times are generated from (4.5.2). Simulation parameters are shown in Table 5.8. Figure 5.3 contains the curve from which survival and censoring times are simulated. Table 5.8: Simulation Parameters, The NHIS White Male Simulation Parameters Values Mean (LBMI) 3.96 × 10−2 Standard Deviation (LBMI) 5.64 × 10−3 β0 −3.03 β1 −0.61 β2 1.13 × 10−2 β3 −1.27 × 10−2 k 31.40 p 1.12 δ0 −55.21 q 6.96 The true nadir under the free-knot polynomial spline model is calculated to be 26.9. Table 5.9 gives the nadir estimation comparison results. The estimated nadirs and confidence intervals are restricted to be between 20 and 30. In other words, if an estimated nadir is too high that it is larger than 30 then 30 is taken to be the nadir and if it is lower than 20 the nadir is taken to be 20. Similarly, the confidence interval is truncated at 20 if the lower bound exceeds 20 and truncated at 30 if the upper bound goes beyond 30. According to the nadir mean values the most accurate estimated nadir is 26.8, given by the free-knot polynomial spline method. The change point method and the quadratic method perform 74 0 adjusted prognostic index .5 1 1.5 cohort 66, The NHIS White Male 10 20 30 BMI 40 50 Figure 5.3: Assumed Underlying Curve, The NHIS White Male equally bad. Although the transformation method generates very good results when the true model is quadratic in the Normally transformed variable, when the true model assumes the knot and nadir are non-equal the transformation method is not the best. In terms of the precision of nadir estimation, the fractional polynomial method produces the smallest MSE 1.06. This is not surprising since the fractional polynomial powers are the pair, selected with replacement from the fixed set P = {−2, −1, −0.5, 0, 0.5, 1, 2, ..., max(3, m)}, that maximizes the partial likelihood. This results in selecting the best of 44 models therefore the fractional polynomial method closely fits the data and almost surely overfits the data. The free-knot polynomial spline MSE is 1.32, the second to the best. The worst two again are the quadratic and the change point methods. The last column of Table 5.9 contains observed coverage probabilities. Again all confidence intervals are constructed to have 95% coverage probability, however only the free-knot polynomial spline method truly generates the claimed 95%. The transformation method is 46%, the quadratic method is 67% and the fractional polynomial is only 76%. Hence when the true model is the free-knot polynomial spline, which does not force the knot to be equal to the nadir, in terms of both nadir estimation and coverage probability of confidence intervals the best method is free-knot polynomial spline. The goodness of fit comparison is given in Table 5.10. 19.6% of the time the quadratic 75 Table 5.9: Simulation Results 15 − 50, The NHIS White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 28.1 25.8 26.6 25.7 26.8 Nadir 95% C.I. MSE Length 2.25 3.24 1.80 2.62 1.06 2.51 3.46 5.32 1.32 5.11 95% C.I. Coverage Probability 0.67 0.46 0.76 0.87 0.95 method did not fit data well, 14.4% of the time the null hypothesis in the goodness of fit test involving the change point method was rejected, hence the quadratic and the change point methods are the worst considering model fitting. The transformation and fractional polynomial methods behave similarly in the sense that their rejection probabilities are both close to 10.0%. The method that produced the best model fitting is the free-knot polynomial spline with a rejection probability of 4.2%. Table 5.10: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50, The NHIS White Male Quadratic Transformation 19.6 10.6 Fractional Polynomial 9.2 Change Point 14.4 Free-Knot Spline 4.2 Tables 5.11 to 5.14 are comparison results based on BMI ranges 15 − 60 and 15 − 70. By extending the BMI range from 15 − 50 to 15 − 70, extreme BMI values are included thus the effect of these extreme values on nadir estimation and goodness of fit is examined. Our results show the nadir estimation performance of all methods is not sensitive to the BMI range and the model fitting is also stable. 5.4 Other J-shaped Function We have compared the performance of the proposed method with that of the existing methods under two assumptions. The first assumption is that there exists a good Normality 76 Table 5.11: Simulation Results 15 − 60, The NHIS White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 28.2 25.8 26.5 25.5 26.8 Nadir 95% C.I. MSE Length 3.10 3.70 1.79 2.58 1.14 2.53 4.09 5.30 1.21 4.56 95% C.I. Coverage Probability 0.67 0.47 0.73 0.84 0.96 Table 5.12: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60, The NHIS White Male Quadratic Transformation 28.6 11.0 Fractional Polynomial 8.6 Change Point 16.4 Free-Knot Spline 4.6 Table 5.13: Simulation Results 15 − 70, The NHIS White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 28.2 25.8 26.5 25.4 26.8 Nadir 95% C.I. MSE Length 3.33 3.85 1.78 2.58 1.15 2.54 4.28 5.33 1.20 4.48 95% C.I. Coverage Probability 0.67 0.47 0.73 0.83 0.96 Table 5.14: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70, The NHIS White Male Quadratic Transformation 32.0 11.2 Fractional Polynomial 8.4 77 Change Point 17.6 Free-Knot Spline 4.4 transformation. The second one assumes there is no Normal transformation and the underlying true relationship is given by a free-knot spline model where the knot and the nadir are not the same. Then what happens if the underlying curve is taken to be something different from the free-knot spline function when we have to work with the original risk factor, say BMI in our case. Here we compare the performance of methods when some other non-monotonic relationship is assumed to be true. The adopted dataset is the NHIS White Male. The survival times T are simulated from h i p1 T = − ln(Y ) × exp − (β0 + gθ (Z)) , where the assumed non-monotonic function is given by h z − r iββ h z − r1 iβγ 1 gθ (Z) = β1 × βα 1− , r2 − r1 r2 − r1 and the censoring times are generated using h U = − ln(Y ) × exp(−δ0 ) i q1 . Simulation parameters are given in Table 5.15, the adopted curve is in Figure 5.4. Table 5.15: Simulation Parameters, NHIS White Male Simulation Parameters Mean (LBMI) Standard Deviation (LBMI) β0 β1 βα ββ βγ r1 r2 p δ0 q Values 3.96 × 10−2 5.64 × 10−3 −7.74 1.86 −4.50 0.35 1.43 15.00 70.00 1.12 −55.21 6.96 Results based on BMI range 15−50 are in Table 5.16. The true nadir calculated under the pre-selected non-monotonic function is 25.8. The mean values of the fractional polynomial 78 0 adjusted prognostic index .5 1 1.5 2 cohort 66, NHIS White Male 10 20 30 40 50 BMI Figure 5.4: Assumed Underlying Curve, NHIS White Male and free-knot spline nadirs are 26.0, which is close to the true nadir 25.8. The average nadir of the transformation method is 25.2, indicating this method is slightly biased. The change point method and the quadratic method are the worst in terms of central tendency. The nadir MSE measures how far these estimated values are from the true nadir. This simulation study shows transformation nadirs are close to the true. The largest MSE is given by the change point method. Again, by splitting the nadir and the knot, the free-knot spline method beats the change point method and generates better nadir estimators. The nadir mean describes the central tendency of the estimators and the nadir MSE measures the variation of the estimated values. A comparison between the transformation and the freeknot spline method suggests the free-knot spline method better targets at the true parameter but the sampling distribution of the estimator has a larger variance. On the other hand, the transformation method produces estimated values that are biased, but these values stay close to each other that they are not far from the true parameter. The confidence interval coverage probabilities of all methods are lower than 95% except for the free-knot spline. The lowest coverage probability 74% is given by the quadratic method, the fractional polynomial and transformation methods are very close. The freeknot spline coverage probability is 98%, which is slightly higher than 95%. Hence the free79 Table 5.16: Simulation Results 15 − 50, NHIS White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 26.9 25.2 26.0 24.6 26.0 Nadir 95% C.I. MSE Length 3.82 5.12 1.30 3.41 1.85 3.59 6.98 7.06 3.23 7.80 95% C.I. Coverage Probability 0.74 0.84 0.83 0.90 0.98 knot spline method generates confidence intervals that are slightly wide, all other methods produce confidence intervals that are too short. The goodness of fit test results are in Table 5.17. The highest rejection probability is associated with the quadratic method, showing the worst fit is given by the quadratic method. All other methods are similar in terms of their model fitting. Table 5.17: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50, NHIS White Male Quadratic Transformation 7.8 5.4 Fractional Polynomial 4.2 Change Point 6.0 Free-Knot Spline 4.0 As the range of BMI extends from 15 − 50 to 15 − 70, the nadir estimation results are in Table 5.18 and Table 5.20. Conclusions stay the same. Table 5.19 and Table 5.21 are goodness of fit results. As the BMI range changes, it gets more clear that the quadratic method is worse than other methods and the free-knot spline as well as the fractional polynomial are slightly better than the transformation and the change point methods. 5.5 Summary In the simulation studies we have considered three cases. The first case represents situations where there exists a good Normal transformation, hence it is appropriate to apply the quadratic method to the Normally transformed variable. As expected, under this condition 80 Table 5.18: Simulation Results 15 − 60, NHIS White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 26.9 25.2 25.9 24.5 25.9 Nadir 95% C.I. MSE Length 4.37 5.55 1.24 3.26 1.80 3.54 6.82 6.88 3.38 7.52 95% C.I. Coverage Probability 0.76 0.82 0.84 0.87 0.98 Table 5.19: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60, NHIS White Male Quadratic Transformation 9.4 6.0 Fractional Polynomial 3.6 Change Point 5.8 Free-Knot Spline 2.6 Table 5.20: Simulation Results 15 − 70, NHIS White Male Quadratic Transformation Fractional Polynomial Change Point Free-Knot Spline Nadir Mean 26.8 25.2 25.9 24.5 25.9 Nadir 95% C.I. MSE Length 4.57 5.69 1.24 3.24 1.77 3.55 7.03 6.84 3.44 7.44 95% C.I. Coverage Probability 0.77 0.82 0.84 0.86 0.97 Table 5.21: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70, NHIS White Male Quadratic Transformation 10.0 6.0 Fractional Polynomial 3.4 81 Change Point 5.4 Free-Knot Spline 2.8 the best method in both nadir estimation and coverage probability is the transformation method. The free-knot polynomial spline also performs very well. When there is no good Normal transformation and the knot is not equal to the nadir the performance of these methods were compared under the second case. Since there is no good Normal transformation we had to work with the original risk factor BMI directly. This time the best method is the free-knot spline, considering both nadir estimation and coverage probability of the confidence interval. Only the free-knot spline confidence interval achieved the claimed 95% coverage probability, the transformation method generated a coverage probability of 46%. The third case assumes the non-monotonic relationship is given by some other function instead of the free-knot spline curve, and there is no good Normal transformation. The free-knot spline method produces the most accurate nadir estimator and the best confidence interval. Therefore it performs the best under this condition. When a method has to be selected before data analysis, if there is a good Normal transformation the transformation method is the best choice. If there is no good Normal transformation, the selected method should perform well, in the sense that it gives a good nadir estimator, the coverage probability of the confidence interval is close to the nominal 95% and the model fitting should be good as well, no matter what is the true relationship. According to the comparisons based on simulations the best choice would be the free-knot spline method. 82 CHAPTER 6 FUTURE WORK For model comparison purposes, a more direct measure of goodness of fit is the difference between the assumed underlying relationship and the estimated curve. This measure may be adopted to further compare the goodness of fit of these methods. In the future we would like to generalize the one-free-knot spline to two or three-knot spline functions. In such cases likelihood ratio based inference can not be used any more, but the Delta method can be utilized to construct the confidence interval. The quadratic spline can be replaced by the cubic spline with continuous second order derivative. The advantage is that the neighborhood assumption in the proof can be avoided. Another very interesting question is how to detect the nadir or test the existence of the nadir. We would also like to see if any better nadir detection test can be proposed based on the free-knot spline method. 83 REFERENCES [1] Build and Blood Pressure Study, 1959. Technical report, Society of Actuaries, Chicago, 1959. [2] Build Study 1979. Technical report, Society of Actuaries and Association of Life Insurance Medical Directors of America, Chicago, 1980. [3] Wilcosky T., Hyde J., Anderson J., Bangdiwala S., and Duncan B. Obesity and mortality in the Lipid Research Clinics Program Follow-up Study. J Clin Epidemiol, 43:743–752, 1990. [4] Schroll M. A longitudinal epidemiological survey of relative weight at age 25, 50 and 60 in the Glostrup population of men and women born in 1914. Dan Med Bull, 28:106–116, 1981. [5] Tuomilehto J., Salonen J., Marti B., et al. Body weight and risk of myocardial infarction and death in the adult population of eastern Finland. BMJ, 295:623–627, 1987. [6] Allison D., Gallagher D., Heo M., PiSunyer F., and S. Heymsfield. Body mass index and all-cause mortality among people age 70 and over: the Longitudinal Study of Aging. Int J Obes Relat Metab Disord, 21:424–431, 1997. [7] Diehr P., Bild D., Harris T., Duxbury A., Siscovick D., and Rossi M. Body mass index and mortality in nonsmoking older adults: the Cardiovascular Health Study. Am J Public Health, 88:623–629, 1998. [8] Durazo-Arvizu R., Cooper R., Luke A., Prewitt T., Liao Y., and McGee D. Relative weight and mortality in U.S. blacks and whites: findings from representative national population samples. Ann Epidemiol, 7:383–395, 1997. [9] Losonczy K., Harris T., Cornoni-Huntley J., et al. Does weight loss from middle age to old age explain the inverse weight mortality relation in old age? Am J Epidemiol, 141:312–321, 1995. [10] Troiano R., Frongillo E. Jr, Sobal J., and Levitsky D. The relationship between body weight and mortality: a quantitative analysis of combined information from existing studies. Int J Obes Relat Metab Disord, 20:63–75, 1996. [11] Folsom A., Kaye S., Sellers T., et al. Body fat distribution and 5-year risk of death in older women. JAMA, 269:483–487, 1993. 84 [12] Marmot M., Rose G., Shipley M., and Thomas B. Alcohol and mortality: a U-shaped curve. Lancet, i:580–583, 1981. [13] Pastor R. and Guallar E. Use of Two-segmented Logistic Regression to Estimate Change-points in Epidemiologic Studies. American Journal of Epidemiology, 148:631– 642, 1998. [14] Samuelsson O., Wilhelmsen L., Pennert K., Wedel H., and Berglund G. The J-shaped relationship between coronary heart disease and achieved blood pressure level in treated hypertension: further analysis of 12 years of follow-up of treated hypertensives in the Primary Prevention Trial in Gothenburg, Sweden. Hyptensn, 8:547–555, 1990. [15] Frank J., Dwayne M., Grove J., and Benfante R. Will lowering population levels of serum cholesterol affect total mortality? J. Clin. Epidem., 45:333–346, 1992. [16] Polichronaki H., Hatzakis A., Vatopoulos A., Katsouyanni K., Tzonou A., and Trichopoulos D. Association of coronary mortality with temperature and air pollution in Athens. Haemostasis, 12:133, 1982. [17] Wilcox A. and Russell I. Birth weight and perinatal mortality: II, On Weight-specific mortality. Int. J. Epidem., 12:319–325, 1983. [18] Durazo-Arvizu R., McGee D., Li Z., and Cooper R. Establishing the Nadir of the Body Mass Index-Mortality Relationship-a Case study. Journal of the American Statistical Association, 92:1312–1319, 1997. [19] Goetghebeur E. and Pocock S. Detection and Estimation of J-shaped Risk-Response Relationships. J. R. Statist. Soc. A, 158, Part 1:107–121, 1995. [20] Royston P. and Altman D. Regression Using Fractional Polynomials of Continuous Covariates: Parsimonious Parametric Modelling. Applied Statistics, 43:429–467, 1994. [21] Sleeper L. and Harrington D. Regression Splines in the Cox Model With Application to Covariate Effects in Liver Disease. Journal of the American Statistical Association, 85:941–949, 1990. [22] D. R. Cox. Regression Models and Life-Tables. Journal of the Royal Statistical Society. Series B (Methodological), 34:187–220, 1972. [23] Breslow N.E. Covariance analysis of censored survival data. Biometrics, 30:89–100, 1974. [24] Peto R. Contribution to the discussion of a paper by D.R. Cox. Journal of the Royal Statistical Society, B, 34:205–207, 1972. [25] Efron B. The efficiency of Cox’s likelod function for censored data. Journal of the American Statistical Association, 72:557–565, 1977. 85 [26] Stevens J., Keil J., Rust P., Tyroler H., Davis C., and Gazes P. Body mass index and body girths as predictors of mortality in black and white women. Arch Intern Med, 152:1257–1262, 1992. [27] Cornfield J., Gordon T., and Smith W. Quantal response curves for experimentally uncontrolled variables. Bulletin of the International Statistical Institute, XXXVIII:97– 115, 1961. [28] Flegal K. Anthropometric Evaluation of Obesity in Epidemiologic Ressearch on Risk Factors: Blood Pressure and Obesity in the Health Examination Survey. 1982. [29] Nevill A. and Holder R. Body Mass Index: A Measure of Fatness or Leaness. British Journal of Nutrition, 73:507–516, 1995. [30] Casella G. and Berger R. Statistical Inference. Duxbury, Pacific Grove, CA, 2002. [31] Efron B. The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia. [32] McCullagh P. and Nelder J.A. Generalized Linear Models. Chapman & Hall/CRC, Boca Raton London New York Washington,D.C., 1999. [33] Greenland S. Dose-response and trend analysis in epidemiology: alternatives to categorical analysis. Epidemiology, 9:356–365, 1995. [34] Schumaker L. L. Spline Functions Basic Theory. Wiley, New York, Chichester, Brisbane, Toronto, 1981. [35] Gallant A.R. and Fuller W.A. Fitting Segmented Polynomial Regression Models Whose Join Points Have to Be Estimated. Journal of the American Statistical Association, 68:144–147, 1973. [36] Chung K. A Course In Probability Theory. Academic Press, New York, New York, 1974. [37] Billingsley P. Probability and Measure. Wiley, New York, New York, 1995. [38] Fleming T. and Harrington D. Counting Processes and Survival Analysis. Wiley, Hoboken, New Jersey, 2005. [39] Grønnesby J. and Borgan O. A method for checking regression models in survival analysis based on the risk score. Lifetime Data Analysis, 2:315–328, 1996. [40] May S. and Hosmer D. A Simplified Method of Calculating an Overall Goodness of Fit Test for the Cox Proportional Hazards Model. Lifetime Data Analysis, 4:109–120, 1998. [41] Parzen M. and Lipsitz S. A Global Goodness of Fit Statistic for Cox Regression Models. Biometrics, 55:580–584, 1999. 86 BIOGRAPHICAL SKETCH Fei Tan Fei Tan was born on February 8, 1979, Beijing, the People’s Republic of China. She attended Nanjing University in China in the Fall of 1997 and completed her Bachelor’s degree in Mathematics in the Summer of 2001. In the Fall of 2001, she was admitted to the University of Mississippi and obtained her Master’s degree in Mathematics in the Spring of 2003. She went to the Florida State University in the Fall of 2003 and finished her Master’s degree in Statistics in the Fall of 2005. Her doctoral program started at FSU in Spring, 2006. Fei Tan’s research interests include survival analysis, non-monotonic regression, free-knot polynomial spline and the proportional hazards model. 87