Download A Method For Finding The Nadir Of Non

Document related concepts

Linear regression wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Florida State University Libraries
Electronic Theses, Treatises and Dissertations
The Graduate School
2007
A Method for Finding the Nadir of NonMonotonic Relationships
Fei Tan
Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected]
THE FLORIDA STATE UNIVERSITY
COLLEGE OF ARTS AND SCIENCES
A METHOD FOR FINDING THE NADIR OF NON-MONOTONIC
RELATIONSHIPS
By
FEI TAN
A Dissertation submitted to the
Department of Statistics
in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Degree Awarded:
Fall Semester, 2007
The members of the Committee approve the Dissertation of Fei Tan defended on
November 8, 2007.
Daniel McGee
Professor Directing Dissertation
Donald Lloyd
Outside Committee Member
Fred Huffer
Committee Member
Xufeng Niu
Committee Member
Gareth Dutton
Committee Member
The Office of Graduate Studies has verified and approved the above named committee members.
ii
This dissertation is dedicated to my family
iii
ACKNOWLEDGEMENTS
In 2006, Dr. Dan McGee provided me with the opportunity of working as a research
assistant in the biostatistics group on the problem of non-monotonic relationships. It was
during that period of time that the idea of estimating the nadir using free-knot polynomial
spline functions was motivated. I would like to express my sincere thanks to my advisor,
Dr. Dan McGee, for giving me this opportunity, stimulating my interest in this topic, his
guidance, patience and constant support.
I would like to thank Dr. Fred Huffer, who taught the Counting Processes class in Fall
2006. Without taking this class, I could not have proved the asymptotic results in this
dissertation.
My gratitude would like to go to Dr. Xufeng Niu for his co-advising in the biostatistics
group and his insightful and constructive suggestions.
I have worked with Dr. Gareth Dutton in the College of Medicine since Summer 2006. I
would like to thank him for giving me the chance to obtain experiences in applying statistics
to solving medical research problems.
I would like to thank both Dr. Dutton and my outside committee member Dr. Donald
Lloyd for their interest in statistical methodology.
Also, I would like to thank the departmental staff, Pam McGee, Jennifer Rivera,
Evangelous Robinson, Virginia Hellman, and Megan Trautman for their help. Especially, I
would like to thank James Stricherz for his great work in maintaining the good computing
environment and being patient to all my computer-related questions.
iv
TABLE OF CONTENTS
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
1. MOTIVATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2. BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Basic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Current Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
6
3. PROPOSED METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Splines With Free Knots . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
19
33
4. ASYMPTOTIC PROPERTIES OF THE PROPOSED METHOD
4.1 Asymptotic Normality Of The Score Process . . . . . . . . .
4.2 Consistency Of The Maximum Partial Likelihood Estimator
4.3 Asymptotic Normality Of The MPLE . . . . . . . . . . . . .
4.4 Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 The Neighborhood Condition . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
34
34
52
56
57
59
5. SIMULATION STUDIES . . . . . . . . . . . . .
5.1 A Goodness Of Fit Test For Survival Models
5.2 Transformation Model . . . . . . . . . . . .
5.3 Free-Knot Spline Model . . . . . . . . . . .
5.4 Other J-shaped Function . . . . . . . . . . .
5.5 Summary . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
66
66
67
72
76
80
6. FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
LIST OF TABLES
3.1 Model Comparisons And Nadir Estimations, NHIS White Female . . . . . .
22
3.2 Model Comparisons And Nadir Estimations, NHIS White Male . . . . . . . .
26
3.3 Model Comparisons And Nadir Estimations, The Norwegian Counties Study
(full sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4 Model Comparisons And Nadir Estimations, The Norwegian Counties Study
(1 obs dropped) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.5 Model Comparisons Using Likelihood Ratio Tests . . . . . . . . . . . . . . .
32
3.6 Model Comparisons Using BIC . . . . . . . . . . . . . . . . . . . . . . . . .
32
5.1 Simulation Parameters, NHANES I White Male . . . . . . . . . . . . . . . .
69
5.2 Simulation Results 15 − 50, NHANES I White Male . . . . . . . . . . . . . .
71
5.3 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 50, NHANES I White Male . . . . . . . . . . . . . . . . . . . . . . . .
72
5.4 Simulation Results 15 − 60, NHANES I White Male . . . . . . . . . . . . . .
73
5.5 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 60, NHANES I White Male . . . . . . . . . . . . . . . . . . . . . . . .
73
5.6 Simulation Results 15 − 70, NHANES I White Male . . . . . . . . . . . . . .
73
5.7 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 70, NHANES I White Male . . . . . . . . . . . . . . . . . . . . . . . .
73
5.8 Simulation Parameters, The NHIS White Male . . . . . . . . . . . . . . . . .
74
5.9 Simulation Results 15 − 50, The NHIS White Male . . . . . . . . . . . . . .
76
5.10 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 50, The NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . .
76
5.11 Simulation Results 15 − 60, The NHIS White Male . . . . . . . . . . . . . .
77
vi
5.12 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 60, The NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . .
77
5.13 Simulation Results 15 − 70, The NHIS White Male . . . . . . . . . . . . . .
77
5.14 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 70, The NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . .
77
5.15 Simulation Parameters, NHIS White Male . . . . . . . . . . . . . . . . . . .
78
5.16 Simulation Results 15 − 50, NHIS White Male . . . . . . . . . . . . . . . . .
80
5.17 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 50, NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
5.18 Simulation Results 15 − 60, NHIS White Male . . . . . . . . . . . . . . . . .
81
5.19 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 60, NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
5.20 Simulation Results 15 − 70, NHIS White Male . . . . . . . . . . . . . . . . .
81
5.21 Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test
15 − 70, NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
vii
LIST OF FIGURES
3.1 Change Point Model Profile Likelihood, NHIS White Female . . . . . . . . .
24
3.2 Quadratic Spline Profile Likelihood, NHIS White Female . . . . . . . . . . .
24
3.3 Fitted Curves, NHIS White Female . . . . . . . . . . . . . . . . . . . . . . .
25
3.4 Change Point Model Profile Likelihood, NHIS White Male . . . . . . . . . .
27
3.5 Spline Model Profile Likelihood, NHIS White Male . . . . . . . . . . . . . .
27
3.6 Fitted Curves, NHIS White Male . . . . . . . . . . . . . . . . . . . . . . . .
27
3.7 Change Point Model Profile Likelihood, The Norwegian Counties Study (full
sample) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.8 Spline Model Profile Likelihood, The Norwegian Counties Study (full sample)
29
3.9 Fitted Curves, The Norwegian Counties Study (full sample) . . . . . . . . .
29
3.10 Change Point Model Profile Likelihood, The Norwegian Counties Study (1
obs dropped) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.11 Spline Model Profile Likelihood, The Norwegian Counties Study (1 obs dropped) 30
3.12 Fitted Curves, The Norwegian Counties Study (1 obs dropped) . . . . . . . .
31
4.1 Gap 1: 27—28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
4.2 Gap 2: 27.3—27.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
4.3 Gap 3: 27.5—27.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4.4 No Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
5.1 LBMI Histogram With Normal Density Curve, NHANES I White Male . . .
69
5.2 Assumed Underlying Curve, NHANES I White Male . . . . . . . . . . . . .
70
5.3 Assumed Underlying Curve, The NHIS White Male . . . . . . . . . . . . . .
75
viii
5.4 Assumed Underlying Curve, NHIS White Male . . . . . . . . . . . . . . . . .
ix
79
ABSTRACT
Different methods have been proposed to model the J-shaped or U-shaped relationship
between a risk factor and mortality so that the optimal risk-factor value (nadir) associated
with the lowest mortality can be estimated.
The basic model considered is the Cox
Proportional Hazards model. Current methods include a quadratic method, a method
with transformation, fractional polynomials, a change point method and fixed-knot spline
regression. A quadratic method contains both the linear and the quadratic term of the risk
factor, it is simple but often it generates unrealistic nadir estimates. The transformation
method converts the original risk factor so that after transformation it has a Normal
distribution, but this may not work when there is no good transformation to normality.
Fractional polynomials are an extended class of regular polynomials that applies negative
and fractional powers to the risk factor. Compared with the quadratic method or the
transformation method it does not always have a good model interpretation and inferences
about it do not incorporate the uncertainty coming from pre-selection of powers and degree.
A change point method models the prognostic index using two pieces of upward quadratic
functions that meet at their common nadir. This method assumes the knot and the nadir
are the same, which is not always true. Fixed-knot spline regression has also been used
to model non-linear prognostic indices. But its inference does not account for variation
arising from knot selections. Here we consider spline regressions with free knots, a natural
generalization of the quadratic, the change point and the fixed-knot spline method. They
can be applied to risk factors that do not have a good transformation to normality as well as
keep intuitive model interpretations. Asymptotic normality and consistency of the maximum
partial likelihood estimators are established under certain condition. When the condition
is not satisfied simulations are used to explore asymptotic properties. The new method is
motivated by and applied to the nadir estimation in non-monotonic relationships between
x
BMI (body mass index) and all-cause mortality. Its performance is compared with that of
existing methods, adopting criteria of nadir estimation ability and goodness of fit.
xi
CHAPTER 1
MOTIVATION
In studies where researchers are interested in effects of risk factors on disease outcomes or
mortality, J-shaped or U-shaped relationships have been reported. Such a non-monotonic
relationship between a covariate and mortality has the interpretation that excess mortality
happens at both very low and very high values of the covariate and beyond a certain point
increasing values of the covariate are associated with increased mortality, whereas below
that point of the risk factor mortality is inversely related to the covariate. An example is
the BMI-mortality relationship. People have often examined the relationship between body
weight and mortality and more importantly, tried to establish guidelines about optimal body
weight. Various findings about the relationship have been reported in the literature, including
a linearly increasing relationship, a decreasing association and no association between weight
and death [1, 2, 3, 4, 5]. However, most observational studies show non-monotone curves with
excess mortality associated with both very low and very high levels of BMI [6, 7, 8, 9, 10, 11].
Alcohol and mortality in [12] and the EURAMIC (EURopean study on Antioxidants,
Myocardial Infarction, and breast Cancer) study involving alcohol intake and risk of
myocardial infarction in [13] present two other examples of a non-monotone J-shaped
relationship. During 1991 and 1992 a total of 1499 men, who were less than seventy years
old, were recruited to the EURAMIC international case-control study from eight European
countries and Israel with the primary goal of examining the association between antioxidants
and the risk of developing a first myocardial infarction. Later 330 cases and 441 controls, who
reported some alcohol intake during the previous year, were selected from the study. The
investigators report that the risk of myocardial infarction dropped compared to non-drinkers
when the level of the risk factor was low and the risk kept increasing as alcohol intake rose.
Detrimental effects were observed at both very low and very high values of alcohol intake.
1
In another study, 686 middle-aged hypertensive men were followed for 12 years in the
Primary Prevention Trial in Gothenburg, Sweden, to study the relationship between the
blood pressure level achieved through anti-hypertensive treatment and the incidence of
coronary heart disease (CHD). The incidence of CHD showed a J-shaped relationship to
achieved treated systolic and diastolic blood pressure levels [14]. The incidence of CHD,
adjusted for age, serum cholesterol, blood pressure and smoking habits, decreased with
increasing level of blood pressure achieved through treatment, until a level of about 150/85
mmHg, and then increased as the treated blood pressure went up. In this study the J-shaped
pattern was also observed when data from patients with pre-existing ischemic heart disease
were excluded.
Other risk factors that have been reported to have an upturn to the left in the relationship
with mortality include cholesterol [15].
Nadirs in such non-monotone patterns give us
information about the optimal value and range of a risk factor and whether it is dangerous
to excessively lower the risk factor.
On the other hand, there are examples where the relationship between a risk factor
and mortality is a “mirror image of the J-shape”. During 1975 and 1976 scientists studied
the association of coronary mortality with temperature and air pollution in Athens. Data
analysis for that study demonstrated in the two-year period that the low mortality point
occurred at 27◦ C-30◦ C [16] showing the mirror image of a J-shape curve. Such a mirror image
was also observed in the association between birth weight and early neonatal mortality [17].
Very high mortality was seen at both the lowest and the heaviest birth weights. In such cases
an inverse association between the risk factor and mortality is well known and it is necessary
to detect and model a less obvious upturn to the right. For these covariates increasing the
risk factor value beyond a certain point is associated with an elevated risk.
Accurately estimating the nadir and constructing the confidence interval are very
important problems that statisticians face. To model such non-monotone data and estimate
nadirs various methods such as the quadratic model [14], a model with a transformation of the
risk factor [18] and a change point model [19] have been proposed. Fractional polynomials [20]
and spline regression with fixed knots [21] have also been used, with focus on overall fitting
of non-monotonic relationships. We address the following questions:
• Are the current methods adequate under different conditions? If not, can we propose
a new one?
2
• What can we say about the asymptotic properties of the Maximum Partial Likelihood
Estimators of the new method?
• How do these methods behave in general, is the new method better than existing ones?
We will address the above questions in the rest of the dissertation.
3
CHAPTER 2
BACKGROUND
2.1
Basic Models
The semi-parametric Cox Proportional Hazards Model and the parametric logistic regression
model are commonly used to describe the risk factor-mortality relationship. Here we focus
on the Cox model because it is more appropriate for time to event data and is frequently
used in epidemiological studies with long-term follow-up. A description of the logistic model
will also be given since the method with a transformation of the risk factor was originally
derived and proposed for logistic models.
2.1.1
Logistic Regression Model
Logistic regression is one of the generalized linear models. For every observation the response
variable Yi comes from a Bernoulli distribution with the probability of a success given the
vector of covariates being π(z i ) = P (Yi = 1|z i ) and the logistic regression model assumes:
π(z i ) =
1
1 + exp − (z i
⊤ γ)
(2.1.1)
If information on p predictors is collected z i can be written as z i ⊤ = [zi0 , zi1 , ..., zip ] and
γ ⊤ = [γ0 , γ1 , ..., γp], where zi0 is 1. The interpretation of logistic model is that increasing
the risk factor zij by one unit causes the patient to be exp(γj ) times more likely to have a
success if all other risk factor values do not change and the odds ratio is estimated to be
exp(γˆj ).
In order to find MLEs (maximum likelihood estimators) of parameters in this model we
Q
define the likelihood function as L(γ) = ni=1 π(z i )yi (1 − π(z i ))1−yi and the score function
∂log(L(γ ))
as U(γ) =
, where yi is a realization of random variable Yi with n being the sample
∂γ
4
size. The MLE is obtained by maximizing the log of the likelihood function. In other words,
MLE occurs at the zero of the score function.
Applying the Newton-Raphson method to the score function leads to an iterative
expression, the relationship between the (m − 1)th and mth step is γ (m) = γ (m−1) +
−1
I(γ (m−1) )
U(γ (m−1) ) with I(γ (m−1) ) being the information matrix evaluated at γ (m−1) .
This expression makes it possible to find the estimators of parameters iteratively if an initial
value of γ (0) is well chosen. It turns out γ (m) can, in each step, be estimated using a weighted
least squares method. For logistic models the Newton-Raphson method is equivalent to
iteratively re-weighted least squares.
2.1.2
Proportional Hazards Model
Suppose T1 , T2 , · · · , Tn are the lifetimes with sample size n and they are independently
identically distributed.
U1 , U2 , · · · , Un are the censoring times, Z1 , Z2 , · · · , Zn are the
covariate vectors. Let Xi = min{Ti , Ui }, δi = 1{Ti ≤Ui } . The hazard at time t is defined
to be the instantaneous probability that one dies at t given this person was alive right before
≥t)
t. The mathematical expression of the hazard function is h(t) = limε→0 P (t≤T <t+ε|T
. The
ε
Cox Proportional Hazards Model was introduced by D. R. Cox in 1972 [22]. It assumes
every individual in the population has a unique hazard given the covariates and the hazard
is a product between an arbitrary and unknown function of time and a function of the
explanatory variables and the unknown regression coefficients, where the arbitrary function
of time is called the baseline hazard. If the ith observation has a covariate vector z i and the
model has a vector of regression coefficients γ, the hazard function at time t of person i is
given by:
h(t|z i ) = h0 (t) × exp(z i ⊤ γ)
(2.1.2)
where h0 (t) is the baseline hazard function, z i ⊤ = [zi1 , ..., zip ] and γ ⊤ = [γ1 , ..., γp ]. Compared
with the log odds ratio of logistic models, the log hazard ratio of the Cox model does not
contain the constant term because it is absorbed into the baseline hazard. The proportional
hazards model can be used to find the relative hazard between two subjects. For instance,
if subject i has covariate vector z i and subject j has covariates z j then their relative hazard
using subject j as the reference is exp((z i −z j )⊤ γ). We can see, from the previous expression,
the ratio between hazards of two individuals is constant over time if their covariates do not
5
change as time goes by, this is why the model is called “Proportional Hazards”.
The partial likelihood involving only the parametric term is used to generate parameter
estimators for the Cox Model. We observe (Xi , δi , Zi), i = 1, · · · , n and let T10 < T20 < · · · <
TL0 be the distinct uncensored death times assuming ties occurring with probability zero.
(i) is the label of the single individual who dies at Ti0 . Let R(t) = {i : Xi ≥ t}. Define
Ri = R(Ti0 ) to be the risk set containing everybody who is still at risk at the ith uncensored
death time. Then the partial likelihood function is:
L
exp z ⊤
γ
Y
(i)
P
P L(γ) =
⊤
exp
z
γ
j
j∈R
i
i=1
However, in practice although we assume that ties occur with probability zero, simultaneous
uncensored deaths could still happen by chance, or the time scale on which death is measured
may result in apparent ties. To accommodate tied observations, various approximations to
the partial likelihood function have been proposed, examples are approximations suggested
by Breslow [23], Peto [24], Efron [25] and Cox [22]. The maximization of the partial likelihood
function is approximated using the Newton-Raphson technique.
2.2
Current Methods
Categorical analyses could be used at an initial stage to detect a potential J-shape, but
it is hard to draw formal statistical inferences based on such descriptive analyses. Another
drawback of categorical analyses is that cut points of groups are usually selected arbitrarily by
using percentiles or adopting suggestions from previous similar studies. This creates problems
if subjects within one group are not homogeneous. So, we will focus on models that treat
independent variables as continuous and in this section we review the various methods that
have been used in epidemiologic research to deal with a non-monotonic predictor-response
relationship.
The models considered are a quadratic model, a model with transformation, fractional
polynomials, change point models and spline regression with fixed knots.
2.2.1
Quadratic Model
Quadratic models have been applied to describe the J-shape pattern between DBP and
coronary heart disease in the Swedish primary prevention trial [14] and the relationship
6
between BMI and mortality in black and white women [26]. It is probably the first and most
natural model one could think of when a U-shape or J-shape is observed. A quadratic model
under the assumption of proportional hazards assumes there is both a linear and a quadratic
term of the risk factor of our interest in the linear combination of all predictors. Let xi be
the variable that we expect to have a non-monotonic relationship to the response and z i be
the vector of the remaining risk factors, the hazard function of subject i is written as
h(t|xi , z i ) = h0 (t) × exp β1 xi + β2 xi 2 + z i ⊤ γ
(2.2.1)
The optimal predictor value is calculated by applying the formula of a quadratic nadir, that
is, the nadir is
Xmin =
β1
−2β2
(2.2.2)
where X is the risk factor with observed sample [x1 , x2 , ......, xn ]⊤ .
Other than the simple quadratic functional form, the advantage of such models is the
capability of directly adopting the existing parameter estimation algorithm under the Cox
model since the prognostic index is still linear in parameters. The nadir is not a parameter in
the model thus the confidence interval can not be obtained from standard software. Methods
that can be utilized to find it include the Delta method, Fieller’s theorem and bootstrap
estimation. These methods will be discussed in detail when the transformation model is
introduced.
A problem with the quadratic model is that in most real studies a covariate-mortality
curve is not symmetric about the nadir and quadratic models force the curve to be symmetric.
Applying this model can generate unrealistic nadir estimates such as a nadir much higher
than the empirical optimal predictor range based on categorical data analyses [18]. To
surmount the problem, other methods have been proposed.
2.2.2
A Model with Transformation
Applying a transformation of the risk factor is one way to avoid the unrealistic nadir estimate
given by the quadratic method. The idea was proposed for logistic models in 1997 to study
the J-shape relationship between BMI and mortality [18]. In this paper although the analysis
was under a multivariate setting, the authors focused on the marginal distribution of the
risk factor BMI. The transformation method was motivated by a result, pointed out by
7
Cornfield [27], which states if random variable X is the predictor that we expect to have
a J-shaped relationship to the response and it has pdf (probability density function) f1 (x)
among cases and pdf f0 (x) among non-cases with p = P (case) and q = 1 − p = P (non-case),
then
π(xi ) =
1
q × f0 (xi )
1+
p × f1 (xi )
If the ratio between f0 (x) and f1 (x) can be further expressed as an exponential function of
some functional form of x, say g(x), then
π(xi ) =
1
1 + K × exp g(xi )
where K is a constant and it is appropriate to apply the logistic model to risk factor X,
given g(x) is linear in parameters.
The Normal distribution was given as an example. Suppose f0 (x) is the pdf of N(µ0 , σ0 2 )
and f1 (x) is the density of N(µ1 , σ1 2 ) then we have
π(xi ) =
1 + K × exp
µ
0
2
σ0
µ1
− 2
σ1
1
xi +
1
1
− 2
2
2σ1
2σ0
xi 2
(2.2.3)
and the xi 2 term remains in the model if σ02 6= σ12 . Equation (2.2.3) implies the form
of predictor X in the logit function is quadratic. For risk factors whose densities f0 (x)
and f1 (x) are not Normal, transformations could be applied to achieve Normality and
according to the above illustration applying the quadratic model to the transformed Normally
distributed predictor (with unequal variances among cases and non-cases) is appropriate.
When variances are the same, the functional form of the transformed variable is linear in
the logit function. The authors who proposed the idea of transforming variables used the
BMI-mortality relationship as an example and suggested the inverse transformation of BMI.
Among others, this transformation was commonly selected and the transformed BMI, 1/BMI,
is called lean body mass index, denoted by LBMI. It has been noted by Flegal [28] and
Nevill and Holder [29] to be not only a transformation to Normality, but also an appropriate
measure of percentage of body fat in terms of its biological meaning.
Using the transformation method in the Cox model is straightforward when the disease is
rare. Under this assumption the odds ratio given by the logistic model is approximately equal
8
to the relative risk and the hazard under Cox model can be thought of as the instantaneous
risk thus relative hazard is roughly the relative risk, which means the two models essentially
measure the same quantity. Hence the idea of transforming the predictor followed by a
quadratic fitting of the Normally distributed transformed predictor can be adopted under
Cox model directly. The proportional hazards model with transformation is the same as
(2.2.1), except this time xi is a Normally distributed predictor obtained by transformation.
In their paper, the BMI associated with the lowest risk is calculated by finding the nadir of
LBMI then transforming LBMI value back to the BMI value. The formula for BMI nadir is:
Xmin =
−2β2
β1
If a reasonably good transformation to Normality can be found, the advantage of this
method is its simplicity and the fact that the log hazard ratio function is linear in parameters
compared with the change point model that will be discussed later. Linearity in parameters
guarantees we can use existing software, and point estimators as well as variance estimators
in standard output are valid. Since our concern is the risk factor value associated with the
lowest mortality, after obtaining the point estimator of the nadir we still need to find the
confidence interval. Three methods were used: The Delta method, Fieller’s theorem and
bootstrap estimation, where results of Fieller’s theorem in the paper turned out to be very
close to those of Delta method.
The Delta method is a very useful tool which can be employed to calculate the variance
of a function of multivariate Normal statistics. In fact, not only the variance but also the
asymptotic Normality of the function is given by Delta method [30]. In our case the quantity
−2βˆ2
of interest is the nadir estimator
, which is a function of multivariate Normal statistics
βˆ1
−2βˆ2
[βˆ1 , βˆ2 ]. According to Delta method, the asymptotic distribution of
is Normal and the
βˆ1
variance is given by:
!
−2βˆ 4βˆ 2 Var(βˆ )
Cov(βˆ1 , βˆ2 ) Var(βˆ2 )
2
2
1
=
+
Var
−2
2
2
2
βˆ1
βˆ1 βˆ2
βˆ1
βˆ1
βˆ2
To estimate the variance of the nadir, we need to replace variances and covariance in
the above equation by their estimates. The confidence interval is constructed based on
asymptotic Normality and estimated variance.
9
Bootstrap estimation is a more computer intensive method, it requires repeated sampling
with replacement from the original sample [31]. If we can assume the original sample well
represents the true population then the generated samples are also good representatives from
the population. Based on each generated sample one nadir estimator can be obtained using
the transformation method described above. Thus a sequence of nadirs can be generated
from these bootstrap samples, and the empirical centiles are the limits of the bootstrap
confidence interval. For example the 95% bootstrap confidence interval contains all values
between the 2.5th and the 97.5th empirical percentiles.
The transformation method works well if a transformation to Normality can be found.
If no good transformation to Normality exists however it can not be applied and we need
to develop other methods. Additional methods, including the fractional polynomial model,
the change point model and the spline model with fixed knots, work without Normality
assumption on the predictor.
2.2.3
Fractional Polynomial
It is possible that other transformations are more suitable for describing the non-linear relationship that is seen between response and the independent variable when the independent
variable is not Normal. Naturally, people think of polynomials. In general, however, low
order polynomials do not always fit the data well due to lack of curvature and high order
polynomials behave badly at the ends of the range of an observed risk factor although they
follow the data more closely. So other transformations of the predictor such as inverse
polynomials have been proposed [32], they exemplify a broader class of models, fractional
polynomials.
Royston and Altman [20] extended the family of polynomials by including fractional and
negative powers of the risk factor. Their method was originally proposed for overall model
fitting instead of nadir estimation, but if a good overall model can be found one can always
get the nadir by taking the first derivative of the log hazard ratio. Thus the class of fractional
polynomials is a candidate for nadir estimation. In our case we assume non-monotonicity of
one variable and linearity in all other predictors. A full definition of fractional polynomial
of degree m is:
φm (x, z; βm , γ m , pm ) =
m
X
j=0
10
βm,j Hj (x) + z T γ m
(2.2.4)
with


if j = 0
1
(p
)
m,j
Hj (x) = x
if j 6= 0 and pm,j =
6 pm,j−1


Hj−1(x)ln(x) if j =
6 0 and pm,j = pm,j−1
where pm,0 = 0, x is the non-monotonic risk factor, z is the vector of the remaining covariates,
β m is the vector of regression coefficients associated with transformations of x, γ m are
coefficients of the remaining predictors, pm,j is the j th element of pm , the power vector of
fractional polynomial with degree m, and x(pm,j ) is the Box-Tidwell transformation, i.e.
(
xpm,j if pm,j 6= 0
x(pm,j ) =
ln(x) if pm,j = 0.
The variable to which we apply the fractional polynomial transformations is assumed to be
positive and if it takes non-positive values a preliminary transformation is required to turn
non-positive values to positive. When a fractional polynomial is incorporated into the Cox
model the coefficient βm,0 of the constant term is always zero.
First we need to select the degree m and power vector pm . Fractional polynomial models
with degree higher than 2 are rarely required in practice and those with degree lower than
or equal to 2 fit better than classical polynomials by offering more flexibility and stability,
Royston and Altman suggest that m ≤ 2 suffices for most situations. As for the power vector
pm , they suggest that one choose the best power vector, p̃m , from all the m-tuples of powers
selected with replacement from the fixed set P = {−2, −1, −0.5, 0, 0.5, 1, 2, ..., max(3, m)},
that maximizes the partial likelihood in model fitting when m is given.
For a given m, they developed the confidence interval of pm . If partial deviance is
defined to be D(m, pm ) = −2 × logP L(m, pm ), suppressing all other parameters, and
p̂m denotes the MPLE (maximum partial likelihood estimator) of pm , then for testing
H0 : pm = p̈m theoretically we have the likelihood ratio statistic D(m, p̈m ) − D(m, p̂m )
that is asymptotically a χ2m random variable. However, in practice we do not know p̂m
but p̃m , we also know D(m, p̃m ) ≥ D(m, p̂m ). The statistic D(m, p̈m ) − D(m, p̃m ), which
we actually use, is conservative for the above test. The corresponding confidence interval
adopting the conservative test statistic consists of all p̈m that are not rejected in the test.
For the comparison between degrees m and m + 1 the likelihood ratio test statistic,
D(m, p̂m )−D(m+1, p̂m+1 ), applies, again theoretically. It has an asymptotic χ22 distribution.
But we can only calculate D(m, p̃m ) − D(m + 1, p̃m+1 ), which is used as an approximation
11
of D(m, p̂m ) − D(m + 1, p̂m+1 ) to test H0 : the degree is m. vs Ha : the degree is m + 1.
Royston and Altman further defined the gain of a model against the baseline linear model
as follows:
G(m, pm ) = D(1, 1) − D(m, pm ),
so that a larger gain indicates a better fit. This way D(m, p̃m ) −D(m+ 1, p̃m+1 ) in the above
test equals G(m + 1, p̃m+1 ) − G(m, p̃m ). In practice people could focus on, say, degrees 1 and
2 as Royston and Altman suggested to find G(1, p̃1 ) and G(2, p̃2 ). The comparison between
these two models helps us determine the best degree, thus the final model.
After degree m and power vector pm are estimated they are treated as constants in
model fitting thus the log hazard ratio of the Cox model with a fractional polynomial is
linear in parameters and the coefficients again are obtained by applying the usual estimation
algorithm. However, this time the linearity is “artificial” and the variation arising from preestimation of degree m and power vector pm is not considered when regression coefficients
β m and γ m are estimated. Hence the statistical inference based on the “artificial” linear log
hazard ratio is not reliable, specifically, the confidence intervals of regression coefficients are
not reliable and if we apply the Delta method to calculate a confidence interval of the nadir
the estimated variance will not be accurate.
2.2.4
Change Point Model
This method was suggested by Goetghebeur and Pocock and was motivated by the quadratic
model originally proposed for modeling the relationship between DBP and coronary mortality [19]. To avoid confounding between evidence for left and right upturn the authors
used two independent pieces of quadratic functions with different regression coefficients to
model both sides of the nadir. The full change point model can be described by the following
equation:
h(t|xi , z i ) = h0 (t) × exp β1 (xi − η)2 1{xi ≤η}
2
⊤
+β2 (xi − η) 1{xi >η} + z i γ
(2.2.5)
with η, the nadir, being a point between the lowest possible value of the nadir, ηl , and the
highest possible nadir, ηu , z i being the covariate vector of the ith observation excluding the
variable with a change point and xi being the variable with a J-shape relationship to the
12
response. Equation (2.2.5) models the J-shape curve with two different pieces of quadratic
functions: the first piece contributes to the left of η, the second describes the branch to the
right, and the two quadratic branches meet at η.
Their work consists of deriving the asymptotic distribution of MPLE, a proposed parameter estimation algorithm and inference for parameters. The change point model (2.2.5)
involves an unknown nadir, thus the log hazard ratio is nonlinear in parameters and the
asymptotic distribution of MPLE under linearity assumption does not apply. In 1995 Goetghebeur and Pocock [19] showed the asymptotic Normality of the MPLE under model (2.2.5)
if no subject’s risk factor level is too close to the true nadir.
Their proposed parameter estimation method is the following 2-step algorithm:
• Scan the range of possible values of η, each η value is used as a known change point
in fitting model (2.2.5). η̂, the MPLE of η, is the one that generates the maximum of
the profile log-partial-likelihood.
• Fix η = η̂ in model (2.2.5) so that the log hazard ratio is linear in parameters. Apply
the usual estimation algorithm to find [βˆ1 , βˆ2 , γ̂] = [βˆ1 (η̂), βˆ2 (η̂), γ̂(η̂)], the MPLEs of
β1 , β2 and γ.
The advantage of applying the profile likelihood search is that the non-linear log hazard ratio
is turned to linear in each step by fixing the change point. Therefore, people do not need
to work directly on the partial likelihood function and can simply use existing statistical
softwares to find the MPLE.
Inferences about parameters are based on likelihood ratio statistics resulting from the
asymptotic distribution of MPLEs. The authors focused on the asymptotic p value of β1
and the confidence interval of η.
To formally test if a J-shape exists, they considered H0 : β1 = 0 vs Ha : β1 6= 0.
The likelihood ratio test statistic is 2[logP L(η̂, βˆ1 , βˆ2 , γ̂) − logP L(η̃, 0, β̃2, γ̃)], which has an
asymptotic χ21 distribution, where [η̂, βˆ1 , βˆ2 , γ̂] are the MPLEs of [η, β1 , β2 , γ] and [η̃, β̃2 , γ̃]
are the MPLEs restricted on the parameter space with requirement β1 = 0. The asymptotic
confidence interval of the nadir consists of all η values such that 2logP L(η, βˆ1 (η), βˆ2 (η), γ̂(η))
is above 2logP L(η̂, β̂1 , β̂2 , γ̂) − χ21,α .
In fact, the authors suggest that in a screening step before the above full analysis one
should check the existence of an upturn to the left. One way of doing so is to fit a sequence
13
of models among observations with xi ≤ η:
h(t|xi , z i ) = h0 (t) × exp β1 (xi − η)2 1{xi ≤η} + z i ⊤ γ
where η is a point that runs from ηl to ηu . The test is completed by calculating
(
)
βˆ1
Zmax = max zη =
: ηl < η < ηu
se(βˆ1 )
and the p value, p = P (Z > Zmax ) with Z being a standard Normal random variable, for the
null hypothesis H0 : β1 = 0 (there is no upturn at the low end). This way the true p value
for H0 is at least equal to the calculated p value, and p = P (Z > Zmax ) is anti-conservative
for H0 .
A non-significant observed p value indicates the true p value is also not significant and we
do not have evidence to support the upturn on the low end. In this case, we stop here since
the J-shape does not exist. On the other hand, if the observed p value is significantly small
we do not know whether the true p value is significant thus further analysis is necessary.
The sequential tests are used in the preliminary step to study the necessity of further full
analysis.
The change point model is not linear in parameters. To overcome this problem the authors
wisely adopted a profile partial likelihood search to convert the non-linear problem to linear in
each step. But when the MPLE, η̂, of the change point is fixed in the Cox model to obtain the
point estimators and variance estimators of regression coefficients, the output from standard
Cox model can not be used since it assumes a fixed change point. In other words, the variance
estimators of regression coefficients do not incorporate variation coming from change point
estimation. Likelihood-based inference, resulting from the asymptotic distribution of the
MPLEs, could produce asymptotic confidence interval for the change point and asymptotic
p values for regression coefficients. Their asymptotic distribution result is under the condition
that no risk factor value is too close to the true change point, which is a restriction that is
not satisfied by continuous predictors. Goetghebeur and Pocock studied, via type I error and
power, the plausibility of the likelihood ratio test of β1 = 0 and the confidence interval of η
when the sample size was taken to be moderate, 1000, and the covariate was simulated from
a Normal distribution without any restriction on the neighborhood of the true change point.
More research needs to be done to study the asymptotic behavior of estimated parameters
14
when the risk factor is continuous and the neighborhood assumption is not satisfied. At the
same time a question we ask ourselves is can we avoid the neighborhood assumption.
Viewed as a sub-model nested in the spline model that will be discussed later, the change
point model assumes the knot is the same as the nadir. However this assumption will be
shown, through examples, to be not always true. The authors proposed the change point
method to model the upturn at the low end and estimate the nadir if a positive relationship
is widely accepted. A simpler way is to fit a quadratic form in the Cox model and estimate
the nadir using (2.2.2). They claimed the change point method was better but only one
example was given where both methods fit the data with monotonically increasing curves
therefore the performance of the two nadir estimation methods were not compared. We will
compare these methods in terms of overall fitting and nadir estimation ability.
2.2.5
Spline Regression with Fixed Knots
Sleeper and Harrington described how regression splines with fixed knots could be applied
in the Cox model [21]. Relevant hypothesis testing problems were also addressed. Their
method again, was proposed for overall model fitting. As mentioned before, one can always
obtain the nadir by taking first order derivative of the spline function and selecting the value
generating the lowest log hazard ratio to eliminate any local minimum if the degree of the
spline function is higher than two. Before applying the spline function no assumption about
the pattern of the curve needs to be made and spline functions allow the data to speak for
themselves. As described by Sleeper and Harrington, “A spline is a piecewise polynomial
with continuity conditions on the function and its derivatives at the points where the pieces
join”. Specifically, a spline function is characterized by its order, sequence of knots and
continuity conditions on the knots. The order of a spline is the highest degree plus one.
A knot is the point where two adjacent polynomial pieces join. The last set of parameters
is the class of numbers that determine the number of continuity conditions at each knot.
Hence if a linear spline function is fit to the data the estimated relationship could be nonsmooth, which is not appropriate for smooth relationships usually seen in epidemiological
research. Sleeper and Harrington mentioned cubic splines with continuous first and second
order derivatives were sufficient for most log hazard ratio functions. In their example both
quadratic and cubic splines were adopted and comparisons among them generated the best
model. Greenland [33] also pointed out that compared with more complex functional forms
15
arising from engineering, relationships observed in epidemiology were simple so that lower
order splines such as quadratic and cubic sufficed for model fitting in this area. He further
suggested the use of relatively simple quadratic splines since cubic splines could produce weird
shapes in wide or open-ended intervals. Besides being a minor disadvantage, cubic splines
result in poor interpretability of coefficients. Sleeper and Harrington suggested using of 5 or
fewer interior knots as their experience indicated “they are typically needed to approximate
the effect of a covariate on survival”.
An introduction to polynomial splines can be found in a book by Schumaker [34]. Let
PPm,M ,∆ be the linear space of all piecewise polynomials with order m, the highest degree
plus one. In this notation,
m
= the highest degree plus one
∆
= [c1 , ..., ck ], the ordered known knot sequence
M
= [m1 , ..., mk ], the vector of multiplicities such that the j th derivative of
the spline function at knot ci exists, where j = 0, 1, ..., m − 1 − mi
and i = 1, ..., k
If every mi , i = 1, ..., k is taken to be m then the spline function allows discontinuities at
knots; when mi , i = 1, ..., k is taken to be one then the smoothest spline function with order
m is obtained since if any further smoothness condition is added the knots will disappear.
The vector of multiplicities also controls the number of times every knot appears in the
extended knot sequence that will be introduced later. The dimension of this linear space is
P
d = m + K = m + ki=1 mi .
For instance, the space of all piecewise quadratic functions with continuous first order
derivative and two fixed knots is denoted by PP3,M ,∆ where ∆ = [c1 , c2 ] is a pair of constants
denoting the two fixed knots and M = [1, 1] specifies at both knots there are two continuity
conditions: continuity of the piecewise quadratic function itself and its first order derivative.
The dimension of this space is 3 + 1 + 1 = 5 meaning five parameters will need to be
estimated if a quadratic spline with continuous first order derivative and two fixed knots is
used. PP4,M ,∆ represents cubic splines with continuous first, second order derivatives and
one fixed knot if ∆ = c1 is a fixed constant with M = 1 requiring the function, its first and
second order derivatives to be continuous at c1 . The dimension of this space is 4 + 1 = 5.
16
Suppose the range of the predictor is [a, b], the one-sided basis of PPm,M ,∆ given in [34]
is
and
mi
ρi,j (x) = (x − a)m−j j=1,
, i=0
mi
, i = 1, ..., k
ρi,j (x) = (x − ci )m−j
+
j=1,
where m0 = m and (x)q+ = (x × 1{x>0} )q .
Specifically, the basis of the linear space consisting of all quadratic splines with continuous
first order derivative and two fixed knots c1 c2 is {1, (x − a), (x − a)2 , (x − c1 )2+ , (x − c2 )2+ };
the basis of cubic splines with continuous first two derivatives and one fixed knot is
{1, (x − a), (x − a)2 , (x − a)3 , (x − c1 )3+ }.
Other authors such as Sleeper and Harrington [21] or Gallant and Fuller [35] used
a truncated power basis that is similar to the one-sided basis.
The truncated power
basis of quadratic splines with continuous first order derivative and two fixed knots is
{1, x, x2 , (x − c1 )2+ , (x − c2 )2+ }, that of cubic spline with continuous first two derivatives
and one fixed knot is {1, x, x2 , x3 , (x − c1 )3+ }. The utilization of the above two truncated
power bases under the Cox model leads us to the log hazard ratio functions β1 x + β2 x2 +
β3 (x − c1 )2 1{x>c1 } + β4 (x − c2 )2 1{x>c2 } and β1 x + β2 x2 + β3 x3 + β4 (x − c1 )3 1{x>c1 } , respectively.
The truncated power basis is very intuitive, especially in some simple cases where the
model has a nice interpretation. For example the Cox model using a quadratic spline with
one fixed knot and continuous first derivative has form:
h(t|xi , z i ) = h0 (t) × exp β1 x + β2 x2 + β3 (x − c1 )2 1{x>c1} + z i ⊤ γ
(2.2.6)
which can be viewed as a baseline parabola with an extra adjustment piece attached to it
and the adjustment piece starts from knot c1 with xi being the non-monotonic risk factor,
z i being the vector of all other predictors.
β1
, of the
−2β2
baseline parabola, then after the adjustment piece is merged into the baseline parabola the
A special case of this function is when the knot c1 is taken to be the center,
function becomes two parabolic branches that meet at their common center. If the center
is a parameter this is the change point model that we already discussed. Furthermore, if
the two branches happen to share the same regression coefficient then a regular parabola is
obtained.
17
Let’s start from the quadratic spline with one fixed knot and continuous first derivative
again, this time if the x2 term is dropped, the function reduces to β1 x + β3 (x − c1 )2 1{x>c1 } ,
which can be interpreted as a baseline linear model with an adjustment parabolic branch
that exists on x > c1 and meets the baseline straight line at c1 . We note that in order for
two spline functions with fixed knots to be nested, the knot locations in the smaller model
have to be a portion of that of the bigger model since a knot is viewed as a constant, not a
parameter.
In their paper, Sleeper and Harrington fit seven quadratic and seven cubic spline models,
with three potential knot locations taken to be the three quartiles of uncensored death
times. The models they tried were three quadratic-one-knot models, three quadratic-twoknot models with each model adopting one pair of the three quartiles of uncensored death
times, one quadratic-three-knot model and the cubic counterparts of the above seven models.
The final model was selected using a statistic equivalent to AIC. They noted that the standard
variance estimators were estimated under the assumption that the knots were fixed. When
various knot locations were tried in order to find the best knot sequence, like they did, such
standard inference methods did not generate the true variances, which were larger. Besides,
the knot locations were arbitrary so that even they mentioned “the shape of some estimates
can depend heavily on the knots selected”. All of these are problems that could affect the
accuracy of the nadir estimation and relevant statistical inference.
18
CHAPTER 3
PROPOSED METHOD
3.1
Motivation
When the predictor has a Normal distribution in both cases and non-cases with distinct
variances the quadratic method is appropriate for model fitting and nadir calculation. If
a good transformation to Normality can be obtained, such as the inverse transformation of
BMI, the quadratic model can be applied to the transformed predictor and the nadir can also
be easily found. If no good transformation to Normality is known, fractional polynomials
can then be used to fit the overall model, but the variance of each parameter estimator
does not incorporate variation arising from pre-selection of the degree and power vector, and
this could affect the accuracy of the inference for the nadir. Regression splines with fixed
knots could also be used for overall model fitting. However when various knot locations are
tried to find the best knot sequence standard inference does not generate the true variances,
which are larger. Besides, estimated parameters could depend heavily on the knots selected.
The change point model does not require the predictor to be Normal thus can be adopted
when no good transformation to Normality exists. It is nested in model (2.2.6) by forcing
β1
c1 =
if knot c1 is a parameter. In other words, the change point model is nested in
−2β2
the quadratic spline model with continuous first order derivative and one free knot (2.2.6).
Compared with the more general free-knot spline function, the change point model implicitly
assumes the nadir is the same as the knot c1 . Is this assumption always true? The nadir is
a predictor value associated with the lowest risk, it is usually in a small subset of the data
range [a, b]. The knot is a point where the adjustment piece starts, it could be any point
in the data range [a, b]. Intuitively it is not clear why they have to be the same. If they
are forced to be the same then the confidence interval of the nadir is forced to be the same
as that of the knot, therefore the constructed nadir confidence interval might be affected by
19
the knot. On the other hand, if the knot is forced to be the nadir then the overall model
fitting could be sacrificed. Will this assumption cause any problem? Does the change point
method always work? We will answer these questions through the following examples.
The examples being considered are the National Health Interview Survey white female
cohort, white male cohort, the Norwegian Counties Study and the Diverse Populations
Collaboration. Detailed descriptions of these examples will be given later.
The change point model is proposed to analyze non-monotone quadratic-looking data.
So before data analysis the existence of a non-monotonic relationship was examined. For the
quadratic model with BMI or LBMI, three models were considered:
1. A null model with linear terms of all other covariates
2. A linear model with a linear term of the main risk factor and linear confounders
3. A quadratic model involving linear and quadratic terms of the risk factor with
confounders
These models were checked and likelihood ratio test statistics were calculated to compare
these nested models. There is an indication of non-monotonicity if the quadratic model is
selected by likelihood ratio tests.
Fractional polynomial models were fit using the fracpoly command with option compare
in the statistical package STATA. The compare option does comparisons via deviance tests
that are given in section 2.2, among four models: null model, linear model, fractional
polynomial model with degree one and fractional polynomial with degree two. A first-degree
fractional polynomial is always monotone when the risk factor is positive. So only when
the best fractional polynomial model of a sample has degree two can the covariate-mortality
relationship be non-monotonic.
Before a change point model was fit the screening test given by Goetghebeur and Pocock,
appeared in previous section, was applied, and a p value less than 0.05 was used as an
indication of an upturn on the low end. For each of the following examples, the four
non-monotonicity detection tools were used and all of them suggested a non-monotonic
relationship.
To every dataset we applied the Cox model with main risk factor BMI, adjusted for age
and smoking status. Other risk factors could also be used, the main reasons for using BMI
20
are its non-Normal distribution and the fact that LBMI is Normally distributed thus results
from a quadratic model applied to LBMI could be used as references.
For each example a quadratic form (2.2.1) was utilized for both BMI and LBMI,
representing the quadratic model and the transformation model, respectively. A second order
fractional polynomial model (2.2.4) was fit. The change point model (2.2.5) and quadratic
spline with continuous first order derivative and one free knot (2.2.6) were adopted for every
example as well. Likelihood-based model comparisons were employed. The transformation
and the fractional polynomial models are not nested in any of the other three models nor
do they contain any of them, thus they were not compared with the other three methods
via likelihood ratio tests. Likelihood ratio comparisons involving the spline function are
based on the asymptotic distribution of MPLEs for parameters in (2.2.6), which has not
been proved yet, therefore BIC was also adopted as a guide of model selection. BIC, or the
Bayesian Information Criterion, is defined to be −2ln(partial-likelihood) + kln(n), where k
is the number of estimated parameters in the model and n is the sample size. We must
point out that when BIC is used as the model selection guidance the fractional polynomial
BIC is misleading since it is calculated assuming both degree m and power vector pm are
constants instead of parameters. Therefore the k, number of estimated parameters, in the
BIC formula does not reflect the fact that m and pm are estimated and the calculated BIC
value is misleading. As mentioned before, the change point model (2.2.5) is nested in the
free-knot spline model (2.2.6) by taking the knot to be the center of the baseline parabola,
hence the degrees of freedom for the likelihood-based χ2 test is 1. The quadratic model can
be achieved by forcing β1 = β2 from the change point model (2.2.5) hence the degrees of
freedom for the model comparison is again 1. Estimated nadirs were obtained using methods
described in section 2.2. The nadir of the quadratic spline model was calculated by taking
the first order derivative of the spline function then comparing the function values at the
two potential nadirs. The local maximum or minimum was eliminated by taking the nadir
to be the one generating a lower function value. 95% Confidence intervals of the quadratic
model or the model with transformation were found via the Delta method mentioned in
section 2.2. Those of the change point model or the spline model were likelihood-based
confidence intervals found through a profile likelihood search.
21
3.1.1
The National Health Interview Survey (NHIS) White Female
The first example is the National Health Interview Survey. The NHIS is a continuing
nationwide survey of the U.S. civilian non-institutionalized population conducted through
households. Each week a probability sample of households is interviewed by trained personnel
from the U.S. Census Bureau to obtain information about the health and other characteristics
of each member of the sample household. The average annual sample consists of 36000 to
47000 households, yielding 92000 to 125000 people. Completed questionnaires are sent from
the U.S. Census Bureau field offices to the National Center for Health Statistics (NCHS)
for coding and editing. Beginning with survey year 1986, linkage information has been
collected on NHIS respondents to allow for matching with other data systems, including the
National Death Index (NDI). Linkage of NHIS respondents with NDI provides a longitudinal
component to NHIS, which allows for the ascertainment of vital status. (From the Florida
State University Diverse Populations Collaboration website biostat.stat.fsu.edu).
We focused on the subgroup of 61521 white females with 5091 deaths, from whom smoking
status is known. The shortest follow-up time is 3 days and the longest follow-up time is
3286 days. The distribution of BMI is right skewed hence not Normal, whereas the LBMI
distribution is approximately Normal. The results of model fitting and nadir estimations are
given in Table 3.1.
Table 3.1: Model Comparisons And Nadir Estimations, NHIS White Female
p(χ2 )
BIC
Nadir
C.I.
transformation
fractional
polynomial
100436.4
23.7
23.1 − 24.3
100421.9
24.2
23.7 − 24.8
quadratic
change
quadratic
point
spline
0
0
100610.8
100474.9
100432.1
28.4
22.2
24.4
24.0 − 32.7 21.7 − 22.7 23.2 − 25.0
This particular dataset shows the p value of the likelihood ratio test between the quadratic
model and the change point model is trivially small, that between quadratic model and the
quadratic spline model is also trivially small and if the change point model is compared
with the quadratic spline model the significant one is the spline model. BIC could also be
22
used as a model selection guide. The advantage of using BIC is that it accounts for the
number of parameters appearing in each model. One can improve the likelihood value by
including a large number of unknown parameters that need to be estimated, BIC penalizes
such excessive utilization of parameters. The comparisons among simple quadratic model,
change point model and the spline model using BIC as the criterion suggest the quadratic
spline model is the best with the smallest BIC of 100432.1. Hence both the likelihood ratio
test and the BIC value indicate the spline model should be selected. We notice the BIC
value of fractional polynomial method is the lowest, this does not indicate the fractional
polynomial method is the best due to the reason mentioned before. As for nadir estimation,
if we take the nadir, 23.7, calculated from the model with a LBMI transformation as the
reference, the fractional polynomial nadir 24.2, the change point nadir 22.2 and the spline
nadir 24.4 are close to the reference. The quadratic nadir 28.4 is too far from the reference
point. As for confidence intervals, apparently the worst is the quadratic method since its
confidence interval does not cover the reference nadir estimator and is the widest among all
confidence intervals. The change point confidence interval is too narrow so that it does not
cover the reference point 23.7. The reference nadir falls on the lower end of the fractional
polynomial nadir. The quadratic spline confidence interval contains the reference nadir and
is not too wide.
Figure 3.1 and Figure 3.2 are the profile likelihood curves based on the change point
model and the spline model, respectively, with the x−axis representing the BMI values and
the y−axis representing the log partial likelihood. The profile curves achieve their maximums
at the maximum partial likelihood estimators, and the horizontal lines are 12 χ21,α units lower
than the corresponding maximum log partial likelihood values. In other words, the log
partial likelihood values that are above the horizontal line correspond to BMIs in the 95%
likelihood-based confidence interval if α is taken to be 0.05. For example, in Figure 3.1 the
horizontal axis represents BMI, 50 equally spaced points in the range of BMI were taken to
be the change point value one by one and the log partial likelihood was calculated for each
change point. The solid curve is the log partial likelihood versus change point. The 95%
confidence interval of the true change point consists of every change point with a log partial
likelihood above the horizontal line, that is 21.7 − 22.7.
Similarly, by applying a grid search of the knot and a grid search of the nadir under the
quadratic free-knot spline model, we obtained the profile curves of the knot and the nadir in
23
15
20
25
30
35
−50170
−50210
−50240
−50250
cohort 65, NHIS White Female
log partial likelihood
−50200
−50190
−50180
log partial likelihood
−50230 −50220 −50210
−50200
cohort 65, NHIS White Female
40
BMI
Figure 3.1: Change Point Model Profile
Likelihood, NHIS White Female
15
20
25
BMI
30
35
Figure 3.2: Quadratic Spline Profile Likelihood, NHIS White Female
Figure 3.2. The knot profile curve is the solid line and the nadir profile curve is the dashed
line. The two ends of the likelihood-based confidence interval of the nadir are given by the
two intersection points of the profile likelihood curve and the horizontal line. Actually, the
method used to generate the knot profile curve is similar to the process of generating the
change point profile curve. 50 equally-spaced grid points were selected in the BMI range and
each time the knot is taken to be one grid point so that the log partial likelihood value under
the fixed knot can be obtained using a standard package. Plotting the log partial likelihood
values against corresponding grid points generates the knot profile curve. The nadir profile
curve is a bit different. Since we only focus on “quadratic-looking” non-monotonic curves
the zero of the first order derivative of the quadratic spline function is the nadir. For each
fixed nadir, using such a relationship, we can express one regression coefficient as a function
of the other regression coefficients, the knot and the given nadir. This way the nadir enters
the model as a parameter. Now for a fixed nadir a grid search within the BMI range needs
to be applied to the knot since the knot is non-linear in the model. This double-grid-search
utilizing standard software created the nadir profile curve in Figure 3.2. In this example the
profile curves of the knot and nadir under the more general quadratic spline model show
that the point estimators of the knot and the nadir are not equal, and their confidence
24
intervals are very different. The nadir is the optimal risk factor value, the knot takes care
of the overall model fitting, forcing them to be the same will either affect nadir estimation
or sacrifice the overall model fitting. Another problem of the change point method that we
notice is its narrow confidence interval compared to other methods. If the confidence interval
is too narrow the coverage probability might be affected.
The adjusted fitted curves of this NHIS study, using the five different methods, are
presented in Figure 3.3. For an easier visual comparison, their vertical locations are adjusted
so that each fitted curve, evaluated at the nadir, is equal to zero. In this graph, the fitted
quadratic curve is very different from all other curves in that it is almost flat, although all
coefficients are significant, and its nadir is much higher than nadirs given by other methods.
This problem is caused by the symmetry of the quadratic curve.
0
adjusted prognostic index
.2
.4
.6
.8
1
cohort 65, NHIS White Female
10
20
30
40
50
BMI
Trans
Quad
Frac
Chgpt
Spline
Figure 3.3: Fitted Curves, NHIS White Female
3.1.2
NHIS White Male
The NHIS White Male cohort contains 46264 males with 4582 deaths for whom smoking
status was available. The shortest follow-up time is 1 day and the longest is 3286 days.
According to both the likelihood ratio tests and the BIC values, the spline model is the best.
25
As shown in Table 3.2, the quadratic model again generates an unrealistic nadir estimate, its
confidence interval is wider than other confidence intervals and does not cover the reference
point estimator 26.2. The change point model and the quadratic spline model have nadirs
that are equally close to the reference point, and both the confidence intervals contain the
reference nadir 26.2 as a borderline case. The fractional polynomial model generates a nadir
that is close to the reference point, its confidence interval covers the reference nadir. However,
this method has its defect and the BIC value is misleading. Figure 3.4 is the change point
profile curve. In Figure 3.5 the two profile likelihood curves are different, especially the
generated confidence intervals are completely non-overlapping, indicating the knot and the
nadir are not the same. Again, this example shows we should not force the knot and the
nadir to be equal. Figure 3.6 is the fitted curve graph, once again the symmetric curve
obtained from the fitting of a quadratic model is very different from other curves and gives
a nadir estimator that is too high.
Table 3.2: Model Comparisons And Nadir Estimations, NHIS White Male
p(χ2 )
BIC
Nadir
C.I.
3.1.3
transformation
fractional
polynomial
88299.8
26.2
25.1 − 27.2
88267.0
26.5
25.8 − 27.2
quadratic
change
quadratic
point
spline
0
0
88396.8
88294.2
88261.2
34.2
25.5
26.9
31.6 − 36.9 24.7 − 26.3 26.2 − 27.4
The Norwegian Counties Study
The Norwegian Counties Study is a population-based survey of counties in Norway. The
cohort of 50, 000 individuals was examined initially in 1974 − 78, with follow-up visits from
1978 − 83 and 1983 − 88. Information on mortality is complete through 1992 (From Florida
State University Diverse Populations Collaboration website biostat.stat.fsu.edu).
The part of the data that we adopted was 24631 males. There are 2434 deaths with
the follow-up period ranging from 29 days to 6866 days. Their lowest BMI is 13.04 and the
highest is 60.64. Figure 3.7 shows as we move the change point to the left the model fitting
gets better and better, and all points to the left of the data range could be used as the change
26
−44090
cohort 66, NHIS White Male
log partial likelihood
−44110
−44100
20
25
30
BMI
35
−44120
−44150
log partial likelihood
−44140
−44130
−44120
−44110
cohort 66, NHIS White Male
40
Figure 3.4: Change Point Model Profile
Likelihood, NHIS White Male
20
25
30
BMI
1.5
adjusted prognostic index
.5
1
0
20
30
40
50
BMI
Trans
40
Figure 3.5: Spline Model Profile Likelihood, NHIS White Male
cohort 66, NHIS White Male
10
35
Quad
Frac
Chgpt
Spline
Figure 3.6: Fitted Curves, NHIS White Male
27
point. This means the change point model fit the data using a monotonically increasing curve
although the likelihood ratio test based on BMI and LBMI, fractional polynomial test and
the change point screening test all agreed the relationship was non-monotone. The confidence
interval of the nadir based on LBMI also suggests existence of an upturn at the low end.
Figure 3.8 is the profile likelihood curves of the spline model. It is clear the confidence
interval of the nadir and that of the knot are completely apart from each other with a gap
between them, indicating they are different hence the assumption that they are the same is
not appropriate sometimes. The likelihood ratio tests in Table 3.3 suggest the change point
model is not significantly different from the quadratic model, which is not surprising because
the estimated change point is close to the lowest BMI value thus the change point model
almost degenerates to a quadratic model. However the comparison between quadratic and the
spline model shows the latter is significant with a p value of 3.70 × 10−11 . The best, among
three models other than the transformation and fractional polynomial models, according
to BIC is the spline model. By comparing estimated nadirs and confidence intervals we
realize the quadratic model and the change point model both fit the data with monotonically
increasing curves, whereas both the transformation model and the spline model detected the
non-monotonicity. In this example the fractional polynomial fitting generated powers −1
and −2 for BMI, hence this model is the same as the transformation model. Figure 3.9
contains fitted curves using these methods. Since the fractional polynomial powers of BMI
are −1 and −2, the transformation model and the fractional polynomial curve coincide.
Table 3.3: Model Comparisons And Nadir Estimations, The Norwegian Counties Study (full
sample)
p(χ2 )
BIC
Nadir
C.I.
transformation
fractional
polynomial
45862.5
22.8
22.1 − 23.4
45862.5
22.8
22.1 − 23.4
quadratic
change
quadratic
point
spline
0.2
1.2 × 10−11
45898.1 45906.2
45870.3
12.6
Failed
23.8
0.9 − 24.2 Failed 23.5 − 24.4
A closer look at the dataset reveals it is an extreme value that failed the quadratic and
the change point model. After the observation with BMI 60.64 is dropped all BMI values
28
15
20
25
−22900
−22920
−22940
−22945
cohort 68, Norwegian Counties Study
log partial likelihood
−22915
−22910
−22905
log partial likelihood
−22935 −22930 −22925
−22920
cohort 68, Norwegian Counties Study
30
15
20
25
BMI
BMI
Figure 3.7: Change Point Model Profile Likelihood, The Norwegian Counties
Study (full sample)
30
Figure 3.8: Spline Model Profile Likelihood, The Norwegian Counties Study
(full sample)
0
adjusted prognostic index
.5
1
1.5
2
cohort 68, The Norwegian Counties Study
10
20
30
40
50
BMI
Trans
Quad
Frac
Chgpt
Spline
Figure 3.9: Fitted Curves, The Norwegian Counties Study (full sample)
29
are less than 50. Figure 3.10 shows that this time the change point profile likelihood curve
goes down at the low end and a valid confidence interval for the nadir can be obtained.
Figure 3.11 is the spline model profile curves. After the extreme BMI value is dropped there
is still no intersection between the confidence interval of the knot and that of the nadir.
Table 3.4 contains the model comparison and nadir estimation information obtained after
the extreme value is dropped. If results based on the transformation model are used as the
reference and the misleading BIC of fractional polynomial model is not considered, according
to both BIC and likelihood ratio tests the best model is the quadratic spline model. The
likelihood ratio p value between the change point model and the spline model is 3.0 × 10−4
(not shown). The estimated nadirs are around 22 and 23 and the confidence intervals are
close. The transformation confidence interval and the spline confidence interval overlap.
The fitted curves in Figure 3.12 were obtained after the extreme BMI was dropped. This
time to the left of the nadir all curves are close except the quadratic model. To the right
of the nadir the change point curve and the quadratic curve are very similar, the spline and
the transformation model curves are close.
cohort 68, Norwegian Counties Study
log partial likelihood
−22905
20
25
30
−22910
−22940
log partial likelihood
−22930
−22920
−22910
−22900
−22900
cohort 68, Norwegian Counties Study
35
BMI
Figure 3.10: Change Point Model Profile Likelihood, The Norwegian Counties
Study (1 obs dropped)
20
25
BMI
30
Figure 3.11: Spline Model Profile Likelihood, The Norwegian Counties Study (1
obs dropped)
This example is a special case where the change point model and the quadratic model
fail when there are extreme values in the data. This is an indication that the quadratic and
30
Table 3.4: Model Comparisons And Nadir Estimations, The Norwegian Counties Study (1
obs dropped)
p(χ2 )
BIC
Nadir
C.I.
transformation
fractional
polynomial
45857.8
22.8
22.1 − 23.4
45854.1
23.3
22.6 − 23.9
quadratic
change
quadratic
point
spline
−3
3.5 × 10
2.1 × 10−5
45869.7
45871.3
45868.3
22.9
22.4
23.8
21.2 − 24.6 21.0 − 23.3 23.2 − 24.1
0
adjusted prognostic index
1
2
3
4
cohort 68, The Norwegian Counties Study
10
20
30
40
50
BMI
Trans
Quad
Frac
Chgpt
Spline
Figure 3.12: Fitted Curves, The Norwegian Counties Study (1 obs dropped)
change point models might not be as stable as the transformation, fractional polynomial and
spline models in presence of extreme values. The effect of extreme values on nadir estimation
will be explored later in the simulation part.
3.1.4
Diverse Populations Collaboration
The Diverse Populations Collaboration is a group of investigators who have pooled data
from their studies into a single database in order to examine issues of heterogeneity of
results in epidemiological studies. The database available to the collaboration currently
31
includes person-level data from 27 studies providing 395, 682 observations. Over 4, 500, 000
person-years of follow-up is available documenting 60, 374 deaths, 17, 708 deaths from
CHD and 15, 523 deaths from cancer. Baseline information begins in 1950 and continues
through 1990. Data samples include both sexes, and white, black, hispanic, and other
ethnic subgroups (From Florida State University Diverse Populations Collaboration website
biostat.stat.fsu.edu).
As mentioned before, four non-monotonicity detection tools, two quadratic models,
second order fractional polynomial and the change point model screening procedure, were
adopted. Actually they were applied to all 78 cohorts in the Diverse Populations Collaboration, and on 31 of the 78 cohorts all the four tests agreed the relationship between BMI
and mortality was non-monotonic. Three models, the simple quadratic model with BMI, the
change point model and the spline model, were fit to each of these 31 cohorts and likelihood
ratio test statistics were calculated. Table 3.5 contains results of model comparisons based
on the 31 cohorts: about one third of them are quadratic, one third are change point and
the other one third are spline models. Table 3.6 are model comparison results based on BIC
values. Half of them are quadratic, one fourth are change point models and the other one
fourth are spline models.
Table 3.5: Model Comparisons Using Likelihood Ratio Tests
model
frequency
Quadratic
9
Change Point
10
Quadratic Spline
12
Table 3.6: Model Comparisons Using BIC
model
frequency
Quadratic
15
Change Point
8
Quadratic Spline
8
32
3.2
Splines With Free Knots
(3.2.1), as an example of free-knot polynomial splines, is the quadratic spline with one free
knot and continuous first order derivative.
h(t|xi , z i ) = h0 (t) × exp β1 x + β2 x2 + β3 (x − c1 )2 1{x>c1} + z i ⊤ γ
(3.2.1)
The only difference between (3.2.1) and (2.2.6) is that in (3.2.1) c1 is a parameter instead of
a constant.
The change point model assumes the knot and the nadir of the function are the same,
which is shown to be not always true through examples in the previous section. Forcing the
nadir and the knot to be equal makes the confidence interval of the nadir to be affected by
that of the knot, maybe by the point where the nadir and the knot are the same as well.
On the other hand, pushing the knot to be equal to the nadir could sacrifice the overall
model fitting. A natural generalization of the change point model is to separate the two
numbers and adopt the spline model with free knots. As a generalization of the change
point model, the free-knot spline model can be applied to covariates that are not Normally
distributed. Compared to fractional polynomials or splines with fixed knots it provides more
accurate inference if the asymptotic distribution of the MPLEs for model parameters can
be established. Therefore in the next chapter we will derive the asymptotic distribution of
estimated parameters of (3.2.1).
33
CHAPTER 4
ASYMPTOTIC PROPERTIES OF THE PROPOSED
METHOD
We will first derive the asymptotic Normality of the score process, then prove the consistency
of the Maximum Partial Likelihood Estimator, followed by the asymptotic Normality of the
maximum partial likelihood estimator. All lemmas required for the proofs are presented in
section 4.4.
4.1
Asymptotic Normality Of The Score Process
Consider the Cox proportional hazards model in which the hazard rate at time t for an
individual with q-variate covariate Z(t) is
h(t) = h0 (t) exp(gθ (Z(t))),
(4.1.1)
where θ ∈ Rp , gθ (z) is twice continuously differentiable with respect to θ with first
derivative g˙θ (p-column vector) and second derivative g¨θ (p × p matrix), and h0 is a
baseline hazard rate. Our candidate for gθ (Z) is the one-free-knot quadratic spline gθ (z) =
β1 z + β2 z 2 + β3 (z − k)2 1{z>k}, where θ = [β1 , β2 , β3 , k]⊤ a unknown column vector parameter,
when z is not close to the knot k. From now on we shall denote the true unknown parameter
by θ0 = [β1,0 , β2,0 , β3,0 , k0]⊤ . Our focus will be on the estimation of θ0 and its asymptotic
behaviors. The first derivative is a 4-dimensional column vector
∂gθ
g˙θ (Z) ≡
(Z) = [Z, Z 2 , (Z − k)2 1{z>k} , −2β3 (Z − k)1{z>k}]⊤ .
∂θ
The second derivative is a 4 × 4 matrix g¨θ (Z) given by


0 0
0
0

 0 0
∂ 2 gθ
0
0


(Z)
=
 0 0
0
−2(Z − k)1{Z>k} 
∂θ∂θ⊤
0 0 −2(Z − k)1{Z>k}
2β3 1{Z>k}
34
(4.1.2)
Note that g˙θ is a continuous function of θ, and under the assumption k 6= Z, g¨θ (Z) is also a
continuous function of θ.
Let T and U be the failure time and censoring time of a person and Z be a covariate
associated with the person such as diastolic blood pressure, body mass index (BMI), etc.
Suppose that the data available are i.i.d. observations (Xi , δi , Zi ) for i = 1, ..., n, where
Xi ≡ min(Ti , Ui ), representing the observed time of person i; δi ≡ 1{Ti ≤Ui } , indicating that
the observed time is a death time not a censoring. Let Ni (t) ≡ 1{Xi ≤t,δi =1} , being one when
person i dies before or on time t and zero otherwise. Let the at-risk process Yi(t) ≡ 1{Xi ≥t} ,
denote whether person i is still alive, or at risk, at time t. Throughout we assume the
following hold.
MC. (N1 , ..., Nn ) is a multivariate counting process, from which it follows in particular that
no two component processes jump at the same time, i.e., for any t ≥ 0 and i 6= j,
P {∆Ni (t) = ∆Nj (t) = 1} = 0,
(4.1.3)
where ∆ξ(t) ≡ ξ(t) − ξ(t−) for a process ξ(t) : t ≥ 0 which is right-continuous with left-hand
limit.
PD. Each of the at-risk process Yi and covariate process Zi is predictable with respect to a
right-continuous filtration {Ft : t ≥ 0} which represents the statistical information accruing
over time. We have the following result.
Proposition 1. In the Cox Proportional Hazards model, the covariate Z is expressed in
a smooth function gθ (Z). Suppose that gθ has continuous first and second derivatives.
Suppose that for i = 1, ..., n, the failure time Ti and the censoring Ui are conditionally
independent given the covariate Zi ; the covariate Zi are bounded and constant in time; and
P {Yi(τ ) > 0} > 0. Then the following hold.
Rτ
(I) The time τ is such that 0 h0 (x)dx < ∞.
(II) Let
n
Sn(0) (θ, t) ≡
1X
Yi (t) exp(gθ (Zi )),
n i=1
n
Sn(1) (θ, t)
1X
≡
g˙θ (Zi )Yi (t) exp(gθ (Zi )),
n i=1
n
Sn(2) (θ, t)
1X
≡
g˙θ (Zi )⊗2 Yi (t) exp(gθ (Zi)).
n i=1
35
Then for any compact neighborhood Θ of θ0 and on Θ × [0, τ ] there exists a scalar s(0) , a
vector s(1) and a matrix s(2) such that for j = 0, 1, 2
sup
x∈[0,τ ],θ∈Θ
P
||Sn(j) (θ, x) − s(j) (θ, x)|| −→ 0, n → ∞,
where g˙θ (Zi )⊗2 ≡ g˙θ (Zi ) × g˙θ (Zi )⊤ , kMk ≡ max{|Mij | : ∀i, j} is a norm of matrix M.
(III) Using the definitions of Θ and s(j) , j = 0, 1, 2, given above, define
e ≡ s(1) /s(0) ,
v ≡ s(2) /s(0) − e⊗2
then for any θ ∈ Θ and x ∈ [0, τ ],
∂ (0)
s (θ, x) = s(1) (θ, x)
∂θ
∂ (1)
s (θ, x) = s(2) (θ, x) + E[g¨θ (Z)Y (x) exp (gθ (Z))].
∂θ
(IV) For j = 0, 1, 2, the functions s(j) (θ, x) are bounded; the function families s(j) (., x), x ∈
[0, τ ] are equicontinuous at θ = θ0 ; and s(0) (θ, x) is bounded away from zero on Θ × [0, τ ].
Proof: Since Yi(τ ) = 1{Xi ≥τ } = 1{Ti ≥τ,Ui ≥τ } , it follows that
P {Yi(τ ) > 0} = P {Ti ≥ τ, Ui ≥ τ }.
By first taking the conditional expectation given Zi then taking the expectation of Zi and
in view of the conditional independence assumption on Ti and Ui , we further have
P {Yi(τ ) > 0} = P {Ti ≥ τ, Ui ≥ τ } = E(P {Ti ≥ τ |Zi }P {Ui ≥ τ |Zi }),
Rτ
where P {Ti ≥ τ |Zi } = exp − exp(gθ (Zi )) 0 h0 (x)dx . Hence condition P {Yi(τ ) > 0} > 0
implies that neither the non-negative random variable P {Ti ≥ τ |Zi } nor P {Ui ≥ τ |Zi } is
Rτ
zero almost surely, thus P {exp[− exp(gθ (Zi)) 0 h0 (x)dx] > 0} > 0. Therefore the desired
Rτ
result (I) 0 h0 (x)dx < ∞ follows.
(0)
We are now about to show (II). By the Strong Law of Large Numbers, Sn (θ, t) →
E(Y (t) exp(gθ (Z))) almost surely for arbitrarily fixed (θ, t). Next we will show this pointwise
convergence is uniform in Θ × [0, τ ] except on a measure zero set, for some compact
neighborhood Θ of θ0 .
(1)
(2)
The same argument applies to Sn (θ, t) and Sn (θ, t).
P (Ui ≥ t|Zi ) could be a discontinuous function of t and
s(0) (θ, t) = E[Yi (t) exp(gθ (Zi))] = E[exp(gθ (Zi))E(Yi (t)|Zi )]
Z t
= E[exp(gθ (Zi )) exp{− exp(gθ (Zi ))
h0 (x)dx}P (Ui ≥ t|Zi )],
0
36
Since
it follows that s(0) (θ, t) is not necessarily continuous in t. Following the idea used in the proof
(0)
of Glivenko-Cantelli Theorem 5.5.1 [36], we will prove the uniform convergence of Sn (θ, t)
to s(0) (θ, t) on t ∈ [0, τ ], i.e.,
ξn (θ) ≡ sup ||Sn(0) (θ, x) − s(0) (θ, x)|| → 0,
a.s. n → ∞.
x∈[0,τ ]
The details are given in Lemma 1. For the uniform convergence in θ, we need to show
supθ∈Θ ξn (θ) → 0 on a set with probability one. Suppose the contrary, that is, there is an
ǫ > 0, sequences {nk : k = 1, 2, ...} and {θk } such that for all k, ξnk (θk ) ≥ ǫ. If Θ is any
compact neighborhood of θ0 , then there exists a convergent subsequence, denoted still by θk
without loss of generality, such that θk → θ ∈ Θ. Then a contradiction can be derived as
follows:
ǫ ≤ ξnk (θk ) = sup ||Sn(0)
(θk , x) − s(0) (θk , x)||
k
x∈[0,τ ]
≤
sup
x∈[0,τ ]
||Sn(0)
(θk , x)
k
− Sn(0)
(θ, x)||
k
+ sup ||s(0) (θk , x) − s(0) (θ, x)||
x∈[0,τ ]
+ sup ||Sn(0)
(θ, x) − s(0) (θ, x)||.
k
x∈[0,τ ]
The last line of the inequality tends to zero as k → ∞, based on the established uniform
(0)
convergence result in t when θ is fixed. By applying the defining expression of Sn (θ, x), the
first term can be written as
sup ||Sn(0)
(θk , x) − Sn(0)
(θ, x)||
k
k
x∈[0,τ ]
nk
h
i
1 X
= sup ||
Yi (x) exp(gθk (Zi)) − exp(gθ (Zi )) ||
x∈[0,τ ] nk i=1
nk
h
i
1 X
sup ||Yi(x) exp(gθk (Zi)) − exp(gθ (Zi )) ||
≤
nk i=1 x∈[0,τ ]
nk
1 X
≤
sup || exp(gθk (Zi )) − exp(gθ (Zi ))||
nk i=1 x∈[0,τ ]
nk
1 X
⊤
∗ (Zi ))ġθ ∗ (Zi ) (θk − θ)|
| exp(gθi,k
=
i,k
nk i=1
nk
1 X
⊤
∗ (Zi ) || × ||θk − θ||
≤ B1
q||ġθi,k
nk i=1
≤ B1 B2 q||θk − θ||.
37
∗
where θi,k
∈ Θ is a point on the line segment between θ and θk , B1 is the bound of
| exp(gθ (Zi ))| on Θ with i, k signifying dependence on Zi and θk , B2 is the bound of ||g˙θ (Zi)⊤ ||
∗ (Zi ). Hence when ||θk − θ|| → 0 the first term converges
on Θ and q is the dimension of ġθi,k
to zero also. For the second term we have
sup ||s(0) (θk , x) − s(0) (θ, x)||
x∈[0,τ ]
=
sup ||EY (x)[exp(gθk (Z)) − exp(gθ (Z))]||
x∈[0,τ ]
≤ E sup ||Y (x)[exp(gθk (Z)) − exp(gθ (Z))]||
x∈[0,τ ]
and it converges to zero as well based on similar arguments. Therefore a contradiction
0 < ǫ ≤ 0 is reached, and we conclude
sup
x∈[0,τ ],θ∈Θ
||Sn(0) (θ, x) − s(0) (θ, x)|| → 0,
a.s.
(4.1.4)
We can prove (III) by applying a classical result about exchangeability of differentiation
and expectation (see, e.g., Theorem 16.8 [37]) in a compact neighborhood Θ of θ0 and in
view of the boundedness assumption of Zi . In fact,
∂
∂ (0)
s (θ, x) =
E[Y (x) exp(gθ (Z))]
∂θ
∂θ
and the derivative of Y (x) exp(gθ (Z)) w.r.t. θ is Y (x)g˙θ (Z) exp(gθ (Z)). By the boundedness
of Z and compactness of Θ, this derivative is bounded by a constant that is independent of
θ. Now the assumptions of Theorem 16.8 [37] are satisfied and hence
∂ (0)
∂
∂
s (θ, x) =
E[Y (x) exp(gθ (Z))] = E [Y (x) exp(gθ (Z))]
∂θ
∂θ
∂θ
(1)
= E[Y (x)g˙θ (Z) exp(gθ (Z))] = s (θ, x).
To prove
∂ (1)
s (θ, x) = s(2) (θ, x) + E[g¨θ (Z)Y (x) exp (gθ (Z))]
∂θ
we use exactly the same idea.
The derivative of Y (x)g˙θ (Z) exp(gθ (Z)) w.r.t.
θ is
Y (x)[g˙θ (Z)⊗2 + g¨θ (Z)] exp(gθ (Z)). Now g¨θ (Z) is bounded by a constant due to boundedness
of Z and compactness of Θ so that the derivative is bounded by a constant. Again Theorem
38
16.8 [37] shows
∂ (1)
∂
s (θ, x) =
E[Y (x)g˙θ (Z) exp(gθ (Z))]
∂θ
∂θ
∂
= E [Y (x)g˙θ (Z) exp(gθ (Z))]
∂θ
= E[Y (x)(g˙θ (Z)⊗2 + g¨θ (Z)) exp(gθ (Z))]
= s(2) (θ, x) + E[Y (x)g¨θ (Z) exp(gθ (Z))].
We are left with the last result (IV). Due to the boundedness of Z1 compactness of Θ,
all s(j) (θ, t), j = 0, 1, 2 are bounded on Θ × [0, τ ]. Since Θ is compact and Z is bounded, it
follows that there is a finite constant B such that |gθ (Z)| ≤ B for all θ ∈ Θ and Z. Hence
for all points in Θ × [0, τ ]
s(0) (θ, t) = E[Y1 (t) exp(gθ (Z1 ))] ≥ exp(−B)E[Y1 (t)]
≥ exp(−B)E[Y1 (τ )] = exp(−B)P {Y1 (τ ) > 0} > 0
The last inequality is due to the assumption P {Y1(τ ) > 0} > 0.
At the end, we will show function families s(j) (., x), x ∈ [0, τ ] are equicontinuous at
θ = θ0 for j = 0, 1, 2. We will demonstrate it with s(0) (., x) as an example. Consider
supt∈[0,τ ] ||s(0) (θm , t) − s(0) (θ0 , t)|| for ||θm − θ0 || → 0 as m → ∞,
sup ||s(0) (θm , t) − s(0) (θ0 , t)||
t∈[0,τ ]
= sup ||E[Y (t)(egθm (Z1 ) − egθ0 (Z1 ) )]||
t∈[0,τ ]
≤ sup E[||egθm (Z1 ) − egθ0 (Z1 ) || · E(Y1 (t)|Z1 )]
t∈[0,τ ]
≤ E||egθm (Z1 ) − egθ0 (Z1 ) ||
= E||ġθ∗ (Z1 )⊤ egθ∗ (Z1 ) (θm − θ0 )||
≤ qE||ġθ∗ (Z1 )egθ∗ (Z1 ) || · ||θm − θ0 ||,
where ġθ∗ (Zi) is the derivative of gθ (Zi ) w.r.t. θ evaluated at θ∗ ∈ Θ, q is the dimension of
ġθ∗ (Z1 ). By letting m go to infinity the last member of the above inequality tends to zero
due to boundedness of Zi and compactness of Θ. Therefore s(0) (., x), x ∈ [0, τ ] is a family of
equicontinuous functions at θ = θ0 . This completes the proof.
The covariate vector Z = [z1 , z2 , ..., zq ]⊤ is q-dimensional. In our case we will focus on
the single main risk factor, thus q = 1. The Proportional Hazards Model with free knot
39
spline link function gθ (Z) assumes the hazard function satisfies h(t) = h0 (t) exp[gθ (Z)] with
baseline hazard rate h0 . The model gθ (Z) is taken to be a quadratic spline with one free
knot, gθ (Z) = β1 Z + β2 Z 2 + β3 (Z − k)2+ , where k is the knot parameter and (Z − k)2+ is
(Z − k)2 1{Z>k} . The column parameter vector θ = [β1 , β2 , β3 , k]⊤ will be estimated through
maximum partial likelihood method (MPL).
The partial likelihood function is
P L(θ) =
L
Y
i=1
exp [gθ (Z(i) )]
Σj∈Ri exp [gθ (Zj )]
where T10 < T20 < ... < TL0 are ordered distinct death times, (i) is the label of the single
individual which dies at Ti0 and Ri is the risk set of persons who are still alive at Ti0 ,
including (i). The log partial likelihood function can be expressed in martingale notation,
l(θ) = log P L(θ) =
n Z
X
i=1
∞
0
n
h
i
X
gθ (Zi ) − log
Yj (t) exp(gθ (Zj )) dNi (t).
j=1
The score function U(θ) is
n
∂l(θ) X
=
∂θ
i=1
Z
∞
0
Pn
i
h
j=1 Yj (t)g˙θ (Zj ) exp(gθ (Zj ))
Pn
dNi (t),
g˙θ (Zi ) −
j=1 Yj (t) exp(gθ (Zj ))
∂gθ
= [Z, Z 2 , (Z − k)2+ , −2β3 (Z − k)+ ]⊤ is a column vector. Let Ai (t) be
∂θ
the compensator of the counting process Ni (t), in other words, Mi (t) = Ni (t) − Ai (t) is a
where g˙θ =
martingale. Under the Cox Proportional Hazards Model dAi (t) = Yi(t) exp(gθ0 (Zi ))h0 (t)dt,
hence
Pn
i
h
j=1 Yj (t)g˙θ0 (Zj ) exp(gθ0 (Zj ))
Pn
dMi (t)
U(θ0 ) =
g˙θ0 (Zi ) −
Y
(t)
exp(g
(Z
))
j
θ
j
0
0
j=1
i=1
Pn
n Z ∞h
i
X
j=1 Yj (t)g˙θ0 (Zj ) exp(gθ0 (Zj ))
Pn
+
g˙θ0 (Zi ) −
dAi (t)
j=1 Yj (t) exp(gθ0 (Zj ))
i=1 0
n Z
X
and the second term
n Z
X
∞
Pn
i
h
j=1 Yj (t)g˙θ0 (Zj ) exp(gθ0 (Zj ))
Pn
dAi (t)
g˙θ0 (Zi) −
j=1 Yj (t) exp(gθ0 (Zj ))
i=1 0
Z ∞
Z ∞ (1)
Sn (θ0 , t) (0)
(1)
=
nSn (θ0 , t)h0 (t)dt −
nSn (θ0 , t)h0 (t)dt = 0
(0)
0
0
Sn (θ0 , t)
∞
40
Therefore
U(θ0 ) =
n Z
X
i=1
∞
0
Pn
i
h
j=1 Yj (t)g˙θ0 (Zj ) exp(gθ0 (Zj ))
Pn
dMi (t)
g˙θ0 (Zi ) −
j=1 Yj (t) exp(gθ0 (Zj ))
is a martingale. Now let the score process be
Pn
n Z th
i
X
j=1 Yj (x)g˙θ0 (Zj ) exp(gθ0 (Zj ))
Pn
dMi (x).
U(θ0 , t) =
g˙θ0 (Zi ) −
Y
(x)
exp(g
(Z
))
j
θ
j
0
0
j=1
i=1
Based on the above proposition, we have the asymptotic Normality of the score process.
Theorem 1. (Asymptotic Normality of the score process) Consider the Cox Proportional
Hazards model. Suppose that the covariate structure of the main risk factor Zi is expressed
as a free knot spline function gθ (Zi ) with Zi bounded and constant in time, P {Yi(τ ) > 0} > 0,
k 6= Zi , ∀i, there is a neighborhood around the true knot k0 where no Zi falls in, and
Rτ
Σ(θ0 , τ ) ≡ 0 v(θ0 , x)s(0) (θ0 , x)h0 (x)dx is positive definite. Then, denoting the score process
by U(θ0 , t), t ∈ [0, τ ], the following hold.
(a) n−1/2 U(θ0 , t) converges in distribution to a Gaussian process, where each component of
the Gaussian process has independent increments, the mean of the limiting process is zero
and the covariance matrix of the limiting process at time t is
Z t
Σ(θ0 , t) =
v(θ0 , x)s(0) (θ0 , x)h0 (x)dx
0
(b) If θ̂n is a consistent estimator of θ0 , then the plug-in estimator of Σ(θ0 , t), Σ̂(θ0 , t) =
Pn R t
1
i=1 0 Vn (θ̂n , x)dNi (x), satisfies
n
n
1X
sup ||
t∈[0,τ ] n i=1
Z
t
Vn (θ̂n , x)dNi (x) − Σ(θ0 , t)|| → 0
0
in probability as n tends to infinity. Furthermore, 1/n times the observed information matrix,
2
l(θ,t)
− n1 ∂∂θ∂θ
⊤ , evaluated at θ̂n , is a consistent estimator of Σ(θ0 , t), for all t ∈ [0, τ ].
Proof: We first write
n Z th
n
i
X
X
U(θ0 , t) =
g˙θ0 (Zi ) −
g˙θ0 (Zj )pj (θ0 , x) dMi (x)
i=1
where
0
j=1
Yj (x) exp(gθ0 (Zj ))
pj (θ0 , x) ≡ Pn
k=1 Yk (x) exp(gθ0 (Zk ))
41
can be viewed as the probability that, at time point x when θ = θ0 , index i is selected
from an urn containing all the n indices. Defined this way, the selected index I is a random
variable and ġθ0 (ZI ), as a vector function of I, is a random vector. The expectation of
P
ġθ0 (ZI ) is equal to nj=1 ġθ0 (Zj )pj (θ0 , x), where to stress the dependence on θ, x we write
I
Eθ,x
the expectation calculated under the urn model with probabilities pi (θ, x) : i = 1, ..., n.
Therefore
n Z th
i
X
U(θ0 , t) =
ġθ0 (Zi ) − EθI0 ,x (ġθ0 (ZI )) dMi (x)
=
i=1 0
n Z t
X
Hi (θ0 , x)dMi (x)
0
i=1
and vector Hi (θ0 , x) = ġθ0 (Zi) − EθI0 ,x (ġθ0 (ZI )) is bounded and predictable.
We will now use the Martingale Central Limit Theorem (see, e.g. Theorem 5.3.5 [38]) to
prove (a) the asymptotic Normality of U (n) (θ0 , t) ≡ n−1/2 U(θ0 , t). Under our assumptions
we need to show the bracket process
n
<U
(n)
1X
(θ0 , ·) > (t) =
n i=1
Z
t
0
Hi⊗2 (θ0 , x)dAi (x)
converges in probability to a limiting matrix that is a function of t as n → ∞. We also need
to show that if we define, based on the lth element of U(θ0 , t) and any ǫ > 0, that
(n)
Ul,ǫ (θ0 , t)
≡
n Z
X
i=1
then
t
0
n−1/2 Hi,l (θ0 , x)1{|n−1/2 Hi,l (θ0 ,x)|≥ǫ}dMi (x)
n
<
(n)
Ul,ǫ (θ0 , ·)
1X
> (t) =
n i=1
Z
t
0
2
Hi,l
(θ0 , x)1{|n−1/2 Hi,l (θ0 ,x)|≥ǫ}dAi (x)
converges in probability to 0.
42
Notice that
n Z
1 X t ⊗2
< U (θ0 , ·) > (t) =
H (θ0 , x)dAi (x)
n i=1 0 i
n Z
i⊗2
1 X th
g˙θ0 (Zi ) − EI (θ0 , x) dAi (x)
=
n i=1 0
!⊗2
Z t h (2)
(1)
i
Sn (θ0 , x)
1
Sn (θ0 , x)
dĀ(x)
−
=
(0)
n 0 Sn(0) (θ0 , x)
Sn (θ0 , x)
Z
i
1 th I
⊗2
⊗2
=
Eθ0 ,x (g˙θ0 (ZI ) ) − EI (θ0 , x) dĀ(x)
n 0
Z
1 t
Vn (θ0 , x)dĀ(x)
=
n 0
Z t
=
Vn (θ0 , x)Sn(0) (θ0 , x)h0 (x)dx
(n)
0
Here
(2)
Vn (θ0 , x) ≡ EθI0 ,x (g˙θ0 (ZI )⊗2 ) − EI (θ0 , x)⊗2 =
Ā(x) ≡
Pn
i=1
Sn (θ0 , x)
(0)
Sn (θ0 , x)
−
"
(1)
Sn (θ0 , x)
(0)
Sn (θ0 , x)
#⊗2
Ai (x). In the above notation, EθI0 ,x (g˙θ0 (ZI )⊗2 ) is the urn model expectation
of random matrix g˙θ0 (ZI )⊗2 hence Vn (θ0 , x) is the urn model variance-covariance matrix of
g˙θ0 (ZI ) at time point x when θ = θ0 .
(j)
By boundedness of Sn (θ, x), j = 0, 1, 2 on Θ × [0, τ ] and bounded away from zero
of s(0) (θ, x) on Θ × [0, τ ] in (IV) and (II) of Proposition 1, it can be shown that
supx∈[0,τ ],θ∈Θ ||Vn (θ, x) − v(θ, x)|| → 0 in probability as n tends to infinity. In fact,
(2)
sup
x∈[0,τ ],θ∈Θ
−
||Vn (θ, x) − v(θ, x)|| =
sup
||
Sn (θ, x)
(0)
Sn (θ, x)
!⊗2
⊗2
(1)
(1)
s (θ, x)
s(2) (θ, x)
Sn (θ, x)
||
+
− (0)
(0)
s (θ, x)
s(0) (θ, x)
Sn (θ, x)
x∈[0,τ ],θ∈Θ
(2)
s(2) (θ, x)
≤ sup || (0)
||
− (0)
s (θ, x)
x∈[0,τ ],θ∈Θ Sn (θ, x)
!⊗2 ⊗2
(1)
s(1) (θ, x)
Sn (θ, x)
||.
+ sup ||
−
(0)
s(0) (θ, x)
x∈[0,τ ],θ∈Θ
Sn (θ, x)
Sn (θ, x)
43
(4.1.5)
For the first term, we have
(2)
sup
Sn (θ, x)
||
(0)
Sn (θ, x)
x∈[0,τ ],θ∈Θ
−
s(2) (θ, x)
||
s(0) (θ, x)
(2)
≤
sup
||
Sn (θ, x) − s(2) (θ, x)
(0)
Sn (θ, x)
x∈[0,τ ],θ∈Θ
+
sup
||
x∈[0,τ ],θ∈Θ
||
s(2) (θ, x)
(0)
s(0) (θ, x)Sn (θ, x)
|| · ||Sn(0) (θ, x) − s(0) (θ, x)||
!
(2)
supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − s(2) (θ, x)||
≤
(0)
+
||Sn (θ, x)||
supx∈[0,τ ],θ∈Θ ||s(2) (θ, x)||
(0)
inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| · ||Sn (θ, x)||
×
sup
x∈[0,τ ],θ∈Θ
||Sn(0) (θ, x) − s(0) (θ, x)||,
(4.1.6)
(0)
where ||Sn (θ, x)|| in the denominators can be further expressed as
||Sn(0) (θ, x)|| = ||Sn(0) (θ, x) − s(0) (θ, x) + s(0) (θ, x)||
≥ ||s(0) (θ, x)|| − ||Sn(0) (θ, x) − s(0) (θ, x)||
≥
inf
x∈[0,τ ],θ∈Θ
||s(0) (θ, x)|| −
sup
x∈[0,τ ],θ∈Θ
||Sn(0) (θ, x) − s(0) (θ, x)||.
Since s(0) (θ, x) is bounded away from zero on Θ × [0, τ ], it follows inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| ≥
η > 0 for some constant η > 0. From
P
sup
x∈[0,τ ],θ∈Θ
||Sn(0) (θ, x) − s(0) (θ, x)|| −→ 0,
(0)
it follows that for sufficiently large n we have supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − s(0) (θ, x)|| ≤
1
2
inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| ≤ η/2 on an event with probability tending to one.
(0)
||Sn (θ, x)|| ≥ η/2 on an event tending to one. Therefore,
(2)
(2)
supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − s(2) (θ, x)||
s(2) (θ, x)
|| ≤
− (0)
|| (0)
1
inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)||
Sn (θ, x) s (θ, x)
2
supx∈[0,τ ],θ∈Θ ||s(2) (θ, x)||
+1
sup ||Sn(0) (θ, x) − s(0) (θ, x)||
(0) (θ, x)||)2
(inf
||s
x∈[0,τ
],θ∈Θ
x∈[0,τ ],θ∈Θ
2
Sn (θ, x)
≤ (4/η 2 )
sup
x∈[0,τ ],θ∈Θ
||s(2) (θ, x)||
sup
x∈[0,τ ],θ∈Θ
44
||Sn(0) (θ, x) − s(0) (θ, x)||.
Hence
By the boundedness of s(2) (θ, x), on [0, τ ] × Θ, and bounded away from zero of s(0) (θ, x) on
[0, τ ] × Θ, both inf x∈[0,τ ],θ∈Θ ||s(0) (θ, x)|| and supx∈[0,τ ],θ∈Θ ||s(2) (θ, x)|| are positive and finite.
(2)
From the results in Proposition 1, both supx∈[0,τ ],θ∈Θ ||Sn (θ, x)−s(2) (θ, x)|| and supx∈[0,τ ],θ∈Θ
(0)
||Sn (θ, x) − s(0) (θ, x)|| converge to zero in probability, the first term in (4.1.5) goes to zero
in probability. To show the second term of (4.1.5) is negligible, we first have that for vectors
A and B,
||A⊗2 − B ⊗2 || = ||AA⊤ − BB ⊤ || = ||(A − B)A⊤ + B(A − B)⊤ ||
≤ ||(A − B)A⊤ || + ||B(A − B)⊤ ||
≤ ||A − B|| · ||A⊤ || + ||B|| · ||(A − B)⊤ ||
= ||A − B|| · (||A|| + ||B||)
≤ ||A − B||(||A − B|| + 2||B||).
Let vector A be
(1)
Sn (θ,x)
(0)
Sn (θ,x)
and B be
s(1) (θ,x)
.
s(0) (θ,x)
Then the negligibility of the first term in
(4.1.5) corresponds to the negligibility of supx∈[0,τ ],θ∈Θ ||A − B||, we only need to show
2 supx∈[0,τ ],θ∈Θ ||B|| is bounded.
This is easily seen by considering the boundedness of
s(1) (θ, x) and noting that s(0) (θ, x) is bounded away from zero. Hence both terms in (4.1.5)
converge to zero in probability and
sup
P
||Vn (θ, x) − v(θ, x)|| −→ 0.
x∈[0,τ ],θ∈Θ
(0)
Together with the conclusion supx∈[0,τ ],θ∈Θ ||Sn (θ, x) − s(0) (θ, x)|| → 0 in probability in
(II), it can be shown that
sup
x∈[0,τ ],θ∈Θ
therefore
Rt
0
P
||Vn (θ, x)Sn(0) (θ, x) − v(θ, x)s(0) (θ, x)|| −→ 0
h0 (x)dx < ∞ of Proposition 1 indicates:
<U
(n)
(θ0 , ·) > (t)
=
P
−→
Z
t
Z0 t
Vn (θ0 , x)Sn(0) (θ0 , x)h0 (x)dx
v(θ0 , x)s(0) (θ0 , x)h0 (x)dx.
0
Next, we show the second condition required by the Martingale Central Limit Theorem
45
is satisfied. Consider
n
<
(n)
Ul,ǫ (θ0 , .)
1X
> (t) =
n i=1
≤
1
n
t
Z
2
Hi,l
(θ0 , x)1{|n−1/2 Hi,l (θ0 ,x)|≥ǫ}dAi (x)
0
n Z t
X
(2B)2 1{n−1/2 (2B)≥ǫ} dAi (x)
0
i=1
where B in the last term is the bound of g˙θ0 (Z) on Θ × [0, τ ] based on the boundedness of Zi
and compactness of Θ. Then the indicator function in the integrand will be zero when n is
P
(n)
large enough, therefore < Ul,ǫ (θ0 , .) > (t) −→ 0 as n → ∞. Now by the Martingale Central
Limit Theorem, see, e.g. (Theorem 5.3.5 [38]), the score process {n−1/2 U(θ0 , t) : t ∈ [0, τ ]}
converges in distribution to a Gaussian process with mean zero and independent increments
of each component. The variance-covariance matrix of the limiting process at time t is the
Rt
limit of < n−1/2 U(θ0 , .) > (t), Σ(θ0 , t) = 0 v(θ0 , x)s(0) (θ0 , x)h0 (x)dx. This finishes the proof
of part (a) of the theorem.
To prove (b), we first have
<U
(n)
1
(θ0 , .) > (t) =
n
Z
t
Vn (θ0 , x)dĀ(x)
Z
Z
1 t
1 t
P
Vn (θ0 , x)dN̄ (x) −
Vn (θ0 , x)dM̄ (x) −→ Σ(θ0 , t).
=
n 0
n 0
Rt
P
P
where N̄(x) ≡ ni=1 Ni (x), M̄ (x) ≡ ni=1 Mi (x). Since n1 0 Vn (θ0 , x)dM̄ (x) is a mean zero
Rt
martingale, n1 0 Vn (θ0 , x)dN̄ (x) is a reasonable estimator of Σ(θ0 , t). The plug-in estimator
0
satisfies
≤
+
+
+
t
Z t
1
||
Vn (θ̂n , x) dN̄(x) −
v(θ0 , x)s(0) (θ0 , x)h0 (x)dx||
n
0
Z0 t h
i1
||
Vn (θ̂n , x) − v(θ̂n , x) dN̄ (x)||
n
0
Z th
i1
||
v(θ̂n , x) − v(θ0 , x) dN̄(x)||
n
0
"
#
Z t
n
X
1
||
v(θ0 , x)
dN̄(x) −
Yi(x) exp(gθ0 (Zi ))h0 (x)dx ||
n
0
i=1
Z t
||
v(θ0 , x) Sn(0) (θ0 , x) − s(0) (θ0 , x) h0 (x)dx||
Z
0
46
(4.1.7)
Applying Lemma 2, for any c, δ > 0 we have
1
δ
P { N̄(t) > c} ≤
+ P{
n
c
Z
δ
+ P{
c
Z
=
t
0
0
t
n
1X
Yi (t) exp(gθ0 (Zi ))h0 (x)dx > δ}
n i=0
Sn(0) (θ0 , x)h0 (x)dx > δ}.
By the SLLN, n1 N̄ (t) converges almost surely, hence the LHS of the above inequality has a
limit limn→∞ P { n1 N̄ (t) > c}. As n → ∞ it can be shown using the Bounded Convergence
Theorem that
P{
Z
t
0
Sn(0) (θ0 , x)h0 (x)dx
> δ} → P {
Z
t
s(0) (θ0 , x)h0 (x)dx > δ}.
0
Due to results (I) and (IV) in Proposition 1,
Rt
s(0) (θ0 , x)h0 (x)dx is a bounded random
δ
c
when n → ∞. Eventually letting c → ∞,
0
variable thus δ can be chosen such that
Z t
s(0) (θ0 , x)h0 (x)dx ≤ δ,
0
therefore the RHS of the inequality is reduced to
we obtain the result that n1 N̄(t) is bounded in probability, i.e.
1
lim lim P { N̄(t) > c} = 0.
c→∞ n→∞
n
Now consider the first term in (4.1.7). It has been shown that
sup
P
||Vn (θ, x) − v(θ, x)|| −→ 0.
x∈[0,τ ],θ∈Θ
If θ̂n is a consistent estimator of θ0 , θ̂n ∈ Θ except on an event with probability tending to
zero for sufficiently large n. Hence
P
sup ||Vn (θ̂n , x) − v(θ̂n , x)|| −→ 0
x∈[0,τ ]
when n → ∞.
This together with the boundedness in probability of
1
N̄ (t)
n
lead to
convergence in probability of the first term in (4.1.7).
From (IV) of Proposition 1, functions s(j) (θ, x), j = 0, 1, 2 are bounded on Θ × [0, τ ], the
function families s(j) (., x), j = 0, 1, 2, x ∈ [0, τ ] are equicontinuous at θ = θ0 and s(0) (θ, x)
47
is bounded away from zero on Θ × [0, τ ]. These can be used to show v(·, x), x ∈ [0, τ ] is a
family of equicontinuous functions at θ = θ0 . Indeed when ||θm − θ0 || → 0, consider
sup ||v(θm , x) − v(θ0 , x)||
x∈[0,τ ]
⊗2
(1)
s (θm , x)
s(2) (θm , x)
−
= sup || (0)
s(0) (θm , x)
x∈[0,τ ] s (θm , x)
⊗2
(1)
s(2) (θ0 , x)
s (θ0 , x)
− (0)
||
+
s (θ0 , x)
s(0) (θ0 , x)
s(2) (θm , x) s(2) (θ0 , x)
≤ sup || (0)
− (0)
||
s (θ0 , x)
x∈[0,τ ] s (θm , x)
⊗2 (1)
⊗2
(1)
s (θ0 , x)
s (θm , x)
−
||.
+ sup || (0)
s (θm , x)
s(0) (θ0 , x)
x∈[0,τ ]
Similar to the argument used in proving
P
sup ||Vn (θ̂n , x) − v(θ̂n , x)|| −→ 0
x∈[0,τ ]
the boundedness of s(j) (θ, x), j = 0, 1, 2, bounded away from zero of s(0) (θ, x) and equicontinuity of s(j) (·, x), x ∈ [0, τ ], j = 0, 1, 2 at θ = θ0 then imply v(·, x), x ∈ [0, τ ] is a family of
equicontinuous functions at θ = θ0 , that is,
sup ||v(θm , x) − v(θ0 , x)|| → 0
x∈[0,τ ]
Now, the boundedness in probability result of
1
N̄(t)
n
and the equicontinuity of v(·, x), x ∈
[0, τ ] guarantees the second term in (4.1.7) converges to zero in probability if θ̂n is a consistent
estimator of θ0 .
Because of results (I), (II) and (IV) in Proposition 1, the fourth term of (4.1.7) converges
to zero in probability. The convergence of the third term uses the second part of Lemma 2.
Rt
Consider one element, the (j, k)th entry, in the matrix 0 v(θ0 , x) n1 dN̄(x) − dĀ(x) :
Z t
1
P { sup |
v(θ0 , x)j,k dM̄ (x)| ≥ ρ}
n
y∈[0,t]
0
Z t
δ
1
≤ 2 + P { v(θ0 , x)2j,k 2 dĀ(x) ≥ δ}
ρ
n
0
Z t
n
X
δ
2 1
= 2 + P { v(θ0 , x)j,k 2
Yi (x) exp(gθ0 (Zi ))h0 (x)dx ≥ δ}
ρ
n i=1
0
Z
1 t
δ
v(θ0 , x)2j,k Sn(0) (θ0 , x)h0 (x)dx ≥ δ}.
= 2 + P{
ρ
n 0
48
By the boundedness condition, the second term in the last member of the inequality vanishes
when n is large enough. Because δ, ρ are arbitrary, we choose δ = ρ3 . Hence the above
probability is bounded by δ so that the third term goes to zero in probability. Thus
n
1X
sup ||
t∈[0,τ ] n i=1
Z
t
P
Vn (θ̂n , x)dNi (x) − Σ(θ0 , t)|| −→ 0
0
is shown. The derivative of the score process w.r.t. θ is a matrix given by
!⊗2
Pn
n Z ∞h
X
∂ 2 l(θ)
j=1 Yj (t)g˙θ (Zj ) exp(gθ (Zj ))
Pn
=
g¨θ (Zi ) +
∂θ∂θ⊤
j=1 Yj (t) exp(gθ (Zj ))
i=1 0
Pn
⊗2
+ g¨θ (Zj )] i
j=1 Yj (t) exp(gθ (Zj ))[g˙θ (Zj )
Pn
dNi (t).
−
j=1 Yj (t) exp(gθ (Zj ))
Note that 1/n times the observed information matrix is defined to be
−
1 ∂ 2 l(θ)
n ∂θ∂θ⊤
which is equal to
Pn
Z th
⊗2
+ g¨θ (Zj )]
j=1 Yj (t) exp(gθ (Zj ))[g˙θ (Zj )
Pn
g¨θ (Zi ) −
0
j=1 Yj (t) exp(gθ (Zj ))
!⊗2
Pn
i
j=1 Yj (t)g˙θ (Zj ) exp(gθ (Zj ))
P
dNi (t)
+
n
j=1 Yj (t) exp(gθ (Zj ))
n Z
n Z
i
1X t
1 X th
I
=
Vn (θ, x)dNi (x) −
g¨θ (Zi ) − Eθ,x
g¨θ (ZI ) dNi (x)
n i=1 0
n i=1 0
n
1X
−
n i=1
where g¨θ is given in (4.1.2) and it is a continuous function of θ under the assumption
k 6= Zi , ∀i. To show
−
1 ∂ 2 l(θ, t)
n ∂θ∂θ⊤
evaluated at θ̂n is a consistent estimator of Σ(θ0 , t), uniform in t ∈ [0, τ ], we need to show
49
that as Θ shrinks to θ0 ,
n
1X
sup ||
t∈[0,τ ],θ∈Θ n i=1
1
≤ sup ||
t∈[0,τ ],θ∈Θ n
P
Z th
i
I
g̈θ (Zi ) − Eθ,x
g̈θ (ZI ) dNi (x)||
0
n Z t
X
i=1
0
h
g̈θ (Zi ) − g̈θ0 (Zi )
i
I
−Eθ,x
g̈θ (ZI ) + EθI0 ,x g̈θ0 (ZI ) dNi (x)||
n Z
i
1 X th
I
g̈θ0 (Zi ) − Eθ0 ,x g̈θ0 (ZI ) dNi (x)||
+ sup ||
t∈[0,τ ] n i=1 0
(4.1.8)
(4.1.9)
−→ 0
This can be done by proving each of (4.1.8) and (4.1.9) goes to zero in probability. The proof
of (4.1.9) is quite simple. Using dN = dM − dA, it is implied by the following two limits:
n
1X
sup ||
t∈[0,τ ] n i=1
and
Z th
i
P
g̈θ0 (Zi ) − EθI0 ,x g̈θ0 (ZI ) dMi (x)|| −→ 0,
(4.1.10)
Z th
i
P
g̈θ0 (Zi ) − EθI0 ,x g̈θ0 (ZI ) dAi (x)|| −→ 0.
(4.1.11)
n
1X
sup ||
t∈[0,τ ] n i=1
0
0
Immediately (4.1.11) holds because what’s in the norm is equal to
Z thX
n
0
i=1
i
g̈θ0 (Zi)pi (θ0 , x) − EθI0 ,x (g̈θ0 (ZI )) Sn(0) (θ0 , x)h0 (x)dx = 0.
Apply Lemma 3 to every entry of (4.1.10), we can show
n Z
i
1 X th
η
I
P { sup ||
g̈θ0 (Zi ) − Eθ0 ,x θ0 , xg̈θ0 (ZI ) dMi (x)||2 ≥ ǫ} ≤
ǫ
j,k
t∈[0,τ ] n i=1 0
Z
n
1
1X τ
[g̈θ0 (Zi ) − EθI0 ,x g̈θ0 (ZI )]2j,k Yi (x) exp(gθ0 (Zi ))h0 (x)dx ≥ η}
+P {
n i=1 0
n
Z τ
η
1
≤ + P { B2
Sn(0) (θ0 , x)h0 (x)dx ≥ η},
ǫ
n
0
h
i2
on Θ × [0, τ ]. Then results (I), (II) and
where B is the bound of g̈θ0 (Zi ) − EθI0 ,x g̈θ0 (ZI )
j,k
boundedness in (IV) guarantee
1
P { B2
n
Z
τ
0
Sn(0) (θ0 , x)h0 (x)dx ≥ η}
50
is zero when n is large enough for any η > 0. Therefore by taking η = ǫ2 , (4.1.10) is proved.
Observe that g̈· (Zi ) and g· (Zi), Z ∈ Z are both equicontinuous functions of θ at θ0 , where
Z is the sample space of all Zi , because Z is bounded and there is a neighborhood of the
true knot k0 such that in the neighborhood there is no Zi . This fact is used below. Break
(4.1.8) into the following two parts:
n
1X
sup ||
t∈[0,τ ],θ∈Θ n i=1
Z th
i
g̈θ (Zi ) − g̈θ0 (Zi ) dNi (x)||,
(4.1.12)
0
and
sup
t∈[0,τ ],θ∈Θ
Z th
i
I
||
Eθ,x
g̈θ (ZI ) − EθI0 ,x g̈θ0 (ZI ) dN̄ (x)/n||.
(4.1.13)
0
For (4.1.12), we have
n
1X
sup ||
t∈[0,τ ],θ∈Θ n i=1
≤
1
n
n
X
Z th
i
g̈θ (Zi) − g̈θ0 (Zi ) dNi (x)||
0
sup ||g̈θ (Zi ) − g̈θ0 (Zi )||Ni (τ )
i=1 θ∈Θ
Since g̈· (Z) is an equicontinuous family of functions of θ at θ0 with respect to Z, it follows
that as Θ shrinks to θ0 , supθ∈Θ ||g̈θ (Zi) − g̈θ0 (Zi )|| is negligible. Together with the fact that
1
N̄(τ )
n
is bounded in probability, it follows (4.1.12) converges to zero in probability as Θ
shrinks to θ0 and n tends to infinity.
To deal with (4.1.13), let
n
1X
Tn (θ, t) ≡
g¨θ (Zi )Yi (t) exp(gθ (Zi )).
n i=1
It can be seen, from the definitions of g̈θ (Z), Z ∈ Z and exp(gθ (Z)), Z ∈ Z, that both are
equicontinuous families of functions of θ w.r.t. Z, and both are bounded on Θ × Z. These
imply
sup
x∈[0,τ ],θ∈Θ
||g̈θ (Zj ) exp(gθ (Zj )) − g̈θ0 (Zj ) exp(gθ0 (Zj ))||
is negligible in probability as Θ shrinks to θ0 and n tends to infinity, so that
sup
P
kTn (θ, t) − Tn (θ0 , t)k −→ 0,
t∈[0,τ ],θ∈Θ
51
(4.1.14)
Now the integrand in (4.1.13) can be broken as
=
Tn (θ, x) − Tn (θ0 , x)
(0)
Sn (θ, x)
≡ Bn (θ, x) − Cn (θ, x).
−
Tn (θ, x)
Tn (θ0 , x)
− (0)
(0)
Sn (θ, x) Sn (θ0 , x)
(0)
(0)
Tn (θ0 , x)[Sn (θ, x) − Sn (θ0 , x)]
(0)
(0)
Sn (θ0 , x)Sn (θ, x)
I
Eθ,x
g̈θ (ZI ) − EθI0 ,x g̈θ0 (ZI ) =
(0)
Since Sn (θ, x) is bounded away from zero (by 1/η > 0 say) for large n, it follows
Z t
dN̄(x)
||
sup ||
Bn (θ, x)
n
t∈[0,τ ],θ∈Θ
0
1
≤ η × sup kTn (θ, x) − Tn (θ0 , x)k × N̄(τ ).
n
x∈[0,τ ],θ∈Θ
The boundedness in probability of n1 N̄ (τ ) and (4.1.14) imply that the above is negligible in
probability as Θ shrinks to θ0 and n tends to infinity. A similar argument verifies
Z t
dN̄(x)
sup ||
Cn (θ, x)
||
n
t∈[0,τ ],θ∈Θ
0
also becomes negligible in probability as Θ shrinks to θ0 and n tends to infinity. Combining
the above shows that (4.1.13) and hence (4.1.8) is negligible in probability again as Θ shrinks
to θ0 and n tends to infinity. Thus we have shown the negligibility of both (4.1.8) and (4.1.9).
Since a consistent estimator θ̂n of θ0 can be made eventually inside the shrinking compact
neighborhood Θ of θ0 for n → ∞ on an event with probability tending to one, it follows
1 ∂ 2 l(θ, t)
−
n ∂θ∂θ⊤
evaluated at θ̂n is a consistent estimator of Σ(θ0 , t) uniform in t ∈ [0, τ ]. The proof is
complete.
4.2
Consistency Of The Maximum Partial Likelihood
Estimator
We now give the consistency of the maximum partial likelihood estimator.
Theorem 2. (Consistency) In the Cox Proportional Hazards model if the covariate structure of the main risk factor Zi is expressed as a free knot spline function gθ (Zi) with
52
Zi bounded and constant in time, P {Yi(τ ) > 0} > 0, k0 6= Zi , ∀i and Σ(θ0 , τ ) =
Rτ
v(θ0 , x)s(0) (θ0 , x)h0 (x)dx is positive definite, then the maximum partial likelihood esti0
P
mator θ̂n is consistent, i.e., θ̂n −→ θ0 when n → ∞.
Proof: The proof uses Lemma 4. Let Xn (θ, t) ≡ n−1 [log P L(θ, t) − log P L(θ0 , t)]. Then
n
1X
Xn (θ, t) =
n i=1
1
−
n
Z th
i
gθ (Zi) − gθ0 (Zi ) dNi (x)
0
n Z t
X
0
i=1
h
Accordingly, define An (θ, t) as follows:
n
1X
An (θ, t) ≡
n i=1
1
−
n
Pn
i
j=1 Yj (x) exp(gθ (Zj ))
dNi (x).
log Pn
j=1 Yj (x) exp(gθ0 (Zj ))
Z th
i
gθ (Zi ) − gθ0 (Zi ) dAi (x)
0
n Z t
X
i=1
0
h
Pn
i
j=1 Yj (x) exp(gθ (Zj ))
dAi (x).
log Pn
j=1 Yj (x) exp(gθ0 (Zj ))
Then Xn (θ, t) − An (θ, t) is a martingale and
n < Xn (θ, ·) − An (θ, ·) > (t)
Z t X
n h
i2
1
=
gθ (Zi) − gθ0 (Zi) Yi (x) exp(gθ0 (Zi ))h0 (x)dx
0 n i=1
Z th
n
(0)
i
Sn (θ, x) i 1 X h
2 gθ (Zi ) − gθ0 (Zi ) dAi (x)
−
log (0)
Sn (θ0 , x) n i=1
0
Z th
(0)
Sn (θ, x) i2 (0)
+
log (0)
Sn (θ0 , x)h0 (x)dx.
Sn (θ0 , x)
0
(4.2.1)
Where
h
i2
gθ (Zi ) − gθ0 (Zi )
h
i2
= (β1 − β10 )Zi + (β2 − β20 )Zi2 + β3 (Zi − k)2+ − β30 (Zi − k0 )2+
with θ = [β1 , β2 , β3 , k]⊤ , θ0 = [β10 , β20 , β30 , k0]⊤ , (Zi − k)2+ = (Zi − k)2 1{Zi >k} and
(Zi − k0 )2+ = (Zi − k0 )2 1{Zi >k0 } .
Notice that
g˙θ =
∂gθ
= [Z, Z 2 , (Z − k)2 1{z>k}, −2β3 (Z − k)1{z>k} ]⊤ ,
∂θ
53
and
n
Sn(1) (θ, x) =
1X
Yi (x)g˙θ (Zi ) exp(gθ (Zi ))
n i=1
n
Sn(2) (θ, x) =
hence
1
n
Pn h
i=1
1X
Yi(x)g˙θ (Zi )⊗2 exp(gθ (Zi )),
n i=1
i2
gθ (Zi )−gθ0 (Zi) Yi (x) exp(gθ0 (Zi )) in the first term of (4.2.1) can be expressed
(2)
(2)
as a combination of entries in Sn (θ, x), therefore the fact that supx∈[0,τ ],θ∈Θ ||Sn (θ, x) −
i2
Pn h
P
1
(2)
s (θ, x)|| −→ 0 indicates the convergence of n i=1 gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi ))
and in fact
n
i2
1 Xh
gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi ))
sup ||
x∈[0,τ ],θ∈Θ n i=1
h
i2
P
− E gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))|| −→ 0.
The uniform convergence in probability of the first term in (4.2.1) is established.
i
Pn h
1
Similarly, n i=1 2 gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi )) in the second term of (4.2.1)
(1)
(1)
can be expressed as a combination of entries in Sn (θ, x), and supx∈[0,τ ],θ∈Θ ||Sn (θ, x) −
P
s(1) (θ, x)|| −→ 0 implies
n
i
1X h
2 gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi ))
x∈[0,τ ],θ∈Θ n i=1
h
i
P
− 2E gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))|| −→ 0.
sup
||
In view of the convergence of log
(0)
Sn (θ,x)
(0)
Sn (θ0 ,x)
(0)
, the uniform convergence in
to log ss(0) (θ(θ,x)
0 ,x)
probability of the second term in (4.2.1) is verified.
Uniform convergence in probability of the third term in (4.2.1) is confirmed by considering
(0)
Rt
(0)
n (θ,x)
the convergence of log S(0)
, Sn (θ0 , x) and the condition that 0 h0 (x)dx < ∞.
Sn (θ0 ,x)
Therefore, it has been shown n < Xn (θ, ·) − An (θ, ·) > (t) has a finite limit and (2) of
Lemma 2 implies
P
Xn (θ, t) − An (θ, t) −→ 0,
n → ∞.
Next consider An (θ, τ ),
n Z
i
1X τh
gθ (Zi ) − gθ0 (Zi ) Yi(x) exp(gθ0 (Zi ))h0 (x)dx
An (θ, τ ) =
n i=1 0
Z τh
(0)
Sn (θ, x) i (0)
−
log (0)
Sn (θ0 , x)dx.
0
Sn (θ0 , x)
54
By the SLLN, boundedness of Zi , compactness of Θ and
Rτ
0
h0 (x)dx < ∞, we have
n Z
i
1X τh
gθ (Zi ) − gθ0 (Zi ) Yi (x) exp(gθ0 (Zi ))h0 (x)dx
n i=1 0
Z τh
i
→ E
gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))h0 (x)dx
0
almost surely. Results (II), boundedness of (IV) and bounded away from zero of s(0) (θ, x) in
Proposition 1 lead to the following convergence in probability
Z
τ
0
h
(0)
Sn (θ, x) i (0)
P
Sn (θ0 , x)dx −→
log (0)
Sn (θ0 , x)
Z
0
τ
h
s(0) (θ, x) i (0)
s (θ0 , x)dx.
log (0)
s (θ0 , x)
Hence,
P
An (θ, τ ) −→ E
−
Z
τ
0
Z
τ
Z
τ
0
h
i
gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))h0 (x)dx
h
log
s(0) (θ, x) i (0)
s (θ0 , x)dx
s(0) (θ0 , x)
and
P
Xn (θ, τ ) −→ E
−
Z
0
0
τ
h
i
gθ (Z) − gθ0 (Z) Y (x) exp(gθ0 (Z))h0 (x)dx
h
log
s(0) (θ, x) i (0)
s (θ0 , x)dx,
s(0) (θ0 , x)
and the common limit is denoted by A(θ, τ ).
We already derived the expression of
∂ 2 l(θ,τ )
,
∂θ∂θ⊤
therefore
n Z
∂ 2 Xn (θ, τ )
1 ∂ 2 l(θ, τ )
1X τ
Vn (θ, x)dNi (x)
=
=−
∂θ∂θ⊤
n ∂θ∂θ⊤
n i=1 0
n Z
i
1X τh
I
+
g¨θ (Zi) − Eθ,x
g¨θ (ZI ) dNi (x).
n i=1 0
It has been shown the second term of the last line converges to zero in probability for all
θ ∈ Θ when n → ∞ and Θ shrinks to θ0 . Vn (θ, x) is the urn model variance-covariance
P Rτ
matrix of g˙θ (ZI ) at time point x, thus it is positive definite and − n1 ni=1 0 Vn (θ, x)dNi (x)
is negative definite. Due to the assumption k0 6= Zi , ∀i, we know
∂ 2 Xn (θ,τ )
∂θ∂θ⊤
is a continuous
function in θ. All these together imply Xn (θ, τ ) is a concave function of θ when n is large
and θ ∈ Θ with samll compact neighborhood Θ of θ0 .
55
Next we will show A(θ, τ ) has a unique maximum at θ = θ0 . The boundedness of
Zi , compactness of Θ and result (III) in Proposition 1 imply we can change the order of
differentiation and taking expectation due to Theorem 16.8 [37], and we have
Z τ
∂A(θ, τ )
=
E g˙θ (Z)Y (x) exp(gθ0 (Z))h0 (x)dx
∂θ
0
Z τ (1)
s (θ, x) (0)
−
s (θ0 , x)h0 (x)dx,
(0)
0 s (θ, x)
and
∂A(θ, τ )
|θ=θ0 = 0.
∂θ
By boundedness of Zi, compactness of Θ and result (III) in Proposition 1 Theorem 16.8 [37]
is used again and
∂ 2 A(θ, τ )
=
∂θ∂θ⊤
Z
τ
E g¨θ (Z)Y (x) exp(gθ0 (Z))h0 (x)dx
0
−
−
τ
Z
E
Z0 τ
g¨θ (Z)Y (x) exp(gθ0 (Z)) (0)
s (θ0 , x)h0 (x)dx
s(0) (θ, x)
v(θ, x)s(0) (θ0 , x)h0 (x)dx.
0
All integrands in the above expression are continuous at θ = θ0 . Results (I), boundedness
and bounded away from zero in result (IV) of Proposition 1 show in the neighborhood Θ all
integrands are bounded by integrable functions. Then Theorem 16.8 [37] applies,
∂ 2 A(θ,τ )
∂θ∂θ⊤
is
a continuous function of θ at θ0 and
∂ 2 A(θ, τ )
→ −
∂θ∂θ⊤
Z
τ
v(θ0 , x)s(0) (θ0 , x)h0 (x)dx,
0
a negative definite matrix, when θ → θ0 . It is verified in the neighborhood Θ,
∂ 2 A(θ,τ )
∂θ∂θ⊤
unique maximum at θ = θ0 .
Now applying Lemma 4 shows if θ̂n is the MPLE of θ0 , then
P
θ̂n −→ θ0
This completes the proof.
4.3
Asymptotic Normality Of The MPLE
We now prove the asymptotic Normality of the MPLE.
56
has a
Theorem 3. (Asymptotic Normality of MPLE) Σ(θ0 , t) is defined as before. Then
n1/2 (θ̂n − θ0 ) =⇒ N 0, Σ−1 (θ0 , τ )
Proof: Expand U(θ̂n , τ ) at θ0 using Taylor’s series. We obtain
U(θ̂n , τ ) = U(θ0 , τ ) +
∂ 2 l(θ, τ )
|θ=θ∗ (θ̂n − θ0 )
∂θ∂θ⊤
where θ∗ is on a line segment between θ̂n and θ0 . Because θ̂n is the MPLE of θ0 , U(θ̂n , τ ) = 0
and
1/2
n
−1 ∂
h
(θ̂n − θ0 ) = − n
We already showed
−n−1
2
i−1 h
i
l(θ, τ )
−1/2
|θ=θ∗
× n
U(θ0 , τ ) .
∂θ∂θ⊤
∂ 2 l(θ, τ )
P
|θ=θ∗ −→ Σ(θ0 , τ ),
∂θ∂θ⊤
n−1/2 U(θ0 , τ ) =⇒ N(0, Σ(θ0 , τ )).
These and the Slutsky’s Theorem together imply
n1/2 (θ̂n − θ0 ) =⇒ N 0, Σ−1 (θ0 , τ ) .
4.4
Lemmas
In this section, we collect several lemmas.
(0)
Lemma 1. supx∈[0,τ ] ||Sn (θ, x) − s(0) (θ, x)|| → 0 almost surely when n → ∞.
(0)
Proof: Both Sn (θ, x) and s(0) (θ, x) are bounded left-continuous non-increasing functions
of x. Hence s(0) (θ, x) has at most countably many jumps on [0, τ ]. On [0, τ ], let Q denote
the set of rational numbers and J the set of jumps of s(0) (θ, x). Then for each x in Q,
(0)
Sn (θ, x) → s(0) (θ, x) on a set of probability one by the Strong Law of Large Numbers. Due
(0)
to countability of Q, there is a set of probability one on which Sn (θ, x) → s(0) (θ, x) for all
x ∈ Q. On the other hand, there exists a set of probability one such that for all x in J,
(0)
(0)
Sn (θ, x+ ) −Sn (θ, x− ) → s(0) (θ, x+ ) −s(0) (θ, x− ). The intersection of the above two sets has
(0)
probability one and we will prove supx∈[0,τ ] ||Sn (θ, x) − s(0) (θ, x)|| → 0 on the intersection
set by contradiction.
57
Suppose there is a fixed ǫ > 0, we can find a sequence of indices nk and a sequence
(0)
xk ∈ [0, τ ] which satisfy ||Snk (θ, xk ) − s(0) (θ, xk )|| ≥ ǫ for all k. Since τ can not take infinity,
the sequence xk is bounded and has a convergent subsequence that is, for simplicity, denoted
by xk → x when k → ∞. Let rational numbers r1 ∈ Q and r2 ∈ Q be such that r1 < x < r2 .
When k is large enough we have the following four cases:
(0)
(0)
(0)
1. xk ↑ x, xk ≤ x: ǫ ≤ Snk (θ, xk ) − s(0) (θ, xk ) ≤ Snk (θ, r1 ) − s(0) (θ, x) = Snk (θ, r1 ) −
s(0) (θ, r1 ) + s(0) (θ, r1 ) − s(0) (θ, x).
(0)
(0)
2. xk ↑ x, xk ≤ x: ǫ ≤ s(0) (θ, xk ) − Snk (θ, xk ) ≤ s(0) (θ, r1 ) − Snk (θ, x) ≤ s(0) (θ, r1 ) −
(0)
(0)
(0)
s(0) (θ, r2 ) + s(0) (θ, r2 ) − Snk (θ, r2 ) + Snk (θ, x+ ) − Snk (θ, x).
(0)
(0)
(0)
3. xk ↓ x, xk > x: ǫ ≤ Snk (θ, xk ) − s(0) (θ, xk ) ≤ Snk (θ, x+ ) − s(0) (θ, r2 ) ≤ Snk (θ, x+ ) −
(0)
(0)
Snk (θ, x) + Snk (θ, r1 ) − s(0) (θ, r1 ) + s(0) (θ, r1 ) − s(0) (θ, r2 ).
(0)
(0)
4. xk ↓ x, xk > x: ǫ ≤ s(0) (θ, xk ) − Snk (θ, xk ) ≤ s(0) (θ, x+ ) − Snk (θ, r2 ) = s(0) (θ, x+ ) −
(0)
s(0) (θ, r2 ) + s(0) (θ, r2 ) − Snk (θ, r2 ).
We first let k go to infinity then let r1 and r2 tend to x, based upon the convergence results
on the intersection set all above inequalities lead to a contradiction 0 < ǫ ≤ 0. The proof is
complete.
Lemma 2. Let N be a univariate counting process with continuous compensator A, let
M = N − A, and let H be a locally bounded, predictable process. Then for all δ, ρ > 0 and
any t ≥ 0,
(1)
P {N(t) ≥ ρ} ≤
δ
+ P {A(t) ≥ δ}.
ρ
(2)
P { sup |
y∈[0,t]
Z
y
0
δ
H(x)dM(x)| ≥ ρ} ≤ 2 + P {
ρ
Z
t
H 2 (x)dA(x) ≥ δ}.
0
Proof. See Lemma 8.2.1 [38].
Lemma 3. Suppose that M is a square integrable martingale with M(0) = 0. Then for all
η, ǫ > 0,
P { sup M 2 (t) ≥ ǫ} ≤
t∈[0,τ ]
η
+ P {< M > (τ ) ≥ η}
ǫ
58
Proof. See Theorem 3.4.1 [38]
Lemma 4. Let E be an open convex subset of Rp , and let F1 , F2 , ..., be a sequence of random
concave functions on E and f a real-valued function on E such that, for all x ∈ E,
lim Fn (x) = f (x)
n→∞
in probability. Then
1. The function f is concave.
2. For all compact subsets A of E,
P
sup |Fn (x) − f (x)| −→ 0,
n → ∞.
x∈A
P
3. If Fn has a unique maximum at Xn and f has one at x, then Xn −→ x as n → ∞.
Proof. See Lemma 8.3.1 [38].
4.5
The Neighborhood Condition
In the previous sections, a neighborhood condition was added to the model. This condition
requires the existence of a neighborhood of the true knot k0 , in which the risk factor does not
fall. Such a condition is satisfied if the risk factor is discrete, not satisfied if it is continuous.
When there is no such a neighborhood, the asymptotic Normality of the MPLE is examined
through simulations.
The quadratic one free-knot spline model (2.2.6) was taken as an example. Data adopted
for checking asymptotic Normality of MPLE were simulated based on the Glostrup Female
cohort in the Diverse Populations Collaboration. The Glostrup Study is a pool of seven
observational cohorts from Glostrup, a city west of Copenhagen, Denmark. The female
cohort consists of 5061 observations with 420 deaths. Since the inverse transformation of
BMI is approximately Normal, it is easier and better to first simulate LBMI from a Normal
distribution and then transform LBMI back to BMI. The hazard function h0 (t) × exp(gθ (Z))
was modeled using the parametric model with Weibull distribution, that is, the corresponding
hazard scale can be expressed as
h(t|Z) = ptp−1 exp(β0 ) × exp(gθ (Z))
59
where p is the parameter coming from the Weibull distribution, β0 is an intercept term
serving to scale the baseline hazard. Then the survival time before the death happens can
be generated from Uniform random variable Y ∼ U[0, 1], using the relationship
h
i1/p
T = − ln(Y ) × exp − (β0 + gθ (Z))
(4.5.1)
where gθ (Z) is the spline expression, Z is BMI. For the purpose of simulation, the censoring
time was assumed to be independent of the covariate effects and was generated similarly
using formula
h
i1/q
U = − ln(Y ) × exp(−δ0 )
.
(4.5.2)
where δ0 and q were both estimated when censoring times were treated as if they were the
real death times and the death times were censoring times. The observed follow-up time was
taken to be X = min {T, U}.
For (4.5.1), the mean LBMI used is 4.24 × 10−2 and the standard deviation is 6.80 × 10−3 .
Other parameters are p = 1.34, β0 = −6.14, β1 = −6.60 × 10−1 , β2 = 1.46 × 10−2 ,
β3 = −2.09 × 10−2 , k = 27.65. For (4.5.2), δ0 = −19.62 and q = 2.38. These values
are pre-estimated MPLEs of parameters.
When BMI was generated, values falling in certain range were dropped to create a
neighborhood containing no BMI. For each neighborhood width 1000 simulated samples
of size 20000 were used. And the neighborhood width gradually shrank to 0. This allows
us to compare the asymptotic behavior of the estimated parameters with and without the
neighborhood condition. The MPLEs were obtained adopting the likelihood ratio method.
The gap adopted in Figure 4.1 is 27 − 28. Subfigure 4.1(a) is the smoothed density curve
of the coefficient β1 in model (2.2.6). Subfigure 4.1(b) is the corresponding qq-norm plot.
Subfigures 4.1(c) and 4.1(d) are for β2 . β3 ’s smoothed density curve is 4.1(e) and qq-norm
plot is 4.1(f). The two graphs 4.1(g) and 4.1(h) in the last row of Figure 4.1 are the density
curve and qq-norm plot of knot c1 in model (2.2.6), respectively. Graphs in Figures 4.2,
4.3 and 4.4 are arranged in the same manner, except that Figure 4.2 was generated using
a gap of 27.3 − 27.8, Figure 4.3’s gap is 27.5 − 27.7 and the last one, Figure 4.4, does not
have any gap. These graphs show that when the width of the neighborhood shrinks to 0 the
asymptotic behavior of the estimated MPLEs remain stable. The smoothed density curves
are similar when the neighborhood width changes, all curves are roughly centered at their
true parameter values. The qq-norm plots are straight lines. It is very interesting that when
60
the sample size was taken to be 5000 the smoothed density curves and qq-norm plots did
not show Normality and qq-norm plots were not straight lines (pictures not shown), as the
sample size increased to 20000 both the density curves and the qq-norm plots became much
better. This suggests that the asymptotic results may not apply very quickly. The effect of
sample size on nadir estimation will be examined in the simulation part.
61
Quadratic Spline with One Free Knot: 27−−−28
0.0
−1.2
0.5
−1.0
−0.8
Sample Quantiles
1.5
1.0
Density
−0.6
2.0
−0.4
2.5
Quadratic Spline with One Free Knot: 27−−−28
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
−3
−2
bmi
Smoothed Density
−1
0
1
2
3
Theoretical Quantiles
Normal Q−Q Plot
(a)
(b)
Quadratic Spline with One Free Knot: 27−−−28
0.020
Sample Quantiles
0
0.010
0.015
60
20
40
Density
80
100
0.025
120
Quadratic Spline with One Free Knot: 27−−−28
0.005
0.010
0.015
0.020
0.025
0.030
−3
−2
bmisq
Smoothed Density
−1
0
1
2
3
Theoretical Quantiles
Normal Q−Q Plot
(c)
(d)
Quadratic Spline with One Free Knot: 27−−−28
−0.020
−0.025
Sample Quantiles
60
0
−0.035
20
−0.030
40
Density
80
100
−0.015
120
Quadratic Spline with One Free Knot: 27−−−28
−0.035
−0.030
−0.025
−0.020
−0.015
−3
−2
right
Smoothed Density
−1
(e)
1
2
3
(f)
Quadratic Spline with One Free Knot: 27−−−28
28
0.00
26
0.05
0.10
0.15
Sample Quantiles
30
0.20
0.25
32
Quadratic Spline with One Free Knot: 27−−−28
Density
0
Theoretical Quantiles
Normal Q−Q Plot
24
26
28
30
32
−3
−2
knot
Smoothed Density
−1
0
1
Theoretical Quantiles
Normal Q−Q Plot
(g)
(h)
Figure 4.1: Gap 1: 27—28
62
2
3
Quadratic Spline with One Free Knot: 27.3−−−27.8
0.0
−0.8
−1.6
0.5
−1.4
−1.2
−1.0
Sample Quantiles
1.5
1.0
Density
2.0
−0.6
2.5
−0.4
3.0
Quadratic Spline with One Free Knot: 27.3−−−27.8
−1.5
−1.0
−0.5
−3
−2
bmi
Smoothed Density
−1
(a)
1
2
3
(b)
Quadratic Spline with One Free Knot: 27.3−−−27.8
0.025
0.020
0
0.010
20
0.015
40
60
80
Sample Quantiles
0.030
100
120
0.035
Quadratic Spline with One Free Knot: 27.3−−−27.8
Density
0
Theoretical Quantiles
Normal Q−Q Plot
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
−3
−2
bmisq
Smoothed Density
−1
(c)
2
3
0
−0.025
−0.040
20
−0.035
−0.030
60
80
Sample Quantiles
−0.020
100
−0.015
120
Quadratic Spline with One Free Knot: 27.3−−−27.8
40
Density
1
(d)
Quadratic Spline with One Free Knot: 27.3−−−27.8
−0.040
−0.035
−0.030
−0.025
−0.020
−0.015
−0.010
−3
−2
right
Smoothed Density
−1
0
1
2
3
Theoretical Quantiles
Normal Q−Q Plot
(e)
(f)
Quadratic Spline with One Free Knot: 27.3−−−27.8
28
0.00
24
0.05
0.10
26
0.15
Sample Quantiles
0.20
30
0.25
0.30
32
Quadratic Spline with One Free Knot: 27.3−−−27.8
Density
0
Theoretical Quantiles
Normal Q−Q Plot
22
24
26
28
30
32
−3
−2
knot
Smoothed Density
−1
0
1
Theoretical Quantiles
Normal Q−Q Plot
(g)
(h)
Figure 4.2: Gap 2: 27.3—27.8
63
2
3
Quadratic Spline with One Free Knot: 27.5−−−27.7
−0.8
0.0
−1.4
0.5
−1.2
−1.0
Sample Quantiles
1.5
1.0
Density
2.0
−0.6
2.5
−0.4
Quadratic Spline with One Free Knot: 27.5−−−27.7
−1.6
−1.4
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
−3
−2
bmi
Smoothed Density
−1
0
1
2
3
Theoretical Quantiles
Normal Q−Q Plot
(a)
(b)
Quadratic Spline with One Free Knot: 27.5−−−27.7
0.025
0
0.010
20
0.015
0.020
Sample Quantiles
60
40
Density
80
100
0.030
120
0.035
Quadratic Spline with One Free Knot: 27.5−−−27.7
0.005
0.010
0.015
0.020
0.025
0.030
0.035
−3
−2
bmisq
Smoothed Density
−1
(c)
1
2
3
(d)
Quadratic Spline with One Free Knot: 27.5−−−27.7
−0.025
Sample Quantiles
60
40
0
−0.035
20
−0.030
80
−0.020
100
120
−0.015
Quadratic Spline with One Free Knot: 27.5−−−27.7
Density
0
Theoretical Quantiles
Normal Q−Q Plot
−0.040
−0.035
−0.030
−0.025
−0.020
−0.015
−3
−2
right
Smoothed Density
−1
0
1
2
3
Theoretical Quantiles
Normal Q−Q Plot
(e)
(f)
Quadratic Spline with One Free Knot: 27.5−−−27.7
26
28
Sample Quantiles
0.15
0.10
24
0.05
0.00
Density
0.20
30
0.25
32
0.30
Quadratic Spline with One Free Knot: 27.5−−−27.7
24
26
28
30
32
−3
−2
knot
Smoothed Density
−1
0
1
Theoretical Quantiles
Normal Q−Q Plot
(g)
(h)
Figure 4.3: Gap 3: 27.5—27.7
64
2
3
Quadratic Spline with One Free Knot: no gap
−0.8
Sample Quantiles
0.0
−1.4
0.5
−1.2
−1.0
1.5
1.0
Density
2.0
−0.6
2.5
−0.4
3.0
Quadratic Spline with One Free Knot: no gap
−1.4
−1.2
−1.0
−0.8
−0.6
−0.4
−0.2
−3
−2
bmi
Smoothed Density
−1
0
1
2
3
Theoretical Quantiles
Normal Q−Q Plot
(a)
(b)
Quadratic Spline with One Free Knot: no gap
0.020
0.015
0
0.010
20
40
Density
60
80
Sample Quantiles
0.025
100
120
0.030
140
Quadratic Spline with One Free Knot: no gap
0.005
0.010
0.015
0.020
0.025
0.030
0.035
−3
−2
bmisq
Smoothed Density
−1
0
1
2
3
Theoretical Quantiles
Normal Q−Q Plot
(c)
(d)
Quadratic Spline with One Free Knot: no gap
−0.025
−0.030
0
−0.035
20
40
Density
60
80
Sample Quantiles
−0.020
100
120
−0.015
140
Quadratic Spline with One Free Knot: no gap
−0.035
−0.030
−0.025
−0.020
−0.015
−0.010
−3
−2
right
Smoothed Density
−1
(e)
1
2
3
(f)
Quadratic Spline with One Free Knot: no gap
0.00
28
24
0.05
26
0.10
0.15
Sample Quantiles
0.20
30
0.25
0.30
32
Quadratic Spline with One Free Knot: no gap
Density
0
Theoretical Quantiles
Normal Q−Q Plot
24
26
28
30
32
−3
knot
Smoothed Density
−2
−1
0
1
Theoretical Quantiles
Normal Q−Q Plot
(g)
(h)
Figure 4.4: No Gap
65
2
3
CHAPTER 5
SIMULATION STUDIES
Now we have proposed the free-knot spline method, which can be used as a nadir estimation
tool when the data is quadratic-looking or J-shape-looking. Then is the proposed method
better than existing ones? We will compare the performance of the new method with that
of the quadratic method, transformation method, fractional polynomials and change point
method using simulations in this chapter. The comparisons will include nadir estimation
ability and the goodness of fit. We observed in the Norwegian Counties Study that in
presence of extreme values the quadratic and change point methods generated unrealistic
nadir estimators as well as bad confidence intervals, when all non-monotonicity detection
tests agreed on the existence of a nadir and the transformation method as well as the freeknot spline method detected the nadir. Then which methods are generally more sensitive to
extreme values and which are more robust? Another problem that was observed is sample
size. The asymptotic results of all the methods apply when the sample size is large enough.
But how large is large? Will these methods produce reasonably good nadir estimates and
confidence intervals when the sample size is moderate? We will examine the effects of extreme
values and sample size on nadir estimation and compare the performance of both the new
and existing methods under different conditions. Next, the model comparison criterion is
introduced first.
5.1
A Goodness Of Fit Test For Survival Models
In 1996 Grønnesby and Borgan proposed an overall goodness of fit test based on martingale
residuals for the Cox Proportional Hazards model [39]. May and Hosmer showed in 1998
that the proposed method is “algebraically identical to one obtained from adding group
indicator variables to the model and testing the hypothesis the coefficients of the group
66
indicator variables are zero via the score test”[40]. Around the same period, in January
1996, independently of the previous authors, Parzen and Lipsitz submitted their paper in
which they defined the same goodness of fit test and compared two non-nested Cox models
via the proposed test [41].
The test is based on the notion of partitioning the subjects into groups according to the
covariate values. In the Cox regression model the hazard for subject i at a given time t with
covariate vector z is
h(t|z i ⊤ ) = h0 (t) × exp(z i ⊤ β).
(5.1.1)
To form the goodness of fit statistic one first partitions the subjects, based on the percentiles
of the estimated risk, ψ̂i = exp(z i ⊤ β̂), into 10 regions, with the first region containing the
lowest ten percent estimated risks and the 10th region containing the highest ten percent.
Given this partition, G − 1 group indicators Iig are defined as
(
1 if ψ̂i is in region g
Iig =
0 if otherwise
and alternative Cox model
"
h(t|z i ⊤ ) = h0 (t) × exp z i ⊤ β +
G−1
X
g=1
Iig γg
#
(5.1.2)
is considered. If model (5.1.1) is correct then γ1 = γ2 = ... = γg = 0. The null hypothesis
H0 : γ1 = γ2 = ... = γg = 0 can be tested using a likelihood ratio, Wald or score statistic. If
the model under consideration has been correctly specified then each of these statistics has a
chi-square distribution with G − 1 degrees of freedom. While comparing different non-nested
models the model with a smaller chi-square statistic is the better model [41]. After every
method is applied to a simulated sample the goodness of fit test statistic is calculated and
the χ29 values are compared.
5.2
Transformation Model
In this section the true underlying non-monotonic relationship between prognostic index and
BMI is assumed to be quadratic in LBMI. In other words this situation represents cases where
a good Normal transformation of the main risk factor exists and it is appropriate to apply
the transformation method. Under such assumptions we would expect the transformation
method to perform very well and we would like to see how other methods behave.
67
The cohort selected to generate the simulated data is the First National Health and
Nutrition Examination Survey Epidemiologic Follow-up Study (NHANES I). The white
male cohort was used. This cohort was also used in [18] to compare the performance of
the transformation and the change point methods. The NHANES I Epidemiologic Followup Study tracks morbidity and mortality for 14, 407 individuals, initially aged 25 − 74,
who received complete medical examinations during the NHANES I survey conducted from
1971 − 1975. Follow-up surveys were conducted from 1982 − 1984, again in 1986 (for those
age 55 or older at baseline), and again in 1987 and 1992 (From the Florida State University
Diverse Populations Collaboration website biostat.stat.fsu.edu). The white male cohort
consists of 4623 participants where 1900 are deaths by the end of the study. The average
follow-up is 5615 days with a minimum of 16 days and a maximum of 7943 days. The mean
BMI of this group is 25.7, and the BMI values range from 13.0 to 52.6.
The histogram of LBMI is in Figure 5.1, we can see it is approximately Normal. The
transformation method within a Weibull parametric survival model was first used to fit
survival time. Censoring was assumed to be independent of covariate effects hence a null
Weibull model was used to describe the time before a censoring happens. Then LBMI
values were simulated according to its mean and standard deviation, adopting the Normal
distribution. Next the survival times were simulated using the Weibull distribution and
expression (4.5.1). In other words they were simulated according to
h
T = − ln(Y ) × exp
i p1
− (β0 + gθ (Z))
,
where Y is the uniform [0, 1] random variable, z is a simulated LBMI value and
gθ (Z) = β1 z + β2 z 2
Parameters β0 , β1 , β2 and p were estimated from the Weibull model and are given in Table 5.1.
Similarly the censoring times were simulated using formula (4.5.2), that is,
h
U = − ln(Y ) × exp(−δ0 )
i 1q
.
Where no covariate effects was included and parameter δ0 as well as q are given in Table 5.1.
The assumed underlying prognostic index curve is given in Figure 5.2, where the curve has
been vertically adjusted so that the prognostic index is zero when BMI is taken to be the
68
nadir. The simulated survival times were compared with the generated censoring times.
A death was simulated If the survival time was smaller than the corresponding censoring
time, and the follow-up time was taken to be the shorter one. After data generation the five
methods were applied to the data under the Cox Proportional Hazards model.
0
20
Density
40
60
80
cohort 56, NHANES I White Male
.02
.04
.06
.08
LBMI
Figure 5.1: LBMI Histogram With Normal Density Curve, NHANES I White Male
Table 5.1: Simulation Parameters, NHANES I White Male
Simulation Parameters
Mean (LBMI)
Standard Deviation (LBMI)
β0
β1
β2
p
δ0
q
Values
3.98 × 10−2
6.15 × 10−3
−9.08
−168.08
2097.07
1.32
−132.16
14.88
First, the BMI values are restricted to 15 − 50. This represents a typical range of BMI in
the NHANES I White Male cohort. The five methods: quadratic, transformation, fractional
69
0
adjusted prognostic index
.2
.4
.6
.8
cohort 56, NHANES I White Male
10
20
30
BMI
40
50
Figure 5.2: Assumed Underlying Curve, NHANES I White Male
polynomial, change point and free-knot spline are applied to each of the 500 simulated
samples of 5000 observations. Results are give in Table 5.2. We hope the estimated nadir
values can target at the true nadir. Therefore column “Nadir Mean” in the table is generated
to reflect the central tendencies of the methods by averaging the 500 nadir values. Column
“Nadir MSE” in the table is the mean squared error of estimated nadirs. It represents
the average squared distance between the true nadir 25.0 and the estimated nadirs. MSE
in the column title is the abbreviation of mean squared error. The “95% C.I. Length”
contains averaged lengths of estimated 95% confidence intervals. Column “95% C.I. Coverage
Probability” consists of the proportions of 95% confidence intervals that cover the true nadir.
A good estimation method should have a nadir mean that is approximately 25.0, a low nadir
MSE, a coverage probability that is close to 95% and the length of the confidence interval is
as short as possible.
We see from Table 5.2 that the transformation nadir estimates target at exactly 25.0,
the means of the fractional polynomial and free-knot spline nadirs are also very close to the
true value. The quadratic method is over-estimating and the change point method is underestimating. The “Nadir MSE” column shows the nadirs produced by the transformation
method not only center at the correct true nadir but also stay very close to the true nadir.
70
Table 5.2: Simulation Results 15 − 50, NHANES I White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
26.9
25.0
25.1
23.4
24.9
Nadir 95% C.I.
MSE
Length
4.95
4.04
0.32
2.11
0.64
2.13
4.08
4.57
1.76
5.53
95% C.I. Coverage
Probability
0.45
0.96
0.83
0.73
0.95
Hence The best point estimator is given by the transformation, the fractional polynomial
and free-knot spline methods behave reasonably well. The worst point estimators are given
by the quadratic and the change point methods since they are biased and they are too far
from the true nadir. An interesting finding is the difference between the performance of the
change point nadirs and that of the free-knot spline nadirs. By separating the nadir and the
knot in the change point model the estimated nadirs are brought closer to the true nadir,
and the mean value of all estimated nadirs becomes closer to the true. There is a large
variation in the coverage probabilities. Only the transformation method and the free-knot
spline method generate the desired 95% probability. The fractional polynomial method, as
expected, produces confidence intervals whose coverage probability is much smaller than
95%, due to ignorance of the variation during power selection. The quadratic method gives
an extremely low coverage probability of 45% when the width of its confidence interval is
on average 4.04. Summary statistics of the quadratic method confidence limits show that
although all upper limits are higher than the true nadir only 45% of its lower limits are
lower than the nadir. By forcing the non-monotonic curve to be symmetric about the nadir,
the quadratic method pushes its minimum point to the right, therefore generates bad nadir
estimators and confidence intervals. Surprisingly, the coverage probability of the change
point method is as low as 73%. It turns out to be some of the upper limits of the confidence
intervals that are not high enough to enclose the true nadir. This indicates the change
point confidence intervals need to be widened or adjusted to the right. For this example, as
expected, the transformation method gives the best results and the free-knot spline method
performs reasonably well.
71
Goodness of fit results are shown in Table 5.3. These tables contain the percentages that
the null hypothesis of H0 : γ1 = γ2 = ... = γg = 0 is rejected in the goodness of fit test.
In other words, a percentage represents how often, in the 500 simulated samples, the fit of
the model is not good enough. According to the results, 21.8% of 500 quadratic models did
not generate good fit. The performance of transformation method and fractional polynomial
method is similar, considering the goodness of fit. The free-knot spline generated the lowest
percentage of 3.6%, indicating the test here is slightly conservative, that is maybe due to the
small simulation size of 500.
Table 5.3: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50,
NHANES I White Male
Quadratic Transformation
21.8
5.4
Fractional
Polynomial
5.0
Change
Point
7.2
Free-Knot
Spline
3.6
As the BMI range changes from 15 − 50 to 15 − 70 more extreme values are included
in the dataset, hence we could study the stability of methods to extreme values. These
results are given in Table 5.4 and Table 5.6.
The quadratic method and the change
point method are more sensitive to enclosure of extreme values; the transformation and
fractional polynomial methods are almost not affected, although the coverage probability
of the fractional polynomial method stays at around 85%. The free-knot spline method is
slightly affected.
Table 5.5 and Table 5.7 are goodness of fit results with extended BMI range. Our
conclusion stays the same that the quadratic method is the worst and the free-knot spline
gives the best fit.
5.3
Free-Knot Spline Model
In this section, we will assume the underlying relationship is given by an one-free-knot polynomial spline function. This case represents situations where no good Normal transformation
exists, we have to work with the original risk factor directly and the knot is not necessarily
equal to the nadir. Similar to the previous section, we first estimate the non-monotonic
72
Table 5.4: Simulation Results 15 − 60, NHANES I White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
26.8
25.0
25.1
23.2
24.8
Nadir 95% C.I.
MSE
Length
5.37
4.78
0.32
2.09
0.61
2.12
4.66
4.43
1.97
5.20
95% C.I. Coverage
Probability
0.55
0.95
0.84
0.69
0.93
Table 5.5: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60,
NHANES I White Male
Quadratic Transformation
34.6
5.4
Fractional
Polynomial
4.4
Change
Point
9.4
Free-Knot
Spline
4.2
Table 5.6: Simulation Results 15 − 70, NHANES I White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
26.7
25.0
25.1
23.2
24.8
Nadir 95% C.I.
MSE
Length
5.58
5.16
0.31
2.08
0.60
2.12
4.93
4.38
2.01
5.11
95% C.I. Coverage
Probability
0.59
0.95
0.85
0.67
0.92
Table 5.7: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70,
NHANES I White Male
Quadratic Transformation
37.2
5.2
Fractional
Polynomial
4.8
73
Change
Point
11.0
Free-Knot
Spline
4.0
relationship using the free-knot spline method. And the estimated parameters are saved for
future simulation. Then LBMI is simulated based on the real data so that BMI is taken to
be the inverse of simulated LBMI. The survival times are generated from the estimated freeknot polynomial spline curve, adopting simulated BMI. The dataset used for this simulation
study is the NHIS White Male cohort of 46264 observations with 4582 deaths.
The survival times are simulated according to (4.5.1), where
gθ (Z) = β1 z + β2 z 2 + β3 (z − k)2 1{z>k} .
And the censoring times are generated from (4.5.2). Simulation parameters are shown
in Table 5.8. Figure 5.3 contains the curve from which survival and censoring times are
simulated.
Table 5.8: Simulation Parameters, The NHIS White Male
Simulation Parameters
Values
Mean (LBMI)
3.96 × 10−2
Standard Deviation (LBMI) 5.64 × 10−3
β0
−3.03
β1
−0.61
β2
1.13 × 10−2
β3
−1.27 × 10−2
k
31.40
p
1.12
δ0
−55.21
q
6.96
The true nadir under the free-knot polynomial spline model is calculated to be 26.9. Table
5.9 gives the nadir estimation comparison results. The estimated nadirs and confidence
intervals are restricted to be between 20 and 30. In other words, if an estimated nadir is
too high that it is larger than 30 then 30 is taken to be the nadir and if it is lower than
20 the nadir is taken to be 20. Similarly, the confidence interval is truncated at 20 if the
lower bound exceeds 20 and truncated at 30 if the upper bound goes beyond 30. According
to the nadir mean values the most accurate estimated nadir is 26.8, given by the free-knot
polynomial spline method. The change point method and the quadratic method perform
74
0
adjusted prognostic index
.5
1
1.5
cohort 66, The NHIS White Male
10
20
30
BMI
40
50
Figure 5.3: Assumed Underlying Curve, The NHIS White Male
equally bad. Although the transformation method generates very good results when the
true model is quadratic in the Normally transformed variable, when the true model assumes
the knot and nadir are non-equal the transformation method is not the best. In terms of the
precision of nadir estimation, the fractional polynomial method produces the smallest MSE
1.06. This is not surprising since the fractional polynomial powers are the pair, selected
with replacement from the fixed set P = {−2, −1, −0.5, 0, 0.5, 1, 2, ..., max(3, m)}, that
maximizes the partial likelihood. This results in selecting the best of 44 models therefore the
fractional polynomial method closely fits the data and almost surely overfits the data. The
free-knot polynomial spline MSE is 1.32, the second to the best. The worst two again are the
quadratic and the change point methods. The last column of Table 5.9 contains observed
coverage probabilities. Again all confidence intervals are constructed to have 95% coverage
probability, however only the free-knot polynomial spline method truly generates the claimed
95%. The transformation method is 46%, the quadratic method is 67% and the fractional
polynomial is only 76%. Hence when the true model is the free-knot polynomial spline,
which does not force the knot to be equal to the nadir, in terms of both nadir estimation and
coverage probability of confidence intervals the best method is free-knot polynomial spline.
The goodness of fit comparison is given in Table 5.10. 19.6% of the time the quadratic
75
Table 5.9: Simulation Results 15 − 50, The NHIS White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
28.1
25.8
26.6
25.7
26.8
Nadir 95% C.I.
MSE
Length
2.25
3.24
1.80
2.62
1.06
2.51
3.46
5.32
1.32
5.11
95% C.I. Coverage
Probability
0.67
0.46
0.76
0.87
0.95
method did not fit data well, 14.4% of the time the null hypothesis in the goodness of fit
test involving the change point method was rejected, hence the quadratic and the change
point methods are the worst considering model fitting. The transformation and fractional
polynomial methods behave similarly in the sense that their rejection probabilities are both
close to 10.0%. The method that produced the best model fitting is the free-knot polynomial
spline with a rejection probability of 4.2%.
Table 5.10: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50,
The NHIS White Male
Quadratic Transformation
19.6
10.6
Fractional
Polynomial
9.2
Change
Point
14.4
Free-Knot
Spline
4.2
Tables 5.11 to 5.14 are comparison results based on BMI ranges 15 − 60 and 15 − 70.
By extending the BMI range from 15 − 50 to 15 − 70, extreme BMI values are included thus
the effect of these extreme values on nadir estimation and goodness of fit is examined. Our
results show the nadir estimation performance of all methods is not sensitive to the BMI
range and the model fitting is also stable.
5.4
Other J-shaped Function
We have compared the performance of the proposed method with that of the existing
methods under two assumptions. The first assumption is that there exists a good Normality
76
Table 5.11: Simulation Results 15 − 60, The NHIS White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
28.2
25.8
26.5
25.5
26.8
Nadir 95% C.I.
MSE
Length
3.10
3.70
1.79
2.58
1.14
2.53
4.09
5.30
1.21
4.56
95% C.I. Coverage
Probability
0.67
0.47
0.73
0.84
0.96
Table 5.12: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60,
The NHIS White Male
Quadratic Transformation
28.6
11.0
Fractional
Polynomial
8.6
Change
Point
16.4
Free-Knot
Spline
4.6
Table 5.13: Simulation Results 15 − 70, The NHIS White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
28.2
25.8
26.5
25.4
26.8
Nadir 95% C.I.
MSE
Length
3.33
3.85
1.78
2.58
1.15
2.54
4.28
5.33
1.20
4.48
95% C.I. Coverage
Probability
0.67
0.47
0.73
0.83
0.96
Table 5.14: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70,
The NHIS White Male
Quadratic Transformation
32.0
11.2
Fractional
Polynomial
8.4
77
Change
Point
17.6
Free-Knot
Spline
4.4
transformation.
The second one assumes there is no Normal transformation and the
underlying true relationship is given by a free-knot spline model where the knot and the
nadir are not the same. Then what happens if the underlying curve is taken to be something
different from the free-knot spline function when we have to work with the original risk
factor, say BMI in our case. Here we compare the performance of methods when some other
non-monotonic relationship is assumed to be true.
The adopted dataset is the NHIS White Male. The survival times T are simulated from
h
i p1
T = − ln(Y ) × exp − (β0 + gθ (Z))
,
where the assumed non-monotonic function is given by
h z − r iββ h
z − r1 iβγ
1
gθ (Z) = β1 × βα
1−
,
r2 − r1
r2 − r1
and the censoring times are generated using
h
U = − ln(Y ) × exp(−δ0 )
i q1
.
Simulation parameters are given in Table 5.15, the adopted curve is in Figure 5.4.
Table 5.15: Simulation Parameters, NHIS White Male
Simulation Parameters
Mean (LBMI)
Standard Deviation (LBMI)
β0
β1
βα
ββ
βγ
r1
r2
p
δ0
q
Values
3.96 × 10−2
5.64 × 10−3
−7.74
1.86
−4.50
0.35
1.43
15.00
70.00
1.12
−55.21
6.96
Results based on BMI range 15−50 are in Table 5.16. The true nadir calculated under the
pre-selected non-monotonic function is 25.8. The mean values of the fractional polynomial
78
0
adjusted prognostic index
.5
1
1.5
2
cohort 66, NHIS White Male
10
20
30
40
50
BMI
Figure 5.4: Assumed Underlying Curve, NHIS White Male
and free-knot spline nadirs are 26.0, which is close to the true nadir 25.8. The average
nadir of the transformation method is 25.2, indicating this method is slightly biased. The
change point method and the quadratic method are the worst in terms of central tendency.
The nadir MSE measures how far these estimated values are from the true nadir. This
simulation study shows transformation nadirs are close to the true. The largest MSE is
given by the change point method. Again, by splitting the nadir and the knot, the free-knot
spline method beats the change point method and generates better nadir estimators. The
nadir mean describes the central tendency of the estimators and the nadir MSE measures the
variation of the estimated values. A comparison between the transformation and the freeknot spline method suggests the free-knot spline method better targets at the true parameter
but the sampling distribution of the estimator has a larger variance. On the other hand,
the transformation method produces estimated values that are biased, but these values stay
close to each other that they are not far from the true parameter.
The confidence interval coverage probabilities of all methods are lower than 95% except
for the free-knot spline. The lowest coverage probability 74% is given by the quadratic
method, the fractional polynomial and transformation methods are very close. The freeknot spline coverage probability is 98%, which is slightly higher than 95%. Hence the free79
Table 5.16: Simulation Results 15 − 50, NHIS White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
26.9
25.2
26.0
24.6
26.0
Nadir 95% C.I.
MSE
Length
3.82
5.12
1.30
3.41
1.85
3.59
6.98
7.06
3.23
7.80
95% C.I. Coverage
Probability
0.74
0.84
0.83
0.90
0.98
knot spline method generates confidence intervals that are slightly wide, all other methods
produce confidence intervals that are too short.
The goodness of fit test results are in Table 5.17. The highest rejection probability
is associated with the quadratic method, showing the worst fit is given by the quadratic
method. All other methods are similar in terms of their model fitting.
Table 5.17: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 50,
NHIS White Male
Quadratic Transformation
7.8
5.4
Fractional
Polynomial
4.2
Change
Point
6.0
Free-Knot
Spline
4.0
As the range of BMI extends from 15 − 50 to 15 − 70, the nadir estimation results are
in Table 5.18 and Table 5.20. Conclusions stay the same. Table 5.19 and Table 5.21 are
goodness of fit results. As the BMI range changes, it gets more clear that the quadratic
method is worse than other methods and the free-knot spline as well as the fractional
polynomial are slightly better than the transformation and the change point methods.
5.5
Summary
In the simulation studies we have considered three cases. The first case represents situations
where there exists a good Normal transformation, hence it is appropriate to apply the
quadratic method to the Normally transformed variable. As expected, under this condition
80
Table 5.18: Simulation Results 15 − 60, NHIS White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
26.9
25.2
25.9
24.5
25.9
Nadir 95% C.I.
MSE
Length
4.37
5.55
1.24
3.26
1.80
3.54
6.82
6.88
3.38
7.52
95% C.I. Coverage
Probability
0.76
0.82
0.84
0.87
0.98
Table 5.19: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 60,
NHIS White Male
Quadratic Transformation
9.4
6.0
Fractional
Polynomial
3.6
Change
Point
5.8
Free-Knot
Spline
2.6
Table 5.20: Simulation Results 15 − 70, NHIS White Male
Quadratic
Transformation
Fractional Polynomial
Change Point
Free-Knot Spline
Nadir
Mean
26.8
25.2
25.9
24.5
25.9
Nadir 95% C.I.
MSE
Length
4.57
5.69
1.24
3.24
1.77
3.55
7.03
6.84
3.44
7.44
95% C.I. Coverage
Probability
0.77
0.82
0.84
0.86
0.97
Table 5.21: Percentage of Rejecting the Null Hypothesis in the Goodness of Fit Test 15 − 70,
NHIS White Male
Quadratic Transformation
10.0
6.0
Fractional
Polynomial
3.4
81
Change
Point
5.4
Free-Knot
Spline
2.8
the best method in both nadir estimation and coverage probability is the transformation
method. The free-knot polynomial spline also performs very well.
When there is no good Normal transformation and the knot is not equal to the nadir
the performance of these methods were compared under the second case. Since there is no
good Normal transformation we had to work with the original risk factor BMI directly. This
time the best method is the free-knot spline, considering both nadir estimation and coverage
probability of the confidence interval. Only the free-knot spline confidence interval achieved
the claimed 95% coverage probability, the transformation method generated a coverage
probability of 46%.
The third case assumes the non-monotonic relationship is given by some other function
instead of the free-knot spline curve, and there is no good Normal transformation. The
free-knot spline method produces the most accurate nadir estimator and the best confidence
interval. Therefore it performs the best under this condition.
When a method has to be selected before data analysis, if there is a good Normal
transformation the transformation method is the best choice. If there is no good Normal
transformation, the selected method should perform well, in the sense that it gives a good
nadir estimator, the coverage probability of the confidence interval is close to the nominal
95% and the model fitting should be good as well, no matter what is the true relationship.
According to the comparisons based on simulations the best choice would be the free-knot
spline method.
82
CHAPTER 6
FUTURE WORK
For model comparison purposes, a more direct measure of goodness of fit is the difference
between the assumed underlying relationship and the estimated curve. This measure may
be adopted to further compare the goodness of fit of these methods.
In the future we would like to generalize the one-free-knot spline to two or three-knot
spline functions. In such cases likelihood ratio based inference can not be used any more, but
the Delta method can be utilized to construct the confidence interval. The quadratic spline
can be replaced by the cubic spline with continuous second order derivative. The advantage
is that the neighborhood assumption in the proof can be avoided.
Another very interesting question is how to detect the nadir or test the existence of the
nadir. We would also like to see if any better nadir detection test can be proposed based on
the free-knot spline method.
83
REFERENCES
[1] Build and Blood Pressure Study, 1959. Technical report, Society of Actuaries, Chicago,
1959.
[2] Build Study 1979. Technical report, Society of Actuaries and Association of Life
Insurance Medical Directors of America, Chicago, 1980.
[3] Wilcosky T., Hyde J., Anderson J., Bangdiwala S., and Duncan B. Obesity and
mortality in the Lipid Research Clinics Program Follow-up Study. J Clin Epidemiol,
43:743–752, 1990.
[4] Schroll M. A longitudinal epidemiological survey of relative weight at age 25, 50 and 60
in the Glostrup population of men and women born in 1914. Dan Med Bull, 28:106–116,
1981.
[5] Tuomilehto J., Salonen J., Marti B., et al. Body weight and risk of myocardial infarction
and death in the adult population of eastern Finland. BMJ, 295:623–627, 1987.
[6] Allison D., Gallagher D., Heo M., PiSunyer F., and S. Heymsfield. Body mass index and
all-cause mortality among people age 70 and over: the Longitudinal Study of Aging.
Int J Obes Relat Metab Disord, 21:424–431, 1997.
[7] Diehr P., Bild D., Harris T., Duxbury A., Siscovick D., and Rossi M. Body mass index
and mortality in nonsmoking older adults: the Cardiovascular Health Study. Am J
Public Health, 88:623–629, 1998.
[8] Durazo-Arvizu R., Cooper R., Luke A., Prewitt T., Liao Y., and McGee D. Relative
weight and mortality in U.S. blacks and whites: findings from representative national
population samples. Ann Epidemiol, 7:383–395, 1997.
[9] Losonczy K., Harris T., Cornoni-Huntley J., et al. Does weight loss from middle age
to old age explain the inverse weight mortality relation in old age? Am J Epidemiol,
141:312–321, 1995.
[10] Troiano R., Frongillo E. Jr, Sobal J., and Levitsky D. The relationship between body
weight and mortality: a quantitative analysis of combined information from existing
studies. Int J Obes Relat Metab Disord, 20:63–75, 1996.
[11] Folsom A., Kaye S., Sellers T., et al. Body fat distribution and 5-year risk of death in
older women. JAMA, 269:483–487, 1993.
84
[12] Marmot M., Rose G., Shipley M., and Thomas B. Alcohol and mortality: a U-shaped
curve. Lancet, i:580–583, 1981.
[13] Pastor R. and Guallar E. Use of Two-segmented Logistic Regression to Estimate
Change-points in Epidemiologic Studies. American Journal of Epidemiology, 148:631–
642, 1998.
[14] Samuelsson O., Wilhelmsen L., Pennert K., Wedel H., and Berglund G. The J-shaped
relationship between coronary heart disease and achieved blood pressure level in treated
hypertension: further analysis of 12 years of follow-up of treated hypertensives in the
Primary Prevention Trial in Gothenburg, Sweden. Hyptensn, 8:547–555, 1990.
[15] Frank J., Dwayne M., Grove J., and Benfante R. Will lowering population levels of
serum cholesterol affect total mortality? J. Clin. Epidem., 45:333–346, 1992.
[16] Polichronaki H., Hatzakis A., Vatopoulos A., Katsouyanni K., Tzonou A., and Trichopoulos D. Association of coronary mortality with temperature and air pollution in
Athens. Haemostasis, 12:133, 1982.
[17] Wilcox A. and Russell I. Birth weight and perinatal mortality: II, On Weight-specific
mortality. Int. J. Epidem., 12:319–325, 1983.
[18] Durazo-Arvizu R., McGee D., Li Z., and Cooper R. Establishing the Nadir of the Body
Mass Index-Mortality Relationship-a Case study. Journal of the American Statistical
Association, 92:1312–1319, 1997.
[19] Goetghebeur E. and Pocock S. Detection and Estimation of J-shaped Risk-Response
Relationships. J. R. Statist. Soc. A, 158, Part 1:107–121, 1995.
[20] Royston P. and Altman D. Regression Using Fractional Polynomials of Continuous
Covariates: Parsimonious Parametric Modelling. Applied Statistics, 43:429–467, 1994.
[21] Sleeper L. and Harrington D. Regression Splines in the Cox Model With Application
to Covariate Effects in Liver Disease. Journal of the American Statistical Association,
85:941–949, 1990.
[22] D. R. Cox. Regression Models and Life-Tables. Journal of the Royal Statistical Society.
Series B (Methodological), 34:187–220, 1972.
[23] Breslow N.E. Covariance analysis of censored survival data. Biometrics, 30:89–100,
1974.
[24] Peto R. Contribution to the discussion of a paper by D.R. Cox. Journal of the Royal
Statistical Society, B, 34:205–207, 1972.
[25] Efron B. The efficiency of Cox’s likelod function for censored data. Journal of the
American Statistical Association, 72:557–565, 1977.
85
[26] Stevens J., Keil J., Rust P., Tyroler H., Davis C., and Gazes P. Body mass index and
body girths as predictors of mortality in black and white women. Arch Intern Med,
152:1257–1262, 1992.
[27] Cornfield J., Gordon T., and Smith W. Quantal response curves for experimentally
uncontrolled variables. Bulletin of the International Statistical Institute, XXXVIII:97–
115, 1961.
[28] Flegal K. Anthropometric Evaluation of Obesity in Epidemiologic Ressearch on Risk
Factors: Blood Pressure and Obesity in the Health Examination Survey. 1982.
[29] Nevill A. and Holder R. Body Mass Index: A Measure of Fatness or Leaness. British
Journal of Nutrition, 73:507–516, 1995.
[30] Casella G. and Berger R. Statistical Inference. Duxbury, Pacific Grove, CA, 2002.
[31] Efron B. The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia.
[32] McCullagh P. and Nelder J.A. Generalized Linear Models. Chapman & Hall/CRC,
Boca Raton London New York Washington,D.C., 1999.
[33] Greenland S. Dose-response and trend analysis in epidemiology: alternatives to
categorical analysis. Epidemiology, 9:356–365, 1995.
[34] Schumaker L. L. Spline Functions Basic Theory. Wiley, New York, Chichester, Brisbane,
Toronto, 1981.
[35] Gallant A.R. and Fuller W.A. Fitting Segmented Polynomial Regression Models Whose
Join Points Have to Be Estimated. Journal of the American Statistical Association,
68:144–147, 1973.
[36] Chung K. A Course In Probability Theory. Academic Press, New York, New York,
1974.
[37] Billingsley P. Probability and Measure. Wiley, New York, New York, 1995.
[38] Fleming T. and Harrington D. Counting Processes and Survival Analysis. Wiley,
Hoboken, New Jersey, 2005.
[39] Grønnesby J. and Borgan O. A method for checking regression models in survival
analysis based on the risk score. Lifetime Data Analysis, 2:315–328, 1996.
[40] May S. and Hosmer D. A Simplified Method of Calculating an Overall Goodness of Fit
Test for the Cox Proportional Hazards Model. Lifetime Data Analysis, 4:109–120, 1998.
[41] Parzen M. and Lipsitz S. A Global Goodness of Fit Statistic for Cox Regression Models.
Biometrics, 55:580–584, 1999.
86
BIOGRAPHICAL SKETCH
Fei Tan
Fei Tan was born on February 8, 1979, Beijing, the People’s Republic of China. She attended
Nanjing University in China in the Fall of 1997 and completed her Bachelor’s degree in
Mathematics in the Summer of 2001. In the Fall of 2001, she was admitted to the University
of Mississippi and obtained her Master’s degree in Mathematics in the Spring of 2003. She
went to the Florida State University in the Fall of 2003 and finished her Master’s degree in
Statistics in the Fall of 2005. Her doctoral program started at FSU in Spring, 2006.
Fei Tan’s research interests include survival analysis, non-monotonic regression, free-knot
polynomial spline and the proportional hazards model.
87