Download Comparing Survival Functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Survival Analysis
A Brief Introduction
2
1. Survival Function, Hazard Function
In many medical studies, the primary endpoint is
time until an event occurs (e.g. death, remission)
 Data are typically subject to censoring (e.g.
when a study ends before the event occurs)
 Survival Function - A function describing the
proportion of individuals surviving to or beyond a
given time. Notation:

◦ T: survival time of a randomly selected individual
◦ t: a specific point in time.
◦ Survival Function:
 t

S (t )  P(T  t )  exp    (u)du 
 0

3
Hazard Function/Rate
Hazard Function (t): instantaneous failure
rate at time t given that the subject has
survived upto time t. That is
P(t  T  t   | T  t ) P(t  T  t   )
  t   lim 0


P(T  t )  

 lim 0


S (t )  S (t   )

f t 
1


S (t ) S  t 
Here f(t) is the probability density function of
d
the survival time T. That is, f  t   F  t 
dt
where F(t) is the cumulative distribution
function of T: F  t   1  S  t   P T  t 
4
2. The Key Word is ‘Censoring’
Because of censoring, many common data
analysis procedures can not be adopted
directly.
 For example, one could use the logistic
regression model to model the
relationship between survival probability
and some relevant covariates

◦ However one should use the customized
logistic regression procedures designed to
account for censoring
5
Key Assumption:
Independent Censoring

Those still at risk at time t in the study
are a random sample of the population at
risk at time t, for all t

This assumption means that the hazard
function, λ(t), can be estimated in a
fair/unbiased/valid way
6
3A. Kaplan-Meier (Product-Limit)
Estimator of the Survival Curve

The Kaplan–Meier estimator is the
nonparametric maximum likelihood estimate of
S(t). It is a product of the form
r1  d1 r2  d 2
ri  di
ˆ
S (t ) 

 ... 
r1
r2
ri
is the number of subjects alive just before
time t k
 d k denotes the number who died at time t k
 rk
7
Kaplan-Meier Curve, Example
Time ti
# at risk
# events
Ŝ
0
20
0
1.00
5
20
2
[1-(2/20)]*1.00=0.90
6
18
0
[1-(0/18)]*0.90=0.90
10
15
1
[1-(1/15)]*0.90=0.84
13
14
2
(1-(2/14)]*0.84=0.72
8
Proportion Surviving (95% Confidence)
0.6
0.7
0.8
0.9
1.0
Kaplan Meier Curve
0
5
10
Survival Time
15
20
9
Figure 1. Plot of survival distribution functions for the NCI and
the SCI Groups. The Y-axis is the probability of not declining to
GDS 3 or above. The X-axis is the time (in years) to decline.
(Barry Reisberg et al., 2010; Alzheimer & Dementia; in press.)
10
3B. Comparing Survival Functions
1.00
0.75
Survival
Distribution
Function
High
0.50
Low
0.25
Medium
0.00
0
10
20
30
40
50
60
Time
11
Log-Rank Test
The log-rank test
• tests whether the survival functions are
statistically equivalent
• is a large-sample chi-square test that uses the
observed and expected cell counts across the
event times
• has maximum power when the ratio of hazards
is constant over time.
12
Wilcoxon Test
The Wilcoxon test
• weights the observed number of events
minus the expected number of events by
the number at risk across the event times
• can be biased if the pattern of censoring is
different between the groups.
13
Log-rank versus Wilcoxon Test
Log-rank test
• is more sensitive than the Wilcoxon test to
differences between groups in later points in
time.
Wilcoxon test
• is more sensitive than the log-rank test to
differences between groups that occur in
early points in time.
14
4.Two Parametric Distributions
Here we present two most notable
models for the distribution of T.
 Exponential distribution:  (t )  
 Weibull distribution:
p 1
p
p 1
 (t )  p(t )  p  t

◦ Its survival function:
 t

p p 1
S (t )  exp    p u du   exp  (t ) p
 0



◦ Thus: ln  ln( S (t )  pln( t )  ln(  )
15
Weibull Hazard Function, Plot
16
5. Regression Models

1.
2.
The Exponential and the Weibull
distribution inspired two parametric
regression approaches:
Parametric proportional hazard model
– this model can be generalized to a
semi-parametric model: the Cox
proportional hazard model
Accelerated failure time model
17
Proportional Hazard Model

In a regression model for survival analysis
one can try to model the dependence on
the explanatory variables by taking the
(new) hazard rate to be:
  0  c(0  1xi1   2 xi 2  ...   k xik )

Hazard rates being positive it is natural to
choose the function c such that c(β,x)
is positive irrespective the values of x.
18
Proportional Hazard Model
Thus a good choice is: c(.)  exp(.)
 The resulting proportional hazard model is:

  0  exp( 0  1xi1   2 xi 2  ...   k xik )

For the Weibull distribution we have:
  p0p  t p 1  exp(  0  1 xi1   2 xi 2  ...   k xik )

For the Exponential distribution we have:
  0  exp( 0  1xi1   2 xi 2  ...   k xik )
19
Accelerated Failure Time Model

For the Weibull distribution (including the
Exponential distribution), the
proportional hazard model is equivalent
to a log linear model in survival time T:
ln T   0  1 xi1  2 xi 2  ...  k xik  

Here the error term  can be shown to
follow the 2-parameter Extreme Vvalue
distribution
20
Apply Both Models Simultaneously
If the underlying distribution for T is
Weibull or Exponential, one can apply
both regression models simultaneously to
reflect different aspects of the survival
process. That is
 Prediction of degree of decline using the
Weibull proportional hazard model
 Prediction of time of decline using the
accelerated failure time model

21
An Example

In a recent paper (Reisberg et al., 2010),
we applied both regression models to a
dementia study conducted at NYU:
(T )  0 (T )exp(1 * Group   2 * Age  3 * Gender   4 * Education  5 * FollowUp)
log T  0  1 * Group  2 * Age  3 * Gender  4 * Education  5 * FollowUp  

The results are shown next
22
23
6. Cox Proportional Hazards Model
24
Parametric versus Nonparametric
Models
Parametric models require that
• the distribution of survival time is known
• the hazard function is completely specified
except for the values of the unknown
parameters.
Examples include the Weibull model, the
exponential model, and the log-normal model.
25
Parametric versus Nonparametric
Models
Properties of nonparametric models are
• the distribution of survival time is unknown
• the hazard function is unspecified.
An example is the Cox proportional hazards
model.
26
...
Cox Proportional Hazards Model
{ 1 X i 1 ...  k X ik }
hi (t )  h0 (t )e
Baseline Hazard
function - involves
time but not predictor
variables
Linear function of a
set of predictor
variables - does
not involve time
β = 0 → hazard ratio = 1
Two groups have the same
survival experience
27
Popularity of the Cox Model
The Cox proportional hazards model
• provides the primary information
desired from a survival analysis, hazard
ratios and adjusted survival curves, with
a minimum number of assumptions
• is a robust model where the regression
coefficients closely approximate the
results from the correct parametric
model.
28
Partial Likelihood
Partial likelihood differs from maximum
likelihood because
• it does not use the likelihoods for all subjects
• it only considers likelihoods for subjects that
experience the event
• it considers subjects as part of the risk set
until they are censored.
29
Partial Likelihood
Subject
Survival Time
Status
C
2.0
1
B
3.0
1
A
4.0
0
D
E
5.0
6.0
1
0
30
Partial Likelihood
hc (2)
Lc 
hc (2)  hb (2)  ha (2)  hd (2)  he (2)
hb (3)
Lb 
hb (3)  ha (3)  hd (3)  he (3)
hd (5)
Ld 
hd (5)  he (5)
31
Partial Likelihood
hd (5)
Ld 
hd (5)  he (5)
Ld 
Ld 
ho (5)e
ho (5)e
1 X d 1  2 X d 2  ....   k X dk
1 X d 1  2 X d 2  ....   k X dk
e
 ho (5)e
1 X e1  2 X e 2  ....   k X ek
1 X d 1  2 X d 2  ....   k X dk
e 1 X d 1  2 X d 2  ....   k X dk  e 1 X e1  2 X e 2  ....   k X ek
32
Partial Likelihood

The overall likelihood is the product of
the individual likelihood. That is:
L  Lc * Lb * Ld
33
7. SAS Programs for Survival Analysis




There are three SAS procedures for analyzing survival
data: LIFETEST, PHREG, and LIFEREG.
PROC LIFETEST is a nonparametric procedure for
estimating the survivor function, comparing the
underlying survival curves of two or more samples, and
testing the association of survival time with other
variables.
PROC PHREG is a semiparametric procedure that fits
the Cox proportional hazards model and its extensions.
PROC LIFEREG is a parametric regression procedure
for modeling the distribution of survival time with a set
of concomitant variables.
34
Proc LIFETEST

The Kaplan-Meier(K-M) survival curves
and related tests (Log-Rank, Wilcoxon)
can be generated using SAS PROC
LIFETEST
PROC LIFETEST DATA=SAS-data-set <options>;
TIME variable <*censor(list)>;
STRATA variable <(list)> <...variable <(list)>>;
TEST variables;
RUN;
35
Proc PHREG

The Cox (proportional hazards)
regression is performed using SAS PROC
PHREG
proc phreg data=rsmodel.colon;
model surv_mm*status(0,2,4) = sex yydx
/ risklimits;
run;
36
Proc LIFEREG

The accelerated failure time regression is
performed using SAS PROC LIFEREG
proc lifereg data=subset
outest=OUTEST(keep=_scale_);
model (lower, hours) = yrs_ed yrs_exp /
d=normal; output out=OUT
xbeta=Xbeta; run;
37
Selected References


PD Allison (1995). Survival Analysis
Using SAS: A Practical Guide. SAS
Publishing.
JD Kalbfleisch and RL Prentice
(2002).The Statistical Analysis of Failure
Time Data. Wiley-Interscience.
38

Questions?
39
Related documents