Download Folie 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Least squares wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Transcript
Introduction to Survival
Analysis
Seminar in Statistics
Presented by: Stefan Bauer, Stephan Hemri
28.02.2011
1
Definition
• Survival analysis:
 method for analysing timing of events;
 data analytic approach to estimate the
time until an event occurs.
• Historically survival time refers to the time
that an individual „survives“ over some
period until the event of death occurs.
• Event is also named failure.
2
Areas of application
Survival analysis is used as a tool in many
different settings:
• proving or disproving the value of
medical treatments for diseases;
• evaluating reliability of technical
equipment;
• monitoring social phenomena like
divorce and unemployment.
3
Examples
Time from...
• marriage to divorce;
• birth to cancer diagnosis;
• entry to a study to relapse.
4
Censoring
The survival time is not known exactly! This
may occur due to the following reasons:
• a person does not experience the event
before the study ends;
• a person is lost to follow-up during the
study period;
• a person withdraws from the study
because of some other reason.
5
Right censored
6
Left censored
7
Outcome variable
•
•
•
•
•
Time until an event occurs
T = survival time T ≥ 0
T is a random variable
t = specific value of interest for T
Ask whether T > t if we are interested in
the question whether an individual
survives longer than t
8
Outcome variable
• Survival time ≠ calendar time (e.g. followup starts for each individual on the day of
surgery)
• Correct starting and ending times may be
unknown due to censoring
9
Survivor function
• 𝑆 𝑡 = 𝑃(𝑇 > 𝑡)
• Probability that
random variable T
exceeds specified
time t
• Fundamental to
survival analysis
t
1
2
3
.
.
.
S(t)
S(1) = P(T > 1)
S(2) = P(T > 2)
S(3) = P(T > 3)
.
.
.
10
Survivor function
11
Survivor function
12
Hazard function
•ℎ 𝑡 =
𝑃 𝑡 ≤ 𝑇<𝑡+ ∆𝑡 𝑇 ≥𝑡)
𝑙𝑖𝑚
∆𝑡
∆𝑡→0
• ℎ 𝑡 ≥0
• h(t) has no upper bounds
• Often called: Failure rate
13
Example: Hazard function
Assume having a huge follow-up study on
heart attacks:
• 600 heart attacks (events) per year;
• 50 events per month;
• 11.5 events per week;
• 0.0011 events per minute.
h(t) = rate of events occurring per time unit
14
Relation between S(t) and h(t)
If T continous:
• 𝑆 𝑡 = 𝑒𝑥𝑝 [−
• ℎ 𝑡 =−
𝑡
ℎ
0
𝑢 𝑑𝑢]
𝑑𝑆 𝑡
𝑑𝑡
𝑆 𝑡
15
Sketch of Proof
i) Find relationship between density f(t) and
S(t)
ii) Express relationship between h(t) and
S(t) as a function of density f(t)
i in ii → h(t) as a function of S(t) and vice
versa
16
Example: Relationship
17
Types of hazard functions
18
Hazard ratio
Cox proportional hazards model:
𝑝
ℎ 𝑡, 𝑿 = ℎ0 (𝑡)𝑒 𝑖=1 𝛽𝑖𝑋𝑖
• h0(t): baseline hazard rate
• X: vector of explanatory variables
• 𝑒 𝛽𝑖 : hazard ratio for the coefficient 𝛽𝑖
• Ratio between the predicted hazard rate
of two individuals that differ by 1 unit in
the variable 𝑋𝑖
19
Example: Hazard ratio
ℎ 𝑡, 𝑿 = ℎ0 (𝑡)𝑒
𝑝
𝑖=1 𝛽𝑖 𝑋𝑖
20
Basic descriptive measures
• Group mean (ignore censorship)
• Median (t for which S(t) = 0.5)
• Average hazard rate: ℎ =
# 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠
𝑛
𝑖=1 𝑡𝑖
21
Goals (of survival analysis)
• to estimate and interpret survivor and or
hazard function;
• to compare survivor and or hazard
function;
• to assess the relationship of explanatory
variables to survival times -> we need
mathematical modelling (Cox model).
22
Computer layout
1
2
3
4
5
5
12
3,5
8
6
δ (failed or
censored)
1
0
0
0
0
6
3,5
1
individual t (in weeks)
23
Computer layout
Layout for multivariate data with p
explanatory variables:
individual t (in weeks)
1
2
...
n
t1
t2
...
tn
δ (failed or
censored)
ᵟ1
ᵟ2
…
ᵟn
X1
X2
....
Xp
X11
X21
...
Xn1
X12
X22
...
Xn2
...
...
...
....
X1p
X2p
...
Xnp
24
Notation & terminology
• Ordered failures: unordered
censored t’s
failed t’s
ordered (t(i))
• Frequency counts:
• mi = # individuals who failed at t(i)
• qi = # ind. censored in [t(i),t(i+1))
• Risk set R(t(i)): Collection of individuals
who have survived at least until time t(i)
25
Manual analysis layout
Ordered
# of failures # censored in
failure times
mi
[t(i),t(i+1))
Risk set R(t(i))
t(0)=0
mi
q0
R(t(0))
t(1)
m1
q1
R(t(1))
....
...
...
...
t(n)
mn
qn
R(t(n))
26
Manual analysis layout
Ordered # of failures # censored
failure times
mi
in [t(i),t(i+1))
Risk set R(t(i))
t(0) = 0
0
0
6 persons survive ≥ 0 weeks
t(1) = 3.5
1
1
6 persons survive ≥ 3,5 weeks
t(2) = 5
1
3
4 persons survive ≥ 5 weeks
27
Example: Leukaemia remission
Extended Remission Data containing:
• two groups of leukaemia patients:
treatment & placebo;
• log WBC values of each individual;
(WBC: white blood cell count)
Expected behaviour:
The higher the WBC value is the lower the
expected survival time.
28
Example: Analysis layout
• Analysis layout for treatment group:
t(j)
t(0) = 0
t(1) = 6
t(2) = 7
t(3) = 10
t(4) = 13
t(5) = 16
mi
0
3
1
1
1
1
qi
0
1
1
2
0
3
R(t(j))
21 persons
21 persons
17 persons
15 persons
12 persons
11 persons
t(6) = 22
1
0
7 persons
t(7) = 23
1
5
6 persons
29
Example: Confounding
30
Example: Confounding
• log 𝑊𝐵𝐶𝑇𝑟𝑒𝑎𝑡 ≪ log 𝑊𝐵𝐶𝑃𝑙𝑎𝑐𝑒𝑏𝑜
• Confounding of treatment effect by log
WBC
• Log WBC suggests: Treatment group
survives longer simply because of lower
WBC values
• Controlling for WBC necessary
31
Example: Interaction
32
Example: Conclusion
• Need to consider confounding and
interaction;
• basic problem: comparing survival of the
two groups after adjusting for confounding
and interaction;
• problem can be extended to the
multivariate case by adding additional
explanatory variables.
33
Summary
• Survival analysis encompasses a variety
of methods for analyzing the timing of
events;
• problem of censoring: exact survival time
unknown;
 mixture of complete and incomplete
observations
 difference to other statistical data
34
Summary
• Relationship between S(t) and h(t):
𝑑𝑆 𝑡
𝑑𝑡
𝑆 𝑡 = exp [− ℎ 𝑢 𝑑𝑢] ↔ ℎ 𝑡 = −
𝑆 𝑡
0
𝑡
• Goals:
• Estimation & Interpretation of S(t) and h(t)
• Comparison of different S(t) and h(t)
• Assessment of relationship of explanatory
variables to survival time
35
References
• A Conceptual Approach to Survival Analysis, Johnson, L.L., 2005.
Downloaded from www.nihtraining.com on 19.02.2011.
• Applied Survival Analysis: Regression Modelling of Time to event
Data, Hosmer, D.W., Lemeshow, S., Wiley Series in Probability and
Statistics 1999.
• Lesson 6: Sample Size and Power - Part A, The Pennsylvania State
University, 2007. Downloaded from http://www.stat.psu.edu/online/
courses/stat509/06_sample/09_sample_hazard.htm on 24.02.2011
• Survival analysis: A self-learning text, Kleinbaum, D.G. & Klein M.,
Springer 2005.
• Survival and Event History Analysis: A Process Point of View
(Statistics for Biology and Health), Aalen, O., Borgan, O. & Gjessing
H., Springer 2010.
36