Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Survival Analysis Seminar in Statistics Presented by: Stefan Bauer, Stephan Hemri 28.02.2011 1 Definition • Survival analysis: method for analysing timing of events; data analytic approach to estimate the time until an event occurs. • Historically survival time refers to the time that an individual „survives“ over some period until the event of death occurs. • Event is also named failure. 2 Areas of application Survival analysis is used as a tool in many different settings: • proving or disproving the value of medical treatments for diseases; • evaluating reliability of technical equipment; • monitoring social phenomena like divorce and unemployment. 3 Examples Time from... • marriage to divorce; • birth to cancer diagnosis; • entry to a study to relapse. 4 Censoring The survival time is not known exactly! This may occur due to the following reasons: • a person does not experience the event before the study ends; • a person is lost to follow-up during the study period; • a person withdraws from the study because of some other reason. 5 Right censored 6 Left censored 7 Outcome variable • • • • • Time until an event occurs T = survival time T ≥ 0 T is a random variable t = specific value of interest for T Ask whether T > t if we are interested in the question whether an individual survives longer than t 8 Outcome variable • Survival time ≠ calendar time (e.g. followup starts for each individual on the day of surgery) • Correct starting and ending times may be unknown due to censoring 9 Survivor function • 𝑆 𝑡 = 𝑃(𝑇 > 𝑡) • Probability that random variable T exceeds specified time t • Fundamental to survival analysis t 1 2 3 . . . S(t) S(1) = P(T > 1) S(2) = P(T > 2) S(3) = P(T > 3) . . . 10 Survivor function 11 Survivor function 12 Hazard function •ℎ 𝑡 = 𝑃 𝑡 ≤ 𝑇<𝑡+ ∆𝑡 𝑇 ≥𝑡) 𝑙𝑖𝑚 ∆𝑡 ∆𝑡→0 • ℎ 𝑡 ≥0 • h(t) has no upper bounds • Often called: Failure rate 13 Example: Hazard function Assume having a huge follow-up study on heart attacks: • 600 heart attacks (events) per year; • 50 events per month; • 11.5 events per week; • 0.0011 events per minute. h(t) = rate of events occurring per time unit 14 Relation between S(t) and h(t) If T continous: • 𝑆 𝑡 = 𝑒𝑥𝑝 [− • ℎ 𝑡 =− 𝑡 ℎ 0 𝑢 𝑑𝑢] 𝑑𝑆 𝑡 𝑑𝑡 𝑆 𝑡 15 Sketch of Proof i) Find relationship between density f(t) and S(t) ii) Express relationship between h(t) and S(t) as a function of density f(t) i in ii → h(t) as a function of S(t) and vice versa 16 Example: Relationship 17 Types of hazard functions 18 Hazard ratio Cox proportional hazards model: 𝑝 ℎ 𝑡, 𝑿 = ℎ0 (𝑡)𝑒 𝑖=1 𝛽𝑖𝑋𝑖 • h0(t): baseline hazard rate • X: vector of explanatory variables • 𝑒 𝛽𝑖 : hazard ratio for the coefficient 𝛽𝑖 • Ratio between the predicted hazard rate of two individuals that differ by 1 unit in the variable 𝑋𝑖 19 Example: Hazard ratio ℎ 𝑡, 𝑿 = ℎ0 (𝑡)𝑒 𝑝 𝑖=1 𝛽𝑖 𝑋𝑖 20 Basic descriptive measures • Group mean (ignore censorship) • Median (t for which S(t) = 0.5) • Average hazard rate: ℎ = # 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠 𝑛 𝑖=1 𝑡𝑖 21 Goals (of survival analysis) • to estimate and interpret survivor and or hazard function; • to compare survivor and or hazard function; • to assess the relationship of explanatory variables to survival times -> we need mathematical modelling (Cox model). 22 Computer layout 1 2 3 4 5 5 12 3,5 8 6 δ (failed or censored) 1 0 0 0 0 6 3,5 1 individual t (in weeks) 23 Computer layout Layout for multivariate data with p explanatory variables: individual t (in weeks) 1 2 ... n t1 t2 ... tn δ (failed or censored) ᵟ1 ᵟ2 … ᵟn X1 X2 .... Xp X11 X21 ... Xn1 X12 X22 ... Xn2 ... ... ... .... X1p X2p ... Xnp 24 Notation & terminology • Ordered failures: unordered censored t’s failed t’s ordered (t(i)) • Frequency counts: • mi = # individuals who failed at t(i) • qi = # ind. censored in [t(i),t(i+1)) • Risk set R(t(i)): Collection of individuals who have survived at least until time t(i) 25 Manual analysis layout Ordered # of failures # censored in failure times mi [t(i),t(i+1)) Risk set R(t(i)) t(0)=0 mi q0 R(t(0)) t(1) m1 q1 R(t(1)) .... ... ... ... t(n) mn qn R(t(n)) 26 Manual analysis layout Ordered # of failures # censored failure times mi in [t(i),t(i+1)) Risk set R(t(i)) t(0) = 0 0 0 6 persons survive ≥ 0 weeks t(1) = 3.5 1 1 6 persons survive ≥ 3,5 weeks t(2) = 5 1 3 4 persons survive ≥ 5 weeks 27 Example: Leukaemia remission Extended Remission Data containing: • two groups of leukaemia patients: treatment & placebo; • log WBC values of each individual; (WBC: white blood cell count) Expected behaviour: The higher the WBC value is the lower the expected survival time. 28 Example: Analysis layout • Analysis layout for treatment group: t(j) t(0) = 0 t(1) = 6 t(2) = 7 t(3) = 10 t(4) = 13 t(5) = 16 mi 0 3 1 1 1 1 qi 0 1 1 2 0 3 R(t(j)) 21 persons 21 persons 17 persons 15 persons 12 persons 11 persons t(6) = 22 1 0 7 persons t(7) = 23 1 5 6 persons 29 Example: Confounding 30 Example: Confounding • log 𝑊𝐵𝐶𝑇𝑟𝑒𝑎𝑡 ≪ log 𝑊𝐵𝐶𝑃𝑙𝑎𝑐𝑒𝑏𝑜 • Confounding of treatment effect by log WBC • Log WBC suggests: Treatment group survives longer simply because of lower WBC values • Controlling for WBC necessary 31 Example: Interaction 32 Example: Conclusion • Need to consider confounding and interaction; • basic problem: comparing survival of the two groups after adjusting for confounding and interaction; • problem can be extended to the multivariate case by adding additional explanatory variables. 33 Summary • Survival analysis encompasses a variety of methods for analyzing the timing of events; • problem of censoring: exact survival time unknown; mixture of complete and incomplete observations difference to other statistical data 34 Summary • Relationship between S(t) and h(t): 𝑑𝑆 𝑡 𝑑𝑡 𝑆 𝑡 = exp [− ℎ 𝑢 𝑑𝑢] ↔ ℎ 𝑡 = − 𝑆 𝑡 0 𝑡 • Goals: • Estimation & Interpretation of S(t) and h(t) • Comparison of different S(t) and h(t) • Assessment of relationship of explanatory variables to survival time 35 References • A Conceptual Approach to Survival Analysis, Johnson, L.L., 2005. Downloaded from www.nihtraining.com on 19.02.2011. • Applied Survival Analysis: Regression Modelling of Time to event Data, Hosmer, D.W., Lemeshow, S., Wiley Series in Probability and Statistics 1999. • Lesson 6: Sample Size and Power - Part A, The Pennsylvania State University, 2007. Downloaded from http://www.stat.psu.edu/online/ courses/stat509/06_sample/09_sample_hazard.htm on 24.02.2011 • Survival analysis: A self-learning text, Kleinbaum, D.G. & Klein M., Springer 2005. • Survival and Event History Analysis: A Process Point of View (Statistics for Biology and Health), Aalen, O., Borgan, O. & Gjessing H., Springer 2010. 36