Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 262: Intermediate Biostatistics April 20, 2004: Introduction to Survival Analysis Jonathan Taylor and Kristin Cobb Satistics 262 1 What is survival analysis? Statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness, recovery from illness (binary variables) or transition above or below the clinical threshold of a meaningful continuous variable (e.g. CD4 counts). Accommodates data from randomized clinical trial or cohort study design. Satistics 262 2 Randomized Clinical Trial (RCT) Disease Intervention Random assignment Target population Disease-free, at-risk cohort Disease-free Disease Control Disease-free TIME Randomized Clinical Trial (RCT) Cured Treatment Random assignment Target population Patient population Not cured Cured Control Not cured TIME Randomized Clinical Trial (RCT) Dead Treatment Random assignment Target population Patient population Alive Dead Control Alive TIME Cohort study (prospective/retrospective) Disease Exposed Target population Disease-free cohort Disease-free Disease Unexposed Disease-free TIME Objectives of survival analysis Estimate time-to-event for a group of individuals, such as time until second heartattack for a group of MI patients. To compare time-to-event between two or more groups, such as treated vs. placebo MI patients in a randomized controlled trial. To assess the relationship of co-variables to time-to-event, such as: does weight, insulin resistance, or cholesterol influence survival time of MI patients? Note: expected time-to-event = 1/incidence rate Satistics 262 7 Examples of survival analysis in medicine Satistics 262 8 RCT: Women’s Health Initiative (JAMA, 2001) On hormones Cumulative incidence On placebo Prospective cohort study: From April 15, 2004 NEJM: Use of Gene-Expression Profiling to Identify Prognostic Subclasses in Adult Acute Myeloid Leukemia Satistics 262 10 Retrospective cohort study: From December 2003 BMJ: Aspirin, ibuprofen, and mortality after myocardial infarction: retrospective cohort study Satistics 262 11 Why use survival analysis? 1. Why not compare mean time-to-event between your groups using a t-test or linear regression? -- ignores censoring 2. Why not compare proportion of events in your groups using logistic regression? --ignores time Satistics 262 12 Cox regression vs.logistic regression Distinction between rate and proportion: Incidence (hazard) rate: number of new cases of disease per population at-risk per unit time (or mortality rate, if outcome is death) Cumulative incidence: proportion of new cases that develop in a given time period Satistics 262 13 Cox regression vs.logistic regression Distinction between hazard/rate ratio and odds ratio/risk ratio: Hazard/rate ratio: ratio of incidence rates Odds/risk ratio: ratio of proportions By takingregression into account you are the taking into account Logistic aimstime, to estimate odds ratio; Cox more information just binary yes/no. regression aims tothan estimate the hazard ratio Gain power/precision. Satistics 262 14 Rates vs. risks Relationship between risk and rates: R(t ) 1 e ht h constant hazard rate R(t) probabilit y of disease in time t Satistics 262 15 Rates vs. risks For example, if rate is 5 cases/1000 person-years, then the chance of developing disease over 10 years is: R (t ) 1 e (.005)(10) .05 Compare to .005(10) = 5% R (t ) 1 e R (t ) 1 .951 .0488 Satistics 262 The loss of persons at risk because they have developed disease within the period of observation is small relative to the size of the total group. 16 Rates vs. risks If rate is 50 cases/1000 person-years, then the chance of developing disease over 10 years is: R (t ) 1 e (.05)(10) .5 R (t ) 1 e R (t ) 1 .61 .39 Compare to .05(10) = 50% Satistics 262 17 Rates vs. risks Relationship between risk and rates (derivation): r (t ) he ht Exponential density function for waiting time until the event (constant hazard rate) t R(t ) he hu du e hu t 0 e ht e 0 1 e ht 0 Preview: Waiting time distribution will change if the hazard rate changes as a function of time: h(t) Satistics 262 18 Survival Analysis: Terms Time-to-event: The time from entry into a study until a subject has a particular outcome Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the study, or if the study ends before ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study. If dropout is related to both outcome and treatment, dropouts may bias the results Right Censoring (T>t) Common examples Termination of the study Death due to a cause that is not the event of interest Loss to follow-up We know that subject survived at least to time t. Satistics 262 20 Left censoring (T<t) The origin time, not the event time, is known only to be less than some value. For example, if you are studying menarche and you begin following girls at age 12, you may find that some of them have already begun menstruating. Unless you can obtain information about the start date for those girls, the age of menarche is left-censored at age 12. *from:Allison, Paul. Survival Analysis. SAS Institute. 1995. Satistics 262 21 Interval censoring (a<T<b) When we know the event has occurred between two time points, but don’t know the exact dates. For example, if you’re screening subjects for HIV infection yearly, you may not be able to determine the exact date of infection.* *from:Allison, Paul. Survival Analysis. SAS Institute. 1995. Satistics 262 22 Data Structure: survival analysis Time variable: ti = time at last diseasefree observation or time at event Censoring variable: ci =1 if had the event; ci =0 no event by time ti Satistics 262 23 Choice of origin Satistics 262 24 Satistics 262 25 Describing survival distributions Ti the event time for an individual, is a random variable having a probability distribution. Different models for survival data are distinguished by different choice of distribution for Ti. Satistics 262 26 Survivor function (cumulative distribution function) Cumulative failure function F (t ) P(T t ) Survival analysis typically uses complement, or the survivor function: S (t ) 1 P(T t ) 1 F (t ) Example: If t=100 years, S(t=100) = probability of surviving beyond 100 years. Satistics 262 27 Corresponding density function dF(t ) dS(t ) f (t ) dt dt The probability of the failure time occurring at exactly time t (out of the whole range of possible t’s). Also written: P(t T t t ) f (t ) lim t 0 t Satistics 262 28 Hazard function P(t T t t / T t ) h(t ) lim t 0 t In words: the probability that if you survive to t, you will succumb to the event in the next instant. f (t ) Hazard from density and survival : h(t) S (t ) Derivation: h(t )dt P(t T t dt / T t ) P(t T t dt & T t ) P(t T t dt) f (t )dt P(T t ) P(T t ) S (t ) Satistics 262 29 Relating these functions: f (t ) Hazard from density and survival : h(t) S (t ) Survival from density : S(t) f (u )du t dS(t ) Density from survival : f (t ) dt t ( h ( u ) du ) Density from hazard : f (t ) h(t )e 0 t ( h ( u ) du ) Survival from hazard : S(t) e Hazard from survival : h(t) - 0 d ln S (t ) dt Satistics 262 30 Introduction to Kaplan-Meier Non-parametric estimate of survivor function. Commonly used to describe survivorship of study population/s. Commonly used to compare two study populations. Intuitive graphical presentation. Satistics 262 31 Survival Data (right-censored) Subject A Subject B Subject C Subject D Subject E X 1. subject E dies at 4 months Beginning of study Time in months End of study Corresponding Kaplan-Meier Curve 100% Probability of surviving to just before 4 months is 100% = 5/5 Fraction surviving this death = 4/5 Subject E dies at 4 months Time in months Survival Data Subject A Subject B 2. subject A drops out after 6 months Subject C 3. subject C dies X at 7 months Subject D Subject E X 1. subject E dies at 4 months Beginning of study Time in months End of study Corresponding Kaplan-Meier Curve 100% subject C dies at 7 months Time in months Fraction surviving this death = 2/3 Survival Data Subject A Subject B 2. subject A drops out after 6 months Subject C 3. subject C dies X at 7 months Subject D 4. Subjects B and D survive for the whole year-long study period Subject E X 1. subject E dies at 4 months Beginning of study Time in months End of study Corresponding Kaplan-Meier Curve 100% Product limit estimate of survival = P(surviving/at-risk through failure 1) * P(surviving/at-risk through failure 2) = 4/5 * 2/3= .5333 Time in months The product limit estimate The probability of surviving in the entire year, taking into account censoring = (4/5) (2/3) = 53% NOTE: 40% (2/5) because the one drop-out survived at least a portion of the year. AND <60% (3/5) because we don’t know if the one drop-out would have survived until the end of the year. KM estimator, formally k distinct event times t1 t j ... t k at each time t j , there are n j individual s at - risk d j is the number who have the event at time t j S (tˆ) dj [1 n j:t j t ] j Satistics 262 39 Comparing 2 groups Caveats Survival estimates can be unreliable toward the end of a study when there are small numbers of subjects at risk of having an event. WHI and breast cancer Small numbers left Overview of SAS PROCS LIFETEST - Produces life tables and Kaplan-Meier survival curves. Is primarily for univariate analysis of the timing of events. LIFEREG – Estimates regression models with censored, continuous-time data under several alternative distributional assumptions. Does not allow for time-dependent covariates. PHREG– Uses Cox’s partial likelihood method to estimate regression models with censored data. Handles both continuous-time and discrete-time data and allows for time-dependent covariables Satistics 262 43