Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Survival Data • survival time examples: – time a cancer patient is in remission – time til a disease-free person has a heart attack – time til death of a healthy mouse – time til a computer component fails – time til a paroled prisoner gets rearrested – time til death of a liver transplant patient – time til a cell phone customer switches carrier – time til recovery after surgery • all are "time til some event occurs" - longer times are better in all but the last… Three goals of survival analysis • estimate the survival function • compare survival functions (e.g., across levels of a categorical variable treatment vs. placebo) • understand the relationship of the survival function to explanatory variables ( e.g., is survival time different for various values of an explanatory variable?) • The survival function S(y)=P(Y>y) can be estimated by the empirical survival function, which essentially gets the relative frequency of the number of Y’s > y… • Look at Definition 1.3 on p.5: Y1, … ,Yn are i.i.d. (independent and identically distributed) survival variables. Then Sn(y) =empirical survival function at y = (# of the Y’s > y)/n = estimate of S(y). n • Note that S (y) 1 I (Y ), where A y, n n A i I 1 where I is the indicator function… • Review of Bernoulli & Binomial RVs: – Show that the expected value of a Bernoulli rv Z with parameter p (i.e., P(Z=1)=p) is p and that the variance of Z is p(1-p) – Then knowing that the sum of n iid (independent and identically distributed) Bernoullis is a Binomial rv with parameters n and p, show on the next slide that the empirical survivor function Sn(y) is an unbiased estimator of S(y) • Note that nSn I(y,) (Yi ) sum of iid Bernoullis and as such nSn has B(n,p) where p=P(Y>y)=S(y). * • Also note that for a fixed y E(nSn (y * )) nS(y * ) so nE(Sn (y * )) nS(y * ) so Sn is unbiased as an estimator of S • What is the Var(Sn)? (see 1.6 and on p.6 where the confidence interval is computed…) • Try this for Example 1.3, p.6 • Example 1.4 on page 8 shows that it is sometimes difficult to compare survival curves since they can cross each other… (what makes one survival curve “better” than another?) • One way of comparing two survival curves is by comparing their MTTF (mean time til failure) values. Let’s try to use R to draw the two curves given in Ex. 1.4: S1(y)=exp(-y/2) and S2(y)=exp(-y2/4)… see the handout R#1. • Note that the MTTF of a survival rv Y is just its expected value E(Y). We can also show (Theorem 1.2) that MTTF S(y)dy y 0 (Math & Stat majors: Show this is true using integration by parts and l’Hospital’s rule…!) • So suppose we have an exponential survival function: S(y) exp(y / ) (btw, can you show this satisfies the properties of a survival function?) • Then the MTTF for this variable is show this… • And for any two such survival functions, S1(y)=exp(-y/ and S2(y)=exp(-y/ one is “better” than the other if the corresponding beta is “better”… S1(y) S2 (y) MTTF1 MTTF2 iff • HW: Use R to plot on the same axes at least two such survival functions with different values of beta and show this result. The hazard function • The hazard function gives the so-called “instantaneous” risk of death (or failure) at time t. Recall that for continuous rvs, the probability of occurrence at time t is 0 for all t. So we think about the probability in a “small” interval around t, given that we’ve survived to t, and then let the small interval go to zero (in the limit). The result is given on page 9 as the hazard rate or hazard function… • Definition of hazard function: P(y Y y y | Y y) h(y) lim y 0 y • notes – the hazard function is conditional on the individual having already survived to time y – the numerator is a non-decreasing function of y (it is more likely that Y will occur in a longer interval) so we divide by the length of the interval to compensate – we take the limit as the length of the interval gets smaller to get the risk at exactly y - “instantaneous risk” – we can show (see p.9) that the hazard function is equal to f (y) h(y) S(y) – use f(y)=-d/dy(S(y)) and the above to show that S(y) exp( h(u)du) u 0 – so all three of f, h, and S are representations that can be found from the others and are used in various situations… • more notes on the hazard function: – hazard is in the form of a rate - hazard is not a probability because it can be >1, but the hazard must be > 0; so the graph of h(y) does not have to look at all like that of a survivor function – in order to understand the hazard function, it must be estimated. – think of the hazard h(y) as the instantaneous risk the event will occur per unit time, given that the event has not occurred up to time y. – note that for given y, a larger S(y) corresponds to a smaller h(y) and vice versa… • life expectancy at age t: – if Y=survival time and we know that Y>t, then Y-t=residual lifetime at age t and the mean residual lifetime at age t is the conditional expectation E(Y-t|Y>t) = r(t) S(y)dy – it can be shown that r(t) y t S(t) – note that when t=0, r(0)=MTTF – we define the mean life expectancy at age t as E(Y|Y>t) = t +r(t) – go over Example 1.6 on page 11…