Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Waiting Time Models (Duration Data, Longitudinal Data, Failure Time Data) Mostly waiting time analysis is applied in areas such as search unemployment, job turnover, mortality, labor supply, and martial instability for which the data for durations of occupancy of states are available from the longitudinal surveys. Problems: (i) Waiting time is always positive. But this is not peculiar to waiting time model. (ii) The complication of this model is that because of time involved and we sometimes cannot observe the dependent variable. Eg. Unemployment period (1),(4),(5): We can observe the complete unemployment period.=> Complete Spell (2),(3): We cannot observer the complete unemployment (1) ? (2) ? (3) (4) period. =>Incomplete Spell. =>Censored data. (5) Ending Survey Starting Survey => This necessity of obtaining methods of analysis that accommodate censoring is probably the most important reason for developing specialized models and procedures for failure time data. 1. Statistical Preliminaries Failure Time Distributions T: nonnegative r.v. representing the failure time of an individual from a homogeneous population. 1 The following three ways to specify the probability distribution of T are particularly useful in survival applications: (i) Survivor function: The probability that T is at least as great as a value t. F(t)= Pr(T≥ t) 0< t < ∞ or S(t) F(t) is a monotone non-increasing left continuous function with F(0)=1, where F(0) is the instantaneous failure rate being zero for negative durations. (ii) Probability density function(p.d.f): f (t ) lim t 0 Pr(t T t t ) dF (t ) t dt F (t ) f (s)ds and f(t)≥ 0 with t (iii) 0 f (t )dt =1 Hazard function: The instantaneous rate of failure at T = t conditional upon survival to time t. (t ) lim t 0 => (t ) Pr(t T t t | T t ) t f (t ) F (t ) d log F (t ) dt By integrating and F(0)=1 t F (t ) exp (u )du 0 => where λ(t) is a nonnegative function. t f (t ) (t ) exp (u )du 0 Note: The hazard rate, the survivor function, and the probability density function are dual to one another. Either may be used to represent the distribution of the random duration time. f (t)=λ(t)F(t) => There is no requirement that lim (t )dt or F(∞)=0. t t 0 => If F(∞)=0, the duration distribution is termed non-defective. Otherwise, it is termed defective. => There is nothing wrong with defective distributions. In fact they emerge naturally from many optimizing models. For example, Jovanovic(1979) derives an infinite horizon worker-firm matching model with a defective job tenure distribution, i.e. F(∞)>0, because some proportion of workers find that their current match is so successful that they never wish to leave their jobs. 2 Note: Duration dependence is said to exist if d (t ) 0 dt If, d (t ) 0 at t = t0, there is said to be positive duration dependence at t0. dt If, d (t ) 0 at t= t0, there is said to be negative duration dependence at t0. dt => In job search models of unemployment, positive duration dependence arises in the case of a “declining reservation wage”. In this case the exit rate from unemployment is monotonically increasing in t. In job turnover model negative duration dependence (at least asymptotically) is associated with worker-firm matching models. 2. Estimation of the Survivor Function (1) Some Commonly Encountered Densities for Duration (Parametric Failure Time Models) (i) The Exponential Distribution This is the simplest model, with the hazard function being a constant and duration being a continuous random variable. => h(t) or λ(t)= λ >0 0 t p.d.f f(t)= λ∙exp[-λt] Survivor fctn. S(t)=exp[-λt] Note h(t)= λ: The instantaneous failure rate is independent of t so that the conditional chance of failure in a time interval of specified length is the same regardless of how long the individual has been on trial. => Memoryless property of the exponential distribution 3 1 λ(t) λ F(t) 1/λ f(t) 2/λ 3/λ The exponential distribution is closely related to Poisson process. The Poisson assumptions are: a. The number of events in non-overlapping intervals of time is independent random variables. b. The probability of exactly one event in a sufficiently short interval of time t is exactly λt. c. The probability of two or more events in a sufficiently short interval of time is approximately zero. Dividing an interval of time of length 1 into n equal parts, then from (b), the probability of exactly one event happening in one of our n intervals is exactly λ/n. Let x=# of events over the entire interval of time of length 1 n Pr( x k ) 1 k n n k nk k e k! for large n Now if our interval has length t n t t Pr( x k ) 1 n k n k n k t k e t k! => The probability that no failure occurs by the end of all n subintervals of the interval [0, t] is Pr( x 0) e t => This is the survivor function for the exponential model. => Exponential model is a dual of Poisson process. Note: An empirical check of the appropriateness of the exponential model for a set of 4 survivor data is provided by plotting the log of a survivor function estimate versus t. Such a plot should approximate a straight line through the origin. Let Y log T , then p.d.f. of Y is f ( y ) exp( y e y ) where -log Let Y W , the p.d.f. of W is - y f ( w) exp( w e w ) - w => This is an extreme value (minimum) distribution. This distribution is the limiting distribution of a standardized form of the minimum of a sample selected from a continuous distribution with support on (-∞, a) a ≤ ∞. => The exponential distribution arises as the limiting form of the distribution of a minimum of samples from some densities with support on (0, ∞) => This can be taken as theoretical justification for its use in survival studies in which a complex mechanism fails when any one of its many components fails. (ii) The Weibull Distribution The hazard rate becomes a function of time i.e. (t ) p(t ) p1 , p 0 Note: If p=1, then λ(t)= λ so that exponential is a special case of Weibull. The hazard is monotone decreasing for p<1 and increasing for p>1 λ(t) 2λ P=1.5 λ P=1 P=0.5 1/λ 2/λ The p.d.f. of Weibull is 5 3/λ t t f (t ) p(t ) p1 exp p(t ) p1 dt 0 p(t ) p1 exp (t ) p1 The survivor function is S (t ) exp t p log log S (t ) p(log t log ) => An empirical check for the Weibull distribution is provided by a plot of log log Sˆ (t ) p(log t log ) versus log t where Sˆ (t ) is a sample (Kaplan-Meire) estimate of the survivor function. The plot should give approximately a straight line, the slope of which provides a rough estimate of p and the log t intercept an estimate of -logλ Log[-log S(t)] Slope ≈ P -log λ log t Let us consider the Weibull model And suppose we transform our measure of time to be t=(λ*w)p/λ so that w = (λt)1/p/λ* dw (t ) (1/ p )1 => dt p * so f (t ) * p (t )1/ p exp( t ) (t ) p 1 (1 p ) / p / p * =λ exp(-λt) => The Weibull model is equivalent to a transformation of the measurement of elapsed time such that the “forgetfulness” property of the exponential model is present 6 in transformed time units. (iii) The Log-Normal Distribution Let Y=logT and Y=α+σW where W is a standard normal variate with p.d.f.(i.e. Y~N(α,σ2)) ( w) 1 w2 / 2 e 2 => p.d.f. of T is f (t ) 2 1 / 2 p2 ( l o g t ) 2 pt 1 e x p 2 where α=-logλ and σ =p-1 The survivor function is S(t)=1-Φ(plogλt) The hazard function is f(t)/S(t). => The hazard function has value 0 at t = 0, increases to a maximum and then decreases, approaching zero as t becomes large. λ(t) P=1 0.8 0.6 P=2 0.4 P=3 0.2 1/λ 2/λ 3/λ 4/λ t Note: The log-normal model is particularly simple to apply if there is no censoring, but with censoring the computations quickly become formidable. (iv) Log-Logistic Distribution The log-logistic distribution provides a good approximation to the log-normal 7 distribution and may frequently be a preferable survival time model. Y=logT and Y=α+σW where W has the logistic p.d.f. i.e. f ( w) ew 1 e w 2 Note: This model has the advantage (like the Weibull and exponential models) of having simple algebraic expression for the survivor and hazard functions. It is therefore more convenient in handling censored data than the log-normal distribution while providing a good approximation to it except in the extreme tails. Note: The hazard function is identical to the Weibull hazard aside from the denominator factor (1+(λt)p). => if p<1, then it is monotone decreasing from ∞. => if p=1, then it is monotone decreasing form λ. => If p>1, the hazard resembles the log-normal hazard in that it increases from zero to a maximum at t=(p-1)1/p/λ and decrease toward zero thereafter. P<1 P=2 P=1 λ t 1/λ (v) The Gamma Distribution This is also a two-parameter generalization of the exponential model. p.d.f.: f (t ) (t ) k 1 e t ( k ) k,λ >0 8 x where (k ) x k 1e dx 0 Let Y=logT and Y=α+W where W has the p.d.f. exp( kw e w ) ( k ) Note: When k=1, the Gamma distribution reduce to the exponential model, and W has the extreme value distribution. f ( w) The moment generating function of W is k M ( ) k Note: W with suitable standardization, has a limiting normal distribution as k → ∞, i.e. W * k (W log k ) => 2 lim M W * exp k 2 where Mw*(Ө) is the moment generating function of W*. =>As k→∞, the distribution of W* converges to that of a standard normal variate. Defn. Incomplete Gamma Integral Ik (s) s 0 x k 1e x dx ( k ) => Survivor function Hazard function t F(t)= 1-Ik(λt) t k 1 exp t k 1 1 I k t Note: The hazard function is monotone increasing from 0 if k>1, monotone decreasing from ∞ if k<1, and in either case approaches λ as t becomes large. 9 λ(t) λ k<1 k=1 k>1 t 10 (2) Regression Models (i) Exponential and Weibull Regression Model A: Exponential Regression Model The exponential distribution can be generalized to obtain a regression model by allowing the failure rate to be a function of the covariate z. => The hazard at time t for an individual with covariate z is λ(t;z)= λ(z) i.e. The hazard for a given z is a constant characterizing an exponential failure time distribution, but the failure rate depends on z. Usually: λ(t;z)= λc(zβ) Where β is a vector of regression parameters, λ is a constant, and c is a specified functional form. Three commonly used functional form for c: a. c(x)=1+x: λ(t;z)= λ(1+zβ)= λ+ λzβ => The failure rate is a linear function of z. b. c(x)=(1+x)-1 => The mean survival time is a linear function of z. c. c(x)=exp(x) Note: Specification a and b both suffer from the disadvantage that the set of β values considered must be restricted to guarantee c(zβ)>0 for all possible z. Specification c is the most natural of the above form since it takes only positive values. => Consider the model with hazard function λ(t;z)= λezβ Then, the conditional density function of T given z is f(t;z)= λezβexp(-λtezβ) In terms of the log survival time, Y=logT, the model can be written as Y=α- zβ+W Where α=-logλ and W has the extreme value distribution. It is a linear model for Y with the error variable W having a specified 11 distribution. B. Weibull Regression Model If the conditional hazard is (t ; z ) p(t ) p1 e z exp t p e z The effect of the covariates is also to act multiplicatively on the Weibull hazard. Alternatively, in terms of Y=logT, the model is the linear model Y= α+ zβ*+σW Where α=-logλ, σ=p-1 and β*=-σβ => The forms of the exponential and Weibull regression models suggest two distinct generalizations—proportional hazards model and the accelerated failure time model. => Proportional hazard model: The effect of the covariates act multiplicatively on the hazard function. The accelerated failure time model: The covariates act additively on Y. 12