Download Waiting Time Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Least squares wikipedia , lookup

Transcript
Waiting Time Models
(Duration Data, Longitudinal Data, Failure Time Data)
Mostly waiting time analysis is applied in areas such as search unemployment, job
turnover, mortality, labor supply, and martial instability for which the data for
durations of occupancy of states are available from the longitudinal surveys.
Problems:
(i) Waiting time is always positive. But this is not peculiar to waiting time model.
(ii) The complication of this model is that because of time involved and we sometimes
cannot observe the dependent variable.
Eg. Unemployment period
(1),(4),(5): We can observe
the complete unemployment
period.=> Complete Spell
(2),(3): We cannot observer
the complete unemployment
(1)
?
(2)
?
(3)
(4)
period.
=>Incomplete Spell.
=>Censored data.
(5)
Ending Survey
Starting Survey
=> This necessity of obtaining methods of analysis that accommodate censoring is
probably the most important reason for developing specialized models and
procedures for failure time data.
1. Statistical Preliminaries
Failure Time Distributions
T: nonnegative r.v. representing the failure time of an individual from a
homogeneous population.
1
The following three ways to specify the probability distribution of T are particularly
useful in survival applications:
(i)
Survivor function: The probability that T is at least as great as a value t.
F(t)= Pr(T≥ t)
0< t < ∞
or S(t)
F(t) is a monotone non-increasing left continuous function with F(0)=1, where
F(0) is the instantaneous failure rate being zero for negative durations.
(ii)
Probability density function(p.d.f):
f (t )  lim
t 0
Pr(t  T  t  t )
dF (t )

t
dt

F (t )   f (s)ds and f(t)≥ 0 with
t
(iii)


0
f (t )dt =1
Hazard function: The instantaneous rate of failure at T = t conditional upon
survival to time t.
 (t ) 
lim
t 0

=>  (t ) 

Pr(t  T  t  t | T  t )
t
f (t )
F (t )
 d log F (t )
dt
By integrating and F(0)=1
t
F (t )  exp    (u )du 
 0

=>
where λ(t) is a nonnegative function.
t
f (t )   (t ) exp    (u )du 
 0

Note: The hazard rate, the survivor function, and the probability density function are
dual to one another. Either may be used to represent the distribution of the random
duration time.
f (t)=λ(t)F(t)
=> There is no requirement that
lim   (t )dt   or F(∞)=0.
t
t 
0
=> If F(∞)=0, the duration distribution is termed non-defective. Otherwise, it is
termed defective.
=> There is nothing wrong with defective distributions. In fact they emerge naturally
from many optimizing models. For example, Jovanovic(1979) derives an infinite
horizon worker-firm matching model with a defective job tenure distribution, i.e.
F(∞)>0, because some proportion of workers find that their current match is so
successful that they never wish to leave their jobs.
2
Note: Duration dependence is said to exist if
d  (t )
0
dt
If,
d (t )
 0 at t = t0, there is said to be positive duration dependence at t0.
dt
If,
d (t )
 0 at t= t0, there is said to be negative duration dependence at t0.
dt
=> In job search models of unemployment, positive duration dependence arises in the
case of a “declining reservation wage”. In this case the exit rate from
unemployment is monotonically increasing in t. In job turnover model negative
duration dependence (at least asymptotically) is associated with worker-firm
matching models.
2. Estimation of the Survivor Function
(1) Some Commonly Encountered Densities for Duration
(Parametric Failure Time Models)
(i)
The Exponential Distribution
This is the simplest model, with the hazard function being a constant and
duration being a continuous random variable.
=>
h(t) or λ(t)= λ >0
0  t  
p.d.f
f(t)= λ∙exp[-λt]
Survivor fctn.
S(t)=exp[-λt]
Note h(t)= λ: The instantaneous failure rate is independent of t so that the conditional
chance of failure in a time interval of specified length is the same regardless of how
long the individual has been on trial.
=> Memoryless property of the exponential distribution
3
1
λ(t)
λ
F(t)
1/λ
f(t)
2/λ
3/λ
The exponential distribution is closely related to Poisson process. The Poisson
assumptions are:
a. The number of events in non-overlapping intervals of time is independent
random variables.
b. The probability of exactly one event in a sufficiently short interval of time t
is exactly λt.
c. The probability of two or more events in a sufficiently short interval of time
is approximately zero.
Dividing an interval of time of length 1 into n equal parts, then from (b), the
probability of exactly one event happening in one of our n intervals is exactly
λ/n. Let x=# of events over the entire interval of time of length 1
 n      
Pr( x  k )     1  
 k  n   n 
k
nk

 k e 

k!
for large n
Now if our interval has length t
 n  t   t 
Pr( x  k )     1  
n
 k  n  
k
n k

t k e t

k!
=> The probability that no failure occurs by the end of all n subintervals of the
interval [0, t] is
Pr( x  0)  e  t
=> This is the survivor function for the exponential model.
=> Exponential model is a dual of Poisson process.
Note: An empirical check of the appropriateness of the exponential model for a set of
4
survivor data is provided by plotting the log of a survivor function estimate versus t.
Such a plot should approximate a straight line through the origin.
Let Y  log T , then p.d.f. of Y is
f ( y )  exp( y    e y  )
where   -log
Let Y    W , the p.d.f. of W is
-  y  
f ( w)  exp( w  e w )
-  w  
=> This is an extreme value (minimum) distribution.
This distribution is the limiting distribution of a standardized form of the minimum of
a sample selected from a continuous distribution with support on (-∞, a) a ≤ ∞.
=> The exponential distribution arises as the limiting form of the distribution of a
minimum of samples from some densities with support on (0, ∞)
=> This can be taken as theoretical justification for its use in survival studies in which
a complex mechanism fails when any one of its many components fails.
(ii)
The Weibull Distribution
The hazard rate becomes a function of time i.e.
 (t )  p(t ) p1
, p  0
Note: If p=1, then λ(t)= λ so that exponential is a special case of Weibull.
The hazard is monotone decreasing for p<1 and increasing for p>1
λ(t)
2λ
P=1.5
λ
P=1
P=0.5
1/λ
2/λ
The p.d.f. of Weibull is
5
3/λ
t
t
f (t )  p(t ) p1 exp   p(t ) p1 dt 
 0

 p(t ) p1 exp  (t ) p1


The survivor function is


S (t )  exp  t p 
log  log S (t )  p(log t  log  )
=> An empirical check for the Weibull distribution is provided by a plot of


log  log Sˆ (t )  p(log t  log  )
versus log t where
Sˆ (t )
is a sample
(Kaplan-Meire) estimate of the survivor function. The plot should give
approximately a straight line, the slope of which provides a rough estimate of p
and the log t intercept an estimate of -logλ
Log[-log S(t)]
Slope ≈ P
-log λ
log t
Let us consider the Weibull model
And suppose we transform our measure of time to be t=(λ*w)p/λ so that
w = (λt)1/p/λ*
dw  (t ) (1/ p )1
=>

dt
p *

so f (t )   * p (t )1/ p
 exp( t ) (t )
p 1
(1 p ) / p
/ p *
=λ exp(-λt)
=> The Weibull model is equivalent to a transformation of the measurement of
elapsed time such that the “forgetfulness” property of the exponential model is present
6
in transformed time units.
(iii)
The Log-Normal Distribution
Let Y=logT and Y=α+σW where W is a standard normal variate with p.d.f.(i.e.
Y~N(α,σ2))
 ( w) 
1  w2 / 2
e
2
=> p.d.f. of T is
f (t )  2 
1 / 2
  p2 ( l o g
t ) 2 

pt 1 e x p
2


where α=-logλ and σ =p-1
The survivor function is S(t)=1-Φ(plogλt)
The hazard function is f(t)/S(t).
=> The hazard function has value 0 at t = 0, increases to a maximum and then
decreases, approaching zero as t becomes large.
λ(t)
P=1
0.8
0.6
P=2
0.4
P=3
0.2
1/λ
2/λ
3/λ
4/λ
t
Note: The log-normal model is particularly simple to apply if there is no
censoring, but with censoring the computations quickly become formidable.
(iv)
Log-Logistic Distribution
The log-logistic distribution provides a good approximation to the log-normal
7
distribution and may frequently be a preferable survival time model.
Y=logT and Y=α+σW where W has the logistic p.d.f. i.e.
f ( w) 
ew
1  e 
w 2
Note: This model has the advantage (like the Weibull and exponential models) of
having simple algebraic expression for the survivor and hazard functions. It is
therefore more convenient in handling censored data than the log-normal
distribution while providing a good approximation to it except in the extreme
tails.
Note: The hazard function is identical to the Weibull hazard aside from the
denominator factor (1+(λt)p).
=> if p<1, then it is monotone decreasing from ∞.
=> if p=1, then it is monotone decreasing form λ.
=> If p>1, the hazard resembles the log-normal hazard in that it increases from
zero to a maximum at t=(p-1)1/p/λ and decrease toward zero thereafter.
P<1
P=2
P=1
λ
t
1/λ
(v)
The Gamma Distribution
This is also a two-parameter generalization of the exponential model.
p.d.f.: f (t ) 
 (t ) k 1 e t
( k )
k,λ >0
8
x

where (k )   x k 1e dx
0
Let Y=logT and Y=α+W where W has the p.d.f.
exp( kw  e w )
( k )
Note: When k=1, the Gamma distribution reduce to the exponential model, and W has
the extreme value distribution.
f ( w) 
The moment generating function of W is
  k 
M ( ) 
k 
Note: W with suitable standardization, has a limiting normal distribution as k → ∞,
i.e.
W *  k (W  log k )
=>
 2 
lim M W *    exp  
k 
 2 
where Mw*(Ө) is the moment generating function of W*.
=>As k→∞, the distribution of W* converges to that of a standard normal variate.
Defn. Incomplete Gamma Integral
Ik

(s) 
s
0
x k 1e  x dx
( k )
=> Survivor function
Hazard function  t  
F(t)= 1-Ik(λt)
 t k 1 exp t k 1
1  I k t 
Note: The hazard function is monotone increasing from 0 if k>1, monotone
decreasing from ∞ if k<1, and in either case approaches λ as t becomes large.
9
λ(t)
λ
k<1
k=1
k>1
t
10
(2) Regression Models
(i)
Exponential and Weibull Regression Model
A: Exponential Regression Model
The exponential distribution can be generalized to obtain a regression model by
allowing the failure rate to be a function of the covariate z.
=> The hazard at time t for an individual with covariate z is
λ(t;z)= λ(z)
i.e. The hazard for a given z is a constant characterizing an exponential failure time
distribution, but the failure rate depends on z.
Usually: λ(t;z)= λc(zβ)
Where β is a vector of regression parameters, λ is a constant, and c is a specified
functional form.
Three commonly used functional form for c:
a. c(x)=1+x: λ(t;z)= λ(1+zβ)= λ+ λzβ
=> The failure rate is a linear function of z.
b. c(x)=(1+x)-1
=> The mean survival time is a linear function of z.
c. c(x)=exp(x)
Note: Specification a and b both suffer from the disadvantage that the set of β values
considered must be restricted to guarantee c(zβ)>0 for all possible z. Specification c is
the most natural of the above form since it takes only positive values.
=> Consider the model with hazard function
λ(t;z)= λezβ
Then, the conditional density function of T given z is
f(t;z)= λezβexp(-λtezβ)
In terms of the log survival time, Y=logT, the model can be written as
Y=α- zβ+W
Where α=-logλ and W has the extreme value distribution.
 It is a linear model for Y with the error variable W having a specified
11
distribution.
B. Weibull Regression Model
If the conditional hazard is

 (t ; z )  p(t ) p1 e z exp  t  p e z

The effect of the covariates is also to act multiplicatively on the Weibull hazard.
Alternatively, in terms of Y=logT, the model is the linear model
Y= α+ zβ*+σW
Where α=-logλ, σ=p-1 and β*=-σβ
=> The forms of the exponential and Weibull regression models suggest two distinct
generalizations—proportional hazards model and the accelerated failure time model.
=> Proportional hazard model: The effect of the covariates act multiplicatively on the
hazard function.
The accelerated failure time model: The covariates act additively on Y.
12