Download Logit, Probit and Tobit: Models for Categorical and Limited

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Time series wikipedia , lookup

Data assimilation wikipedia , lookup

Least squares wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Logit, Probit and Tobit:
Models for Categorical and Limited
Dependent Variables
By Rajulton Fernando
Presented at
PLCS/RDC Statistics and Data Series at Western
March 23,
23 2011
Introduction
• In social science research
research, categorical data are often
collected through surveys.
– Categorical
g
Î Nominal and Ordinal variables
– They take only a few values that do NOT have a metric.
• A)) Binary
y Case
• Many dependent variables of interest take only two
values (a dichotomous variable), denoting an event or
non-event and coded as 1 and 0 respectively. Some
examples:
– The labor force status of a person.
– Voting behavior of a person (in favor of a new policy).
– Whether a person got married or divorced.
– Whether a person involved in criminal behaviour, etc.
Introduction
• With such variables
variables, we can build models that
describe the response probabilities, say P(yi = 1), of
the dependent
p
variable yi.
– For a sample of N independently and identically distributed
observations i = 1, ... ,N and a (K+1)-dimensional vector x′i
off explanatory
l
t
variables,
i bl the
th probability
b bilit that
th t y takes
t k value
l
1 is modeled as
P ( yi = 1| xi ) = F ( xi′ β ) = F ( zi )
where β is a (K + 1)-dimensional column vector of
parameters.
• The transformation function F is crucial. It maps the
linear combination into [0,1] and satisfies in general
F(−∞) = 0,
0 F(+∞) = 1,
1 and δF(z)/δz > 0 [that is
is, it is a
cumulative distribution function].
The Logit and Probit Models
• When the transformation function F is the logistic
function, the response probabilities are given by
P ( y i = 1 | xi ) =
e xi β
′
1 + e xi β
′
• And, when the transformation function F is the
cumulative density function (cdf) of the standard
normal distribution, the response probabilities are
1
x ′β
x ′β
s
−
given by
1
2
P ( y i = 1 | x i ) = Φ ( x i′ β ) =
i
∫ Φ ( s ) ds
−∞
2
i
=
∫
−∞
2π
e
ds
• The Logit and Probit models are almost identical (see
the Figure next slide) and the choice of the model is
arbitrary,
bi
although
lh
h llogit
i model
d l has
h certain
i
advantages (simplicity and ease of interpretation)
Source: J.S. Long, 1997
The Logit and Probit Models
• However
However, the parameters of the two models are
scaled differently. The parameter estimates in a
logistic
g
regression
g
tend to be 1.6 to 1.8 times higher
g
than they are in a corresponding probit model.
probit and logit
g models are estimated byy
• The p
maximum likelihood (ML), assuming independence
across observations. The ML estimator of β is
consistent
i
and
d asymptotically
i ll normally
ll distributed.
di ib d
However, the estimation rests on the strong
assumption that the latent error term is normally
distributed and homoscedastic. If homoscedasticity is
violated,, no easyy solution.
The Logit and Probit Models
• Note: The response function (logistic or probit) is an
S-shaped function, which implies a fixed change in X
has a smaller impact
p
on the p
probability
y when it is
near zero than when it is near the middle. Thus, it is a
non-linear response function.
• How to interpret the coefficients : In both models,
If b > 0 Î p increases as X increases
If b < 0 Î p decreases as X increases
– As mentioned above, b cannot be interpreted as a simple
slope as in ordinary regression. Because the rate at which
the curve ascends or descends changes according to the
value of X.
– In other words,, it is not a constant change
g as in ordinaryy
regression. Î The greatest rate of change is at p = 0.5
The Logit and Probit Models
– In the logit model
model, we can interpret b as an effect
on the odds. That is, every unit increase in X
results in a multiplicative
p
effect of eb on the odds.
Example: If b = 0.25, then e.25 = 1.28. Thus, when X
changes by one unit, p increases by a factor of 1.28, or
changes by 28%.
- In the probit model, use the Z-score terminology.
F every unit
For
it increase
i
in
i X,
X the
th Z-score
Z
( the
(or
th
Probit of “success”) increases by b units. [Or, we
can also say that an increase in X changes Z by b
standard deviation units.]
- If yyou like,, yyou can convert the z-score to p
probabilities
using the normal table.
Models for Polytomous Data
• B) Polytomous Case
– Here we need to distinguish between purely
nominal variables and really ordinal variables.
– When the variable is purely nominal, we can
extend the dichotomous logit
g model,, usingg one of
the categories as reference and modeling the other
responses j=1,2,..m-1 compared to the reference.
• Example: In the case of 3 categories, using the 3rd category
as the reference, logit p1 = ln(p1/p3) and logit p2 = ln(p2/p3),
which will g
give two sets of parameter
p
estimates.
P ( y = 1) =
exp( β 1 x )
1 + exp( β 1 x ) + exp( β 2 x )
P ( y = 2) =
exp( β 2 x )
1 + exp( β 1 x ) + exp( β 2 x )
P ( y = 3) =
1
1 + exp( β 1 x ) + exp( β 2 x )
Polytomous Case
– When the variable is really ordinal,
ordinal we use cumulative
logits (or probits). The logits in this model are for
cumulative categories at each point, contrasting
categories above with categories below.
– Example: Suppose Y has 4 categories; then,
• logit (p1) = ln{p1 / (1-p
(1 p1)}
= a1 + bX
• logit (p1 + p2) = ln{(p1+ p2 )/(1-p1 – p2)}
= a2 + bX
• logit (p1+p2+p3) = ln{(p1+ p2 + p3 )/(1-p1–p2–p3)} = a3 + bX
– Since these are cumulative logits, the probabilities are
attached to being in category j and lower.
– Since the right side changes only in the intercepts,
and not in the slope coefficient, this model is known as
Proportional odds model.
model Thus,
Thus in ordered logistic,
logistic we
need to test the assumption of proportionality as well.
Ordinal Logistic
– a1, a2, a3 … are the “intercepts”
intercepts that satisfy the property
a1 < a2 < a3… interpreted as “thresholds” of the latent
variable.
– Interpretation of parameter estimates depends on the
software used! Check the software manual.
• If the RHS = a + bX,
bX a positive
positi e coefficient is associated
more with lower order categories and a negative
coefficient is associated more with higher order
categories.
• If the RHS = a – bX, a negative coefficient is more
associated with lower ordered categories
categories, and a positive
coefficient is more associated with higher ordered
categories.
Model for Limited Dependent Variable
• C) Tobit Model
• This model is for metric dependent variable and
when it is “limited”
limited in the sense we observe it only if
it is above or below some cut off level. For example,
– the wages
g mayy be limited from below by
y the minimum
wage
– The donation amount give to charity
– “Top coding” income at, say, at $300,000
– Time use and leisure activity of individuals
– Extramarital affairs
• It is also called censored regression model. Censoring
can be from below or from above, also called left and
right censoring. [Do not confuse the term “censoring”
with the one used in dynamic modeling.]
The Tobit Model
• The model is called Tobit because it was first proposed
by Tobin (1958), and involves aspects of Probit analysis –
a term coined by Goldberger for Tobin’s Probit.
• Reasoning behind:
– If we include the censored observations as y = 0, the
censored
d observations
b
i
on the
h lleft
f will
ill pull
ll down
d
the
h end
d off
the line, resulting in underestimates of the intercept and
p
overestimates of the slope.
– If we exclude the censored observations and just use the
observations for which y>0 (that is, truncating the sample),
it will overestimate the intercept and underestimate the
slope.
– The degree
g
of bias in both will increase as the number of
observations that take on the value of zero increases. (see
Figure next slide)
Source: J.S. Long
The Tobit Model
• The Tobit model uses all of the information,
information
including info on censoring and provides consistent
estimates.
• It is also a nonlinear model and similar to the probit
model. It is estimated usingg maximum likelihood
estimation techniques. The likelihood function for
the tobit model takes the form:
• This is an unusual function, it consists of two terms,
the first for non-censored observations (it is the pdf),
and
d th
the second
d ffor censored
d observations
b
ti
(it iis th
the cdf).
df)
The Tobit Model
• The estimated tobit coefficients are the marginal
effects of a change in xj on y*, the unobservable latent
variable and can be interpreted
p
in the same way
y as in a
linear regression model.
• But such an interpretation may not be useful since we
are interested in the effect of X on the observable y (or
change in the censored outcome).
– It can b
be shown
h
th
thatt change
h
iin y is
i found
f
d by
b multiplying
lti l i
the coefficient with Pr(a<y*<b), that is, the probability of
being uncensored. Since this probability is a fraction, the
marginal effect is actually attenuated.
– In the above, a and b denote lower and upper censoring
points For example,
points.
example in left censoring,
censoring the limits will be:
a =0, b=+∞.
Illustrations for logit, probit and tobit models, using womenwk.dta from Baum available at
http://www.stata-press.com/data/imeus/womenwk.dta
Descriptive Statistics
N
Minimum
Maximum
Mean
Std. Deviation
age
2000
20
59
36.21
8.287
education
2000
10
20
13.08
3.046
married
2000
0
1
.67
.470
children
2000
0
5
1.64
1.399
wagefull
2000
-1.68
45.81
21.3118
7.01204
wage
1343
5.88
45.81
23.6922
6.30537
lw
1343
1.77
3.82
3.1267
.28651
work
2000
0
1
.67
.470
lwf
2000
.00
3.82
2.0996
1.48752
Valid N (listwise)
1343
Binary Logistic Regression
Model Summary
Step
Cox & Snell R
Nagelkerke R
Square
Square
-2 Log likelihood
2055.829a
1
.212
.295
a. Estimation terminated at iteration number 5 because
parameter estimates changed by less than .001.
Hosmer and Lemeshow Test
Step
Chi-square
1
df
6.491
Sig.
8
.592
Variables in the Equation
B
a
Step 1
S.E.
Wald
df
Sig.
Exp(B)
age
.058
.007
64.359
1
.000
1.060
education
.098
.019
27.747
1
.000
1.103
married
.742
.126
34.401
1
.000
2.100
children
.764
.052
220.110
1
.000
2.148
-4.159
.332
156.909
1
.000
.016
Constant
a. Variable(s) entered on step 1: age, education, married, children.
Binary Probit Regression (in SPSS, use the ordinal regression menu and select probit
link function. Ignore the test of parallel lines, etc.)
Model Fitting Information
Model
-2 Log
Likelihood
Intercept Only
1645.024
Final
1166.702
Chi-Square
df
478.322
Sig.
4
.000
Link function: Probit.
Parameter Estimates
95% Confidence Interval
Estimate
Threshold
[work = 0]
Location
Std. Error
Wald
df
Sig.
Lower Bound
Upper Bound
2.037
.209
94.664
1
.000
1.626
2.447
age
.035
.004
67.301
1
.000
.026
.043
education
.058
.011
28.061
1
.000
.037
.080
children
.447
.029
243.907
1
.000
.391
.503
[married=0]
-.431
.074
33.618
1
.000
-.577
-.285
[married=1]
0a
.
.
0
.
.
.
Link function: Probit.
a. This parameter is set to zero because it is redundant.
Tobit regression cannot be done in SPSS. Use Stata. Here are the Stata commands.
First, fit simple OLS Regression of the variable lwf (just to check)
. regress lwf age married children education
Source |
SS
df
MS
-------------+-----------------------------Model | 937.873188
4 234.468297
Residual | 3485.34135 1995 1.74703827
-------------+-----------------------------Total | 4423.21454 1999 2.21271363
Number of obs
F( 4, 1995)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
2000
134.21
0.0000
0.2120
0.2105
1.3218
-----------------------------------------------------------------------------lwf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age |
.0363624
.003862
9.42
0.000
.0287885
.0439362
married |
.3188214
.0690834
4.62
0.000
.1833381
.4543046
children |
.3305009
.0213143
15.51
0.000
.2887004
.3723015
education |
.0843345
.0102295
8.24
0.000
.0642729
.1043961
_cons | -1.077738
.1703218
-6.33
0.000
-1.411765
-.7437105
------------------------------------------------------------------------------
. tobit lwf age married children education, ll(0)
Tobit regression
Log likelihood = -3349.9685
Number of obs
LR chi2(4)
Prob > chi2
Pseudo R2
=
=
=
=
2000
461.85
0.0000
0.0645
-----------------------------------------------------------------------------lwf |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age |
.052157
.0057457
9.08
0.000
.0408888
.0634252
married |
.4841801
.1035188
4.68
0.000
.2811639
.6871964
children |
.4860021
.0317054
15.33
0.000
.4238229
.5481812
education |
.1149492
.0150913
7.62
0.000
.0853529
.1445454
_cons | -2.807696
.2632565
-10.67
0.000
-3.323982
-2.291409
-------------+---------------------------------------------------------------/sigma |
1.872811
.040014
1.794337
1.951285
-----------------------------------------------------------------------------Obs. summary:
657 left-censored observations at lwf<=0
1343
uncensored observations
0 right-censored observations
. mfx compute, predict(pr(0,.))
Marginal effects after tobit
y = Pr(lwf>0) (predict, pr(0,.))
= .81920975
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------age |
.0073278
.00083
8.84
0.000
.005703 .008952
36.208
married*|
.0706994
.01576
4.48
0.000
.039803 .101596
.6705
children |
.0682813
.00479
14.26
0.000
.058899 .077663
1.6445
educat~n |
.0161499
.00216
7.48
0.000
.011918 .020382
13.084
-----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1
. mfx compute, predict(e(0,.))
Marginal effects after tobit
y = E(lwf|lwf>0) (predict, e(0,.))
= 2.3102021
-----------------------------------------------------------------------------variable |
dy/dx
Std. Err.
z
P>|z| [
95% C.I.
]
X
---------+-------------------------------------------------------------------age |
.0314922
.00347
9.08
0.000
.024695
.03829
36.208
married*|
.2861047
.05982
4.78
0.000
.168855 .403354
.6705
children |
.2934463
.01908
15.38
0.000
.256041 .330852
1.6445
educat~n |
.0694059
.00912
7.61
0.000
.051531 .087281
13.084
-----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1