Download Understanding and Interpreting Results from Logistic, Multinomial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Understanding and Interpreting Results from
Logistic, Multinomial, and Ordered Logistic
Regression Models: Using Post-Estimation
Commands in Stata
Raymond Sin-Kwok Wong
University of California-Santa Barbara
Model Estimation and Interpretation
• For OLS models, both model estimation and
interpretation are relatively easily, since the effects
are linear.
• For non-linear models, model estimation is simple
but the interpretation of results can be tricky,
especially for beginners who are not familiar with
the non-linear relationship between dependent and
independent variables.
What my talk is about?
– Not about the rationale for statistical modeling
or the mathematical and statistical derivation of
specific non-linear models
– But about a set of post-estimation tools that
would aid understanding and interpretation and
the presentation of complex relationship among
variables using graphical display
Alternative Output Methods
•
(a) Display odds-ratios rather than logit coefficients
logit y x1 x2 x3 x4 x5
logit, or
Alternative Output Methods
•
(b) Use LISTCOEF
listcoef [varlist] [,pvalue(#) [factor|percent|std] constant help]
Factor:
factor changes in the odds or expected counts
Percent: % change in the odds or expected counts
Std:
Standardized coefficients
==============================================================
Option
std
factor
percent
----------------------------------------------------------------------------------------------------------Type 1: regress, probit, cloglog,
Default
No
No
oprobit, tobit, cnreg, intreg
Type 2: logit, logistic, ologit
Yes
Default
Yes
Type 3: clogit, mlogit, poisson,
No
Default
Yes
nbreg, zip, zinb
==============================================================
Alternative Output Methods
• Different standardized coefficients
– x-standardized (bStdX)
• For a standard deviation increase in xk, y is expected to change
by βkSx units, holding everything constant
– y-standardized (bStdY)
• For a unit increase in xk, y is expected to change by βkSy
standard deviations, holding everything constant
– Fully standardized (bStdXY)
• For a standard deviation increase in xk, y is expected to change
by βkS units, holding everything constant
Post-Estimation Tests
regress y x1 x2 … xk
estimates store mod1
What is the use? For post-estimation analysis
Two kind of tests are common:
(a) Wald test, and
(b) LR tests
Wald Test
test varlist, [accumulate]
For example,
(a) regress y x1 x2 x3 x4 x5
test x1 x2 x3 x4 x5
This tests for the H0: β1 = β2 = β3 = β4 = β5 = 0
(b) test x1=2x2
test x3=x4, accumulate
This tests for the H0: β1 = 2β2 and β3 = β4
LR (Likelihood-Ratio) Tests
lrtest [, saving(name) using(name) model(name) df(#) ]
For example,
(a) logit chd age age2 sex
(b) lrtest, saving(0)
(c) logit chd age sex
(d) lrtest
(e) lrtest, saving(1)
(f) logit chd sex
(g) lrtest
(h) lrtest, using(1)
(i) lrtest, model(1)
estimate saturated model
save results
estimate simpler model
obtain test
save results as 1
estimate simplest model
compare to saturated model
compare to model 1
repeat earlier test
•
. logit died studytime age drug
•
•
•
•
Logit estimates
•
•
•
•
•
•
•
•
-----------------------------------------------------------------------------died |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------studytime | -.0236468
.0457671
-0.52
0.605
-.1133487
.0660551
age |
.0793438
.0699391
1.13
0.257
-.0577343
.2164219
drug | -1.150009
.5549529
-2.07
0.038
-2.237697
-.0623212
_cons | -1.113136
3.945369
-0.28
0.778
-8.845918
6.619645
------------------------------------------------------------------------------
•
. lrtest, saving(0)
Log likelihood = -24.364293
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
48
13.67
0.0034
0.2191
•
•
•
•
•
•
. logit died studytime age
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood
•
•
•
•
Logit estimates
•
•
•
•
•
•
•
-----------------------------------------------------------------------------died |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------studytime | -.0843475
.0353784
-2.38
0.017
-.153688
-.015007
age |
.0518897
.0646409
0.80
0.422
-.0748042
.1785836
_cons |
-.87332
3.729449
-0.23
0.815
-8.182906
6.436266
------------------------------------------------------------------------------
•
•
•
. lrtest
Logit: likelihood-ratio test
•
. lrtest, saving(1)
Log likelihood = -26.734061
=
=
=
=
=
-31.199418
-26.82757
-26.734502
-26.734061
-26.734061
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
=
=
=
=
chi2(1)
=
Prob > chi2 =
48
8.93
0.0115
0.1431
4.74
0.0295
•
. logit died age
•
•
•
•
Iteration
Iteration
Iteration
Iteration
•
•
•
•
Logit estimates
•
•
•
•
•
•
-----------------------------------------------------------------------------died |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age |
.0893535
.0585925
1.52
0.127
-.0254857
.2041928
_cons | -4.353928
3.238757
-1.34
0.179
-10.70177
1.993919
------------------------------------------------------------------------------
•
•
•
. lrtest
Logit: likelihood-ratio test
•
•
•
. lrtest, using(1)
Logit: likelihood-ratio test
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
Log likelihood = -29.945379
=
=
=
=
-31.199418
-29.955649
-29.945382
-29.945379
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
48
2.51
0.1133
0.0402
chi2(2)
=
Prob > chi2 =
11.16
0.0038
chi2(1)
=
Prob > chi2 =
6.42
0.0113
Fit Statistics
• fitstat calculates a large number of fit statistics for many kinds of
regression models. It works after the following: clogit, cnreg, cloglog,
intreg, logistic, logit, mlogit, nbreg, ocratio, ologit, oprobit, poisson,
probit, regress, zinb, and zip. With the saving() and using() options, it
can also be used to compare fit measures for two different models.
fitstat [, saving(name) using(name) bic force save dif]
Examples:
(a)
logit y x1 x2 … x10
Fitstat
Fit statistics
(b) To compute, save, and compare with other models
Logit y x1 x2 x3 x4 x5 age
Quietly fitstat, saving(mod1)
Generate age2=age*age
Logit y x1 x2 x3 x4 x5 age age2
Fitstat, using(mod1)
Post-Estimation Approach to Interpret NonLinear Regression Models
• For non-linear regression models, the interpretation of
individual coefficients do not have the simple linear
relationship. For example, the beta coefficient in a logistic
regression model can only be interpreted as the logit
coefficient. If we want to interpret the model in terms of
predicted probability, the effect of a change in a variable
depends on the values of all variables in the model. Or to
put it differently, it depends on where we evaluate the
effect.
Post-Estimation Approach to Interpret NonLinear Regression Models
• (1) Use predict command
regress y x1 x2 x3
predict
logit y x1 x2 x3
predict
ologit y x1 x2 x3
predict
mlogit y x1 x2 x3
predict
poission y x1 x2 x3
generate predicted-y
generate predicted P(Y=1)
generate predicted P(Y=k)
generate predicted P(Y=k)
generate predicted count
Post-Estimation Approach to Interpret NonLinear Regression Models
•
(2) Use prchange command to compute discrete and marginal
changes in the predicted outcomes
prchange [varlist] [if exp] [in range]
[,x(variables_and_values) rest(stat) outcome(#)
fromto brief nobase nolabel help all uncentered
delta(#) ]
Examples:
(a) prchange age, x(x1=20 x2=10) rest(mean) help
(b) prchange, help
(c) prchange x1 x2, fromto
This will calculate x=min to max, 0 to 1, -.5 to .5, -.5 sd to .5 sd, and
marginal effect
Post-Estimation Approach to Interpret NonLinear Regression Models
•
(3) Use prvalue command to calculate the change in probability
for a discrete change for any magnitudes in an independent variable.
prvalue[if exp] [in range] [,(variables_and_values)][rest(stat)]
[level(#)][save][dif][brief][all][maxcnt#)][nobase][nolabel]
[ystar]
Examples:
(a) prvalue, rest(median)
(b) prvalue, x(age=30) save brief
prvalue, x(age=40) dif brief
(c) prvalue age, x(age=30) uncentered delta(10)
rest(mean) brief
This will generate a change in probability (P(Y=1)) from age 30 to
age 40.
Post-Estimation Approach to Interpret NonLinear Regression Models
•
(4) Use prgen command to compute predicted values as one variable
changes over a range of values, which is useful for constructing plots.
The syntax is:
prgen varname, generate(newvar)[from(#) to(#) ncases(#)]
[x(variables_and_values)][rest(stat)][maxcnt (#)]
[brief][nobase][all]
Examples
To compute predicted values from an ordered probit where warm has four
categories SD, D, A and SA:
.
.
.
.
oprobit warm yr89 male white age ed prst
prgen age, f(20) t(80) gen(mn)
prgen age, x(male=0) rest(grmean) f(20) t(80) gen(fem)
prgen age, x(male=1) rest(grmean) f(20) t(80) gen(mal)
To plot the predicted probabilites for average males:
. graph malp1 malp2 malp3 malp4 malX
Post-Estimation Approach to Interpret NonLinear Regression Models
Models and Predictions - * is the prefix all models:
*X: value of X
logit & probit:
Predicted probability of each outcome: *p0, *p1
ologit, oprobit
Predicted probabilities: *p#1,*p#2,... where #1,#2,... are values of the outcome
variable.
Cumulative probabilities: *s#1,*s#2,... where #1,#2,... are of the outcome
variable. *s#k is the probability of all categories up to or equal to #k.
mlogit:
Predicted probabilities: *p#1,*p#2,... where #1,#2,... are values of the outcome
variable.
Post-Estimation Approach to Interpret NonLinear Regression Models
•
(5) Use prtab command to construct a table of predicted values for all
combinations of up to three variables. The syntax is:
prtab rowvar [colvar [supercolvar]] [if exp] [in range],
[by(superrowvar)][x(variables_and_values)][rest(stat)]
[outcome(string)][brief][nobase][nolabel][novarlbl][all]
Examples:
(a) probit faculty female fellow phd mcit3 mnas
prtab female fellow mnas
(b) ologit jobclass female fellow pub1 phd
prtab female fellow, x(phd=min)
(c) logit died female race age educ
prtab female race educ
Post-Estimation Approach to Interpret NonLinear Regression Models
•
(6) Use mfx compute command to compute numerically calculates
the marginal effects or the elasticities and their standard errors after
estimation. Exactly what mfx can calculate is determined by the
previous estimation command and the predict() option. At which
points the marginal effects or elasticities are to be evaluated is
determined by the at() option. By default, mfx calculates the marginal
effects or elasticities at the means of the independent variables by the
default prediction option associated with the preceding estimation
command.
Post-Estimation Approach to Interpret NonLinear Regression Models
Examples
(a)
logit foreign mpg price
mfx compute
mfx, at(mpg = 20, price = 6000)
mfx compute, predict(xb)
mfx replay, level(90)
(b)
mlogit rep78 mpg displ, nolog
mfx compute, predict(outcome(1))
(c)
regress mpg length weight
mfx compute, eyex