Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Logit, Probit and Tobit: Models for Categorical and Limited Dependent Variables By Rajulton Fernando Presented at PLCS/RDC Statistics and Data Series at Western March 23, 23 2011 Introduction • In social science research research, categorical data are often collected through surveys. – Categorical g Î Nominal and Ordinal variables – They take only a few values that do NOT have a metric. • A)) Binary y Case • Many dependent variables of interest take only two values (a dichotomous variable), denoting an event or non-event and coded as 1 and 0 respectively. Some examples: – The labor force status of a person. – Voting behavior of a person (in favor of a new policy). – Whether a person got married or divorced. – Whether a person involved in criminal behaviour, etc. Introduction • With such variables variables, we can build models that describe the response probabilities, say P(yi = 1), of the dependent p variable yi. – For a sample of N independently and identically distributed observations i = 1, ... ,N and a (K+1)-dimensional vector x′i off explanatory l t variables, i bl the th probability b bilit that th t y takes t k value l 1 is modeled as P ( yi = 1| xi ) = F ( xi′ β ) = F ( zi ) where β is a (K + 1)-dimensional column vector of parameters. • The transformation function F is crucial. It maps the linear combination into [0,1] and satisfies in general F(−∞) = 0, 0 F(+∞) = 1, 1 and δF(z)/δz > 0 [that is is, it is a cumulative distribution function]. The Logit and Probit Models • When the transformation function F is the logistic function, the response probabilities are given by P ( y i = 1 | xi ) = e xi β ′ 1 + e xi β ′ • And, when the transformation function F is the cumulative density function (cdf) of the standard normal distribution, the response probabilities are 1 x ′β x ′β s − given by 1 2 P ( y i = 1 | x i ) = Φ ( x i′ β ) = i ∫ Φ ( s ) ds −∞ 2 i = ∫ −∞ 2π e ds • The Logit and Probit models are almost identical (see the Figure next slide) and the choice of the model is arbitrary, bi although lh h llogit i model d l has h certain i advantages (simplicity and ease of interpretation) Source: J.S. Long, 1997 The Logit and Probit Models • However However, the parameters of the two models are scaled differently. The parameter estimates in a logistic g regression g tend to be 1.6 to 1.8 times higher g than they are in a corresponding probit model. probit and logit g models are estimated byy • The p maximum likelihood (ML), assuming independence across observations. The ML estimator of β is consistent i and d asymptotically i ll normally ll distributed. di ib d However, the estimation rests on the strong assumption that the latent error term is normally distributed and homoscedastic. If homoscedasticity is violated,, no easyy solution. The Logit and Probit Models • Note: The response function (logistic or probit) is an S-shaped function, which implies a fixed change in X has a smaller impact p on the p probability y when it is near zero than when it is near the middle. Thus, it is a non-linear response function. • How to interpret the coefficients : In both models, If b > 0 Î p increases as X increases If b < 0 Î p decreases as X increases – As mentioned above, b cannot be interpreted as a simple slope as in ordinary regression. Because the rate at which the curve ascends or descends changes according to the value of X. – In other words,, it is not a constant change g as in ordinaryy regression. Î The greatest rate of change is at p = 0.5 The Logit and Probit Models – In the logit model model, we can interpret b as an effect on the odds. That is, every unit increase in X results in a multiplicative p effect of eb on the odds. Example: If b = 0.25, then e.25 = 1.28. Thus, when X changes by one unit, p increases by a factor of 1.28, or changes by 28%. - In the probit model, use the Z-score terminology. F every unit For it increase i in i X, X the th Z-score Z ( the (or th Probit of “success”) increases by b units. [Or, we can also say that an increase in X changes Z by b standard deviation units.] - If yyou like,, yyou can convert the z-score to p probabilities using the normal table. Models for Polytomous Data • B) Polytomous Case – Here we need to distinguish between purely nominal variables and really ordinal variables. – When the variable is purely nominal, we can extend the dichotomous logit g model,, usingg one of the categories as reference and modeling the other responses j=1,2,..m-1 compared to the reference. • Example: In the case of 3 categories, using the 3rd category as the reference, logit p1 = ln(p1/p3) and logit p2 = ln(p2/p3), which will g give two sets of parameter p estimates. P ( y = 1) = exp( β 1 x ) 1 + exp( β 1 x ) + exp( β 2 x ) P ( y = 2) = exp( β 2 x ) 1 + exp( β 1 x ) + exp( β 2 x ) P ( y = 3) = 1 1 + exp( β 1 x ) + exp( β 2 x ) Polytomous Case – When the variable is really ordinal, ordinal we use cumulative logits (or probits). The logits in this model are for cumulative categories at each point, contrasting categories above with categories below. – Example: Suppose Y has 4 categories; then, • logit (p1) = ln{p1 / (1-p (1 p1)} = a1 + bX • logit (p1 + p2) = ln{(p1+ p2 )/(1-p1 – p2)} = a2 + bX • logit (p1+p2+p3) = ln{(p1+ p2 + p3 )/(1-p1–p2–p3)} = a3 + bX – Since these are cumulative logits, the probabilities are attached to being in category j and lower. – Since the right side changes only in the intercepts, and not in the slope coefficient, this model is known as Proportional odds model. model Thus, Thus in ordered logistic, logistic we need to test the assumption of proportionality as well. Ordinal Logistic – a1, a2, a3 … are the “intercepts” intercepts that satisfy the property a1 < a2 < a3… interpreted as “thresholds” of the latent variable. – Interpretation of parameter estimates depends on the software used! Check the software manual. • If the RHS = a + bX, bX a positive positi e coefficient is associated more with lower order categories and a negative coefficient is associated more with higher order categories. • If the RHS = a – bX, a negative coefficient is more associated with lower ordered categories categories, and a positive coefficient is more associated with higher ordered categories. Model for Limited Dependent Variable • C) Tobit Model • This model is for metric dependent variable and when it is “limited” limited in the sense we observe it only if it is above or below some cut off level. For example, – the wages g mayy be limited from below by y the minimum wage – The donation amount give to charity – “Top coding” income at, say, at $300,000 – Time use and leisure activity of individuals – Extramarital affairs • It is also called censored regression model. Censoring can be from below or from above, also called left and right censoring. [Do not confuse the term “censoring” with the one used in dynamic modeling.] The Tobit Model • The model is called Tobit because it was first proposed by Tobin (1958), and involves aspects of Probit analysis – a term coined by Goldberger for Tobin’s Probit. • Reasoning behind: – If we include the censored observations as y = 0, the censored d observations b i on the h lleft f will ill pull ll down d the h end d off the line, resulting in underestimates of the intercept and p overestimates of the slope. – If we exclude the censored observations and just use the observations for which y>0 (that is, truncating the sample), it will overestimate the intercept and underestimate the slope. – The degree g of bias in both will increase as the number of observations that take on the value of zero increases. (see Figure next slide) Source: J.S. Long The Tobit Model • The Tobit model uses all of the information, information including info on censoring and provides consistent estimates. • It is also a nonlinear model and similar to the probit model. It is estimated usingg maximum likelihood estimation techniques. The likelihood function for the tobit model takes the form: • This is an unusual function, it consists of two terms, the first for non-censored observations (it is the pdf), and d th the second d ffor censored d observations b ti (it iis th the cdf). df) The Tobit Model • The estimated tobit coefficients are the marginal effects of a change in xj on y*, the unobservable latent variable and can be interpreted p in the same way y as in a linear regression model. • But such an interpretation may not be useful since we are interested in the effect of X on the observable y (or change in the censored outcome). – It can b be shown h th thatt change h iin y is i found f d by b multiplying lti l i the coefficient with Pr(a<y*<b), that is, the probability of being uncensored. Since this probability is a fraction, the marginal effect is actually attenuated. – In the above, a and b denote lower and upper censoring points For example, points. example in left censoring, censoring the limits will be: a =0, b=+∞. Illustrations for logit, probit and tobit models, using womenwk.dta from Baum available at http://www.stata-press.com/data/imeus/womenwk.dta Descriptive Statistics N Minimum Maximum Mean Std. Deviation age 2000 20 59 36.21 8.287 education 2000 10 20 13.08 3.046 married 2000 0 1 .67 .470 children 2000 0 5 1.64 1.399 wagefull 2000 -1.68 45.81 21.3118 7.01204 wage 1343 5.88 45.81 23.6922 6.30537 lw 1343 1.77 3.82 3.1267 .28651 work 2000 0 1 .67 .470 lwf 2000 .00 3.82 2.0996 1.48752 Valid N (listwise) 1343 Binary Logistic Regression Model Summary Step Cox & Snell R Nagelkerke R Square Square -2 Log likelihood 2055.829a 1 .212 .295 a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001. Hosmer and Lemeshow Test Step Chi-square 1 df 6.491 Sig. 8 .592 Variables in the Equation B a Step 1 S.E. Wald df Sig. Exp(B) age .058 .007 64.359 1 .000 1.060 education .098 .019 27.747 1 .000 1.103 married .742 .126 34.401 1 .000 2.100 children .764 .052 220.110 1 .000 2.148 -4.159 .332 156.909 1 .000 .016 Constant a. Variable(s) entered on step 1: age, education, married, children. Binary Probit Regression (in SPSS, use the ordinal regression menu and select probit link function. Ignore the test of parallel lines, etc.) Model Fitting Information Model -2 Log Likelihood Intercept Only 1645.024 Final 1166.702 Chi-Square df 478.322 Sig. 4 .000 Link function: Probit. Parameter Estimates 95% Confidence Interval Estimate Threshold [work = 0] Location Std. Error Wald df Sig. Lower Bound Upper Bound 2.037 .209 94.664 1 .000 1.626 2.447 age .035 .004 67.301 1 .000 .026 .043 education .058 .011 28.061 1 .000 .037 .080 children .447 .029 243.907 1 .000 .391 .503 [married=0] -.431 .074 33.618 1 .000 -.577 -.285 [married=1] 0a . . 0 . . . Link function: Probit. a. This parameter is set to zero because it is redundant. Tobit regression cannot be done in SPSS. Use Stata. Here are the Stata commands. First, fit simple OLS Regression of the variable lwf (just to check) . regress lwf age married children education Source | SS df MS -------------+-----------------------------Model | 937.873188 4 234.468297 Residual | 3485.34135 1995 1.74703827 -------------+-----------------------------Total | 4423.21454 1999 2.21271363 Number of obs F( 4, 1995) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2000 134.21 0.0000 0.2120 0.2105 1.3218 -----------------------------------------------------------------------------lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .0363624 .003862 9.42 0.000 .0287885 .0439362 married | .3188214 .0690834 4.62 0.000 .1833381 .4543046 children | .3305009 .0213143 15.51 0.000 .2887004 .3723015 education | .0843345 .0102295 8.24 0.000 .0642729 .1043961 _cons | -1.077738 .1703218 -6.33 0.000 -1.411765 -.7437105 ------------------------------------------------------------------------------ . tobit lwf age married children education, ll(0) Tobit regression Log likelihood = -3349.9685 Number of obs LR chi2(4) Prob > chi2 Pseudo R2 = = = = 2000 461.85 0.0000 0.0645 -----------------------------------------------------------------------------lwf | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------age | .052157 .0057457 9.08 0.000 .0408888 .0634252 married | .4841801 .1035188 4.68 0.000 .2811639 .6871964 children | .4860021 .0317054 15.33 0.000 .4238229 .5481812 education | .1149492 .0150913 7.62 0.000 .0853529 .1445454 _cons | -2.807696 .2632565 -10.67 0.000 -3.323982 -2.291409 -------------+---------------------------------------------------------------/sigma | 1.872811 .040014 1.794337 1.951285 -----------------------------------------------------------------------------Obs. summary: 657 left-censored observations at lwf<=0 1343 uncensored observations 0 right-censored observations . mfx compute, predict(pr(0,.)) Marginal effects after tobit y = Pr(lwf>0) (predict, pr(0,.)) = .81920975 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------age | .0073278 .00083 8.84 0.000 .005703 .008952 36.208 married*| .0706994 .01576 4.48 0.000 .039803 .101596 .6705 children | .0682813 .00479 14.26 0.000 .058899 .077663 1.6445 educat~n | .0161499 .00216 7.48 0.000 .011918 .020382 13.084 -----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1 . mfx compute, predict(e(0,.)) Marginal effects after tobit y = E(lwf|lwf>0) (predict, e(0,.)) = 2.3102021 -----------------------------------------------------------------------------variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X ---------+-------------------------------------------------------------------age | .0314922 .00347 9.08 0.000 .024695 .03829 36.208 married*| .2861047 .05982 4.78 0.000 .168855 .403354 .6705 children | .2934463 .01908 15.38 0.000 .256041 .330852 1.6445 educat~n | .0694059 .00912 7.61 0.000 .051531 .087281 13.084 -----------------------------------------------------------------------------(*) dy/dx is for discrete change of dummy variable from 0 to 1