Download Dummy Dependent Variables Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Adnan Kasman
Dept. of Economics,
Faculty of Business,
Dokuz Eylul University
Course: Econometrics II
Dummy Dependent Variables Models
In this chapter we introduce models that are designed to deal with situations in which our
dependent variable is a dummy variable. That is, it assumes either the value 0 or the value 1. Such
models are very useful in that they allow us to address questions for which there is a “yes or no”
answer.
1. Linear Probability Model
In the case of dummy dependent variable model we have:
y i  1   2 xi   i
where yi  0 or 1 and E ( i )  0 .
What would happen if we simply estimated the slope coefficients of this model using
OLS? What would the coefficients mean? Would they be unbiased? Are they efficient?
A regression model in the situation where the dependent variable takes on the two values 0
or 1 is called a linear probability model. To see its properties note the following.
a) Since the mean error is zero, we know that E ( yi )  1   2 xi .
pi  prob ( yi  1) and
1  pi  prob ( yi  0) ,
b) Now,
if
we
define
then
E ( yi )  1  pi  0  (1  pi ) . Therefore, our model is pi  1   2 xi and the estimated slope
coefficients would tell us the impact of a unit change in that explanatory variable on the
probability that yi  1
c) The predicted values from the regression model pˆ i  b1  b2 xi would provide predictions,
based on some chosen values for the explanatory variables, for the probability that yi  1 .
There is, however, nothing in the estimation strategy that would constrain the
resulting predictions from being negative or larger than 1-clearly an unfortunate
characteristic of the approach.
d) Since E ( i )  0 and uncorrelated with the explanatory variables (by assumption), it is
easy to show that the OLS estimators are unbiased. The errors, however, are
heteroscedastic. A simple way to see this is to consider an example. Suppose that the
dependent variable takes the value 1 if the individual buys a Rolex watch and 0 other wise.
Also, suppose the explanatory variable is income. For low level of income it is likely that
all of the observations are zeros. In this case, there would be no scatter around the line. For
higher levels of income there would be some zeros and some ones. That is, there would be
some scatter around the line. Thus, the errors would be heteroscedastic. This suggests two
empirical strategies. First, we know that the OLS estimators are unbiased but would yield
the incorrect standard errors. We might simply use OLS and then use the White correction
to produce correct standard errors.
1
2. Logit and Probit Models
One potential criticism of the linear probability model (beyond those mentioned above) is
that the model assumes that the probability that yi  1 is linearly related to the explanatory
variable(s). We might, however, expect the relation to be nonlinear. For example, increasing the
income of the very poor or the very rich will probably have little effect on whether they buy an
automobile. It could, however, have a nonzero effect on other income groups.
Two models that are nonlinear, yet provide predicted probabilities between 0 and 1, are the
logit and probit models. The difference between the linear probability model and the nonlinear
logit and probit models can be explained using an example. To motivate these models, suppose
that our underlying dummy dependent variable depends on an unobserved (“latent”) utility index
y * . For example, if the variable y is discrete, taking on the values 0 and 1 if someone buys a car,
then we can imagine a continuous variable y * that reflects a person’s desire to buy the car. It
seems reasonable that y * would vary continuously with some explanatory variable like income.
More formally, suppose
y*  1   2 xi   i
and
yi  1 if
yi  0 if
y*  0 (i.e., the utility index is “high enough”)
y*  0 (i.e., the utility index is not “high enough”)
Then:
pi  prob( yi  1)
 prob( yi*  0)
 prob( 1   2 xi   i  0)
 prob( i   1   2 xi )
 1  F ( 1   2 xi ) where is the c.d . f . for 
 F ( 1   2 xi ) if F is symmetric
Given this, our basic problem is selecting F – the cumulative density function for the error term. It
is here where the logit and probit models differ. As a practical matter, we are likely interested in
estimating the  ’s in the model. This is typically done using a Maximum Likelihood Estimator
(MLE). To outline the MLE in this context, recognize that each outcome y i has the density
function f ( yi )  piyi (1  pi )1 yi . That is, each y i takes on either the value of 0 or 1 with
probability f (0)  (1  pi ) and f (1)  pi . Then the likelihood function is:
L  f ( y1 , y2 .....yn )
 f ( y1 ) f ( y2 )...... f ( yn )
 [ p1y1 (1  p1 )1 y1 ][ p2y2 (1  p2 )1 y2 ].....[ pnyn (1  pn )1 yn ]

n
p
yi
i
(1  pi )1 yi
i 1
and
2
ln L 
n
 y ln p
i
i
 (1  yi ) ln(1  pi )
i 1
which, given pi  F ( 1   2 xi ) , becomes
ln L 
n
 y ln F (
i
1
  2 xi )  (1  yi ) ln(1  F ( 1   2 xi ))
i 1
Analytically, the next step would be to take the partial derivatives of the likelihood function with
respect to the  ’s, set them equal to zero, and solve for the MLEs. This could be a very messy
calculation depending on the functional form of F. In practice, the computer will solve this
problem for us.
2.1. Logit Model
For the logit model we specify
p( yi  1)  F ( 1   2 xi ) 
1
1 e
 ( 1   2 xi )
It can be seen that p( yi  1)  0 as 1   2 xi   . Similarly, p( yi  1)  1 as 1   2 xi   .
Thus, unlike the linear probability model, probabilities from the logit will be between 0 and 1. A
complication arises in interpreting the estimated  ’s. In the case of a linear probability model, a b
measures the ceteris paribus effect of a change in the explanatory variable on the probability y
equals 1. In the logit model we can see that
prob( yi  1) F (b1  b2 xi )

b2
xi
xi

b2 e ( 1   2 xi )
[1  e ( 1   2 xi ) ]2
Notice that the derivative is nonlinear and depends on the value of x. It is common to evaluate the
derivative at the mean of x so that a single derivative can be presented.
Odds Ratio
p( yi  1)  F ( 1   2 xi ) 
1
1 e
 ( 1   2 xi )
For ease of exposition, we write above equation as pi 
1
ez
where z  1   2 xi .

1  e z 1  e z
To avoid the possibility that the predicted values might be outside the probability interval
of 0 to 1, we model the ratio
pi
. This ratio is the likelihood, or odds, of obtaining a
1  pi
3
successful outcome (the ration of the probability that a family will own a car to the
probability that it will not own a car)1.
pi
1  e zi

 e zi
1  pi 1  e  z i
If we take the natural log of above equation, we obtain
 p 
L  ln  i   zi  1   2 xi
 1  pi 
that is, L, the log of the odds ration, is not only linear in x, but also linear in the
parameters. L is called the logit, and hence the name logit model.
Logit model cannot be estimated using OLS. Instead, we use MLE that discussed previous
section, an iterative estimation technique that is especially useful for equations that are
nonlinear in the coefficients. MLE is inherently different from least squares in that it
chooses coefficient estimates that maximize the likelihood of the sample data set being
observed. Interestingly, OLS and MLE are not necessarily different; for a linear equation
that meets the classical assumptions (including the normality assumption), MLE are
identical to the OLS.
Once the logit has been estimated, hypothesis testing and econometric analysis
can be undertaken in much the same way as for linear equations. When interpreting
coefficients, however, be careful to recall that they represent the impact of a one unit
increase in the independent variable in question, holding the other explanatory
variables constant, on the log of the odds of a given choice, not on the probability
itself. But we can always compute the probability as certain level of variable in
question.
2.2. Probit Model
In the case of the probit model, we assume that the  i ~ N (0,  2 ) . That is, we assume the error in
the utility index model is normally distributed. In this case,
    2 xi 
p( yi  1)  F  1




where F is the standard normal cumulative density function. That is
    2 xi
p( yi  1)  F  1






1   2 xi


1
2
e

t2
2 dt
In practice, the c.d.f. of the logit and the probit look quite similar to one another. Once again,
calculating the derivative is moderately complicated . In this case,
1
Odds refer to the ration of the number of times a choice will be made divided by the number of times it will
not. In today’s world, odds are used most frequently with respect to sporting events, such as horse races, on
which bets are made.
4
prob( yi  1)

xi
F (
1   2 xi
)
    2 xi 

 f 1
 2
xi



where f is the density function of the normal distribution. As in the logit case, the derivative is
nonlinear and is often evaluated at the mean of the explanatory variables. In the case of dummy
explanatory variables, it is common to estimate the derivative as the probability yi  1 when the
dummy variable is 1 (other variables set to their mean) minus the probability yi  1 when the
dummy variable is 0 (other variables set to their mean). That is, you simply calculate how the
predicted probability changes when the dummy variable of interest switches from 0 to 1.
Which Is Better? Logit or Probit
Fortunately, from an empirical standpoint, logits and probits typically yield very similar
estimates of the relevant derivatives. This is because the cumulative distribution functions for the
logit and probit are similar, differing slightly only in the tails of their respective distributions.
Thus, the derivatives are different only if there are enough observations in the tail of the
distribution. While the derivatives are usually similar, it is important to remember the parameter
estimates associated with logit and probit models are not. A simple approximation suggests that
multiplying the logit estimates by 0.625 makes the logit estimates comparable to the probit
estimates.
Example: We estimate the relationship between the openness of a country Y and a country’s per
capita income in dollars X in 1992. We hypothesize that higher per capita income should be
associated with free trade, and test this at the 5% significance level. The variable Y takes the value
of 1 for free trade, 0 otherwise.
Since the dependent variable is a binary variable, we set up the index function
Y *  1   2 X i
If Y *  0, Y  1 (open); if Y *  0, Y  0 (not open)
Probit estimation gives the following results:
Dependent Variable: Y
Method: ML - Binary Probit (Quadratic hill climbing)
Date: 05/27/04 Time: 13:54
Sample(adjusted): 1 20
Included observations: 20 after adjusting endpoints
Convergence achieved after 7 iterations
Covariance matrix computed using second derivatives
Variable
Coefficient
Std. Error
z-Statistic
Prob.
C
X
-1.994184
0.001003
0.824708
0.000471
-2.418048
2.129488
0.0156
0.0332
Mean dependent var
S.E. of regression
Sum squared resid
Log likelihood
Restr. log likelihood
LR statistic (1 df)
Probability(LR stat)
0.500000
0.337280
2.047636
-6.864713
-13.86294
13.99646
0.000183
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Avg. log likelihood
McFadden R-squared
5
0.512989
0.886471
0.986045
0.905909
-0.343236
0.504816
Slope is significant at the 5% level.
The interpretation of the b2 changes in a probit model. b2 is the effect of X on Y * . The
marginal effect of X on p (Yi  1) is easier to interpret and is given by f (b1  b2 X )  b2 .
f (1.9942  0.001(3469 .5))(0.001)  0.0001
To test the fit of the model (analogous to R-squared), the maximized log-likelihood value
(lnL) can be compared to the maximized log likelihood in a model with only a constant
ln L0 in the likelihood ratio index
LRI  1 
ln L
 6.8647
1
 0.50
ln L0
 13.8629
Logit estimation gives the following results:
Dependent Variable: Y
Method: ML - Binary Logit (Quadratic hill climbing)
Date: 05/27/04 Time: 14:12
Sample(adjusted): 1 20
Included observations: 20 after adjusting endpoints
Convergence achieved after 7 iterations
Covariance matrix computed using second derivatives
Variable
Coefficient
Std. Error
z-Statistic
Prob.
C
X
-3.604995
0.001796
1.681068
0.000900
-2.144467
1.995415
0.0320
0.0460
Mean dependent var
S.E. of regression
Sum squared resid
Log likelihood
Restr. log likelihood
LR statistic (1 df)
Probability(LR stat)
0.500000
0.333745
2.004939
-6.766465
-13.86294
14.19296
0.000165
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Avg. log likelihood
McFadden R-squared
0.512989
0.876647
0.976220
0.896084
-0.338323
0.511903
As you can see from the output, the slop coefficient is significant at the 5% level.
The coefficients are proportionally higher in absolute value than in the probit model, but the
marginal effects and significance should be similar.
prob( yi  1) F (b1  b2 xi )

b2
xi
xi

 (b1  b2 X ).b2 
b2 e ( 1   2 xi )
[1  e ( 1   2 xi ) ]2
e 3.605 0.0018(3469.5)
(0.0018 )  0.0001
(1  e 3.605 0.0018(3469.5) ) 2
This can be interpreted as the marginal effect of GDP per capita on the expected value of
Y.
6
LRI  1 
ln L
 6.7664
1
 0.51
ln L0
 13.8629
Example :
From the household budget survey of 1980 of the Dutch Central Bureau of
Statistics, J.S. Cramer obtained the following logit model based on a sample of 2820
households. (The results given here are based on the method of maximum likelihood and
are after the third iteration.) The purpose of the logit model was to determine car
ownership as a function of (logarithm of) income. Car ownership was a binary variable:
Y=1 if a household owns a car, zero otherwise.
Lˆi  2.77231  0.347582 ln Income
t = (-3.35)
(4.05)
2
 (1 df) = 16.681 (p value = 0.0000)
where L̂i = estimated logit and where Ln Income is the logarithm of income. The  2 measures the
goodness of fit of the model.
a ) Interpret the estimated logit model.
b) From the estimated logit model, how would you obtain the expression for the probability
of car ownership?
c) What is the probability that a household with an income of 20,000 will own car? And at an
income level of 25,000? What is the rate of change of probability at the income level of
20,000?
d) Comment on the statistical significance of the estimated logit model.
7