Download Lecture 12 Qualitative Dependent Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Discrete choice wikipedia , lookup

Choice modelling wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Lecture 12
Qualitative Dependent
Variables
12.1
Qualitative Dependent variable (Dummy
Dependent Variable
• Qualitative Dependent variable – binary variable with values of 0 and 1
Examples:
• To model household purchasing decision whether to buy a car ⇒ a
typical family either bought a car or it did not: takes the value of 1
if the household purchased a car; takes the value of 0 if the household
did not.
– the interpretation of the dependent variable is that it is a probability measure for which the realized value is 0 or 1
– discrete choice model or qualitative response model: involving
binary (dummy) dependent variables
• Decision to study for MBA degree:
– a function of unemployment rate, average wage rate, family income, etc.
– takes tow values: 1 if the person is in MBA program and 0 if
he/she is not.
1
2
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
• Union membership of a worker
• Own a house
When we handle models involving dichotomous response variables, four
most commonly used approaches to estimating such models are:
1. The linear probability model (LPM)
2. The logit model
3. The probit model
4. The tobit (censored regression) model
12.2
Linear Probability (or Binary Choice)
Models [LPM]
Consider the following simple model:
Yi = α + βXi + ui
where Xi = household income
Yi = 1 if the household (ith observation) buys a car in a given year
Yi = 0 if the household does not buy a car
In this model, the dichotomous Yi is represented as a linear function of
X. This model is called linear probability model since E[yi |Xi ] can be
interpreted as the conditional probability that the event will occur given Xi ;
that is, Pr(Yi = 1|Xi ).
Given observed value of Yi = 0 or 1, let’s define Pi =Pr(Yi = 1|Xi ). It
results
E[yi |Xi ] = 1 × Pi + 0 × (1 − Pi ) = Pi
Pr(yi = 1|Xi ) = α + βXi
More importantly, the linear probability model does violate the assumption
of homoskedasticity. When y is a binary variable, we have
Var(yi |Xi ) = Pi [1 − Pi ]
c
Yin-Feng
Gau 2002
ECONOMETRICS
3
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
where Pi denotes the probability of success: Pi (X) = α+βXi . This indicates
there exists heteroskedasticity in LPM model. It implies the OLS estimators
are inefficient in the LPM. Hence we have to correct for heteroskedasticity
for estimating the LPM if we want to have a more efficient estimator that
OLS in LPM.
ui = 1 − α − βXi ifYi = 1
= −α − βXi ifYi = 1
0 = E(ui |Xi ) = Pi × (1 − α − βXi ) + (1 − Pi ) × (−α − βXi )
σi2 = E[(ui − E(ui ))2 |Xi ] = E(u2i ) since E(ui |Xi ) = 0
By definition,
σi2 = Pi (1 − α − βXi )2 + (1 − Pi )(−α − βXi )2
= Pi (1 − Pi )2 + (1 − Pi )Pi2 = Pi (1 − Pi )
which makes use if the fact that α + βXi = Pi . Hence σi2 = (1 − α −
βXi )(α + βXi ), which varies with i, thus establishing the heteroskedasticity
of the residuals ui .
Procedures of Estimating the LPM:
1. Obtain the OLS estimators of the LPM at first.
2. Determine whether all of the OLS fitted values, ŷi , satisfy 0 < ŷi < 1.
If so, proceed to step (3). If not, some adjustment is needed to bring
all fitted values into the unit interval.
3. Construct the estimator of σi2 :
σ̂i2 = ŷi (1 − ŷi )
4. Apply WLS to estimate the equation
REMARKS: Even the normality assumption of ui is violated, OLS estimates
of α and β are unbiased and consistent, but inefficient because of the heteroskedasticity.
Why not estimate α and β by regressing Y against a constant and X?
Does this cause any problem?
Reason: In the case of dummy dependent variable, the residual will be heteroskedastic, and hence the application of OLS will yield inefficient estimates.
DISCUSSIONS: The LPM is plagued by several problems, such as
c
Yin-Feng
Gau 2002
ECONOMETRICS
4
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
1. nonnormality of Ui
2. heteroskedasticity of Ui
3. possibility of Ŷi lying outside the 0-1 range
4. the generally lower R2 values
5. it is not logically a very attractive model because it assumes that Pi =
E[Y = 1|X] increases linearly with X.
We need a (probability) model that has two features:
1. As Xi increases, Pi = E[Yi = 1|Xi ] increases but never steps outside
the 0-1 interval
2. the relationship between Pi and Xi is nonlinear, that is, “one which
approaches zero at slower and slower rates as Xi gets small and approaches one at slower and slower rates as Xi gets very large.”
Pi = f (Xi )
f (·) is S-shaped and resembles the cumulative distribution function (CDF)
of a random variable. Practically, the CDFs commonly chosen to represent
the 0-1 response models are
1. the logistic distribution (logit model)
2. the normal distribution (probit model)
12.2.1
Maximum Likelihood Estimation (MLE)
Use MLE method to estimate β, the collection of β1 , · · · , βk , σ 2 , and Ω(θ),
the variance matrix of parameters, at the same time. Let Γ estimate Ω−1 .
Then, the log likelihood can be written as:
log L = −
c
Yin-Feng
Gau 2002
1
1
n
log(2π) + log σ 2 − 2 ε0 Γε + log |Γ|
2
2σ
2
ECONOMETRICS
5
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
First Order Condition (FOC):
∂ log L
1
= 2 X 0 Γ(Y − Xβ)
∂β
σ
∂ log L
n
1
= − 2 + 4 (Y − Xβ)0 Γ(Y − Xβ)
2
∂σ
2σ
2σ
1 −1
1
1
=
Γ − ( 2 )εε0 = 2 (σ 2 Ω − εε0 )
2
σ
2σ
12.3
Heteroskedasticity-Robust Inference After OLS Estimation
Since hypotheses tests and confidence interval with OLS are invalid in the
presence of heteroskedasticity, we must make decide if we entirely abandon
OLS or reformulate the adequate corresponding test statistics or confidence
intervals. For the later options, we have to adjust standard errors, t, F , and
LM statistics so that they are valid in the presence of heteroskedasticity of
unknown form. Such procedure is called heteroskedasticity-robust inference
and it is valid in large samples.
12.3.1
How to estimate the variance, Var(β̂j ), in the
presence of heteroskedasticity
Consider the simple regression model,
yi = β1 + β2 xi + ui
Assume assumptions A1-A4 are satisfied. If the errors are heteroskedastic,
then
Var(ui |xi ) = σi2
The OLS estimator can be written as
Pn
(xi − x)ui
2
i=1 (xi − x)
β̂1 = β1 + Pi=1
n
and we have
Pn
Var(b2 ) =
c
Yin-Feng
Gau 2002
− x)2 σi2
SST2x
i=1 (xi
ECONOMETRICS
6
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
where SSTx = ni=1 (xi − x)2 is the total sum of squares of the xi . Note:
When σi2 = σ 2 for all i, Var(β̂1 ) reduces to the usual form, σ 2 /SSTx .
Regarding the way to estimate Var(β̂2 ) in the presence of heteroskedasticity, White (1980) proposed a procedure which is valid in large samples.
Let ûi denote the OLS residuals from the initial regression of y on x. White
(1980) suggested a valid estimator of Var(β̂2 ) for heteroskedasticity of any
form (including homoskedasticity), is
P
Pn
− x)2 û2i
SST2x
i=1 (xi
Brief proof: (for complete proof, please refer to White (1980))
Pn
− x)2 û2i p
n·
→ E[(xi − µx )2 u2i ]/(σx2 )2
2
SSTx
Pn
Pn
2 2
2 2
i=1 (xi − x) ûi
i=1 (xi − x) σi p
→
n · Var(β̂1 ) = n ·
SST2x
SST2x
i=1 (xi
Therefore, by the law of large number
limit theorem, we
Pn and the2 central
2
(x
−
x)
û
i=1 i
i
d β̂ ) =
can use this estimator, Var(
to construct confidence
1
2
SSTx
intervals and t test.
For the multiple regression model,
yi = β1 + β2 xi2 + · · · + βk xik + ui
under assumptions A1-A4, the valid estimator of Var(β̂j ) is
Pn
d β̂ ) =
Var(
j
2 2
i=1 rij ûi
SST2j
where rij denotes the ith residuals from regressing xj on all other indepenP
dent variables (including an intercept), and SSTj = ni=1 (xij − xj )2 .
REMARKS:
σ2
,
SSTj (1 − Rj2 )
P
for j = 1, . . . , k, where SSTj = ni=1 (xij − xj )2 , and Rj2 is the R2 from
the regressing xj on all other independent variables (and including an
intercept).
• The variance of the usual OLS estimator β̂j is Var(β̂j ) =
c
Yin-Feng
Gau 2002
ECONOMETRICS
7
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
d β̂ ) is called heteroskedasticity-robust stan• The square root of Var(
j
dard error for β̂j . Once the heteroskedasticity-robust standard errors
are obtained, we can then construct a heteroskedasticity-robust t
statistic.
estimate − hypothesized value
t=
standard error
12.4
The Probit Model
Assume there is a response function of the form Ii = α + βXi , where Xi is
observable but where Ii is an unobservable variable. What we observe in
practice is Yi , which takes the value 1 if Ii > I ∗ and 0 otherwise. I ∗ is a
critical or threshold level of the index.
For example, we can assume that the decision of ith household to own a
house or not depends on an unobservable utility index that is determined by
X. We thus have
Yi = 1 if α + βXi > I ∗
Yi = 0 if α + βXi ≤ I ∗
If we denote by F (z) the cumulative distribution function of the normal
distribution, that is, F (z) = P (Z ≤ z), then
Pi = P (Yi = 1) = P (Ii >
Ii∗ )
1 Z Ti −t2 /2
= F (Ii ) = √
e
dt
2π −∞
1 Z α+βXi −t2 /2
=√
e
dt
2π −∞
where t ∼ N (0, 1).
Ii = F −1 (Ii ) = F −1 (Pi ) = α + βXi
The joint probability density of the sample of observations (called the
likelihood function) is therefore given by
L=
Y
F
Yi =0
−α − βXi
σ
!
"
Y
Yi =1
1−F
−α − βXi
σ
!#
• the parameters α and β are estimated by maximizing L, which is highly
nonlinear in parameters and cannot be estimated by conventional regression programs. It needs specialized nonlinear optimization procedures, such as BHHH and BFGS.
c
Yin-Feng
Gau 2002
ECONOMETRICS
8
12.5
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
The Logit Model
The logit model (aka the logistic model) has the following functional form:
Pi = E[Yi = 1|Xi ] =
1
1+
e−(α+βXi )
We see that if βX → +∞, P → 1, and when βX → −∞, P → 0. Thus, P
can never be outside the range [0,1].
logistic distribution function:
The CDF for a logistic random variable is
P r(Z ≤ z) = F (z) =
1
1 + e−z
F (z) → 1 as z → ∞, F (z) → 0 as z → −∞.
Since Pi is defined as the probability of Yi = 1, then 1 − Pi is the probability of Yi = 0.
1
1 − Pi =
1 + ez
Hence, we have
Pi
1 + ezi
=
= ezi
1 − Pi
1 + e−zi
By taking the natural log, we obtain
Pi
Li = ln
1 − Pi
12.5.1
= zi = α + βXi
Estimation of the Logit Model
Pi
Li = ln
1 − Pi
= α + βXi + ui
If we have data on individual households, with Pi = 1 if a household owns
a house and Pi = 0 if it does not own a house. But it is meaningless to
calculate the logarithm of Pi /(1 − Pi ) since it is undefined when Pi is either
0 or 1. Therefore we can not estimate this model by the standard OLS. We
need to use MLE to estimate the parameters.
c
Yin-Feng
Gau 2002
ECONOMETRICS
9
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
If we have data in which P is strictly between 0 and 1, then we can
simply transform P and obtain Yi = ln[Pi /(1 − Pi )]. Then regress Yi against
a constant and Xi .
The marginal effect of X on P
4P̂
β̂e−(α̂+β̂X )
=
=
β̂
P̂
1
−
P̂
2
4X
1 + e−(α̂+β̂X )
12.6
The Tobit Model(or Censored Regressions)
The observed values of a dependent variable sometimes have a discrete jump
at zero, that is, some of the values may be zero while others are positive. We
therefore never observe negative values.
• What are the consequences of disregarding this fact and regressing Y
against a constant and X? In this situation, the residual will not
satisfy the condition E(ui ) = 0, which is required for the unbiasedness
of estimates.
The Tobit Model(or Censored Regressions) There is an asymmetry between observations with positive values of Y and those with negative
values. The model becomes
(
Yi =
α + βXi + ui if Yi > 0 or ui > −α − βXi
0
if Yi ≤ 0 or ui ≤ −α − βXi
The basic assumption behind this model is that there exists an index function
Ii = α + βXi + ui for each economic agent being studied. If Ii ≤ 0, the value
of the dependent variable is set to zero. If Ii > 0, the value of the dependent
variable is set to Ii . Suppose u has the normal distribution with mean zero
and variance σ 2 . We note that Z = u/σ is a standard normal random
variable. Denote by f (z) the probability density of the standard normal
variable Z, and by F (z) its cumulative density – that is, P [Z ≤ z]. Then
the joint probability density for those observations for which Yi is positive is
given by the following:
P1 =
i=m
Y
i=1
c
Yin-Feng
Gau 2002
"
1 Yi − α − βXi
f
σ
σ
#
ECONOMETRICS
10
LECTURE 12
QUALITATIVE DEPENDENT VARIABLE
where m is the number of observation in the subsample for which Y is positive. For the second subsample (of size n) for which the observed Y is zero,
the random variable u ≤ −α − βX. The probability for this event is
P2 =
j=n
Y
P [uj ≤ −α − βXj ]
j=1
j=n
Y
"
−α − βXj
=
F
σ
j=1
#
The joint probability for the entire sample is therefore given by L = P1 P2 .
Because this is nonlinear, the OLS is not applicable here. We have to employ maximum likelihood procedure to obtain estimates of α and β, i.e., to
maximize L with respect to the parameters.
References
Greene, W. H., 2003, Econometric Analysis, 5th ed., Prentice Hall. Chapter
21.
Gujarati, D. N., 2003, Basic Econometrics, 4th ed., McGraw-Hill. Chapter
15.
c
Yin-Feng
Gau 2002
ECONOMETRICS