Download Results on the Bias and Inconsistency of Ordinary Least

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Data assimilation wikipedia , lookup

German tank problem wikipedia , lookup

Discrete choice wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Syracuse University
SURFACE
Economics Faculty Scholarship
Maxwell School of Citizenship and Public Affairs
11-28-2005
Results on the Bias and Inconsistency of Ordinary
Least Squares for the Linear Probability Model
William C. Horrace
Syracuse University, [email protected]
Ronald L. Oaxaca
University of Arizona
Follow this and additional works at: http://surface.syr.edu/ecn
Part of the Economics Commons
Recommended Citation
Horrace, William C. and Oaxaca, Ronald L., "Results on the Bias and Inconsistency of Ordinary Least Squares for the Linear
Probability Model" (2005). Economics Faculty Scholarship. Paper 10.
http://surface.syr.edu/ecn/10
This Article is brought to you for free and open access by the Maxwell School of Citizenship and Public Affairs at SURFACE. It has been accepted for
inclusion in Economics Faculty Scholarship by an authorized administrator of SURFACE. For more information, please contact [email protected].
Economics Letters 90 (2006) 321 – 327
www.elsevier.com/locate/econbase
Results on the bias and inconsistency of ordinary least squares for
the linear probability model
William C. Horrace a,T, Ronald L. Oaxaca b
a
Department of Economics, Syracuse University, Syracuse, NY 13244, USA and NBER, United States
Department of Economics, University of Arizona, Tucson, AZ 85721, USA and IZA, United States
b
Received 10 January 2005; received in revised form 28 June 2005; accepted 30 August 2005
Available online 28 November 2005
Abstract
This note formalizes bias and inconsistency results for ordinary least squares (OLS) on the linear probability
model and provides sufficient conditions for unbiasedness and consistency to hold. The conditions suggest that a
btrimming estimatorQ may reduce OLS bias.
D 2005 Elsevier B.V. All rights reserved.
Keywords: Consistency; Unbiased; LPM; OLS
JEL classification: C25
1. Introduction
Limitations of the Linear Probability Model (LPM) are well-known. OLS estimated probabilities are
not bounded on the unit interval, and OLS estimation implies that heteroscedasticity exists. Conventional
advice points to probit or logit as the standard remedy, which bound the maximum likelihood estimated
probabilities on the unit interval. However, the fact that consistent estimation of the LPM may be
difficult does not imply that either probit or logit is the correct specification of the probability model; it
may be reasonable to assume that probabilities are generated from bounded linear decision rules.
Theoretical rationalizations for the LPM are in Rosenthal (1989) and Heckman and Snyder (1977).
T Corresponding author. Tel.: +1 315 443 9061; fax: +1 315 443 1081.
E-mail address: [email protected] (W.C. Horrace).
0165-1765/$ - see front matter D 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.econlet.2005.08.024
322
W.C. Horrace, R.L. Oaxaca / Economics Letters 90 (2006) 321–327
Despite the attractiveness of logit and probit for estimating binary dependent variable models, OLS
on the LPM is still used. Recent applications include Klaassen and Magnus (2001), Bettis and Fairlie
(2001), Lukashin (2000), McGarry (2000), Fairlie and Sundstrom (1999), Reiley (2005), and Currie
and Gruber (1996). Empirical rationales for the LPM specification are plentiful. McGarry appeals to
ease of interpretation of estimated marginal effects, while Reiley cites a perfect correlation problem
associated with the probit model. Fairlie and Sundstrom prefer LPM because it implies a simple
expression for the change in unemployment rate between two censuses. Bettis and Farlie choose LPM
because of an extremely large sample size and other simplifications implied by it. Lukashin uses the
LPM, because it lends itself to a model selection algorithm based on an adaptive gradient criterion.
Currie and Gruber state that logit, probit, and OLS are similar for their data and only report LPM
results.
Other rationales for the OLS on the LPM are complications of probit/logit models in certain contexts.
Klaassen and Magnus cite panel data complications in their tennis example and select OLS. OLS is
perhaps justified in simultaneous equations/instrumental variable methods. The presence of dummy
endogenous regressors is problematic if the DGP is assumed to be probit or logit; these problems were
first considered by Heckman (1978). While perhaps less popular than logit and probit, OLS on the LPM
model still finds its way into the literature for various reasons.
Some well-known LPM theorems are provided in Amemiya (1977). Econometrics textbooks (e.g.,
Greene, 2000), acknowledge complications leading to biased and inconsistent OLS estimates.
Nevertheless, the literature is not clear on the precise conditions when OLS is problematic. This note
rigorously lays out these conditions, derives the finite-sample and asymptotic biases of OLS, and
provides additional results that highlight the appropriateness or inappropriateness of OLS estimation of
the LPM. Finally, we suggest a trimmed sample estimator that could reduce OLS bias.
2. Results
Let y i be a discrete random variable, taking on the values 0 or 1. Let x i be a 1 k vector of
explanatory variables on Rk , b be a k 1 vector of coefficients, and e i be a random error. Define
probabilities over the random variable x i b a R.
Prðxi bN1Þ ¼ p;
Prðxi ba½0; 1Þ ¼ c
Prðxi bb0Þ ¼ q;
where p + c + q = 1. Consider a random sample of data: ( y i , x i ); i a N; N = {1, . . . , n}. Define the data
partition:
jc ¼ fijxi ba½0; 1g;
jp ¼ fijxi bN1g;
ð1Þ
implying
PrðiajpÞ ¼ p;
Priajc ¼ c;
Pr igjc [ jp ¼ q:
ð2Þ
W.C. Horrace, R.L. Oaxaca / Economics Letters 90 (2006) 321–327
323
The LPM DGP is:
yi ¼ 1 for iajp ; ¼ xi b þ ei for iajc ; ¼ 0 otherwise:
ð3Þ
The conditional probability of y i is:
Prðyi ¼ 1jxi ; iajpÞ ¼ 1;
Pryi ¼ 1jxi ; iajc ¼ xi b;
Pryi ¼ 0jxi ; iajc ¼ 1 xi b;
Pr yi ¼ 0jxi ; igjc [ jp ¼ 1:
ð4Þ
Therefore, y i traces the familiar ramp function on x i b with error process:
ei ¼ 0 for iajp ; ¼ yi xi b; i a jc ; ¼ 0 for igjc [ jp ;
and probabilities
Prðei ¼ 0jxi ; iajp Þ ¼ 1;
Prei ¼ 1 xi bjxi ; iajc ¼ xi b;
Prei ¼ xi bjxi ; iajc ¼ 1 xi b;
Pr ei ¼ 0jxi ; igjc [ jp ¼ 1:
ð5Þ
OLS proceeds as:
yi ¼ xi b þ ui ; iaN ;
where u i is a zero-mean random variable, independent of the x i . Notice that the OLS error term, u i ,
differs from e i :
ui ¼ 1 xi b for iajp ; ¼ yi xi b for iajc ; ¼ xi b for igjc [ jp ;
with probability function:
Prðui ¼ 1 xi bjxi ; iajpÞ ¼ 1;
Prui ¼ 1 xi bjxi ; iajc ¼ xi b;
Prui ¼ xi bjxi ; iajc ¼ 1 xi b;
Pr ui ¼ xi bjxi ; igjc [ jp ¼ 1:
ð6Þ
The distinction between u i and e i induces problems in OLS.
Theorem 1. If c b 1, then Ordinary Least Squares estimation of the Linear Probability Model is
generally biased and inconsistent.
Proof. Eq. (6) implies:
E ðui jxi ; iajpÞ ¼ 1 xi b;
E ui jxi ; iajc ¼ 0;
E ui jxi ; igjc [ jp ¼ xi b:
Therefore, the conditional expectation of the OLS error, u i , is a function of x i with probability (1 c).
Hence, OLS is biased and inconsistent, if c b 1.
5
324
W.C. Horrace, R.L. Oaxaca / Economics Letters 90 (2006) 321–327
Hence, only observations i a j g possess mean-zero errors, so OLS with i g j g is problematic.
Remark 2. If nc p N, then OLS estimation is biased and inconsistent. That is, if the sample used to
estimate b contains any i g j g , then c b 1, so OLS is problematic.
Also:
Remark 3. If c = 1, then OLS is unbiased and consistent, because p = q = 0, E(u i | x i ) = 0 for all i a N,
and:
Eðyi jxi Þ ¼ Prðyi ¼ 1jxi Þ ¼ xi b; iaN :
Define random variables z i and w i :
zi ¼ 1 for iajc ;
¼ 0 otherwise:
wi ¼ 1 for iajp ;
¼ 0 otherwise:
Hence, Pr(z i = 1) = c and Pr(w i = 1) = p. Alternative representation of Eq. (3) is:
yi ¼ wi þ zi xi b þ ui zi ;
ð7Þ
iaN ;
making explicit that u i is not the correct OLS error. Notice,
ui zi ¼ 0 for igjc ; ¼ 1 xi b for yi ¼ 1; iajc ; ¼ xi b for yi ¼ 0; iajc ;
so the conditional probability function of u i z i is the same as that of e i . Therefore, E(u i z i | x i ) = 0, and Eq.
(7) has a zero-mean error, independent of x i . Taking the unconditional mean of Eq. (7):
Eðyi Þ ¼ p þ E ðzi xi Þb þ Eðui zi Þ ¼ p þ cEðzi xi jzi ¼ 1Þb þ cE ðzi ui jzi ¼ 1Þ ¼ p þ clxc b;
ð8Þ
where l xg = E(x i | z i = 1). Eq. (8) will be used in the sequel. The OLS estimator is:
"
#1
X
X
bn ¼
b̂
xi Vxi
xi Vyi :
iaN
iaN
Substituting Eq. (7):
"
#1
X
X
xi Vxi
xi Vðwi þ zi xi b þ ui zi Þ:
bn ¼
b̂
iaN
ð9Þ
iaN
Partitioning the data by j g and j k, and taking into consideration z i and w i in each regime:
"
#1 "
#
X
X
X
X
xi Vxi
xi Vð0Þ þ
xi Vðxi b þ ui Þ þ
xi Vð1Þ
bn ¼
b̂
"
¼
iaN
X
iaN
#1 "
xi Vxi
igjc [jp
X
iajc
xi Vxi b þ
iajc
X
iajc
xi uV i þ
X
iajp
#
xi V :
iajp
W.C. Horrace, R.L. Oaxaca / Economics Letters 90 (2006) 321–327
325
Hence:
E b̂
b n jxi ¼
"
X
#1 "
xi Vxi
iaN
"
¼
X
#1 "
xi Vxi
¼
xi xV i b þ
X
#1
xi Vxi
iajc
X
X
xi xV i b þ
X
"
xi Vxi b þ
X
iaN
xi VE ui jxi ; iajc þ
X
#
xi V
iajp
xi Vð0Þ þ
iajc
iajc
iaN
X
iajc
iajc
iaN
"
X
X
#
xi V
iajp
#1
xi Vxi
X
xi Vpb;
ð10Þ
iajp
which is generally biased and asymptotically biased, because c b 1. When c = 1, j g = N, the first term on
the RHS is b, the second term is 0, and b̂n is unbiased.
The inconsistency of bˆn follows in a similar fashion. Letting C denote the cardinality operator, define
denote the probability limit operator as n Y l.
n k = C(j k), n g = C(jP
g) and n U = n n k n g. Let plim
1 P
1
Q and Q g are finite, (nonAssume plim [n
iaN xVx
i i ] = Q and plim [nP
g
iang xVx
i i ] = Q g where
1
1
1 P
x
V]
=
l
V
,
plim
[n
singular)
positive
definite.
Assume
plim
[n
k
ian
i
xk
iaN xV]
i = l xV and plim [n g
k
P
1
1
V and l xV are finite vectors. Assume plim [n n k] = p and plim [n gn ] = c.
iang x Viu i ] = 0, where l xk
Then:
plim b̂
b n ¼ Q1 Qc bc þ plVxp pb:
Even if c and p were known, b̂n could not be bias corrected, yet Eq. (8) seems to imply that if c and p
were known, an OLS regression of ( y i p) on (cx i ) might be unbiased. Define transformed OLS
estimator:
b̂ n* ¼
b
"
X
#1
c2 xi Vxi
iaN
X
cxi Vðyi pÞ:
ð11Þ
iaN
Theorem 4. b̂n* is biased and inconsistent for b.
Proof. Eq. (11) implies
"
#1
"
#1
"
#1
X
X
X
1 X
1 X
1
p X
*
xi Vxi
xi Vyi xi Vxi
xi p
V ¼ b̂
xi xV i
xi V:
bn ¼
b̂
bn c iaN
c iaN
c
c iaN
iaN
iaN
iaN
Hence,
"
#1
1 p X
X
*
b n jxi xi xV i
xi Vpb:
E b̂
b n jxi ¼ E b̂
c
c iaN
iaN
5
326
W.C. Horrace, R.L. Oaxaca / Economics Letters 90 (2006) 321–327
Thus, knowledge of p and c does not ensure an unbiased OLS estimator of b, and the bias will persist
asymptotically. Moreover, it does not facilitate consistent estimation. The problem with b̂n and b̂*n is not
that c and p are unknown but that j g is unknown. If we knew j g, we could perform OLS only on
observations i a j g. Therefore:
Remark 5. Sufficient information for unbiased and consistent OLS estimation is knowledge of j g.
Also, if j g = N, then:
X
xi Vxi ¼
iajc
X
xi xV i ; and
X
xi V¼ 0:
iajp
iaN
Therefore, Eq. (10) becomes:
E b̂
b n jxi ¼
"
X
#1
xi xV i
iaN
X
iaN
"
xi Vxi b þ
X
#1
xi xV i
ð0Þ ¼ b;
iaN
unbiased for j g = N. A similar argument can be made for consistency. If c = 1, then j g = N. Therefore:
Remark 6. Without knowledge of j g and j k, a sufficient condition for unbiased OLS when c b 1 is
j c = N.
j g = N is a weaker sufficient condition than c = 1, but probably unlikely in large samples. For any
given random sample, Pr[j g = N] = c n , so
lim Pr jc pN ¼ lim ð1 cn Þ ¼ 1:
nYl
nYl
Remark 7. Without knowledge of j g and j k, if c b 1 and j g = N, then as n Y l, j g p N with probability
approaching 1, and b̂n is asymptotically biased and inconsistent.
Therefore, as N grows, once the first observation x i b g [0, 1] appears, then j g p N and unbiasedness is
lost. Oddly, the estimator b̂n could be reliable in small samples yet unreliable in large samples.
3. Conclusions
Although it is theoretically possible for OLS on the LPM to yield unbiased estimation, this generally
would require fortuitous circumstances. Furthermore, consistency seems to be an exceedingly rare
occurrence as one would have to accept extraordinary restrictions on the joint distribution of the
regressors. Therefore, OLS is frequently a biased estimator and almost always an inconsistent estimator
of the LPM. If we had knowledge of the sets j g and j k, then a consistent estimate of b could be based
on the sub-sample i a j g. This is tantamount to removing observations i g j g, suggesting that trimming
observations violating the rule ŷ i = x i b̂n a [0, 1] and re-estimating the OLS model (based on the trimmed
sample) may reduce finite sample bias. This seems to hold in simulations, but formal proof of this result
is left for future research.
W.C. Horrace, R.L. Oaxaca / Economics Letters 90 (2006) 321–327
327
Acknowledgements
We gratefully acknowledge valuable comments by Seung Ahn, Badi Baltagi, Gordon Dahl, Dan
Houser, Price Fishback, Art Gonzalez, Shawn Kantor, Alan Ker, Paul Ruud and Peter Schmidt. Capable
research assistance was provided by Nidhi Thakur.
References
Amemiya, T., 1977. Some theorems in the linear probability model. International Economic Review 18, 645 – 650.
Bettis, J.R., Fairlie, R.W., 2001. Explaining ethnic, racial, and immigrant differences in private school attendance. Journal of
Urban Economics 50, 26 – 51.
Currie, J., Gruber, J., 1996. Health insurance eligibility, utilization of medical care, and child health. Quarterly Journal of
Economics 111, 431 – 466.
Fairlie, R.W., Sundstrom, W.A., 1999. The emergence, persistence, and recent widening of the racial unemployment gap.
Industrial and Labor Relations Review 52, 252 – 270.
Greene, W.H., 2000. Econometric Analysis. Prentice-Hall, Upper Saddle River, NJ.
Heckman, J.J., 1978. Dummy endogenous variables in a simultaneous equation system. Econometrica 46, 931 – 959.
Heckman, J.J., Snyder Jr., J.M., 1977. Linear probability models of the demand for attributes with an empirical application to
estimating the preferences of legislators. Rand Journal of Economics 28, S142 – S189.
Klaassen, F.J.G.M., Magnus, J.R., 2001. Are points in tennis independent and identically distributed? Evidence from a dynamic
binary panel data model. Journal of the American Statistical Association 96, 500 – 509.
Lukashin, Y.P., 2000. Econometric analysis of managers’ judgements on the determinants of the financial situation in Russia.
Economics of Planning 33, 85 – 101.
McGarry, K. 2000, Testing parental altruism: Implications of a dynamic model,Q NBER Working Paper 7593.
Reiley, D.H., 2005, Field experiments on the effects of reserve prices in auctions: More magic on the internet, mimeo,
University of Arizona.
Rosenthal, R.W., 1989. A bounded-rationality approach to the study of noncooperative games. International Journal of Game
Theory 18, 273 – 292.