Download Threshold Regression Without Distribution Assumption when the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

German tank problem wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Bias of an estimator wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Threshold Regression Without Distribution Assumption when
the Threshold Variable Is Endogenous
Hung-Pin Lai∗
Chang-Ching Lin†
This version: March 15, 2009
National Chung Cheng University
Academia Sinica
Abstract
In this paper, we propose a semi-parametric approach to estimate threshold models
without the reliance of distribution assumption when the threshold variable is endogenous. Early threshold models based on the exogeneity assumption on the threshold
variables are widely applied in economics studies. The invalidation of this assumption,
however, may bias the estimation and inference. To control for the effects caused by an
endogenous threshold variable, we propose a concentrated two-stage least squares method
using instrumental variables. This method is based on the local linear smoothers by Fan
(1992) and can be simplified as a concentrated kernel-weighted least squares estimation.
Particularly, our approach can consistently estimate the threshold model without relying on any distribution assumption and allows us to handle nonlinear effects caused by
the endogenous threshold variable in a more flexible setting. We establish consistency
of the threshold parameter and the asymptotic properties of the slope coefficients in the
proposed method. Monte Carlo simulations are performed to evaluate its finite sample
properties. The new method is applied to re-examine the effects of political institutions
on economic growth.
Keywords: Threshold Regression, Endogeneity, Normality, Semi-parametric Method, Instrumental Variables, Local Linear Approximation.
JEL classification: C13, C14, C51.
∗
Department of Economics, National Chung Cheng University, 168 University Rd., Min-Hsiung Chia-Yi,
Taiwan, tel: 886-5-272-0411 ext. 34110 Email: [email protected].
†
Institute of Economics, Academia Sinica, 128 Academia Road, Sec 2, Taipei 115, Taiwan. Tel: 886-227822791 ext. 301. E-mail: [email protected].
1
Introduction
In this paper, we propose an approach for estimating threshold models without distribution
assumption when the threshold variable is endogenous. As in Kourtellos, Stengos, and Tan
(2008), we turn the problem of an endogenous threshold variable into the problem of omitting
selection correction term. Our approach is a semi-parametric method with following features.
Firstly, it can consistently estimate the threshold model without any distribution assumption.
Secondly, our semi-parametric method, which is based on the local linear (LL) smoothers
by Fan (1992), can be simplified as a concentrated kernel-weighted two-stage least squares
estimation and allows us to handle the effects caused by the endogenous threshold variable.
Early threshold models based on the exogeneity assumption of the threshold variables
are widely applied in economics studies. The model internally sorts the data by an observed
threshold variable and split the sample into groups at some threshold. The model is parsimonious and easy for researchers to draw inference about the fundamental determinants of the
data generating process. Under the assumption that regressors and the threshold variable are
exogenous, Hansen (2000) derives the theoretical properties of the concentrated least squares
estimators of the threshold parameter and slope coefficients. Caner and Hansen (2004) further relax the assumption of exogeneity in the slope coefficients and develop an instrumental
variable estimation and a theory of inference. These papers, however, do not explicitly count
for the presence of the endogenous threshold variable. Evidence of the endogeneity in the
threshold variable is not uncommon. For example, in the literature of economic growth,
many empirical studies deal with parameter heterogeneity by utilizing the threshold model
with many fundamental determinants as threshold variables. In particular, Tan (2005) finds
the quality of institutions and ethnic fractionalization are the most important determinants
of economic growth, while these institution measures might be endogenous.
Kourtellos, Stengos, and Tan (2008) is the first study that proposes to treat the problem
of an endogenous threshold variable as the problem of omitting selection correction term.
To relax exogeneity of the threshold variable, they decompose the error term into two components: one is correlated with the threshold variable and the other is not. Then, they
rewrite the threshold model by these two components and find that the threshold model
with endogenous threshold variable is generally inconsistent as in the sample selection bias
correction method. But one of the major differences, as they pointed out, is that the whole
sample can be observed in the threshold model, but only a limited sample is observed in the
sample selection model. Consequently, one can explicitly control for the endogenous threshold variable by utilizing the estimated results from instrumental variable estimation. The
problem of an endogenous threshold variable then becomes the problem of a sample selection
bias correction model (or more precisely a Type-3 Tobit model). Adopting the two-stage
1
procedure of Heckman (1979), they use inverse Mills ratios to correct for the bias caused by
the endogeneity of the threshold variable, which is straightforward to implement but under
the (joint) normality of errors.
A crucial criticism of Heckman’s two-stage procedure is that the estimation results might
be sensitive to the normality/distribution assumptions. To avoid the imposition of normality,
Vella (1992, 1998) and Wooldridge (1998) suggest alternative two-stage estimation methods
by utilizing the estimated residuals from the selection equation by imposing other moment
conditions. Moreover, the selection correction term might not be linear, numerous econometricians have also suggested semi-parametric estimation of the type-3 Tobit model without
normality. For example, under the assumption that errors are independent of regressors,
Lee (1994), Carroll, Fan, Gijbels, and Wand (1997), and Li and Wooldridge (2002) advance
two-stage least squares methods involving nonparametric estimation of selection correction
term1 . Due to the sensitivity of these estimates to the choice of bandwidth in nonparametric
estimation, Chen (1997) proposed two stage estimations using trimmed sub-sample. Honore,
Kyriazidou, and Udry (1997) consider an alternative method under the conditional symmetry assumption of errors. See Vella (1998) and Christofides, Li, Liu, and Min (2003) for
comprehensive survey.
Notice that in the threshold model it is assumed that we can observe the whole sample but
the threshold parameter is unknown in advance. The methods used in the type-3 Tobin model
to relax the distribution assumption may not appropriate in the problem of the endogenous
threshold variable. However, with the advantage of considerable semi-parametric approaches
in the type-3 Tobin model, we can adopt the methods for the threshold model and investigate
their small and large sample properties. In particular, we propose a semi-parameter method
when the effects of endogeneity in the main equation are of unknown form. Specifically, we
apply the local linear approximation of Fan (1992, 1993) and Ruppert and Wand (1994) to
remove the effects of this nonlinear sample correction term.
Our proposed approach has the following advantages. Firstly, it can consistently estimate
the threshold model without relying on any distribution assumption and allows to capture the
effects of unknown form caused by the endogenous transition variable. Secondly, our method
inherits the desirable properties of Fan’s local linear approximation. The main idea of Fan’s
approach is that a nonlinear function can be nonparametrically approximated around each
observation by a linear function. Fan (1992) shows that this local linear smoother has high
minimax efficiency among all linear smoothers2 , and its bias and variance near the boundary
1
More precisely, Lee (1994) and Carroll, Fan, Gijbels, and Wand (1997) propose semi-parametric maximum likelihood estimations, and Li and Wooldridge (2002) propose a semi-parametric two-stage least squares
method for the Type-3 Tobit model.
2
As pointed by Fan (1992):”The minimax is an efficiency measure, playing a role similar to the Fisher
2
of data support are of the same order of magnitude as in the interior of support. These are
very appealing properties when we use sub-sample sets to estimate parameters in distinct
regimes for a given threshold. Moreover, the procedure can be simplified as a two-step
concentrated kernel-weighted least squares estimation using sub-sample observations.
Before closing this introduction, it should be pointed out that the kernel-based method
has been recently utilized in threshold model by Seo and Linton (2007). The focus of their
paper, however, is quite different from this paper in that they consider the case in which
the threshold is determined by a linear combination of several observed exogenous variables.
Consequently, the kernel-based method is used to approximate the threshold effects, while our
nonparametric method is specifically used to remove the effects of endogeneity in the threshold
variable. Moreover, our approach also has the flexibility to handle endogenous regressors by
incorporating the instrumental-variable threshold estimation developed by Caner and Hansen
(2004) before applying our approach to deal with the endogeneity of the threshold variables.
The remainder of this paper is organized as follows. In Section 2, the model of interest
and related studies are briefly introduced. In Section 3, we propose an approach to estimate
the threshold model without distribution assumption when the threshold variable is endogenous. Additionally, we briefly address the limitations of the proposed method. In Section 4,
asymptotic properties are established. Monte Carlo simulations are performed to evaluate
our method in Section 5. Section 6 re-examines the effects of (political) institutions on economic growth as an application and Section 7 concludes this paper. Mathematical proofs are
collected in Appendix.
2
The Model
Consider the following threshold regression model:
yi = (η1 + xi β1 )1(qi ≤ γ) + (η2 + xi β2 )1(qi > γ) + ei ,
qi = µ + zi θ + ui ,
(2.1)
where η1 , η2 , and µ are intercepts, β1 and β2 are slope parameters, 1(A) denotes the indicator
function such that 1(A) = 1 if A is true and 0 otherwise, xi denotes a 1×k vector of covariates,
qi denotes an observed threshold variables, γ ∈ Γ = [γ, γ] is the unknown threshold parameter
with real value, zi is a 1 × l vector of instruments, E [ui |zi ] = 0. To illustrate the main idea,
information in the parametric inference” For example, 80% efficient nonparametric estimator means that with
the sample size 100, the estimator works as well as the optimal estimator with sample size 80. Fan (1992)
further show that this smoother is can asymptotically be 100% efficient with an optimal choice of kernel and
bandwidth. Moreover, Ruppert and Wand (1994) extend the results to multivariate case and Hamilton and
Truong (1997) study the asymptotic properties of linear slope coefficients in a partial linear regression using
this local linear approximation.
3
we first assume that xi is exogenous and qi is endogenous in the sense that E [ei ui |xi , zi ] 6= 0.
Following Kourtellos, Stengos, and Ten (2008), suppose that
ei σe2
ρi σe σu
x
,
z
∼
0,
, i = 1, . . . , n,
ui i i
ρi σe σu
σu2
where ρi is the correlation coefficient between ei and ui . If we further assume that ρi = ρ1
when qi ≤ γ, and ρi = ρ2 when qi > γ, then it follows from Theorem 2.4.1 of Hogg, McKean
and Craig (2005) that
E (ei |ui , xi , zi , qi ) = [ρ1 σe ui 1(qi ≤ γ) + ρ2 σe ui 1(qi > γ)] .
(2.2)
Therefore, we may decompose ei as the sum of two orthogonal components, that is
ei = [ei − E (ei |ui , xi , zi , qi )] + E (ei |ui , xi , zi , qi ) .
(2.3)
Based on the results, the effect caused by the threshold variable, ρj σe ui .
Substituting (2.2) and (2.3) into (2.1) suggests the model can be expressed as
yi = (η1 + xi β1 + ρ1 σe ui )1(qi ≤ γ)
+(η2 + xi β2 + ρ2 σe ui )1(qi > γ) + εi ,
qi = µ + zi θ + ui ,
(2.4)
where εi ≡ [ei − E (ei |ui , xi , zi , qi )] . A special case is that if there exists no correlation between
ei and ui , i.e., ρ1 = ρ2 = 0, then there will be no endogeneity effects caused by the threshold
variable and (2.1) will degenerate to the model of Seo and Linton (2000).
In the above case, we treat ui as an observed variable and assume that ui is in the
information set. However, if ui is unobserved, then the condition expectation of ei becomes
E [ei |xi , zi , qi ] = E [ei |xi , zi , qi ≤ γ] 1(qi ≤ γ) + E [ei |xi , zi , qi > γ] 1(qi > γ);
therefore, the two regimes can be fully described by the level of qi . In particular, by defining
E [ei |xi , zi , qi ≤ γ] ≡ f1 (γ − µ − zi θ) and E [ei |xi , zi , qi > γ] ≡ f2 (γ − µ − zi θ) we may rewrite
(2.1) as
yi = [η1 + xi β1 + f1 (γ − µ − zi θ)] 1(qi ≤ γ)
+ [η2 + xi β2 + f2 (γ − µ − zi θ)] 1(qi > γ) + εi ,
qi = µ + zi θ + ui ,
(2.5)
where
εi = ei − E [ei |xi , zi , qi ≤ γ] 1(qi ≤ γ) − E [ei |xi , zi , qi > γ] .
(2.6)
An special case of (2.5) is considered by Kourtellos, Stengos, and Ten (2008), where
they assume that (ei , ui )T follows a bivariate normal normal distribution and propose a
4
concentrated two stage least squares (C2SLS) method to estimate the parameters of interest
based on (2.5) and show consistency of their C2SLS estimator of γ. Under the bivariate
normal assumption, f1 and f2 can be replaced by the inverse Mills ratios and turn the problem
into one of a sample selection type model. More precisely, f1 (γ −µ−zi θ) = κ1 λ1 (qi , zi ; µ, θ, γ)
and f2 (γ −µ−zi θ) = κ2 λ2 (qi , zi ; µ, θ, γ), where κ1 , κ2 are some constants and λ1 (qi , zi ; µ, θ, γ)
and λ2 (qi , zi ; µ, θ, γ) are the so called Inverse Mills ratios, defined as
λ1 (qi , zi ; µ, θ, γ) =
φ(γ − µ − zi θ)
−φ(γ − µ − zi θ)
, and λ2 (qi , zi ; µ, θ, γ) =
,
Φ(γ − µ − zi θ)
1 − Φ(γ − µ − zi θ)
with φ and Φ being, respectively, the density and distribution function of a standard normal
variable.
As in the sample selection bias correction method, the threshold estimation is generally
inconsistent if the inverse Mills bias correction term λj (qi , zi ; µ, θ, γ), where j = 1, 2, is omitted. But a crucial difference between them is that only a partial sample observed in a regime
in the sample selection model, but the variable that drives this assignment is latent; instead,
the whole sample and threshold variable are observed but do not know the group membership
in the threshold model. Thus, in (2.5), (µ̂, θ̂) can be directly estimated from the regression
of entire sample of qi on zi using least squares method. We can replace (µ, θ) by (µ̂, θ̂) and
then estimate the threshold model by plugging in λ1 (qi , zi ; µ̂, θ̂, γ) and λ2 (qi , zi ; µ̂, θ̂, γ) as a
regressor for a given γ. However, the assumption of the normality of the joint distribution
of (ei , ui )T might be too restrictive. We might also relax the restriction that ρi = ρ1 for all
i ∈ I1 = {i : qi ≤ γ} and ρi = ρ2 for all i ∈ I2 = {i : qi > γ}.
In the next section we shall consider the general case and propose a distribution-free, semiparametric approach to estimate the model given in (2.5). In our model the joint distribution
of (ei , ui )T is unspecified and thus the functional forms of f1 and f2 are unknown.
3
The local linear semi-parametric approach
Our estimation approach applies the local linear (LL) regression proposed by Fan (1992) to
control the threshold effect or approximate true functions gj (γ −µ−zi θ) ≡ ηj +fj (γ −µ−zi θ),
where j = 1, 2. In a manner similar to Hansen (2000), we define δn = β1 − β2 , β = β2 , and
xi (γ) = xi · di (γ), where di (r) = 1 (qi ≤ γ) ,then Model (2.5) can be represented as
yi = xi β + g2 (wi ) + xi (γ) δn + di (r) [g1 (wi ) − g2 (wi )] + εi ,
(3.1)
where wi = γ − µ − zi θ. Since the nonlinear functions g1 (wi ) and g2 (wi ) are of unknown
forms, the LL estimators of g1 (·) and g2 (·) are then used to replace by the true functions in
estimating β1 , β2 , and γ.
5
The LL estimator suggests approximating the true nonlinear function ‘locally’ by a linear
function. According to the first order Taylor’s expansion, gj (wl ), j = 1, 2, can be locally
approximated by a linear function around wi :
j
j
gj (wl ) ≈ δi0
+ δi1
(wl − wi ), j = 1, 2,
(3.2)
j
j
where δi0
= gj (wi ), and δi1
= ∂gj (wi )/∂wi , depend on zi , µ, θ, and γ. In particular,
j
gj (wi ) = δi0
when wl = wi ,
(3.3)
which suggests that the function value at wi is exactly the intercept term of the local linear
function.
Below, we will briefly introduce how to use (3.2) and (3.3) in estimating β1 , β2 and γ. To
illustrate the main idea, suppose that a candidate value of γ is given and we first consider
estimating β1 (or β2 ) by using the observations with qi ≤ γ (or qi > γ). In the first stage, we
j
j
estimate (δi0
, δi1
) for each j around wi by the LL method. Then, in second stage, we replace
j
gj (wi ) by its estimate δbi0
and directly estimate βj , j = 1, 2, respectively. Finally, we obtain
the least squares estimator of γ that delivers the least total sum of squared residuals among
all potential values of γ.
For a given γ, we can divide the whole sample can into two groups such that we have
1 (qi ≤ γ) = 1, for i = 1, ..., n1 , the observations contained in the first group, and 1 (qi ≤ γ) =
0, for i = n1 + 1, ..., n, the observations in the second group. Assume that the second group
has n2 = n − n1 observations. For simplicity, we explain the procedure by using observations
contained in the first group.
(a) Estimation of the unknown function g( · ): Given β (and γ, µ, θ), the LLR
estimator of gj (w) at w = γ − µ − zi θ can be obtained by the following weighted least
squares regression:
min
2
1
1
− δl1
(wl − wi ) Kh (wl − wi ),
(yl − xl β1 ) − δl0
X
1 ,δ 1
{δl0
l1 } {l:q ≤γ
l
}
where h is a bandwidth and Kh (ξ) = h−1 K(h−1 ξ) is a kernel function. Let






y1
x1
1 w1 − wi






..
Y1 =  ...  , X1 =  ...  , Mi1 =  ...
,
.
1 wn 1 − wi
yn1
xn1
Wi1
= diag (Kh (w1 − wi ), ..., Kh (wn1 − wi )) and
6
δi1
=
1
δi0
1
δi1
,
(3.4)
then the the problem given in (3.4) can be further represented in a matrix form
T
min Y1 − X1 β − Mi1 δi1 Wi1 Y1 − X1 β − Mi1 δi1 .
(3.5)
δi1
The LL estimator of the 2 × 1 vector δl1 is
δbi1 (β1 ) = (Mi1T Wi1 Mi1 )−1 Mi1T Wi1 (Y1 − X1 β1 ) .
(3.6)
Thus according to (3.2) and (3.3)
gb1 (wi ) = Si1 (Y1 − X1 β1 ) ,
(3.7)
where Si1 = v(Mi1T Wi1 Mi1 )−1 Mi1T Wi1 is an 1 × n1 vector and v = (1, 0)T . In a similar
way, we can define Y2 , X2 , Mi2 , Wi2 and Si2 , and obtain the LL estimator of g2 (wi ),
gb2 (wi ) = Si2 (Y2 − X2 β2 ) .
(3.8)
In this step, we have solve gb1 (wi ) and gb2 (wi ) as a functions of β1 and β2 , respectively.
(b) Estimation of β1 and β2 : Let’s first define S1 as an n1 ×n1 matrix with Si1 as the ith
row. By substituting (3.7) into (3.5), the objective function for the first group becomes
T
minβ1 Y1 − X1 β − Mi1 δbi1 (β1 ) Wi1 Y1 − X1 β − Mi1 δbi1 (β1 )
T e 1 β1
e1 β1
= minβ1 Ye1 − X
Ye1 − X
(3.9)
e1 = (In − S1 )X1 . Then the least squares (LS) estimator of
where Ye1 = (In1 − S1 )Y1 , X
1
β1 for a given γ is
−1 e1T X
e1
e1T Ye1 .
β̂1LL (γ) = X
X
(3.10)
Remark 1: The n1 × n1 matrix S1 plays a role of the projection matrix. SX projects
X onto the space generated by w;in other words, SX is an approximation of E(X|w).
The main difference between S and the least squares projection matrix is that S is a
nonlinear projection. Moreover, (In1 − S1 )X1 is the error obtained from the nonlinear
projection.
Remark 2: Here we estimate β1 for a given a γ. Our estimator β̂1LL (γ) is obtained
by (3.9) without explicitly estimating gb1 (·) using the observations satisying qi ≤ γ.
Analogously, using the observations with qi > γ, we can further obtain β̂2LL (γ) from
T e2 β2
e2 β2 .
min Ye2 − X
Ye2 − X
β2
Finally, wi is unknown even when the potential value of γ is provided. Practically, we
can replace wi by ŵi = γ − µ̂ + zi θ̂ in the above procedure, where µ̂ and θ̂ are the least
squares estimates from the regression of qi on an intercept and zi , respectively.
7
(c) Estimation of γ: The specification of (3.1) can be written as
Y = Xβ + G + Xγ δn − DG + ε,
(3.11)
where
X1
Y =
,X =
, Xγ =
0n2×k




g2 (w1 )
g1 (w1 )
β1


 1 
..
..
β=
,G = 
,
,G = 
.
.
β2
g2 (wn )
g1 (wn1 )


g2 (w1 )
e 1 − G1
G


.
1
e
..
G =
.
 , and DG =
0n2×1
g2 (wn1 )
Y1
Y2
X1
X2
Xγ is a n × k matrix with xi di (γ) as the ith row such that the first n1 rows satisfying
that di (r) = 1 (qi ≤ γ) = 1 and the remaining n2 rows vectors are all zero vectors.
Furthermore, let’s define
1
Wi1
On1 ×n2
Mi1
S
Wi =
, Mi =
, and S =
2
2
On2 ×n1 Wi
Mi
On2 ×n1
On1 ×n2
S2
,
then (3.7) and (3.8) suggest the LL estimators of G1 , G2 , and G are
b j = S j (Yj − Xj βj ), for j = 1, 2
G
(3.12)
b = S(Y − Xβ),
G
(3.13)
and
b = S(Y − Xβ) is a n × 1 vector. Subsituting (3.12) and (3.13) into (3.11) gives
where G
b + Xγ δn − SXγ δn + ξ,
Y = Xβ + G
(3.14)
where ξ is a vector of the residuals. After rearranging (3.14) we get
(In − S)Y = (In − S)Xβ + (In − S)Xγ δn + ξ.
(3.15)
Base on (3.15) the local-linear semi-parametric estimator (LL estimator, hereafter) of
γ can be defined as
γ
eLL = arg min SLL
n (γ),
(3.16)
γ∈Γ
T
where SLL
n (γ) = ξ ξ is the sum of the sum of squared least squared residuals of (3.15).
8
It is easy to extend our method to the model with endogenous regressors and the endogenous threshold variable. Combining the Caner-Hansen method, we can use instrumental
variables to purge out the endogeneity in the regressors and then apply the propose method
to remove the effects of the endogenous threshold variable.
The advantage of this method is that it is not required to assume the functional forms
of E [f1 (ui )|xi , zi , ui ≤ γ − µ − zi θ] and E [f2 (ui )|xi , zi , ui > γ − µ − zi θ] in advance, which
makes the model much more flexible and easy to implement. The major disadvantage is
that we cannot directly estimate the intercepts in both regimes without extra assumptions
as in most of semi-parametric methods. If the only threshold effect is between the intercepts,
the proposed semi-parametric method is expected to perform poorly. Nevertheless, the finite
sample results of this semi-parametric method depend on choosing of a bandwidth. To resolve
this problem, for a given threshold, we can take the value as a bandwidth that delivers
minimum sum of squared residuals among all candidate values of bandwidth; or following
Fan and Gijbels (1992) to find a data-dependent bandwidth for a given kernel. Similar to
IV/GMM estimation, it is also required that at least one (exogenous) variable in zi not
included in xi .
4
The asymptotic properties
For the following analysis, we temporarily ignore the problem that wi is in fact replaced
by w
bi = γ − µ
b − zi θb and suppose that the true parameters µ and θ are both known. The
bandwidth h is also given and not discussed here.
Let the 1 × k vector ϕi = ϕ(wi ) = xi − E(xi |wi ), where i = 1, ..., n and E(xi |wi ) denotes
the conditional function of xi given wi ,and w = (w1 , ..., wn )T is a n × 1 vector. Moreover, we
use ϕij denote the j th element of ϕi . We define the following moment functionals.
T
M (γ) = E ϕT
i ϕi di (γ) , M = E ϕi ϕi ,
T
D(γ) = E ϕT
i ϕi |qi = γ , D = E ϕi ϕi |qi = γ0 ,
and
2
T
2
V (γ) = E ϕT
i ϕi εi |qi = γ , V (γ) = E ϕi ϕi εi |qi = γ0 ,
where γ0 represents the true value of γ. Furthermore, let fq (q) denote the density function
of qi .
In order to establish the asymptotic distribution of the proposes estimator, we impose
the a similar set of assumptions as that used in Hansen (2000).
9
[A1] (xi , qi , zi , εi , ui ) is strictly stationary, ergodic and ρ-mixing, with ρ-mixing coefficients
P
1/2
satisfying ∞
m=1 ρm < ∞.
[A2] E(εi |Fi−1 ) = 0 and E(εi |wi ) = 0.
[A3] E|ϕi |4 < ∞ and E|ϕi εi |4 < ∞.
[A4] For all γ ∈ Γ, E |ϕi |4 ε4i qi = γ ≤ C and E |ϕi |4 qi = γ ≤ C for some C < ∞, and
fq (γ) ≤ f < ∞.
[A5] fq (γ), D(γ), V (γ) are continuous at γ = γ0 .
[A6] δn = cn−α with c 6= 0 and 0 < α < 1/2.
[A7] cT Dc < 0, cT V c < 0, and fq > 0.
[A8] M > M (γ) > 0 for all γ ∈ Γ.
Following (3.11), define Sn (γ) = (Y − Xβ − G − Xγ δn + DG )T (Y − Xβ − G − Xγ δn + DG )
and denote limn→∞ n1 Sn (γ) = S(γ).
[A9] Assume γ0 = arg minγ∈Γ S(γ) is unique.
Assumptions [A1]-[A8] are quite similar to the conditions used by Hansen (2000), except
that the role of xi has been replaced by ϕi . [A9] assures that the true threshold value is
unique. Let fw (w) denote the density function of wi . Define ψi = ψi (wi ) = yi − E(yi |wi ) as
the conditional expectation of yi given wi . The following assumptions are required for our
analysis for the local linear estimator.
[B1] The conditional expectation of yi given wi can be presented as yi = ψi (wi ) + σ1 (wi )1i ,
i.i.d.
i = 1, ..., n, j = 1, ..., k and 1i ∼ (0, 1). The function ϕij has bounded and continuous
second partial derivatives.
[B2] The conditional expectation of xij given wi , is ϕij (wi ), and can be written as the
i.i.d.
following form xij = ϕij (wi ) + σ2j (wi )2ij , i = 1, ..., n, j = 1, ..., k, and and 2ij ∼ (0, 1).
The function ϕij has bounded and continuous second partial derivatives.
[B3] The random variable wi has a continuous density function fw (·) on a compact support,
and is bounded away from zero and infinity.
R
[B4] The kernel K is a compactly supported, bounded kernel such that K(u)du = 1,
R
R
K(u)2 du = R(K), and u2 K(u)du = µ2 (K) > 0, where µ2 is a scalar. All odd-order
moments of K(·) vanish.
10
[B5] The bandwidth hn satisfies that h → 0, nh → ∞.
The asymptotic properties of the LL estimator are summarized in Theorem 1 and 2.
p
Theorem 4.1: The LL estimator of γ is a consistent estimator of γ0 . That is γ
eLL → γ0 .
Theorem 4.2: Denote βδ = (β T , δnT )T the slope parameter. Then the LL estimator of βδ
are such that
−1 ∗T
E βbδ = βδ + Xγ∗T Xγ∗
Xγ (I − S)(G − DG )
and
−1
V ar βbδ = Xγ∗T Xγ∗
Xγ∗T (I − S) · Vε (I − S)T Xγ∗ (Xγ∗T Xγ∗ )−1 ,
1
where Vε = E(εεT ). Moreover, Bias of βbδ = Op (n− 2 h2 ) and V ar βbδ = Op (n−1 ).
√
Theorem 2 suggests that our LL estimator of the slope estimator is still n-consistent. The
1
bias vanishes at the rate n− 2 h2 . The above results are based on the assumption that wi is
b Let
observed. Empirically, wi is replaced by its estimator w
bi = γ − µ
b − zi θ.


1 w
b1 − wi

. .
ci = 
M
 .. ..
,
1 w
bn − wi
cT Wi M
ci )−1 (M
cT Wi ),
Sbi = v(M
i
i
and
b
be
be
T b
b
e be
e be
SLL
n (γ) = (Y − Xβ − X r δn ) (Y − Xβ − X r δn ).
Define γ
bLL = arg minγ∈Γ b
SLL
n (γ), then it remains to show
bLL
p
LL
supr∈Γ Sn (γ) − Sn (γ) → 0
to establish the consistency of γ
bLL .
5
Monte Carlo Simulation
In this section, Monte Carlo experiments are constructed to examine the finite sample
properties of the proposed estimators under several scenarios, including with and without
normality under various combinations of sample sizes. In particular, it is crucial to compare
our methods with the method by Kourtellos, Stengos, and Ten (2008). We investigate the
finite sample performance, based on the mean squared errors (MSE) of the estimates of the
threshold parameter, and the averaged MSE of the slope coefficients of exogenous regressors.
11
The data generating processes considered are described as follows. For i = 1, . . . , n,
yi = β0 + β1 xi 1(qi ≤ γ0 ) + β2 xi 1(qi > γ0 ) + ei ,
qi = α0 + α1 z1i + α2 z2i + ui ,
where xi , z1i
q, z2i ∼ iid N (0, 1) and are mutually independent, ui ∼ iid(0, 1), ei = (ρi ui +
(1 − ρi )εi )/ ρ2i + (1 − ρi )2 , εi ∼ iid(0, 1) and independent of ui , the true threshold γ0 = 1,
β0 = β1 = α0 = 1, α1 = α2 = 3, β2 = β1 + δ. To demonstrate the impact of nonnormally distributed errors on estimation, the cases are refereed as Normal, χ2 , and Mixed
Normal if both vi and i are generated from normal, χ2 , and mixed normal distributions,
respectively. There are two types in each case: in Type I, ρi = 0.6 when qi ≤ γ0 and ρi = 0.8
when qi > γ0 ; in Type II we allow ρi to vary across units and ρi = {0.8, 0.4, 0.85, 0.7} as
1 ≤ i ≤ n/4, n/4 < i ≤ n/2, n/2 < i ≤ 2n/3, and 2n/3 < i ≤ n, respectively. The mean
of ρi in each group is the same as that in Type 1. For each case in each type, we consider
δ = {0.05, 0.10, 0.25} and n = {100, 200, 400}. The number of Monte Carlo replications is
1000 in each of the experiment.
We compare our method with Kourtellos, Stengos, and Ten’s (2008) method, referred as
the HECKIT method, and the threshold estimation without controlling for the endogeneity
in transition variable, referred as the Naive method. The proposed method is referred as the
LL method. To reduce the boundary effects, which might bias the the threshold estimates
in the proposed method, we also consider the to obtain the residuals from trimmed OLS
using 95% lowest observations (in terms of Euclidean distance) in each split subsample to
find the estimated threshold, and then given this estimated threshold we then use the regular
LL estimation to obtain the slope estimates. We refer this hybrid estimation strategy as
the MIX model. To evaluate the finite sample accuracy of these methods, we investigate
the efficiency of the Naive, LL, and MIX estimators relative to the HECKIT estimator. In
particular, the efficiency is defined as the ratio of the mean squared errors (MSE) of the
Naive or LL estimator to that of the HECKIT estimator. The MSE of the estimates of the
threshold parameter and the averaged MSE of the slope coefficients of exogenous regressors
defined as:
M SE(γ̂) =
R
R
r=1
r=1
1 X
1 X
(γ̂r − γ0 )2 , M SE(β̂1 , β̂2 ) =
[(β̂1,r − β1 )2 + (β̂2,r − β2 )2 ],
R
2R
where γ̂r , β̂1,r , and β̂2,r denote the estimates of γ0 , β1 , and β2 in the rth trial, respectively,
and R is the total number of replications.
First, we compare the finite sample performance in terms of the accuracy of threshold estimation. Table 1 reports the efficiency of the Naive, LL, and MIX estimators of the threshold
12
parameter relative to the HECKIT. It is obvious that the HECKIT, LL, and MIX estimators
tend to outperform the Naive method over all experiments sometime by a large margin. This
result indicates that it is crucial to control for the endogeneity in the transition variable. As
expected, the LL and MIX estimators are more efficient than the HECKIT method when
the errors are generated from non-normal distribution. Even in the cases with normally distributed errors, the LL and MIX estimator is still comparable to the HECKIT. With normally
distributed errors, the HECKIT tends to be more efficient than the proposed methods when
sample size is 400. However, when sample size is 100 or 200, the LL and MIX estimators
can estimate the threshold more accurately. Increasing the sample size tends to reduce the
advantage of using the LL and MIX methods over the HECKIT method. While increasing
δ reduces the gap between the Naive method and the other methods controlling for the endogeneity in transition variable, its effect on the difference between the LL method and the
HECKIT methods is ambiguous. Finally, the MIX method tends to generate more accurate
threshold estimates than the LL method. From the distribution of threshold estimates not
reported here, we observe that the MIX method tends to choose smaller threshold estimates
than the LL method when large values of threshold are suggested by the LL method, while
in at least 80% of replications the MIX and the LL methods will suggest the same threshold
values.
Next, we consider the efficiency on estimating the slope parameters. Table 2 reports
the efficiency of the Naive, LL, and MIX estimators of the slope parameters relative to the
HECKIT method. Similar to Table 1, the HECKIT, LL, and MIX methods are uniformly
more reliable in estimating the slope parameters, and increasing δ will reduce the benefit of
controlling for the endogeneity in transition variable. There is a slight difference from Table
1. When errors are normally distributed, the HECKIT method will uniformly deliver more
accurate slope estimates than the others. With non-normally distributed errors, while the LL
method cannot always deliver more accurate slope estimates than the HECKIT method, the
range of ratio is still between 0.98 and 1.04 when sample size is 400. On the other hand, the
MIX method tends to generate more accurate slope estimates than the others when residual
distributions are not normal and perform slightly better than the LL method due to its
slightly more accuracy of choosing threshold value.
Overall, our simulation results show the importance of controlling the endogeneity in the
transition variable. The proposed method can deliver more accurate threshold estimate than
Kourtellos, Stengos, and Ten’s (2008) method when errors are non-normally distributed. The
advantage of using more flexible method is ambiguous in terms of the accuracy of the slope
estimators. But removing the outliers during estimation procedure can improve the accuracy
of determining threshold value and thus reduce the MSE of slope coefficient estimates.
13
Tables 1 and 2 around here
6
Empirical Application
We re-visit the work of Minier (2007) to investigate the effects of institutions on growth
by the proposed methods. Because institutions might affect the relationship between growth
and its other determinants, it can act as a potential threshold variable. Particularly, we
will study the indirect effects by taking institutions as threshold variables. Minier (2007)
uses the regression tree procedure, allowing for any number of variables to be considered as
“split variables” that separate the full sample into subsamples. She finds that “executive
constraints” significantly affected the other growth determinants. Instead, we will take this
variable as a threshold variable and then control for potential endogeneity with the proposed
methods in the context of threshold models. Using the data of Glaeser, Porta, Lopez-de
Silanes, and Shleifer (2004), we consider the empirical regression of growth in per capita
GDP over 1960 to 2000 on initial GDP per capita, initial investment/GDP ratio, initial
schooling, and the percentage of the population that lives in temperate climate zones, and
use a measure of the initial level of constraints on the executive branch as the transition
variables. Due to the potential endogeneity of “executive constraints”, it is expected that
the results might be different from Minier (2007). Moreover, we use the averaged degree of
autocracy and the percentage of years as instrumental variables, which are also available from
Glaeser et al. (2004). There are 70 countries in the sample. Along the value of transition
variable, countries are grouped into two groups: Group 1 – low executive constraint , and
Group 2 – high executive constraints.
The empirical estimation results are reported in Table 3. The result from the naive
OLS without controlling for the endogeneity in transition variable is just identified with that
of Minier (2007, Table 2). If we control this endogeneity by using the HECKIT method
suggested by Kourtellos, Stengos, and Ten’s (2008), we can find the inverse Mill’s ratio is
statistically significant in Group 1 but not in Group 2, which indicates that the “executive
constraints” is endogenous for Group 1 countries and the estimates results from the naive
OLS method might be distorted for these countries. Moreover, if we further relaxed the
assumptions imposed in the HECKIT model and use the MIX method to estimate the parameters, the estimated threshold is 5.561 and 27 countries are clustered into Group 1. Also,
notice that the LL method suggests the same partition as the naive OLS method. However,
after correcting the potential bias caused by the endogeneity in transition variable, the coefficients are different from the Naive method in Group 1 countries; but seem similar in Group
2. Our method suggests that schooling is the only significant explanatory variable in Group 1
countries. It can also be found that the instrumental variables chosen are valid from Table 4,
14
which reports the first stage regression of transition variable on the exogenous variables and
the exogeneity regression-based test over all countries.
Tables 3 and 4 around here.
To investigate how the effects caused by the endogeneity in transition variable vary across
countries, we plots the estimated effects from different methods in Figure 1. Particularly, in
the LL and MIX methods, the nonlinear estimates are obtained by Ŝ j (Yj − Xj β̂j ), j = 1, 2;
and the HECKIT method, the estimates are captured by ρ̂j λj (ŵi ), where β̂j and ρ̂j are the
estimates from different regimes j = 1, 2. For comparison, the nonlinear estimates plotted
are the deviations from the group-specific mean in each different regime. While the nonlinear
estimates from all methods for Group 2 countries seem trivial, these estimated nonlinear
effects are quite visually significant for Group 1 countries. Moreover, the shapes of the
nonlinear effects from the LL and MIX methods are different from that of the HECKIT
method, which indicates that more flexible setting might be appropriate to capture the effects
caused by the endogenous variable in the case of interest.
Figure 1 around here.
We further regress the nonlinear components capture by the LL and MIX methods on a
constant, the level, square, and cube of ŵi and report the results in Table 5. The coefficients
over all are quite significant, which indicates that the effects captured by the proposed method
is not constant across observations and therefore extra bias might be generated if we do not
control for the endogeneity in the transition variable.
Tables 5 and 6 around here.
Overall, in this application we demonstrate the importance of controlling the endogeneity
in the transition variable. We also show the advantage of using the proposed method on
capturing the effects caused by the endogenous variable. The detailed country list and the
estimated group membership can be found in Table 6. The policy implication and other
subtle analysis will be left for the future study.
7
Conclusion
We propose a concentrated two-stage least squares method using instrumental variables,
based on the local linear smoothers by Fan (1992), to handle the effects caused by the
15
endogenous threshold variable. Our proposed estimator is consistent without relying on any
distribution assumption and allows us to handle nonlinear effects in a more flexible setting.
Our simulation results show that the proposed method can deliver more accurate threshold
estimate when errors are non-normally distributed.
Based on this paper, it might be interesting to compare our proposed method with the
other semi-parametric methods. It would be also helpful to further develop feasible test
methods for potential endogeneity in the threshold variable for semi-parametric methods,
regardless of whether some regressors are endogenous or not. We will leave these as future
studies.
REFERENCES
Caner, M., and B. E. Hansen (2004): “Instrumental Variable Estimation of a Threshold
Model,” Econometric Theory, 20, 813–843.
Carroll, R. J., J. Fan, I. Gijbels, and M. P. Wand (1997): “Generalized Partially
Linear Single-Index Models,” Journal of the Americian Statistical Association, 92, 477–
489.
Chen, S. (1997): “Semiparametric Estimation of the Type-3 Tobit Model,” Journal of
Econometrics, 80, 1–34.
Christofides, L. N., Q. Li, Z. Liu, and I. Min (2003): “Recent Two-Stage Sample
Selection Procedures With an Application to the Gender Wage Gap,” Journal of Business
and Economic Statistics, 21, 396–405.
Fan, J. (1992): “Design-adaptive Nonparametric Regression,” Journal of the Americian
Statistical Association, 87, 998–1004.
(1993): “Local Linear Regression Smoothers and Their Minimax Efficiency,” The
Annals of Statistics, 21, 196–216.
Hamilton, S. A., and Y. K. Truong (1997): “Local Linear Estimation in Partly Linear
Models,” Journal of Multivariate Analysis, 60, 1–19.
Hansen, B. E. (2000): “Sample Splitting and Threshold Estimation,” Econometrica, 68,
575–603.
Heckman, J. (1979): “Sample Selection Bias as a Specification Error,” Econometrica, 47,
153–161.
16
Honore, B. E., E. Kyriazidou, and C. Udry (1997): “Estimation of Type-3 Tobit
Models Using Symmetric Trimming and Pairwise Comparisons,” Journal of Econometrics,
76, 107–128.
Kourtellos, A., T. Stengos, and C. M. Tan (2008): “Threshold Regression with Endogenous Threshold Variables,” Unpusblished.
Lee, L.-F. (1994): “Simulated Maximum Likelihood Estimation of Dynamic Discrete Choice
Statistical Models Some Monte Carlo Results,” Journal of Econometrics, 82, 1–35.
Li, Q., and J. M. Wooldridge (2002): “Semiparametric Estimation of Partially Linear
Models for Dependent Data With Generated Regressors,” Econometric Theoy, 18, 625–645.
Minier, J. (2007): “Institutions and Parameter Heterogeneity,” Journal of Macroeconomics,
29, 595–611.
Ruppert, D., and M. P. Wand (1994): “Multivariate Locally Weighted Least Squares
Regression,” The Annals of Statistics, 22, 1346–1370.
Seo, M. H., and O. Linton (2007): “A Smoothed Least Squares Estimator for Threshold
Tegression Models,” Journal of Econometrics, 141, 704–735.
Tan, C. M. (2005): “No One True Path: Uncovering the Interplay Between Geography,Institutions, and Fractionalization in Economic Development,” Series 2005-12, Dept.
of Economics, Tufts University.
Vella, F. (1992): “Simple Tests for Sample Selection Bias in Censored and Discrete Choice
Models,” Journal of Applied Econometrics, 7, 413–421.
(1998): “Estimation Models with Sample Selection Bias: A Survey,” Journal of
Human Resources, 33, 127–169.
Wooldridge, J. M. (1998): “Selection Corrections With a Censored Selection Variable,”
Unpusblished, Michigan State University.
17
Appendix
e and (In − S)Xγ = X
eγ . Then average sum of
Denote (In − S)Y = Ye , (In − S)X = X,
squared error function can be represented as
T e
e
e
e
e
e
Y
−
Xβ
−
X
δ
Y
−
Xβ
−
X
δ
SLL
(β,
δ,
γ;
h)
=
γ n
γ n .
n
Conditional on γ and given
choice bandwidth h, (β, δ) can be obtained by the LS
a proper
∗
e
e
estimation. Denote Xγ = X, Xγ , then our LL estimator of (β, δ) is
!
−1
βb (γ; h)
∗T ∗
(A.1)
=
X
X
Xγ∗T Ye .
γ
γ
δb (γ; h)
The concentrated sum of squared error function is then a function of the threshold value γ
b (γ; h) , δb (γ; h) , γ; h
SLL
β
n
−1
Xγ∗T Ye ,
= Ye T Ye − Ye T Xγ∗ Xγ∗T Xγ∗
= Ye T (In − Pγ )Ye ,
where Pγ = Xγ∗ Xγ∗T Xγ∗
−1
(A.2)
Xγ∗T . Therefore,
γ
eLL = arg minγ∈Γ SLL
n (γ),
LL β
b (γ; h) , δb (γ; h) , γ; h .
where SLLR
(γ)
is
in
fact
the
abbreviation
of
S
n
n
Define x
ei = (xi − Si X) as the ith row of X, which is also the LL estimator of ϕi . Similarly,
LL
denote yei = (yi − Si Y ) as the LL estimator of ψi Let S (γ) = limn→∞ n1 SLL
n (γ). Recall that
00
∂ϕi1
∂ϕik
∂ 2 ϕi1
∂ 2 ϕik
0
ϕi = (ϕi1 , ..., ϕik ) so ϕi = ( ∂wi , ..., ∂wi ) and ϕi = ( ∂w2 , ..., ∂w2 ). Lemma 1 directly follow
i
i
from Theorem 1 of Fan (1992).
Lemma 1: Under assumption B,
1
E (e
xij − ϕij |wi ) = h2 ϕ00ij µ2 (K) + op (h2 ),
2
and
1
E (e
yi − ψi |wi ) = h2 ψi00 µ2 (K) + op (h2 ),
2
where ϕ00ij denote the j th element of ϕ00i .
1 R(K)
1
V ar (e
xij |wi ) =
σ2j (wi ) + op ( ),
nh fw (wi )
nh
1 R(K)
1
V ar (e
yi |wi ) =
σ1 (wi ) + op ( ).
nh fw (wi )
nh
18
Let lk = (1, ..., 1) be a 1 × k vector and lk×k be a k × k mtrix with all elements being
ones. Lemma 2 follows from Lemma 1.
Lemma 2: Under Assumption B
n
n
1X T
1
p 1 X T
x
ei x
ei →
ϕi ϕi + h2 µ2 (K)
n
n
2
i=1
i=1
#
" n
1 X T 00
00T
ϕi ϕi + ϕi ϕi + op (h2 lk×k ),
×
n
(A.3)
i=1
= M + op (lk×k )
as n → ∞ and h → 0.
n
n
1 X e iT e i p 1 X T
1
Xγ Xγ →
ϕi ϕi di (γ) + h2 µ2 (K)
n
n
2
i=1
i=1i
" n
#
1 X T 00
2
×
ϕi ϕi + ϕ00T
i ϕi di (γ) + op (h lk×k ),
n
(A.4)
i=1
= M (γ) + op (lk×k )
as n → ∞ and h → 0.
LL
p
Lemma 3: Under assumption A and B, supγ∈Γ S (γ) − S(γ) → 0.
Proof: Using (3.1), (3.14), and Lemma 1 and 2, we have
∆i = ξi − εi
1 2
=
h µ2 (K) −ψi00 + ϕ00i β − ϕ00i di (γ)δn + op (h2 ),
2
1 2
h µ2 (K)C3 [lk β − 1 + |lk δn |] + op (h2 ),
<
2
(A.5)
(A.6)
for all i and γ, where C3 is a finite constant.
For instance, one may define C1 =
00 00
max1≤i≤n |ψi | , C2 = max1≤i≤n, 1≤j≤k ϕij , and C3 = max{C1 , C2 }. It implies that
∆2i = Op (h4 ).Denote ∆ = (∆1 , ..., ∆n )T . Therefore,
1 LL
Sn (γ) − Sn (γ)
n
1
= limn→∞ supγ∈Γ ξ T ξ − εT ε ,
n
1
= limn→∞ supγ∈Γ εT ∆ + ∆T ∆ ,
n
p
→ 0 + Op (h4 ).
limn→∞ supγ∈Γ
19
The uniform convergence of the first term is due to E(εi |wi , xi ) = 0 and also that ψi00 (·)
p
and ϕ00i (·) are both functions of wi , so limn→∞ supγ∈Γ n1 εT ∆ → 0.
Therefore, [A9] together with Lemma 3 suggests
LL
γ0 = arg minγ∈Γ S
(γ).
(A.7)
Proof of Theorem 1: Using (A.2) we obtain
LL
SLL
n (γ) − Sn (γ0 )
= Ye T (In − Pγ )Ye − ξ0T ξ0 ,
eγT (In − Pγ )X
eγ δn ,
eγT (In − Pγ )ξ0 + δ T X
= −ξ0T Pγ ξ0 + 2δnT X
0
n
0
0
(A.8)
where
e −X
eγ δn .
ξ0 = Ye − Xβ
0
(A.9)
Using the assumption that δn = cn−α and substituting it into (A.8) gives
T
n−1+2α SLL
n (γ) − ξ0 ξ0
e T (In − Pγ )ξ0 + 1 cT X
e T (In − Pγ )X
eγ c
= −n−1+2α ξ0T Pγ ξ0 + 2n−1+α cT X
γ0
γ0
0
n
1 T eT
eγ c + op (1),
c Xγ0 (In − Pγ )X
=
0
n
1 eT
1 T eT e
eγ c + op (1).
c Xγ0 Xγ0 c − cT X
P
X
=
(A.10)
γ
γ0
0
n
n
The second eauality of (A.10) is due to (A.9). Since ξ0 is defined on the true threshold value
γ0 , it implies that
e =0
E ξ0 |X
and thus we have
e ξ0 = 0
E h X
e of X.
e Therefore, the first two terms in the second line of (A.10) vanish
for any function h X
in the limit.
eγ , Z
eγ ) such that
The projection matrix Pγ can also be written as the projection onto (X
eγ (β + δn ) + Z
eγ β + ξ, where Zeγ = X
e −X
eγ is a n × k matrix with the ith row being
Ye = X
eγ is
x
ei (1 − di (γ)). The ith row of X
ei = x
X
ei di (γ).
γ
20
e i is the LL estimator of ϕi di (γ). Obviously, the matrices also satisfy X
e TZ
e
It follows that X
γ
γ γ =
0k×k .
e = 0.
eγT Z
eγ and X
eγ = X
eγT X
eγT X
Let’s first consider the case when γ ≥ γ0 . Then we have X
0
0 γ
0
0
Then it follows from (A.10) that
LL
n−1+2α SLL
n (γ) − Sn (γ0 )
p
→ cT M (γ0 ) c − cT M (γ0 )M (γ)−1 M (γ0 )c,
≡ b1 (γ).
(A.11)
Similar to Hansen’s (2000), we have
dM (γ)
= D(γ)f (γ)
dγ
and thus
db1 (γ)
= −cT M (γ0 )M (γ)−1 D(γ)f (γ)M (γ)−1 M (γ0 )c ≥ 0,
dγ
which suggests that b1 (γ) is increasing on the interval [γ0 , γ]. Moreover,
db1 (γ0 )
= −cT D(γ0 )f (γ0 )c ≥ 0,
dγ
which suggests that γ0 is the minimum on [γ0 , γ]. For the case γ < γ0 , we can show that
p
LL
n−1+2α SLL
n (γ) − Sn (γ0 ) → b2 (γ), where b2 (γ) is a weakly decreasing continuous function
and has a unique minimum at γ0 on the interval [γ, γ0 ]. Therefore,
p
LL
n−1+2α SLL
n (γ) − Sn (γ0 ) → b1 (γ)1{γ ≥ γ0 } + b2 (γ)1{γ < γ0 },
LL
it follows that the LL estimator defined by γ
eLL = arg minγ∈Γ n−1+2α SLL
n (γ) − Sn (γ0 )
p
satisfies γ
eLL → γ0 .
To show the results given in Theorem 2, define the Euclidean norm of a vector a as
!1/2
1
P 2
kak =
aj
and the Euclidean norm of a matrix A = (aij ) as kAk = tr(AT A) 2 =
j
!1/2
P
ij
a2ij
. The results of Lemma 5 and equation (4.12) of Hamilton and Troung (1997)
2
suggest that (i) k(I − S)(G − DG )k = Op (h2 ) and (ii) (I − S)T Xγ∗ = Op (n).
Proof of Theorem 2: Recall that E(Y |X, w) = (X, Xγ )βδ +(G − DG ) and βbδ = Xγ∗T Xγ∗
21
−1
Xγ∗T Ye .
So
−1 ∗T
E βbδ = βδ + Xγ∗T Xγ∗
Xγ (I − S)(G − DG )
(A.12)
−1 ∗T
−1
V ar βbδ = Xγ∗T Xγ∗
Xγ (I − S)Vε (I − S)T Xγ∗ Xγ∗T Xγ∗
,
(A.13)
and
where Vε = E(εεT ). It follows from (A.12) that
−1 ∗T
Xγ (I − S)(G − DG ) ,
Bias of βbδ = Xγ∗T Xγ∗
−1 ∗ Xγ · k(I − S)(G − DG )k ,
≤ Xγ∗T Xγ∗
√
1
= Op (n) · Op ( n) · Op (h2 ) = Op (n− 2 h2 ),
1
√
where the last line follows from (i) and also that Xγ∗ = tr Xγ∗T Xγ∗ 2 = Op ( n).
For (A.13), notice that Lemma 1 suggest
1 ∗T ∗ p
Xγ Xγ → (M, M (γ)) + op (l2k×2k ),
n
(A.14)
−1
= Op n−1 · l2k×2k . Moreover, given that ε has bounded
which implies that Xγ∗T Xγ∗
variance, [A1] and [B2] together suggest that there exist a finite number C5 such that
Xγ∗T (I − S) · Vε (I − S)T Xγ∗
2
≤ C5 · (I − S)T Xγ∗ · l2k×2k ,
(A.15)
= C5 · Op (n) · l2k×2k ,
where the last equality follows from (ii). Therefore, (A.14) and (A.15) implies that V ar βbδ ≤
C5 · O(n−1 ) · l2k×2k .
22
Table 1
The efficiency of the Naive, LL, and MIX estimators of γ0 relative to the HECKIT estimator.
Case
type
n
δ
0.05
I
0.10
0.25
0.05
II
0.10
0.25
100
200
400
100
200
400
100
200
400
100
200
400
100
200
400
100
200
400
Normal
Naive LL
2.36
0.91
2.44
0.92
2.77
1.04
2.17
0.91
2.27
0.93
2.31
1.03
1.64
0.95
1.50
0.99
1.17
1.05
2.25
0.88
2.34
0.93
3.02
1.00
2.08
0.88
2.10
0.91
2.54
1.01
1.57
0.90
1.47
0.96
1.26
1.08
MIX
0.92
0.95
1.04
0.92
0.96
1.02
0.94
0.99
1.04
0.91
0.94
0.97
0.90
0.91
1.00
0.87
0.98
1.12
Naive
1.59
1.71
1.93
1.55
1.59
1.69
1.41
1.16
1.05
1.57
1.65
1.72
1.53
1.51
1.56
1.25
1.07
0.84
23
χ2
LL
0.82
0.89
0.94
0.83
0.89
0.94
0.85
0.90
0.95
0.85
0.88
0.93
0.85
0.90
0.94
0.85
0.90
0.97
MIX
0.80
0.88
0.92
0.78
0.87
0.93
0.84
0.88
0.93
0.83
0.87
0.91
0.84
0.88
0.91
0.83
0.88
0.95
Mixed Normal
Naive LL
MIX
2.22
0.82 0.77
2.62
0.93 0.92
2.88
0.97 0.95
2.11
0.82 0.79
2.35
0.93 0.91
2.35
0.92 0.91
1.78
0.89 0.89
1.55
0.95 0.91
1.20
0.95 0.97
2.17
0.84 0.90
2.60
0.94 0.90
2.72
0.94 0.94
2.03
0.82 0.86
2.31
0.89 0.87
2.30
0.94 0.95
1.56
0.91 0.93
1.45
0.90 0.89
1.06
0.95 0.94
Table 2
The efficiency of the Naive, LL, and MIX estimators of slope parameters relative to the HECKIT
estimator.
Case
type
n
δ
0.05
I
0.10
0.25
0.05
II
0.10
0.25
100
200
400
100
200
400
100
200
400
100
200
400
100
200
400
100
200
400
Normal
Naive LL
3.34
1.19
3.00
0.99
1.98
1.05
2.97
1.16
2.76
1.00
1.71
1.06
2.03
1.20
1.70
1.05
1.16
1.05
3.41
1.04
3.11
1.01
2.10
1.04
3.69
1.09
2.63
1.00
1.71
1.04
2.29
1.08
1.70
1.03
1.20
1.04
Mix
1.16
0.98
1.04
1.13
0.99
1.05
1.01
1.03
1.03
1.00
1.02
1.03
1.09
0.97
1.03
1.00
1.00
1.04
Naive
3.82
4.65
5.63
3.89
4.38
4.99
3.57
3.55
3.41
3.54
4.12
5.26
3.74
3.84
4.75
3.39
2.79
2.93
24
χ2
LL
1.10
1.06
1.00
1.14
1.06
1.00
1.15
0.98
0.98
1.15
1.10
1.04
1.23
1.10
1.00
1.24
1.02
1.00
Mix
0.81
0.97
0.95
0.84
0.95
0.96
0.87
0.83
0.92
0.77
1.01
0.97
0.83
0.99
0.91
0.85
0.91
0.93
Mixed Normal
Naive LL
Mix
3.77
0.96 0.84
3.00
1.00 0.96
2.76
1.00 1.00
3.22
0.91 0.85
2.49
1.02 1.00
2.16
0.98 0.97
2.52
0.99 0.94
1.77
1.04 1.01
1.31
1.01 0.99
3.90
0.92 0.88
2.84
1.01 0.96
2.55
1.02 1.01
3.51
0.91 0.88
2.33
1.01 0.95
2.10
1.00 1.00
2.33
0.95 0.94
1.48
0.99 0.99
1.21
1.01 0.98
Table 3
Empirical Example.
Con
ln(Y60)
ln(INV)
ln(SCH)
TEM
IMR
Con2
ln(Y60)2
ln(INV)2
ln(SCH)2
TEM2
IMR2
Naive
γ̂
n1
3.2368
20
Coef.
t
0.0426
(1.594)
-0.0068 (-1.805)
0.0064
(2.042)
0.0044
(1.674)
0.0140
(1.856)
0.1243
-0.0159
0.0043
0.0049
0.0224
(5.578)
(-5.345)
(1.222)
(1.841)
(5.405)
HECKIT
γ̂
n1
2.8462
15
Coef.
t
0.0381
(1.411)
-0.0055 (-1.433)
0.0065
(2.085)
0.0064
(2.099)
0.0121
(1.493)
0.1255
(5.585)
-0.0160 (-5.360)
0.0044
(1.239)
0.0045
(1.596)
0.0224
(5.412)
0.0065
(1.375)
-0.0020 (-0.415)
MIX
γ̂
3.561
Coef.
LL
n1
27
t
γ̂
3.2368
Coef.
n1
20
t
-0.0063
0.0039
0.0084
0.0075
(-1.053)
(1.068)
(1.788)
(0.621)
-0.0035
0.0052
0.0100
0.0020
(-0.735)
(1.593)
(2.348)
(0.193)
-0.0164
0.0051
0.0056
0.0207
(-5.466)
(1.591)
(1.815)
(5.350)
-0.0177
0.0037
0.0048
0.0219
(-5.895)
(1.174)
(1.531)
(5.739)
(a) Number of observations is 70. Dependent variable: Growth of GDP per capita, 1960∼2000.
(b) Regressors: lnGDP60 denotes Log GDP per capita in 1960, ln(SCH) denotes Log of years of
schooling of population over age 25, ln(INV) denotes Log of average investment/GDP from
1960 to 1965, TEM denotes the percentage of the population living in temperate climate zones
in 1995, and IMR denotes the inverse Mill’s ratio in the HECKIT model. Transition variable
(qi ): EXE denotes the extent of institutionalized constraints on decision-making powers of chief
executive: 1 = unlimited executive authority through 7 = executive parity or subordination. Instrumental variables: the averaged degree of autocracy over 1960-1990: 0 = democracy through
2 = autocracy, and the percentage of years 1975V2000 in which legislators elected under winnertake-all rule. All data except ln(INV) (taken directly from Penn World Tables) are taken from
Glaeser et al. (2004). See Minier (2007) for the details.
(c) The regression based Hausman test using whole sample indicates that these two variables are
valid instruments. γ̂ denotes the threshold estimate from each method and n1 denote the
number of observations with qi ≤ γ̂. The subscription 2 denotes the coefficients in the regime
with qi > γ̂.
25
Table 4
First Stage Estimation and Regression-based Exogeneity Test over the whole sample
Con
ln(Y60)
ln(INV)
ln(SCH)
ln(TEM)
AUT
PLU
R2
F -statistic
First Stage
(dep =EXE)
3.9686
(2.278)
0.2242
(1.014)
0.1024
(0.493)
0.2230
(1.300)
0.1752
(0.529)
-2.2289 -(8.097)
-0.2306 -(0.940)
0.81
49.04
*
**
Exogeneity Test
(dep = ê)
0.0156
(0.797)
-0.0015 -(0.587)
-0.0003 -(0.130)
-0.0008 -(0.419)
0.0004
(0.114)
-0.0055 -(1.769)
0.0016
(0.580)
0.05
0.59
(a) ê denotes the OLS residuals over all countries without threshold effects. The other variables
are defined in the note of Table 3.
(b) ** Significant at 1% level. * Significant at 5% level.
Table 5
Nonlinearity for the observations in Regime One (qi < γ̂)
Con
ŵi
ŵi2
ŵi2
MIX
(dep = φ̂i )
0.0434 (155.972)
0.0092
(19.890)
-0.0013
-(7.148)
-0.0028 -(19.166)
**
**
**
**
LL
(dep = φ̂i )
-0.0885 -(304.834)
0.0128
(24.947)
0.0002
(1.455)
-0.0032
-(20.665)
**
**
**
(a) φ̂i denotes the estimates of nonlinear effects from the trimmed-LL method or LL method ŵi =
γ̂ − Z θ̂i . Here, we only consider the the observations with (qi < γ̂).
(b) *** Significant at 1% level. ** Significant at 5% level. * Significant at 10% level.
26
0.04
0.03
0.02
0.01
0
−0.01
−0.02
HECKIT
MIX
LL
−0.03
−0.04
−0.05
−0.06
−6
−5
−4
−3
−2
γ−Zθ
27
−1
0
1
2
Table 6
Country List and Group Estimates
Country
Syrian
Algeria
Togo
Malawi
Jordan
Ghana
Mali
Cameroon
Indonesia
Zambia
Paraguay
Niger
Uganda
Nicaragua
Mozambique
Guatemala
Senegal
Lesotho
Iran
Kenya
Nepal
Brazil
Thailand
Mexico
Zimbabwe
Korea
Panama
Argentina
Peru
Pakistan
Chile
Philippines
Honduras
Bolivia
El Salvador
Dominican Republic
Spain
Ecuador
Portugal
France
Uruguay
Malaysia
Greece
Venezuela
Sri Lanka
Turkey
Colombia
India
Australia
Austria
Belgium
Canada
Costa Rica
Denmark
Finland
Iceland
Ireland
Israel
Italy
Jamaica
Japan
Netherlands
New Zealand
Norway
South Africa
Sweden
Switzerland
Trinidad and Tobago
United Kingdom
United States
GR
0.027
0.015
-0.001
0.015
0.013
0.012
-0.001
0.005
0.033
-0.007
0.016
-0.015
0.013
-0.013
-0.010
0.013
-0.003
0.020
0.021
0.012
0.015
0.027
0.045
0.020
0.019
0.058
0.024
0.010
0.010
0.029
0.024
0.013
0.005
0.004
0.007
0.028
0.034
0.014
0.038
0.026
0.012
0.038
0.031
-0.005
0.022
0.023
0.019
0.027
0.022
0.029
0.028
0.024
0.013
0.022
0.029
0.028
0.041
0.029
0.029
0.008
0.042
0.024
0.012
0.030
0.011
0.021
0.014
0.024
0.021
0.025
EXE
1.250
1.538
1.564
1.757
2.049
2.054
2.300
2.317
2.390
2.405
2.439
2.700
2.750
2.821
2.846
3.075
3.075
3.094
3.211
3.237
3.244
3.300
3.368
3.463
3.476
3.525
3.561
3.707
3.789
3.919
3.951
4.050
4.051
4.171
4.306
4.432
4.632
4.683
4.692
4.927
5.077
5.195
5.400
5.756
5.878
6.050
6.098
6.951
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
7.000
28
Naive
HECKIT
MIX
LL
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2