Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Threshold Regression Without Distribution Assumption when the Threshold Variable Is Endogenous Hung-Pin Lai∗ Chang-Ching Lin† This version: March 15, 2009 National Chung Cheng University Academia Sinica Abstract In this paper, we propose a semi-parametric approach to estimate threshold models without the reliance of distribution assumption when the threshold variable is endogenous. Early threshold models based on the exogeneity assumption on the threshold variables are widely applied in economics studies. The invalidation of this assumption, however, may bias the estimation and inference. To control for the effects caused by an endogenous threshold variable, we propose a concentrated two-stage least squares method using instrumental variables. This method is based on the local linear smoothers by Fan (1992) and can be simplified as a concentrated kernel-weighted least squares estimation. Particularly, our approach can consistently estimate the threshold model without relying on any distribution assumption and allows us to handle nonlinear effects caused by the endogenous threshold variable in a more flexible setting. We establish consistency of the threshold parameter and the asymptotic properties of the slope coefficients in the proposed method. Monte Carlo simulations are performed to evaluate its finite sample properties. The new method is applied to re-examine the effects of political institutions on economic growth. Keywords: Threshold Regression, Endogeneity, Normality, Semi-parametric Method, Instrumental Variables, Local Linear Approximation. JEL classification: C13, C14, C51. ∗ Department of Economics, National Chung Cheng University, 168 University Rd., Min-Hsiung Chia-Yi, Taiwan, tel: 886-5-272-0411 ext. 34110 Email: [email protected]. † Institute of Economics, Academia Sinica, 128 Academia Road, Sec 2, Taipei 115, Taiwan. Tel: 886-227822791 ext. 301. E-mail: [email protected]. 1 Introduction In this paper, we propose an approach for estimating threshold models without distribution assumption when the threshold variable is endogenous. As in Kourtellos, Stengos, and Tan (2008), we turn the problem of an endogenous threshold variable into the problem of omitting selection correction term. Our approach is a semi-parametric method with following features. Firstly, it can consistently estimate the threshold model without any distribution assumption. Secondly, our semi-parametric method, which is based on the local linear (LL) smoothers by Fan (1992), can be simplified as a concentrated kernel-weighted two-stage least squares estimation and allows us to handle the effects caused by the endogenous threshold variable. Early threshold models based on the exogeneity assumption of the threshold variables are widely applied in economics studies. The model internally sorts the data by an observed threshold variable and split the sample into groups at some threshold. The model is parsimonious and easy for researchers to draw inference about the fundamental determinants of the data generating process. Under the assumption that regressors and the threshold variable are exogenous, Hansen (2000) derives the theoretical properties of the concentrated least squares estimators of the threshold parameter and slope coefficients. Caner and Hansen (2004) further relax the assumption of exogeneity in the slope coefficients and develop an instrumental variable estimation and a theory of inference. These papers, however, do not explicitly count for the presence of the endogenous threshold variable. Evidence of the endogeneity in the threshold variable is not uncommon. For example, in the literature of economic growth, many empirical studies deal with parameter heterogeneity by utilizing the threshold model with many fundamental determinants as threshold variables. In particular, Tan (2005) finds the quality of institutions and ethnic fractionalization are the most important determinants of economic growth, while these institution measures might be endogenous. Kourtellos, Stengos, and Tan (2008) is the first study that proposes to treat the problem of an endogenous threshold variable as the problem of omitting selection correction term. To relax exogeneity of the threshold variable, they decompose the error term into two components: one is correlated with the threshold variable and the other is not. Then, they rewrite the threshold model by these two components and find that the threshold model with endogenous threshold variable is generally inconsistent as in the sample selection bias correction method. But one of the major differences, as they pointed out, is that the whole sample can be observed in the threshold model, but only a limited sample is observed in the sample selection model. Consequently, one can explicitly control for the endogenous threshold variable by utilizing the estimated results from instrumental variable estimation. The problem of an endogenous threshold variable then becomes the problem of a sample selection bias correction model (or more precisely a Type-3 Tobit model). Adopting the two-stage 1 procedure of Heckman (1979), they use inverse Mills ratios to correct for the bias caused by the endogeneity of the threshold variable, which is straightforward to implement but under the (joint) normality of errors. A crucial criticism of Heckman’s two-stage procedure is that the estimation results might be sensitive to the normality/distribution assumptions. To avoid the imposition of normality, Vella (1992, 1998) and Wooldridge (1998) suggest alternative two-stage estimation methods by utilizing the estimated residuals from the selection equation by imposing other moment conditions. Moreover, the selection correction term might not be linear, numerous econometricians have also suggested semi-parametric estimation of the type-3 Tobit model without normality. For example, under the assumption that errors are independent of regressors, Lee (1994), Carroll, Fan, Gijbels, and Wand (1997), and Li and Wooldridge (2002) advance two-stage least squares methods involving nonparametric estimation of selection correction term1 . Due to the sensitivity of these estimates to the choice of bandwidth in nonparametric estimation, Chen (1997) proposed two stage estimations using trimmed sub-sample. Honore, Kyriazidou, and Udry (1997) consider an alternative method under the conditional symmetry assumption of errors. See Vella (1998) and Christofides, Li, Liu, and Min (2003) for comprehensive survey. Notice that in the threshold model it is assumed that we can observe the whole sample but the threshold parameter is unknown in advance. The methods used in the type-3 Tobin model to relax the distribution assumption may not appropriate in the problem of the endogenous threshold variable. However, with the advantage of considerable semi-parametric approaches in the type-3 Tobin model, we can adopt the methods for the threshold model and investigate their small and large sample properties. In particular, we propose a semi-parameter method when the effects of endogeneity in the main equation are of unknown form. Specifically, we apply the local linear approximation of Fan (1992, 1993) and Ruppert and Wand (1994) to remove the effects of this nonlinear sample correction term. Our proposed approach has the following advantages. Firstly, it can consistently estimate the threshold model without relying on any distribution assumption and allows to capture the effects of unknown form caused by the endogenous transition variable. Secondly, our method inherits the desirable properties of Fan’s local linear approximation. The main idea of Fan’s approach is that a nonlinear function can be nonparametrically approximated around each observation by a linear function. Fan (1992) shows that this local linear smoother has high minimax efficiency among all linear smoothers2 , and its bias and variance near the boundary 1 More precisely, Lee (1994) and Carroll, Fan, Gijbels, and Wand (1997) propose semi-parametric maximum likelihood estimations, and Li and Wooldridge (2002) propose a semi-parametric two-stage least squares method for the Type-3 Tobit model. 2 As pointed by Fan (1992):”The minimax is an efficiency measure, playing a role similar to the Fisher 2 of data support are of the same order of magnitude as in the interior of support. These are very appealing properties when we use sub-sample sets to estimate parameters in distinct regimes for a given threshold. Moreover, the procedure can be simplified as a two-step concentrated kernel-weighted least squares estimation using sub-sample observations. Before closing this introduction, it should be pointed out that the kernel-based method has been recently utilized in threshold model by Seo and Linton (2007). The focus of their paper, however, is quite different from this paper in that they consider the case in which the threshold is determined by a linear combination of several observed exogenous variables. Consequently, the kernel-based method is used to approximate the threshold effects, while our nonparametric method is specifically used to remove the effects of endogeneity in the threshold variable. Moreover, our approach also has the flexibility to handle endogenous regressors by incorporating the instrumental-variable threshold estimation developed by Caner and Hansen (2004) before applying our approach to deal with the endogeneity of the threshold variables. The remainder of this paper is organized as follows. In Section 2, the model of interest and related studies are briefly introduced. In Section 3, we propose an approach to estimate the threshold model without distribution assumption when the threshold variable is endogenous. Additionally, we briefly address the limitations of the proposed method. In Section 4, asymptotic properties are established. Monte Carlo simulations are performed to evaluate our method in Section 5. Section 6 re-examines the effects of (political) institutions on economic growth as an application and Section 7 concludes this paper. Mathematical proofs are collected in Appendix. 2 The Model Consider the following threshold regression model: yi = (η1 + xi β1 )1(qi ≤ γ) + (η2 + xi β2 )1(qi > γ) + ei , qi = µ + zi θ + ui , (2.1) where η1 , η2 , and µ are intercepts, β1 and β2 are slope parameters, 1(A) denotes the indicator function such that 1(A) = 1 if A is true and 0 otherwise, xi denotes a 1×k vector of covariates, qi denotes an observed threshold variables, γ ∈ Γ = [γ, γ] is the unknown threshold parameter with real value, zi is a 1 × l vector of instruments, E [ui |zi ] = 0. To illustrate the main idea, information in the parametric inference” For example, 80% efficient nonparametric estimator means that with the sample size 100, the estimator works as well as the optimal estimator with sample size 80. Fan (1992) further show that this smoother is can asymptotically be 100% efficient with an optimal choice of kernel and bandwidth. Moreover, Ruppert and Wand (1994) extend the results to multivariate case and Hamilton and Truong (1997) study the asymptotic properties of linear slope coefficients in a partial linear regression using this local linear approximation. 3 we first assume that xi is exogenous and qi is endogenous in the sense that E [ei ui |xi , zi ] 6= 0. Following Kourtellos, Stengos, and Ten (2008), suppose that ei σe2 ρi σe σu x , z ∼ 0, , i = 1, . . . , n, ui i i ρi σe σu σu2 where ρi is the correlation coefficient between ei and ui . If we further assume that ρi = ρ1 when qi ≤ γ, and ρi = ρ2 when qi > γ, then it follows from Theorem 2.4.1 of Hogg, McKean and Craig (2005) that E (ei |ui , xi , zi , qi ) = [ρ1 σe ui 1(qi ≤ γ) + ρ2 σe ui 1(qi > γ)] . (2.2) Therefore, we may decompose ei as the sum of two orthogonal components, that is ei = [ei − E (ei |ui , xi , zi , qi )] + E (ei |ui , xi , zi , qi ) . (2.3) Based on the results, the effect caused by the threshold variable, ρj σe ui . Substituting (2.2) and (2.3) into (2.1) suggests the model can be expressed as yi = (η1 + xi β1 + ρ1 σe ui )1(qi ≤ γ) +(η2 + xi β2 + ρ2 σe ui )1(qi > γ) + εi , qi = µ + zi θ + ui , (2.4) where εi ≡ [ei − E (ei |ui , xi , zi , qi )] . A special case is that if there exists no correlation between ei and ui , i.e., ρ1 = ρ2 = 0, then there will be no endogeneity effects caused by the threshold variable and (2.1) will degenerate to the model of Seo and Linton (2000). In the above case, we treat ui as an observed variable and assume that ui is in the information set. However, if ui is unobserved, then the condition expectation of ei becomes E [ei |xi , zi , qi ] = E [ei |xi , zi , qi ≤ γ] 1(qi ≤ γ) + E [ei |xi , zi , qi > γ] 1(qi > γ); therefore, the two regimes can be fully described by the level of qi . In particular, by defining E [ei |xi , zi , qi ≤ γ] ≡ f1 (γ − µ − zi θ) and E [ei |xi , zi , qi > γ] ≡ f2 (γ − µ − zi θ) we may rewrite (2.1) as yi = [η1 + xi β1 + f1 (γ − µ − zi θ)] 1(qi ≤ γ) + [η2 + xi β2 + f2 (γ − µ − zi θ)] 1(qi > γ) + εi , qi = µ + zi θ + ui , (2.5) where εi = ei − E [ei |xi , zi , qi ≤ γ] 1(qi ≤ γ) − E [ei |xi , zi , qi > γ] . (2.6) An special case of (2.5) is considered by Kourtellos, Stengos, and Ten (2008), where they assume that (ei , ui )T follows a bivariate normal normal distribution and propose a 4 concentrated two stage least squares (C2SLS) method to estimate the parameters of interest based on (2.5) and show consistency of their C2SLS estimator of γ. Under the bivariate normal assumption, f1 and f2 can be replaced by the inverse Mills ratios and turn the problem into one of a sample selection type model. More precisely, f1 (γ −µ−zi θ) = κ1 λ1 (qi , zi ; µ, θ, γ) and f2 (γ −µ−zi θ) = κ2 λ2 (qi , zi ; µ, θ, γ), where κ1 , κ2 are some constants and λ1 (qi , zi ; µ, θ, γ) and λ2 (qi , zi ; µ, θ, γ) are the so called Inverse Mills ratios, defined as λ1 (qi , zi ; µ, θ, γ) = φ(γ − µ − zi θ) −φ(γ − µ − zi θ) , and λ2 (qi , zi ; µ, θ, γ) = , Φ(γ − µ − zi θ) 1 − Φ(γ − µ − zi θ) with φ and Φ being, respectively, the density and distribution function of a standard normal variable. As in the sample selection bias correction method, the threshold estimation is generally inconsistent if the inverse Mills bias correction term λj (qi , zi ; µ, θ, γ), where j = 1, 2, is omitted. But a crucial difference between them is that only a partial sample observed in a regime in the sample selection model, but the variable that drives this assignment is latent; instead, the whole sample and threshold variable are observed but do not know the group membership in the threshold model. Thus, in (2.5), (µ̂, θ̂) can be directly estimated from the regression of entire sample of qi on zi using least squares method. We can replace (µ, θ) by (µ̂, θ̂) and then estimate the threshold model by plugging in λ1 (qi , zi ; µ̂, θ̂, γ) and λ2 (qi , zi ; µ̂, θ̂, γ) as a regressor for a given γ. However, the assumption of the normality of the joint distribution of (ei , ui )T might be too restrictive. We might also relax the restriction that ρi = ρ1 for all i ∈ I1 = {i : qi ≤ γ} and ρi = ρ2 for all i ∈ I2 = {i : qi > γ}. In the next section we shall consider the general case and propose a distribution-free, semiparametric approach to estimate the model given in (2.5). In our model the joint distribution of (ei , ui )T is unspecified and thus the functional forms of f1 and f2 are unknown. 3 The local linear semi-parametric approach Our estimation approach applies the local linear (LL) regression proposed by Fan (1992) to control the threshold effect or approximate true functions gj (γ −µ−zi θ) ≡ ηj +fj (γ −µ−zi θ), where j = 1, 2. In a manner similar to Hansen (2000), we define δn = β1 − β2 , β = β2 , and xi (γ) = xi · di (γ), where di (r) = 1 (qi ≤ γ) ,then Model (2.5) can be represented as yi = xi β + g2 (wi ) + xi (γ) δn + di (r) [g1 (wi ) − g2 (wi )] + εi , (3.1) where wi = γ − µ − zi θ. Since the nonlinear functions g1 (wi ) and g2 (wi ) are of unknown forms, the LL estimators of g1 (·) and g2 (·) are then used to replace by the true functions in estimating β1 , β2 , and γ. 5 The LL estimator suggests approximating the true nonlinear function ‘locally’ by a linear function. According to the first order Taylor’s expansion, gj (wl ), j = 1, 2, can be locally approximated by a linear function around wi : j j gj (wl ) ≈ δi0 + δi1 (wl − wi ), j = 1, 2, (3.2) j j where δi0 = gj (wi ), and δi1 = ∂gj (wi )/∂wi , depend on zi , µ, θ, and γ. In particular, j gj (wi ) = δi0 when wl = wi , (3.3) which suggests that the function value at wi is exactly the intercept term of the local linear function. Below, we will briefly introduce how to use (3.2) and (3.3) in estimating β1 , β2 and γ. To illustrate the main idea, suppose that a candidate value of γ is given and we first consider estimating β1 (or β2 ) by using the observations with qi ≤ γ (or qi > γ). In the first stage, we j j estimate (δi0 , δi1 ) for each j around wi by the LL method. Then, in second stage, we replace j gj (wi ) by its estimate δbi0 and directly estimate βj , j = 1, 2, respectively. Finally, we obtain the least squares estimator of γ that delivers the least total sum of squared residuals among all potential values of γ. For a given γ, we can divide the whole sample can into two groups such that we have 1 (qi ≤ γ) = 1, for i = 1, ..., n1 , the observations contained in the first group, and 1 (qi ≤ γ) = 0, for i = n1 + 1, ..., n, the observations in the second group. Assume that the second group has n2 = n − n1 observations. For simplicity, we explain the procedure by using observations contained in the first group. (a) Estimation of the unknown function g( · ): Given β (and γ, µ, θ), the LLR estimator of gj (w) at w = γ − µ − zi θ can be obtained by the following weighted least squares regression: min 2 1 1 − δl1 (wl − wi ) Kh (wl − wi ), (yl − xl β1 ) − δl0 X 1 ,δ 1 {δl0 l1 } {l:q ≤γ l } where h is a bandwidth and Kh (ξ) = h−1 K(h−1 ξ) is a kernel function. Let y1 x1 1 w1 − wi .. Y1 = ... , X1 = ... , Mi1 = ... , . 1 wn 1 − wi yn1 xn1 Wi1 = diag (Kh (w1 − wi ), ..., Kh (wn1 − wi )) and 6 δi1 = 1 δi0 1 δi1 , (3.4) then the the problem given in (3.4) can be further represented in a matrix form T min Y1 − X1 β − Mi1 δi1 Wi1 Y1 − X1 β − Mi1 δi1 . (3.5) δi1 The LL estimator of the 2 × 1 vector δl1 is δbi1 (β1 ) = (Mi1T Wi1 Mi1 )−1 Mi1T Wi1 (Y1 − X1 β1 ) . (3.6) Thus according to (3.2) and (3.3) gb1 (wi ) = Si1 (Y1 − X1 β1 ) , (3.7) where Si1 = v(Mi1T Wi1 Mi1 )−1 Mi1T Wi1 is an 1 × n1 vector and v = (1, 0)T . In a similar way, we can define Y2 , X2 , Mi2 , Wi2 and Si2 , and obtain the LL estimator of g2 (wi ), gb2 (wi ) = Si2 (Y2 − X2 β2 ) . (3.8) In this step, we have solve gb1 (wi ) and gb2 (wi ) as a functions of β1 and β2 , respectively. (b) Estimation of β1 and β2 : Let’s first define S1 as an n1 ×n1 matrix with Si1 as the ith row. By substituting (3.7) into (3.5), the objective function for the first group becomes T minβ1 Y1 − X1 β − Mi1 δbi1 (β1 ) Wi1 Y1 − X1 β − Mi1 δbi1 (β1 ) T e 1 β1 e1 β1 = minβ1 Ye1 − X Ye1 − X (3.9) e1 = (In − S1 )X1 . Then the least squares (LS) estimator of where Ye1 = (In1 − S1 )Y1 , X 1 β1 for a given γ is −1 e1T X e1 e1T Ye1 . β̂1LL (γ) = X X (3.10) Remark 1: The n1 × n1 matrix S1 plays a role of the projection matrix. SX projects X onto the space generated by w;in other words, SX is an approximation of E(X|w). The main difference between S and the least squares projection matrix is that S is a nonlinear projection. Moreover, (In1 − S1 )X1 is the error obtained from the nonlinear projection. Remark 2: Here we estimate β1 for a given a γ. Our estimator β̂1LL (γ) is obtained by (3.9) without explicitly estimating gb1 (·) using the observations satisying qi ≤ γ. Analogously, using the observations with qi > γ, we can further obtain β̂2LL (γ) from T e2 β2 e2 β2 . min Ye2 − X Ye2 − X β2 Finally, wi is unknown even when the potential value of γ is provided. Practically, we can replace wi by ŵi = γ − µ̂ + zi θ̂ in the above procedure, where µ̂ and θ̂ are the least squares estimates from the regression of qi on an intercept and zi , respectively. 7 (c) Estimation of γ: The specification of (3.1) can be written as Y = Xβ + G + Xγ δn − DG + ε, (3.11) where X1 Y = ,X = , Xγ = 0n2×k g2 (w1 ) g1 (w1 ) β1 1 .. .. β= ,G = , ,G = . . β2 g2 (wn ) g1 (wn1 ) g2 (w1 ) e 1 − G1 G . 1 e .. G = . , and DG = 0n2×1 g2 (wn1 ) Y1 Y2 X1 X2 Xγ is a n × k matrix with xi di (γ) as the ith row such that the first n1 rows satisfying that di (r) = 1 (qi ≤ γ) = 1 and the remaining n2 rows vectors are all zero vectors. Furthermore, let’s define 1 Wi1 On1 ×n2 Mi1 S Wi = , Mi = , and S = 2 2 On2 ×n1 Wi Mi On2 ×n1 On1 ×n2 S2 , then (3.7) and (3.8) suggest the LL estimators of G1 , G2 , and G are b j = S j (Yj − Xj βj ), for j = 1, 2 G (3.12) b = S(Y − Xβ), G (3.13) and b = S(Y − Xβ) is a n × 1 vector. Subsituting (3.12) and (3.13) into (3.11) gives where G b + Xγ δn − SXγ δn + ξ, Y = Xβ + G (3.14) where ξ is a vector of the residuals. After rearranging (3.14) we get (In − S)Y = (In − S)Xβ + (In − S)Xγ δn + ξ. (3.15) Base on (3.15) the local-linear semi-parametric estimator (LL estimator, hereafter) of γ can be defined as γ eLL = arg min SLL n (γ), (3.16) γ∈Γ T where SLL n (γ) = ξ ξ is the sum of the sum of squared least squared residuals of (3.15). 8 It is easy to extend our method to the model with endogenous regressors and the endogenous threshold variable. Combining the Caner-Hansen method, we can use instrumental variables to purge out the endogeneity in the regressors and then apply the propose method to remove the effects of the endogenous threshold variable. The advantage of this method is that it is not required to assume the functional forms of E [f1 (ui )|xi , zi , ui ≤ γ − µ − zi θ] and E [f2 (ui )|xi , zi , ui > γ − µ − zi θ] in advance, which makes the model much more flexible and easy to implement. The major disadvantage is that we cannot directly estimate the intercepts in both regimes without extra assumptions as in most of semi-parametric methods. If the only threshold effect is between the intercepts, the proposed semi-parametric method is expected to perform poorly. Nevertheless, the finite sample results of this semi-parametric method depend on choosing of a bandwidth. To resolve this problem, for a given threshold, we can take the value as a bandwidth that delivers minimum sum of squared residuals among all candidate values of bandwidth; or following Fan and Gijbels (1992) to find a data-dependent bandwidth for a given kernel. Similar to IV/GMM estimation, it is also required that at least one (exogenous) variable in zi not included in xi . 4 The asymptotic properties For the following analysis, we temporarily ignore the problem that wi is in fact replaced by w bi = γ − µ b − zi θb and suppose that the true parameters µ and θ are both known. The bandwidth h is also given and not discussed here. Let the 1 × k vector ϕi = ϕ(wi ) = xi − E(xi |wi ), where i = 1, ..., n and E(xi |wi ) denotes the conditional function of xi given wi ,and w = (w1 , ..., wn )T is a n × 1 vector. Moreover, we use ϕij denote the j th element of ϕi . We define the following moment functionals. T M (γ) = E ϕT i ϕi di (γ) , M = E ϕi ϕi , T D(γ) = E ϕT i ϕi |qi = γ , D = E ϕi ϕi |qi = γ0 , and 2 T 2 V (γ) = E ϕT i ϕi εi |qi = γ , V (γ) = E ϕi ϕi εi |qi = γ0 , where γ0 represents the true value of γ. Furthermore, let fq (q) denote the density function of qi . In order to establish the asymptotic distribution of the proposes estimator, we impose the a similar set of assumptions as that used in Hansen (2000). 9 [A1] (xi , qi , zi , εi , ui ) is strictly stationary, ergodic and ρ-mixing, with ρ-mixing coefficients P 1/2 satisfying ∞ m=1 ρm < ∞. [A2] E(εi |Fi−1 ) = 0 and E(εi |wi ) = 0. [A3] E|ϕi |4 < ∞ and E|ϕi εi |4 < ∞. [A4] For all γ ∈ Γ, E |ϕi |4 ε4i qi = γ ≤ C and E |ϕi |4 qi = γ ≤ C for some C < ∞, and fq (γ) ≤ f < ∞. [A5] fq (γ), D(γ), V (γ) are continuous at γ = γ0 . [A6] δn = cn−α with c 6= 0 and 0 < α < 1/2. [A7] cT Dc < 0, cT V c < 0, and fq > 0. [A8] M > M (γ) > 0 for all γ ∈ Γ. Following (3.11), define Sn (γ) = (Y − Xβ − G − Xγ δn + DG )T (Y − Xβ − G − Xγ δn + DG ) and denote limn→∞ n1 Sn (γ) = S(γ). [A9] Assume γ0 = arg minγ∈Γ S(γ) is unique. Assumptions [A1]-[A8] are quite similar to the conditions used by Hansen (2000), except that the role of xi has been replaced by ϕi . [A9] assures that the true threshold value is unique. Let fw (w) denote the density function of wi . Define ψi = ψi (wi ) = yi − E(yi |wi ) as the conditional expectation of yi given wi . The following assumptions are required for our analysis for the local linear estimator. [B1] The conditional expectation of yi given wi can be presented as yi = ψi (wi ) + σ1 (wi )1i , i.i.d. i = 1, ..., n, j = 1, ..., k and 1i ∼ (0, 1). The function ϕij has bounded and continuous second partial derivatives. [B2] The conditional expectation of xij given wi , is ϕij (wi ), and can be written as the i.i.d. following form xij = ϕij (wi ) + σ2j (wi )2ij , i = 1, ..., n, j = 1, ..., k, and and 2ij ∼ (0, 1). The function ϕij has bounded and continuous second partial derivatives. [B3] The random variable wi has a continuous density function fw (·) on a compact support, and is bounded away from zero and infinity. R [B4] The kernel K is a compactly supported, bounded kernel such that K(u)du = 1, R R K(u)2 du = R(K), and u2 K(u)du = µ2 (K) > 0, where µ2 is a scalar. All odd-order moments of K(·) vanish. 10 [B5] The bandwidth hn satisfies that h → 0, nh → ∞. The asymptotic properties of the LL estimator are summarized in Theorem 1 and 2. p Theorem 4.1: The LL estimator of γ is a consistent estimator of γ0 . That is γ eLL → γ0 . Theorem 4.2: Denote βδ = (β T , δnT )T the slope parameter. Then the LL estimator of βδ are such that −1 ∗T E βbδ = βδ + Xγ∗T Xγ∗ Xγ (I − S)(G − DG ) and −1 V ar βbδ = Xγ∗T Xγ∗ Xγ∗T (I − S) · Vε (I − S)T Xγ∗ (Xγ∗T Xγ∗ )−1 , 1 where Vε = E(εεT ). Moreover, Bias of βbδ = Op (n− 2 h2 ) and V ar βbδ = Op (n−1 ). √ Theorem 2 suggests that our LL estimator of the slope estimator is still n-consistent. The 1 bias vanishes at the rate n− 2 h2 . The above results are based on the assumption that wi is b Let observed. Empirically, wi is replaced by its estimator w bi = γ − µ b − zi θ. 1 w b1 − wi . . ci = M .. .. , 1 w bn − wi cT Wi M ci )−1 (M cT Wi ), Sbi = v(M i i and b be be T b b e be e be SLL n (γ) = (Y − Xβ − X r δn ) (Y − Xβ − X r δn ). Define γ bLL = arg minγ∈Γ b SLL n (γ), then it remains to show bLL p LL supr∈Γ Sn (γ) − Sn (γ) → 0 to establish the consistency of γ bLL . 5 Monte Carlo Simulation In this section, Monte Carlo experiments are constructed to examine the finite sample properties of the proposed estimators under several scenarios, including with and without normality under various combinations of sample sizes. In particular, it is crucial to compare our methods with the method by Kourtellos, Stengos, and Ten (2008). We investigate the finite sample performance, based on the mean squared errors (MSE) of the estimates of the threshold parameter, and the averaged MSE of the slope coefficients of exogenous regressors. 11 The data generating processes considered are described as follows. For i = 1, . . . , n, yi = β0 + β1 xi 1(qi ≤ γ0 ) + β2 xi 1(qi > γ0 ) + ei , qi = α0 + α1 z1i + α2 z2i + ui , where xi , z1i q, z2i ∼ iid N (0, 1) and are mutually independent, ui ∼ iid(0, 1), ei = (ρi ui + (1 − ρi )εi )/ ρ2i + (1 − ρi )2 , εi ∼ iid(0, 1) and independent of ui , the true threshold γ0 = 1, β0 = β1 = α0 = 1, α1 = α2 = 3, β2 = β1 + δ. To demonstrate the impact of nonnormally distributed errors on estimation, the cases are refereed as Normal, χ2 , and Mixed Normal if both vi and i are generated from normal, χ2 , and mixed normal distributions, respectively. There are two types in each case: in Type I, ρi = 0.6 when qi ≤ γ0 and ρi = 0.8 when qi > γ0 ; in Type II we allow ρi to vary across units and ρi = {0.8, 0.4, 0.85, 0.7} as 1 ≤ i ≤ n/4, n/4 < i ≤ n/2, n/2 < i ≤ 2n/3, and 2n/3 < i ≤ n, respectively. The mean of ρi in each group is the same as that in Type 1. For each case in each type, we consider δ = {0.05, 0.10, 0.25} and n = {100, 200, 400}. The number of Monte Carlo replications is 1000 in each of the experiment. We compare our method with Kourtellos, Stengos, and Ten’s (2008) method, referred as the HECKIT method, and the threshold estimation without controlling for the endogeneity in transition variable, referred as the Naive method. The proposed method is referred as the LL method. To reduce the boundary effects, which might bias the the threshold estimates in the proposed method, we also consider the to obtain the residuals from trimmed OLS using 95% lowest observations (in terms of Euclidean distance) in each split subsample to find the estimated threshold, and then given this estimated threshold we then use the regular LL estimation to obtain the slope estimates. We refer this hybrid estimation strategy as the MIX model. To evaluate the finite sample accuracy of these methods, we investigate the efficiency of the Naive, LL, and MIX estimators relative to the HECKIT estimator. In particular, the efficiency is defined as the ratio of the mean squared errors (MSE) of the Naive or LL estimator to that of the HECKIT estimator. The MSE of the estimates of the threshold parameter and the averaged MSE of the slope coefficients of exogenous regressors defined as: M SE(γ̂) = R R r=1 r=1 1 X 1 X (γ̂r − γ0 )2 , M SE(β̂1 , β̂2 ) = [(β̂1,r − β1 )2 + (β̂2,r − β2 )2 ], R 2R where γ̂r , β̂1,r , and β̂2,r denote the estimates of γ0 , β1 , and β2 in the rth trial, respectively, and R is the total number of replications. First, we compare the finite sample performance in terms of the accuracy of threshold estimation. Table 1 reports the efficiency of the Naive, LL, and MIX estimators of the threshold 12 parameter relative to the HECKIT. It is obvious that the HECKIT, LL, and MIX estimators tend to outperform the Naive method over all experiments sometime by a large margin. This result indicates that it is crucial to control for the endogeneity in the transition variable. As expected, the LL and MIX estimators are more efficient than the HECKIT method when the errors are generated from non-normal distribution. Even in the cases with normally distributed errors, the LL and MIX estimator is still comparable to the HECKIT. With normally distributed errors, the HECKIT tends to be more efficient than the proposed methods when sample size is 400. However, when sample size is 100 or 200, the LL and MIX estimators can estimate the threshold more accurately. Increasing the sample size tends to reduce the advantage of using the LL and MIX methods over the HECKIT method. While increasing δ reduces the gap between the Naive method and the other methods controlling for the endogeneity in transition variable, its effect on the difference between the LL method and the HECKIT methods is ambiguous. Finally, the MIX method tends to generate more accurate threshold estimates than the LL method. From the distribution of threshold estimates not reported here, we observe that the MIX method tends to choose smaller threshold estimates than the LL method when large values of threshold are suggested by the LL method, while in at least 80% of replications the MIX and the LL methods will suggest the same threshold values. Next, we consider the efficiency on estimating the slope parameters. Table 2 reports the efficiency of the Naive, LL, and MIX estimators of the slope parameters relative to the HECKIT method. Similar to Table 1, the HECKIT, LL, and MIX methods are uniformly more reliable in estimating the slope parameters, and increasing δ will reduce the benefit of controlling for the endogeneity in transition variable. There is a slight difference from Table 1. When errors are normally distributed, the HECKIT method will uniformly deliver more accurate slope estimates than the others. With non-normally distributed errors, while the LL method cannot always deliver more accurate slope estimates than the HECKIT method, the range of ratio is still between 0.98 and 1.04 when sample size is 400. On the other hand, the MIX method tends to generate more accurate slope estimates than the others when residual distributions are not normal and perform slightly better than the LL method due to its slightly more accuracy of choosing threshold value. Overall, our simulation results show the importance of controlling the endogeneity in the transition variable. The proposed method can deliver more accurate threshold estimate than Kourtellos, Stengos, and Ten’s (2008) method when errors are non-normally distributed. The advantage of using more flexible method is ambiguous in terms of the accuracy of the slope estimators. But removing the outliers during estimation procedure can improve the accuracy of determining threshold value and thus reduce the MSE of slope coefficient estimates. 13 Tables 1 and 2 around here 6 Empirical Application We re-visit the work of Minier (2007) to investigate the effects of institutions on growth by the proposed methods. Because institutions might affect the relationship between growth and its other determinants, it can act as a potential threshold variable. Particularly, we will study the indirect effects by taking institutions as threshold variables. Minier (2007) uses the regression tree procedure, allowing for any number of variables to be considered as “split variables” that separate the full sample into subsamples. She finds that “executive constraints” significantly affected the other growth determinants. Instead, we will take this variable as a threshold variable and then control for potential endogeneity with the proposed methods in the context of threshold models. Using the data of Glaeser, Porta, Lopez-de Silanes, and Shleifer (2004), we consider the empirical regression of growth in per capita GDP over 1960 to 2000 on initial GDP per capita, initial investment/GDP ratio, initial schooling, and the percentage of the population that lives in temperate climate zones, and use a measure of the initial level of constraints on the executive branch as the transition variables. Due to the potential endogeneity of “executive constraints”, it is expected that the results might be different from Minier (2007). Moreover, we use the averaged degree of autocracy and the percentage of years as instrumental variables, which are also available from Glaeser et al. (2004). There are 70 countries in the sample. Along the value of transition variable, countries are grouped into two groups: Group 1 – low executive constraint , and Group 2 – high executive constraints. The empirical estimation results are reported in Table 3. The result from the naive OLS without controlling for the endogeneity in transition variable is just identified with that of Minier (2007, Table 2). If we control this endogeneity by using the HECKIT method suggested by Kourtellos, Stengos, and Ten’s (2008), we can find the inverse Mill’s ratio is statistically significant in Group 1 but not in Group 2, which indicates that the “executive constraints” is endogenous for Group 1 countries and the estimates results from the naive OLS method might be distorted for these countries. Moreover, if we further relaxed the assumptions imposed in the HECKIT model and use the MIX method to estimate the parameters, the estimated threshold is 5.561 and 27 countries are clustered into Group 1. Also, notice that the LL method suggests the same partition as the naive OLS method. However, after correcting the potential bias caused by the endogeneity in transition variable, the coefficients are different from the Naive method in Group 1 countries; but seem similar in Group 2. Our method suggests that schooling is the only significant explanatory variable in Group 1 countries. It can also be found that the instrumental variables chosen are valid from Table 4, 14 which reports the first stage regression of transition variable on the exogenous variables and the exogeneity regression-based test over all countries. Tables 3 and 4 around here. To investigate how the effects caused by the endogeneity in transition variable vary across countries, we plots the estimated effects from different methods in Figure 1. Particularly, in the LL and MIX methods, the nonlinear estimates are obtained by Ŝ j (Yj − Xj β̂j ), j = 1, 2; and the HECKIT method, the estimates are captured by ρ̂j λj (ŵi ), where β̂j and ρ̂j are the estimates from different regimes j = 1, 2. For comparison, the nonlinear estimates plotted are the deviations from the group-specific mean in each different regime. While the nonlinear estimates from all methods for Group 2 countries seem trivial, these estimated nonlinear effects are quite visually significant for Group 1 countries. Moreover, the shapes of the nonlinear effects from the LL and MIX methods are different from that of the HECKIT method, which indicates that more flexible setting might be appropriate to capture the effects caused by the endogenous variable in the case of interest. Figure 1 around here. We further regress the nonlinear components capture by the LL and MIX methods on a constant, the level, square, and cube of ŵi and report the results in Table 5. The coefficients over all are quite significant, which indicates that the effects captured by the proposed method is not constant across observations and therefore extra bias might be generated if we do not control for the endogeneity in the transition variable. Tables 5 and 6 around here. Overall, in this application we demonstrate the importance of controlling the endogeneity in the transition variable. We also show the advantage of using the proposed method on capturing the effects caused by the endogenous variable. The detailed country list and the estimated group membership can be found in Table 6. The policy implication and other subtle analysis will be left for the future study. 7 Conclusion We propose a concentrated two-stage least squares method using instrumental variables, based on the local linear smoothers by Fan (1992), to handle the effects caused by the 15 endogenous threshold variable. Our proposed estimator is consistent without relying on any distribution assumption and allows us to handle nonlinear effects in a more flexible setting. Our simulation results show that the proposed method can deliver more accurate threshold estimate when errors are non-normally distributed. Based on this paper, it might be interesting to compare our proposed method with the other semi-parametric methods. It would be also helpful to further develop feasible test methods for potential endogeneity in the threshold variable for semi-parametric methods, regardless of whether some regressors are endogenous or not. We will leave these as future studies. REFERENCES Caner, M., and B. E. Hansen (2004): “Instrumental Variable Estimation of a Threshold Model,” Econometric Theory, 20, 813–843. Carroll, R. J., J. Fan, I. Gijbels, and M. P. Wand (1997): “Generalized Partially Linear Single-Index Models,” Journal of the Americian Statistical Association, 92, 477– 489. Chen, S. (1997): “Semiparametric Estimation of the Type-3 Tobit Model,” Journal of Econometrics, 80, 1–34. Christofides, L. N., Q. Li, Z. Liu, and I. Min (2003): “Recent Two-Stage Sample Selection Procedures With an Application to the Gender Wage Gap,” Journal of Business and Economic Statistics, 21, 396–405. Fan, J. (1992): “Design-adaptive Nonparametric Regression,” Journal of the Americian Statistical Association, 87, 998–1004. (1993): “Local Linear Regression Smoothers and Their Minimax Efficiency,” The Annals of Statistics, 21, 196–216. Hamilton, S. A., and Y. K. Truong (1997): “Local Linear Estimation in Partly Linear Models,” Journal of Multivariate Analysis, 60, 1–19. Hansen, B. E. (2000): “Sample Splitting and Threshold Estimation,” Econometrica, 68, 575–603. Heckman, J. (1979): “Sample Selection Bias as a Specification Error,” Econometrica, 47, 153–161. 16 Honore, B. E., E. Kyriazidou, and C. Udry (1997): “Estimation of Type-3 Tobit Models Using Symmetric Trimming and Pairwise Comparisons,” Journal of Econometrics, 76, 107–128. Kourtellos, A., T. Stengos, and C. M. Tan (2008): “Threshold Regression with Endogenous Threshold Variables,” Unpusblished. Lee, L.-F. (1994): “Simulated Maximum Likelihood Estimation of Dynamic Discrete Choice Statistical Models Some Monte Carlo Results,” Journal of Econometrics, 82, 1–35. Li, Q., and J. M. Wooldridge (2002): “Semiparametric Estimation of Partially Linear Models for Dependent Data With Generated Regressors,” Econometric Theoy, 18, 625–645. Minier, J. (2007): “Institutions and Parameter Heterogeneity,” Journal of Macroeconomics, 29, 595–611. Ruppert, D., and M. P. Wand (1994): “Multivariate Locally Weighted Least Squares Regression,” The Annals of Statistics, 22, 1346–1370. Seo, M. H., and O. Linton (2007): “A Smoothed Least Squares Estimator for Threshold Tegression Models,” Journal of Econometrics, 141, 704–735. Tan, C. M. (2005): “No One True Path: Uncovering the Interplay Between Geography,Institutions, and Fractionalization in Economic Development,” Series 2005-12, Dept. of Economics, Tufts University. Vella, F. (1992): “Simple Tests for Sample Selection Bias in Censored and Discrete Choice Models,” Journal of Applied Econometrics, 7, 413–421. (1998): “Estimation Models with Sample Selection Bias: A Survey,” Journal of Human Resources, 33, 127–169. Wooldridge, J. M. (1998): “Selection Corrections With a Censored Selection Variable,” Unpusblished, Michigan State University. 17 Appendix e and (In − S)Xγ = X eγ . Then average sum of Denote (In − S)Y = Ye , (In − S)X = X, squared error function can be represented as T e e e e e e Y − Xβ − X δ Y − Xβ − X δ SLL (β, δ, γ; h) = γ n γ n . n Conditional on γ and given choice bandwidth h, (β, δ) can be obtained by the LS a proper ∗ e e estimation. Denote Xγ = X, Xγ , then our LL estimator of (β, δ) is ! −1 βb (γ; h) ∗T ∗ (A.1) = X X Xγ∗T Ye . γ γ δb (γ; h) The concentrated sum of squared error function is then a function of the threshold value γ b (γ; h) , δb (γ; h) , γ; h SLL β n −1 Xγ∗T Ye , = Ye T Ye − Ye T Xγ∗ Xγ∗T Xγ∗ = Ye T (In − Pγ )Ye , where Pγ = Xγ∗ Xγ∗T Xγ∗ −1 (A.2) Xγ∗T . Therefore, γ eLL = arg minγ∈Γ SLL n (γ), LL β b (γ; h) , δb (γ; h) , γ; h . where SLLR (γ) is in fact the abbreviation of S n n Define x ei = (xi − Si X) as the ith row of X, which is also the LL estimator of ϕi . Similarly, LL denote yei = (yi − Si Y ) as the LL estimator of ψi Let S (γ) = limn→∞ n1 SLL n (γ). Recall that 00 ∂ϕi1 ∂ϕik ∂ 2 ϕi1 ∂ 2 ϕik 0 ϕi = (ϕi1 , ..., ϕik ) so ϕi = ( ∂wi , ..., ∂wi ) and ϕi = ( ∂w2 , ..., ∂w2 ). Lemma 1 directly follow i i from Theorem 1 of Fan (1992). Lemma 1: Under assumption B, 1 E (e xij − ϕij |wi ) = h2 ϕ00ij µ2 (K) + op (h2 ), 2 and 1 E (e yi − ψi |wi ) = h2 ψi00 µ2 (K) + op (h2 ), 2 where ϕ00ij denote the j th element of ϕ00i . 1 R(K) 1 V ar (e xij |wi ) = σ2j (wi ) + op ( ), nh fw (wi ) nh 1 R(K) 1 V ar (e yi |wi ) = σ1 (wi ) + op ( ). nh fw (wi ) nh 18 Let lk = (1, ..., 1) be a 1 × k vector and lk×k be a k × k mtrix with all elements being ones. Lemma 2 follows from Lemma 1. Lemma 2: Under Assumption B n n 1X T 1 p 1 X T x ei x ei → ϕi ϕi + h2 µ2 (K) n n 2 i=1 i=1 # " n 1 X T 00 00T ϕi ϕi + ϕi ϕi + op (h2 lk×k ), × n (A.3) i=1 = M + op (lk×k ) as n → ∞ and h → 0. n n 1 X e iT e i p 1 X T 1 Xγ Xγ → ϕi ϕi di (γ) + h2 µ2 (K) n n 2 i=1 i=1i " n # 1 X T 00 2 × ϕi ϕi + ϕ00T i ϕi di (γ) + op (h lk×k ), n (A.4) i=1 = M (γ) + op (lk×k ) as n → ∞ and h → 0. LL p Lemma 3: Under assumption A and B, supγ∈Γ S (γ) − S(γ) → 0. Proof: Using (3.1), (3.14), and Lemma 1 and 2, we have ∆i = ξi − εi 1 2 = h µ2 (K) −ψi00 + ϕ00i β − ϕ00i di (γ)δn + op (h2 ), 2 1 2 h µ2 (K)C3 [lk β − 1 + |lk δn |] + op (h2 ), < 2 (A.5) (A.6) for all i and γ, where C3 is a finite constant. For instance, one may define C1 = 00 00 max1≤i≤n |ψi | , C2 = max1≤i≤n, 1≤j≤k ϕij , and C3 = max{C1 , C2 }. It implies that ∆2i = Op (h4 ).Denote ∆ = (∆1 , ..., ∆n )T . Therefore, 1 LL Sn (γ) − Sn (γ) n 1 = limn→∞ supγ∈Γ ξ T ξ − εT ε , n 1 = limn→∞ supγ∈Γ εT ∆ + ∆T ∆ , n p → 0 + Op (h4 ). limn→∞ supγ∈Γ 19 The uniform convergence of the first term is due to E(εi |wi , xi ) = 0 and also that ψi00 (·) p and ϕ00i (·) are both functions of wi , so limn→∞ supγ∈Γ n1 εT ∆ → 0. Therefore, [A9] together with Lemma 3 suggests LL γ0 = arg minγ∈Γ S (γ). (A.7) Proof of Theorem 1: Using (A.2) we obtain LL SLL n (γ) − Sn (γ0 ) = Ye T (In − Pγ )Ye − ξ0T ξ0 , eγT (In − Pγ )X eγ δn , eγT (In − Pγ )ξ0 + δ T X = −ξ0T Pγ ξ0 + 2δnT X 0 n 0 0 (A.8) where e −X eγ δn . ξ0 = Ye − Xβ 0 (A.9) Using the assumption that δn = cn−α and substituting it into (A.8) gives T n−1+2α SLL n (γ) − ξ0 ξ0 e T (In − Pγ )ξ0 + 1 cT X e T (In − Pγ )X eγ c = −n−1+2α ξ0T Pγ ξ0 + 2n−1+α cT X γ0 γ0 0 n 1 T eT eγ c + op (1), c Xγ0 (In − Pγ )X = 0 n 1 eT 1 T eT e eγ c + op (1). c Xγ0 Xγ0 c − cT X P X = (A.10) γ γ0 0 n n The second eauality of (A.10) is due to (A.9). Since ξ0 is defined on the true threshold value γ0 , it implies that e =0 E ξ0 |X and thus we have e ξ0 = 0 E h X e of X. e Therefore, the first two terms in the second line of (A.10) vanish for any function h X in the limit. eγ , Z eγ ) such that The projection matrix Pγ can also be written as the projection onto (X eγ (β + δn ) + Z eγ β + ξ, where Zeγ = X e −X eγ is a n × k matrix with the ith row being Ye = X eγ is x ei (1 − di (γ)). The ith row of X ei = x X ei di (γ). γ 20 e i is the LL estimator of ϕi di (γ). Obviously, the matrices also satisfy X e TZ e It follows that X γ γ γ = 0k×k . e = 0. eγT Z eγ and X eγ = X eγT X eγT X Let’s first consider the case when γ ≥ γ0 . Then we have X 0 0 γ 0 0 Then it follows from (A.10) that LL n−1+2α SLL n (γ) − Sn (γ0 ) p → cT M (γ0 ) c − cT M (γ0 )M (γ)−1 M (γ0 )c, ≡ b1 (γ). (A.11) Similar to Hansen’s (2000), we have dM (γ) = D(γ)f (γ) dγ and thus db1 (γ) = −cT M (γ0 )M (γ)−1 D(γ)f (γ)M (γ)−1 M (γ0 )c ≥ 0, dγ which suggests that b1 (γ) is increasing on the interval [γ0 , γ]. Moreover, db1 (γ0 ) = −cT D(γ0 )f (γ0 )c ≥ 0, dγ which suggests that γ0 is the minimum on [γ0 , γ]. For the case γ < γ0 , we can show that p LL n−1+2α SLL n (γ) − Sn (γ0 ) → b2 (γ), where b2 (γ) is a weakly decreasing continuous function and has a unique minimum at γ0 on the interval [γ, γ0 ]. Therefore, p LL n−1+2α SLL n (γ) − Sn (γ0 ) → b1 (γ)1{γ ≥ γ0 } + b2 (γ)1{γ < γ0 }, LL it follows that the LL estimator defined by γ eLL = arg minγ∈Γ n−1+2α SLL n (γ) − Sn (γ0 ) p satisfies γ eLL → γ0 . To show the results given in Theorem 2, define the Euclidean norm of a vector a as !1/2 1 P 2 kak = aj and the Euclidean norm of a matrix A = (aij ) as kAk = tr(AT A) 2 = j !1/2 P ij a2ij . The results of Lemma 5 and equation (4.12) of Hamilton and Troung (1997) 2 suggest that (i) k(I − S)(G − DG )k = Op (h2 ) and (ii) (I − S)T Xγ∗ = Op (n). Proof of Theorem 2: Recall that E(Y |X, w) = (X, Xγ )βδ +(G − DG ) and βbδ = Xγ∗T Xγ∗ 21 −1 Xγ∗T Ye . So −1 ∗T E βbδ = βδ + Xγ∗T Xγ∗ Xγ (I − S)(G − DG ) (A.12) −1 ∗T −1 V ar βbδ = Xγ∗T Xγ∗ Xγ (I − S)Vε (I − S)T Xγ∗ Xγ∗T Xγ∗ , (A.13) and where Vε = E(εεT ). It follows from (A.12) that −1 ∗T Xγ (I − S)(G − DG ) , Bias of βbδ = Xγ∗T Xγ∗ −1 ∗ Xγ · k(I − S)(G − DG )k , ≤ Xγ∗T Xγ∗ √ 1 = Op (n) · Op ( n) · Op (h2 ) = Op (n− 2 h2 ), 1 √ where the last line follows from (i) and also that Xγ∗ = tr Xγ∗T Xγ∗ 2 = Op ( n). For (A.13), notice that Lemma 1 suggest 1 ∗T ∗ p Xγ Xγ → (M, M (γ)) + op (l2k×2k ), n (A.14) −1 = Op n−1 · l2k×2k . Moreover, given that ε has bounded which implies that Xγ∗T Xγ∗ variance, [A1] and [B2] together suggest that there exist a finite number C5 such that Xγ∗T (I − S) · Vε (I − S)T Xγ∗ 2 ≤ C5 · (I − S)T Xγ∗ · l2k×2k , (A.15) = C5 · Op (n) · l2k×2k , where the last equality follows from (ii). Therefore, (A.14) and (A.15) implies that V ar βbδ ≤ C5 · O(n−1 ) · l2k×2k . 22 Table 1 The efficiency of the Naive, LL, and MIX estimators of γ0 relative to the HECKIT estimator. Case type n δ 0.05 I 0.10 0.25 0.05 II 0.10 0.25 100 200 400 100 200 400 100 200 400 100 200 400 100 200 400 100 200 400 Normal Naive LL 2.36 0.91 2.44 0.92 2.77 1.04 2.17 0.91 2.27 0.93 2.31 1.03 1.64 0.95 1.50 0.99 1.17 1.05 2.25 0.88 2.34 0.93 3.02 1.00 2.08 0.88 2.10 0.91 2.54 1.01 1.57 0.90 1.47 0.96 1.26 1.08 MIX 0.92 0.95 1.04 0.92 0.96 1.02 0.94 0.99 1.04 0.91 0.94 0.97 0.90 0.91 1.00 0.87 0.98 1.12 Naive 1.59 1.71 1.93 1.55 1.59 1.69 1.41 1.16 1.05 1.57 1.65 1.72 1.53 1.51 1.56 1.25 1.07 0.84 23 χ2 LL 0.82 0.89 0.94 0.83 0.89 0.94 0.85 0.90 0.95 0.85 0.88 0.93 0.85 0.90 0.94 0.85 0.90 0.97 MIX 0.80 0.88 0.92 0.78 0.87 0.93 0.84 0.88 0.93 0.83 0.87 0.91 0.84 0.88 0.91 0.83 0.88 0.95 Mixed Normal Naive LL MIX 2.22 0.82 0.77 2.62 0.93 0.92 2.88 0.97 0.95 2.11 0.82 0.79 2.35 0.93 0.91 2.35 0.92 0.91 1.78 0.89 0.89 1.55 0.95 0.91 1.20 0.95 0.97 2.17 0.84 0.90 2.60 0.94 0.90 2.72 0.94 0.94 2.03 0.82 0.86 2.31 0.89 0.87 2.30 0.94 0.95 1.56 0.91 0.93 1.45 0.90 0.89 1.06 0.95 0.94 Table 2 The efficiency of the Naive, LL, and MIX estimators of slope parameters relative to the HECKIT estimator. Case type n δ 0.05 I 0.10 0.25 0.05 II 0.10 0.25 100 200 400 100 200 400 100 200 400 100 200 400 100 200 400 100 200 400 Normal Naive LL 3.34 1.19 3.00 0.99 1.98 1.05 2.97 1.16 2.76 1.00 1.71 1.06 2.03 1.20 1.70 1.05 1.16 1.05 3.41 1.04 3.11 1.01 2.10 1.04 3.69 1.09 2.63 1.00 1.71 1.04 2.29 1.08 1.70 1.03 1.20 1.04 Mix 1.16 0.98 1.04 1.13 0.99 1.05 1.01 1.03 1.03 1.00 1.02 1.03 1.09 0.97 1.03 1.00 1.00 1.04 Naive 3.82 4.65 5.63 3.89 4.38 4.99 3.57 3.55 3.41 3.54 4.12 5.26 3.74 3.84 4.75 3.39 2.79 2.93 24 χ2 LL 1.10 1.06 1.00 1.14 1.06 1.00 1.15 0.98 0.98 1.15 1.10 1.04 1.23 1.10 1.00 1.24 1.02 1.00 Mix 0.81 0.97 0.95 0.84 0.95 0.96 0.87 0.83 0.92 0.77 1.01 0.97 0.83 0.99 0.91 0.85 0.91 0.93 Mixed Normal Naive LL Mix 3.77 0.96 0.84 3.00 1.00 0.96 2.76 1.00 1.00 3.22 0.91 0.85 2.49 1.02 1.00 2.16 0.98 0.97 2.52 0.99 0.94 1.77 1.04 1.01 1.31 1.01 0.99 3.90 0.92 0.88 2.84 1.01 0.96 2.55 1.02 1.01 3.51 0.91 0.88 2.33 1.01 0.95 2.10 1.00 1.00 2.33 0.95 0.94 1.48 0.99 0.99 1.21 1.01 0.98 Table 3 Empirical Example. Con ln(Y60) ln(INV) ln(SCH) TEM IMR Con2 ln(Y60)2 ln(INV)2 ln(SCH)2 TEM2 IMR2 Naive γ̂ n1 3.2368 20 Coef. t 0.0426 (1.594) -0.0068 (-1.805) 0.0064 (2.042) 0.0044 (1.674) 0.0140 (1.856) 0.1243 -0.0159 0.0043 0.0049 0.0224 (5.578) (-5.345) (1.222) (1.841) (5.405) HECKIT γ̂ n1 2.8462 15 Coef. t 0.0381 (1.411) -0.0055 (-1.433) 0.0065 (2.085) 0.0064 (2.099) 0.0121 (1.493) 0.1255 (5.585) -0.0160 (-5.360) 0.0044 (1.239) 0.0045 (1.596) 0.0224 (5.412) 0.0065 (1.375) -0.0020 (-0.415) MIX γ̂ 3.561 Coef. LL n1 27 t γ̂ 3.2368 Coef. n1 20 t -0.0063 0.0039 0.0084 0.0075 (-1.053) (1.068) (1.788) (0.621) -0.0035 0.0052 0.0100 0.0020 (-0.735) (1.593) (2.348) (0.193) -0.0164 0.0051 0.0056 0.0207 (-5.466) (1.591) (1.815) (5.350) -0.0177 0.0037 0.0048 0.0219 (-5.895) (1.174) (1.531) (5.739) (a) Number of observations is 70. Dependent variable: Growth of GDP per capita, 1960∼2000. (b) Regressors: lnGDP60 denotes Log GDP per capita in 1960, ln(SCH) denotes Log of years of schooling of population over age 25, ln(INV) denotes Log of average investment/GDP from 1960 to 1965, TEM denotes the percentage of the population living in temperate climate zones in 1995, and IMR denotes the inverse Mill’s ratio in the HECKIT model. Transition variable (qi ): EXE denotes the extent of institutionalized constraints on decision-making powers of chief executive: 1 = unlimited executive authority through 7 = executive parity or subordination. Instrumental variables: the averaged degree of autocracy over 1960-1990: 0 = democracy through 2 = autocracy, and the percentage of years 1975V2000 in which legislators elected under winnertake-all rule. All data except ln(INV) (taken directly from Penn World Tables) are taken from Glaeser et al. (2004). See Minier (2007) for the details. (c) The regression based Hausman test using whole sample indicates that these two variables are valid instruments. γ̂ denotes the threshold estimate from each method and n1 denote the number of observations with qi ≤ γ̂. The subscription 2 denotes the coefficients in the regime with qi > γ̂. 25 Table 4 First Stage Estimation and Regression-based Exogeneity Test over the whole sample Con ln(Y60) ln(INV) ln(SCH) ln(TEM) AUT PLU R2 F -statistic First Stage (dep =EXE) 3.9686 (2.278) 0.2242 (1.014) 0.1024 (0.493) 0.2230 (1.300) 0.1752 (0.529) -2.2289 -(8.097) -0.2306 -(0.940) 0.81 49.04 * ** Exogeneity Test (dep = ê) 0.0156 (0.797) -0.0015 -(0.587) -0.0003 -(0.130) -0.0008 -(0.419) 0.0004 (0.114) -0.0055 -(1.769) 0.0016 (0.580) 0.05 0.59 (a) ê denotes the OLS residuals over all countries without threshold effects. The other variables are defined in the note of Table 3. (b) ** Significant at 1% level. * Significant at 5% level. Table 5 Nonlinearity for the observations in Regime One (qi < γ̂) Con ŵi ŵi2 ŵi2 MIX (dep = φ̂i ) 0.0434 (155.972) 0.0092 (19.890) -0.0013 -(7.148) -0.0028 -(19.166) ** ** ** ** LL (dep = φ̂i ) -0.0885 -(304.834) 0.0128 (24.947) 0.0002 (1.455) -0.0032 -(20.665) ** ** ** (a) φ̂i denotes the estimates of nonlinear effects from the trimmed-LL method or LL method ŵi = γ̂ − Z θ̂i . Here, we only consider the the observations with (qi < γ̂). (b) *** Significant at 1% level. ** Significant at 5% level. * Significant at 10% level. 26 0.04 0.03 0.02 0.01 0 −0.01 −0.02 HECKIT MIX LL −0.03 −0.04 −0.05 −0.06 −6 −5 −4 −3 −2 γ−Zθ 27 −1 0 1 2 Table 6 Country List and Group Estimates Country Syrian Algeria Togo Malawi Jordan Ghana Mali Cameroon Indonesia Zambia Paraguay Niger Uganda Nicaragua Mozambique Guatemala Senegal Lesotho Iran Kenya Nepal Brazil Thailand Mexico Zimbabwe Korea Panama Argentina Peru Pakistan Chile Philippines Honduras Bolivia El Salvador Dominican Republic Spain Ecuador Portugal France Uruguay Malaysia Greece Venezuela Sri Lanka Turkey Colombia India Australia Austria Belgium Canada Costa Rica Denmark Finland Iceland Ireland Israel Italy Jamaica Japan Netherlands New Zealand Norway South Africa Sweden Switzerland Trinidad and Tobago United Kingdom United States GR 0.027 0.015 -0.001 0.015 0.013 0.012 -0.001 0.005 0.033 -0.007 0.016 -0.015 0.013 -0.013 -0.010 0.013 -0.003 0.020 0.021 0.012 0.015 0.027 0.045 0.020 0.019 0.058 0.024 0.010 0.010 0.029 0.024 0.013 0.005 0.004 0.007 0.028 0.034 0.014 0.038 0.026 0.012 0.038 0.031 -0.005 0.022 0.023 0.019 0.027 0.022 0.029 0.028 0.024 0.013 0.022 0.029 0.028 0.041 0.029 0.029 0.008 0.042 0.024 0.012 0.030 0.011 0.021 0.014 0.024 0.021 0.025 EXE 1.250 1.538 1.564 1.757 2.049 2.054 2.300 2.317 2.390 2.405 2.439 2.700 2.750 2.821 2.846 3.075 3.075 3.094 3.211 3.237 3.244 3.300 3.368 3.463 3.476 3.525 3.561 3.707 3.789 3.919 3.951 4.050 4.051 4.171 4.306 4.432 4.632 4.683 4.692 4.927 5.077 5.195 5.400 5.756 5.878 6.050 6.098 6.951 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 7.000 28 Naive HECKIT MIX LL 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2