Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Determining the Number of Groups in Latent Panel Structures with an Application to Income and Democracy∗ Xun Lu and Liangjun Su Department of Economics, Hong Kong University of Science & Technology School of Economics, Singapore Management University, Singapore February 1, 2017 Abstract We consider a latent group panel structure as recently studied by Su, Shi, and Phillips (2016), where the number of groups is unknown and has to be determined empirically. We propose a testing procedure to determine the number of groups. Our test is a residual-based LM-type test. We show that after being appropriately standardized, our test is asymptotically normally distributed under the null hypothesis of a given number of groups and has power to detect deviations from the null. Monte Carlo simulations show that our test performs remarkably well in finite samples. We apply our method to study the effect of income on democracy and find strong evidence of heterogeneity in the slope coefficients. Our testing procedure determines three latent groups among seventy-four countries. Key words: Classifier Lasso; Dynamic panel; Latent structure; Penalized least square; Number of groups; Test JEL Classification: C12, C23, C33, C38, C52. ∗ The authors express their sincere appreciation to the co-editor, Frank Schorfheide, and three anonymous referees for their many constructive comments on the previous versions of the paper. They also thank Stéphane Bonhomme, Xiaohong Chen, Han Hong, Cheng Hsiao, Hidehiko Ichimura, Peter C. B. Phillips, and Katsumi Shimotsu for discussions on the subject matter and valuable comments on the paper. Lu acknowledges support from the Hong Kong Research Grants Council (RGC) under grant number 699513. Su gratefully acknowledges the Singapore Ministry of Education for Academic Research Fund under Grant MOE2012-T2-2-021 and the funding support provided by the Lee Kong Chian Fund for Excellence. All errors are the authors’ sole responsibilities. Address correspondence to: Liangjun Su, School of Economics, Singapore Management University, 90 Stamford Road, Singapore, 178903; Phone: +65 6828 0386; E-mail: [email protected]. 1 1 Introduction Recently latent group structures have received much attention in the panel data literature; see, e.g., Sun (2005), Lin and Ng (2012), Deb and Trivedi (2013), Bonhomme and Manresa (2015, BM hereafter), Sarafidis and Weber (2015), Ando and Bai (2016), Bester and Hansen (2016), Su, Shi, and Phillips (2016, SSP hereafter), and Su and Ju (2017). In comparison with some other popular approaches to model unobserved heterogeneity in panel data models such as random coefficient models (see, e.g., Hsiao (2014, chapter 6)), one important advantage of the latent group structure is that it allows flexible forms of unobservable heterogeneity while remaining parsimonious. In addition, the group structure has sound theoretical foundations from game theory or macroeconomic models where multiplicity of Nash equilibria is expected (c.f. Hahn and Moon (2010)). The key question in latent group structures is how to identify each individual’s group membership. Bester and Hansen (2016) assume that membership is known and determined by external information, say, external classification or geographic location, while others assume that it is unrestricted and unknown and propose statistical methods to achieve classification. Sun (2005) uses a parametric multinomial logit regression to model membership. Lin and Ng (2012), BM, Sarafidis and Weber (2015), and Ando and Bai (2016) extend K-means classification algorithms to the panel regression framework. Deb and Trivedi (2013) propose EM algorithms to estimate finite mixture panel data models with fixed effects. Motivated by the sparse feature of the individual regression coefficients under latent group structures, SSP propose a novel variant of the Lasso procedure, i.e., classifier Lasso (C-Lasso), to achieve classification. While these methods make important contributions by empirically grouping individuals, to implement these methods, we often need to determine the number of groups first. Some information criteria have been proposed to achieve this goal (see, e.g., BM and SSP), which often rely on certain tuning parameters. This paper provides a hypothesis-testing-based solution to determine the number of groups. Specifically, we consider the panel data structure as in SSP: = 00 + + = 1 and = 1 (1.1) where and are the vector of regressors, individual fixed effect, and idiosyncratic error term, respectively, and 0 is the slope coefficient that can depend on individual . We assume that the individuals belong to groups and all individuals in the same group share the same slope coefficients That is, 0 ’s are homogeneous within each of the groups but heterogeneous across the groups. For a given we can apply SSP’s C-Lasso procedure or the K-means algorithm to determine the group membership and estimate 0 ’s. However, in practice, is unknown and has to be determined from data. This motivates us to test the following hypothesis: H0 (0 ) : = 0 versus H1 (0 ) : 0 ≤ max where 0 and max are pre-specified by researchers. We can sequentially test the hypotheses H0 (1) H0 (2) until we fail to reject H0 ( ∗ ) for some ∗ ≤ max and conclude that the 2 number of groups is ∗ Onatski (2009) applies a similar procedure to determine the number of latent factors in panel factor structures. In addition to helping to determine the number of groups, testing H0 (0 ) itself is also useful for empirical research. When 0 = 1 the test becomes a test for homogeneity in the slope coefficients, which is often assumed in empirical applications. When 0 is some integer greater than 1, we test whether the group structure is correctly specified. Although the group structure is flexible in terms of modeling unobserved slope heterogeneity, it could still be misspecified. Inferences based on misspecified models are often misleading. Thus conducting a formal specification test is highly desirable. Although we work in the same framework as SSP, the questions of interest are different. SSP study the classification and estimation problem, while we consider the specification testing. To avoid overlapping with SSP’s paper, we omit the detailed discussions on the estimation in this paper. Our test is a residual-based LM-type test. We estimate the model under the null hypothesis H0 (0 ) to obtain the restricted residuals, and the test statistic is based on whether the regressors have predictive power for the restricted residuals. Under the null of the correct number of latent groups, the regressors should not contain any useful information about the restricted residuals. We show that after being appropriately standardized, our test statistic is asymptotically normal under the null. The p-values can be obtained based on the standard normal approximation, and thus the test is easy to implement. Our test is related to the literature on testing slope homogeneity and poolability for panel data models in which case 0 = 1. See, e.g., Pesaran, Smith, and Im (1996), Phillips and Sul (2003), Pesaran and Yamagata (2008), and Su and Chen (2013), among others. Nevertheless, none of the existing tests can be directly applied to test = 0 where 0 1 Also, the test proposed in this paper substantially differs from the existing tests in technical details, as we need to apply the C-Lasso method or K-means algorithm to estimate the model under the null. When there are no time fixed effects in the model, both SSP’s C-Lasso method and the K-means algorithm can be used to estimate the model under H0 (0 ). So the residuals that are used to construct our LM statistic can be obtained from either method. When time fixed effects are present in the model, we extend SSP’s method to allow for time fixed effects and show that the LM statistic behaves asymptotically equivalent to that in the absence of time fixed effects. We conduct Monte Carlo simulations to show the excellent finite sample performance of our test. Both the levels and powers of our test perform well in finite samples. For the data generating processes (DGPs) considered, when both and are large, our method can determine the number of groups correctly with a high probability. Our method is applicable to a wide range of empirical studies. In this paper, we provide detailed empirical analysis on the relationship between income and democracy. Specifically, is a measure of democracy for country in period . includes its income (the logarithm of its real GDP per capita) and lagged The main parameter of interest is the effect of income on democracy. In an influential paper, Acemoglu, Johnson, Robinson, and 3 Yared (2008, AJRY hereafter) use the standard panel data model with common slope coefficients and find that the effect of income on democracy is insignificant. We find that the slope coefficients (the effects of income and lagged democracy on democracy) are actually heterogeneous. The hypothesis of homogeneous slope coefficients is strongly rejected with p-values being less than 0.001. This suggests that the common coefficient model is likely to be misspecified. Further, we determine the number of heterogeneous groups to be three and find that the slope coefficients of the three groups differ substantially. In particular, for one group, the effect of income on democracy is positive and significant, and for the other two groups, the income effects are also significant, but negative with different magnitudes. Therefore, our method provides the new insight that the effect of income on democracy is heterogeneous and significant, in sharp contrast to the existing finding that the income effect is insignificant. Our classification of groups is completely data-driven and the result does not show any apparent pattern, like in geographic locations. We further investigate the determinants of the group pattern by running a cross-country multinomial logit regression on some country-specific covariates. We find that two variables, namely, the constraints on the executive at independence and the long-run economic growth, are important determinants. There are numerous potential applications of our method. For example, SSP study the determinants of saving rates across countries. That is, is the ratio of saving to GDP, and includes the lagged saving rates, inflation rates, real interest rates, and per capita GDP growth rates. Using an information criterion, they find a two-group structure in the slope coefficients. Our method is perfectly applicable to their setting. Another example is Hsiao and Tahmiscioglu (1997, HT) who study the effect of firms’ liquidity constraints on their investment expenditure. Specifically, their is the ratio of firm ’s capital investment to its capital stock at time and includes its liquidity (defined as cash flow minus dividends), sales and Tobin’s HT classify firms into two groups, less-capital-intensive firms and morecapital-intensive firms, using the capital intensity ratio of 0.55 as a cutoff point. Our method can be applied to their setting by classifying the firms into heterogeneous groups in a datadriven way, which allows unobservable heterogeneity in the slope coefficients. In general, our method provides a powerful tool to detect unobserved heterogeneity, which is an important phenomenon in panel data analysis. It can be applied to any linear model for panel data where the time dimension is relatively long. There are two limitations of our approach. First, we treat the unknown parameters are as fixed, thus our asymptotic analysis is only pointwise. We do not consider the uniform inference here. Second, our method requires that both and be large. In particular, the finite-sample results may not be reliable when is small, and the poor performance can occur when the true number of groups is large relative to or the differences between the values of the parameters in different groups are small. The remainder of the paper is organized as follows. In Section 2, we introduce the hypotheses and the test statistic for panel data models with individual fixed effects only. In Section 3 we derive the asymptotic distribution of our test statistic under the null and 4 study the global power of our test. In Section 4, we extend the analysis to allow for both individual and time fixed effects in the models. We conduct Monte Carlo experiments to evaluate the finite sample performance of our test in Section 5 and apply it to the incomedemocracy dataset in Section 6. Section 7 concludes. All proofs are relegated to the online supplementary appendix. Notation. For an × real matrix we denote its transpose as 0 and its Frobenius norm as kk (≡ [tr(0 )]12 ) where ≡ means “is defined as”. Let ≡ (0 )−1 0 and ≡ − where denotes an × identity matrix. When = { } is symmetric, we use max () and min () to denote its maximum and minimum eigenvalues, respectively, and denote diag() as a diagonal matrix whose ( )th diagonal element is given by . Let 0 ≡ −1 i i0 and 0 ≡ − −1 i i0 where i is a × 1 vector of ones. Moreover, the operator −→ denotes convergence in probability, and −→ convergence in distribution. We use ( ) → ∞ to denote the joint convergence of and when and pass to infinity simultaneously. We abbreviate “positive semidefinite”, “with probability approaching 1”, and “without loss of generality” to “p.s.d.”, “w.p.a.1”, and “wlog”, respectively. 2 Hypotheses and test statistic In this section we introduce the hypotheses and test statistic. 2.1 Hypotheses We consider the panel structure model = 00 + + = 1 and = 1 (2.1) where is a × 1 vector of strictly exogenous or predetermined regressors, an individual fixed effect, and the idiosyncratic error term. The model with both individual and time fixed effects will be studied in Section 4. We assume that 0 has the following group structure: 0 = 0 X =1 ª © 0 1 ∈ 0 (2.2) where 1 {·} is the indicator function, is an integer, and {01 0 } forms a partition of 0 0 0 0 0 {1 } such that ∪ =1 = {1 } and ∩ = ∅ for any 6= Further, 6= for any 6= Let = #0 denote the cardinality of the set 0 We assume that G 0 ≡ {01 0 } α0 ≡ (01 0 ) and β 0 ≡ ( 01 0 ) are all unknown. One key step in estimating all these parameters is to first determine as once is determined, we can readily apply SSP’s C-Lasso method or the K-means algorithm. This motivates us to test the following hypothesis: H0 (0 ) : = 0 versus H1 (0 ) : 0 ≤ max 5 (2.3) Here we assume that max and are fixed, i.e., they do not increase with the sample size or The testing procedure developed below can be used to determine Suppose that we have a priori information such that min ≤ ≤ max where min is typically 1. Then we can first test: H0 (min ) against H1 (min ) If we fail to reject the null, then we conclude that = min Otherwise, we continue to test H0 (min + 1) against H1 (min + 1) We repeat this procedure until we fail to reject the null H0 ( ∗ ) and conclude that = ∗ If we reject = max then we can use the random coefficient model to estimate the model, i.e., = . This procedure is also used in other contexts, e.g., to determine the number of lags in autoregressive (AR) models, the cointegration rank, the rank of a matrix, and the number of latent factors in panel factor structures (Onatski, 2009). In theory, max can be any finite number. However, in practice, we do not suggest choosing too large a value for max Our method is powerful when the number of groups is small. When the number of groups is large, the classification errors can be large. Therefore, if we reject = max for a reasonably large max , then we can simply adopt the random coefficient model to avoid classification errors. 2.2 Estimation under the null and test statistic Our test is a residual-based test and so we only need to estimate the model under H0 (0 ) : = 0 In the special case where 0 = 1 the panel structure model reduces to a homogeneous panel data model so that 0 = 0 for all = 1 and we can estimate the homogeneous slope coefficient using the usual within-group estimator ̂ In the general case where 0 1 we can consider two popular ways to estimate the unknown group structure and the group-specific parameters. For example, we can apply P SSP’s C-Lasso procedure. Let ̃ = − ̄· where ̄· = −1 =1 Define ̃ and ̄· analogously. Let β̃ ≡ (̃ 1 ̃ ) and α̃0 ≡ (̃1 ̃0 ) be the C-Lasso estimators proposed in SSP, which are defined as the minimizer of the following criterion function: (0 ) 1 (β α0 ) X 0 = 1 (β) + Π || − || =1 =1 (2.4) where ≡ is a tuning parameter and ´2 1 XX³ 1 (β) = ̃ − 0 ̃ =1 =1 Let ̂ = n o 0 ∈ {1 2 } : ̃ = ̃ for = 1 0 Let ̂0 = {1 2 } \(∪ =1 ̂ ) Although SSP demonstrate that the number of elements in ̂0 shrinks to zero as → ∞ in finite samples, ̂0 may not be empty. To fully impose the null hypothesis H0 (0 ) we can force all the estimates of the slope coefficients to be grouped into 0 groups. Specifically, 6 n o we classify member in ̂0 to group ∗ where ∗ ≡ arg min ||̃ − ̃ || = 1 0 Our final estimators of 00 s are the post-Lasso estimators. Specifically, for each of the classified groups, we re-estimate the homogeneous slope coefficient using the usual withingroup estimator. Let ̂ ( = 1 0 ) be the estimator for group . Then the final estimators of 00 s are β̂ ≡ (̂ 1 ̂ ) where ̂ = ̂ for ∈ ̂ Alternatively, one can apply the K-means algorithm as advocated by Lin and Ng (2012), BM, Sarafidis and Weber (2015), and Ando and Bai (2016). Let g = {1 } denote the group membership such that ∈ {1 0 } The K-means algorithm estimates of α0 and g can be obtained as the minimizer of the following objective function ( ) 0 0 X X ³ ´2 1 X ̃ − 0 ̃ (α0 g) = =1 : = =1 (2.5) Let ĝ ≡ {̂1 ̂ } denote the K-means algorithm estimate of g With a little abuse of notation, we also use α̂0 ≡ (̂1 ̂0 ) and β̂ ≡ (̂ 1 ̂ ) to denote the K-means algorithm estimate of α00 and β 0 where ̂ = ̂̂ Define ̂ = { ∈ {1 2 } : ̂ = } for = 1 0 Note that the K-means algorithm forces all individuals to be classified into one of the 0 groups automatically so that ̂0 becomes an empty set in this case. P 0 Given {̂ } we can estimate the individual fixed effects using ̂ = 1 =1 ( − ̂ )1 The residuals are obtained by 0 ̂ ≡ − ̂ − ̂ (2.6) It is easy to show that ¡ ¢0 ̂ = ( − ̄· ) − − ̄· ̂ ¡ ¢0 = − ̄· + − ̄· ( 0 − ̂ ) (2.7) P where ̄· = −1 =1 Under the null hypothesis, ̂ is a consistent estimator of 0 2 Hence, ̂ should be close to By the assumption that the regressors are predetermined, should not have any predictive power for Specifically, we assume that the error term is a martingale difference sequence (m.d.s.) (Assumption A.1(v) below), which implies that ( | ) = 0. If we write in the linear regression form, we have = + 0 + = 1 = 1 where and are the intercept and slope coefficients, respectively, and is the regression error. Then the m.d.s. assumption implies that = 0 for all ’s. This motivates us to run the following auxiliary regression model ̂ = + 0 + = 1 = 1 1 (2.8) If 0 = 1 we set ̂ = ̂ the within-group estimator of the homogeneous slope coefficient. Note that we also suppress the dependence of ̂ on 0 2 Strictly speaking, ̃ is a consistent estimator of 0 under the null. But because the cardinality of the set ̂0 shrinks to zero under the null as → ∞ the difference between ̃ and ̂ is asymptotically negligible. 7 and test the null hypothesis H∗0 : = 0 for all = 1 We construct an LM-type test statistic by concentrating the intercept out in (2.8). Consider the Gaussian quasi-likelihood function for ̂ : (φ) = X =1 (̂ − 0 )0 (̂ − 0 ) where φ ≡ (1 )0 ̂ ≡ (̂1 ̂ )0 ≡ (1 )0 0 ≡ − −1 i i0 and i is a × 1 vector of ones. Define the LM statistic as ¶−1 µ ¶0 µ ¶ µ 2 −12 (0) −1 (0) −12 (0) 1 (0 ) = − (2.9) φ φ φ φ0 where we make the dependence of 1 (0 ) on 0 explicit. We can verify that 1 (0 ) = X −1 ̂0 0 (0 0 ) 0 0 ̂ (2.10) =1 where the dependence of 1 (0 ) on 0 is through that of ̂ on 0 We will show that after being appropriately scaled and centered, 1 (0 ) is asymptotically normally distributed under H0 (0 ) and diverges to infinity under H1 (0 ) Remark 2.1. We have included a constant term in the regression in (2.8). Under the assumption that ( ) = 0 and and pass to infinity jointly, one can also omit the constant term and obtain the following LM test statistic: 1 (0 ) = X −1 ̂0 (0 ) 0 ̂ (2.11) =1 The asymptotic distribution of (0 ) can be similarly studied with little modification. In case is not very large as in our empirical applications, we recommend including a constant term in the auxiliary regression in (2.8) and thus only focus on the study of 1 (0 ) below. 3 Asymptotic properties In this section we first present a set of assumptions that are necessary for asymptotic analyses, and then study the asymptotic distributions of 1 (0 ) under both H0 (0 ) and H1 (0 ). 8 3.1 Assumptions Let kk ≡ [(kk )]1 for ≥ 1. Let Ω̂ ≡ −1 0 0 and Ω ≡ (Ω̂ ) Define F ≡ ({+1 −1 −1 } =1 ) Let ∞ be a generic constant that may vary across lines. Following SSP, we define the following two types of classification errors: o o n n (3.1) ̂ | ∈ 0 and ̂ = ∈ 0 | ∈ ̂ ̂ = ∈ where = 1 and = 1 0 Let ̂ = ∪∈0 ̂ and ̂ = ∪∈̂ ̂ We make the following assumptions. Assumption A.1. (i) max1≤≤ max1≤≤ k k8+4 ≤ for some 0 for = and (ii) There exist positive constants Ω and ̄Ω such that Ω ≤ min1≤≤ min (Ω ) ≤ max1≤≤ max (Ω ) ≤ ̄Ω (iii) For each = 1 {( ) : = 1 2 } is a strong mixing process with mixing coefficients { (·)}. (·) ≡ (·) ≡ max1≤≤ (·) satisfies () = (− ) where = 3(2 + ) + for some 0. In addition, there exist integers 0 ∗ ∈ (1 ) such that ( 0 ) = (1) ( + 12 ) ( ∗ )(1+)(2+) = (1) and 12 −1 2∗ = (1) (iv) Let ≡ (1 )0 ( ) = 1 are mutually independent of each other. (v) For each = 1 ( |F−1 ) = 0 a.s. Assumption A.2. Under H0 (0 ) we have (i) → ∈ (0 1) for each = 1 0 as → ∞ ¡ ¢ P 0 −1 0 2 −2 (ii) k̂ − k = + ( ) ´ =1 ´ ³ ³ 0 (iii) ∪ =1 ̂ 0 = (1) and ∪ =1 ̂ = (1) Assumption A.3. There exist finite nonnegative numbers 1 and 2 such that lim sup( )→∞ log( ) 2 = 1 and lim sup( )→∞ log( ) (3+)(4+2) −(5+3)(4+2) = 2 A.1(i) imposes moment conditions on and . A.1(ii) requires that Ω be positive definite uniformly in A.1(iii) requires that each individual time series {( ) : = 1 2 } be strong mixing. This condition can be verified if does not contain lagged dependent variables no matter whether one treats the fixed effects ’s as random or fixed. In the case of dynamic panel data models, Hahn and Kuersteiner (2011) assume that ’s are nonrandom and uniformly bounded, in which case the strong mixing condition can also be verified. In the case of random fixed effects, they suggest adopting the concept of conditional strong mixing where the mixing coefficient is defined by conditioning on the fixed effects. The dependence of the mixing rate on defined in A.1(i) reflects the trade-off between the degree of dependence and the moment bounds of the process {( ) ≥ 1} The last set of conditions in A.1(iii) can easily be met. In particular, if the process is strong mixing with a geometric mixing rate, the conditions on (·) can be met simply by specifying 0 = ∗ = b log c for some sufficiently large where bc denotes the integer part of . A.1(iv) rules out 9 cross sectional dependence among ( ) and greatly facilitates our asymptotic analysis. A.1(v) requires that the error term be a martingale difference sequence (m.d.s.) with respect to the filter F which allows for lagged dependent variables in and conditional heteroskedasticity, skewness, or kurtosis of unknown form in 3 A.2(i) is typically imposed in the literature on panel data models with latent group structure; see BM, Ando and Bai (2016) and SSP who have rigorous asymptotic analysis for clustering estimators in different contexts. It implies that each group has asymptotically non-negligible members as → ∞ A.2(ii)-(iii) can be verified for either the SSP’s CLasso estimators or the K-means algorithm estimators under suitable conditions. Please note neither BM nor Ando and Bai (2016) study the same model as ours. But it is not difficult to extend their analysis to our model and verify A.2(ii)-(iii). Our model is a special case considered by SSP. We can readily verify A.2(ii)-(iii) under Assumption A.1 and the following additional conditions: (1) 12 −1 (ln )9 → 0 and 2 −(3+2) → ∈ [0 ∞) as ( ) → ∞ and (2) 2 (ln )6+2 → ∞ and (ln ) → 0 for some 0 as ( ) → ∞ Note that we allow to include the lagged dependent variables, in which case, both the SSP’s post Lasso-estimators ̂ ’s or the K-means algorithm estimators have asymptotic bias of order ( −1 ). In the absence of dynamics in the model, one would expect that ̂ − 0 = (( )−12 ). A.3 imposes conditions on the rates at which and pass to infinity, and the interaction between ( ) and . Note that we allow and to pass to infinity at either identical or suitably restricted different rates. The appearance of the logarithm terms is due to the use of a Bernstein inequality for strong mixing processes. If the mixing process {( ) ≥ 1} has a geometric decay rate, one can take an arbitrarily small in A.1(i) . In this case, A.3 puts the most stringent restrictions on ( ) by passing → 0 : 35 → 0 as ( ) → ∞ ignoring the logarithm term. On the other hand, if ≥ 1 in A.1(i), then the second condition in A.3 becomes redundant given the first condition. In the case of conventional panel data models with regressors only, Pesaran and Yamagata (2008) √ strictly exogenous √ 2 require that either → 0 or → 0 for two of their tests; but for stationary dynamic panel data models, they prove the asymptotic validity of their test only under the condition that → ∈ [0 ∞) 3 If the error terms are serially correlated in a static panel, it is well known that by adding lagged dependent and independent variables in the model, one can potentially ameliorate problems caused by such serial correlation. See Su and Chen (2013, p.1090). For dynamic panel data models (e.g., panel AR(1) model), we cannot allow for serial correlation in the error terms (e.g., AR(1) errors); otherwise, the error terms will be correlated with the lagged dependent variables, causing the endogeneity issue which we do not address in this paper. 10 3.2 Asymptotic null distribution Let denote the ( )’th element of ≡ 0 (0 0 )−1 0 0 Let † ≡ − P −12 −1 =1 ( ) and ̄ ≡ Ω † . Define #2 " X X −1 X X X 2 and ≡ 4 −2 −1 ̄0 (3.2) ̄ ≡ −12 =1 =1 =1 =2 =1 The following theorem studies the asymptotic null distribution of the infeasible statistic 1 Theorem 3.1 Suppose Assumptions A.1-A.3 hold. Then under H0 (0 ) ¡ ¢ p 1 (0 ) ≡ −12 1 (0 ) − −→ (0 1) as ( ) → ∞ The proof of the above theorem is tedious and relegated to the appendix. To imple−12 ment the test, we need consistent estimates of both and . Let ̂ = Ω̂ ( − P −12 −1 =1 ) and ̂ ≡ (̂1 ̂ )0 Note that ̂ = 0 Ω̂ We propose to estimate by X X −12 ̂ (0 ) = ̂2 (3.3) =1 =1 and by ̂ (0 ) = 4 −2 −1 X X =1 =2 Define " ̂ ̂0 −1 X =1 ̂ ̂ #2 ³ ´ q −12 ˆ 1 (0 ) − ̂ (0 ) ̂ (0 ) 1 (0 ) ≡ (3.4) (3.5) The following theorem establishes the consistency of ̂ (0 ) and ̂ (0 ) and the asymptotic distribution of ˆ1 (0 ) under H0 (0 ) Theorem 3.2 Suppose Assumptions A.1-A.3 hold. Then under H0 (0 ) ̂ (0 ) = + (1) ̂ (0 ) = + (1) and ˆ1 (0 ) −→ (0 1) as ( ) → ∞ Theorem 3.2 implies that the test statistic ˆ1 (0 ) is asymptotically pivotal under H0 (0 ) and we reject H0 (0 ) for a sufficiently large value of ˆ1 (0 ). We obtain the distributional results in Theorems 3.1 and 3.2 despite the fact that the individual effects ’s can only be estimated at the slower rate −12 than the rate ( )−12 or ( )−12 + −1 at which the group-specific parameter estimates {̂ = 1 0 } converge to their true values under H0 (0 ). The slow convergence rate of these individual effect estimates does not have adverse asymptotic effects on the estimation of the bias term the variance term and the asymptotic distribution of ˆ1 (0 ) Nevertheless, they can play an important role in finite samples, which we verify through Monte Carlo simulations. 11 3.3 Consistency Let G = {(1 ) : ∪ =1 = {1 } and ∩ = ∅ for any 6= } That is, G denotes the class of all possible -group partitions of {1 }. To study the consistency of our test, we add the following assumption. ° P ° ° 0 °2 = (1) Assumption A.4. (i) −1 =1 ° 0 °2 P 0 P ° ° → 0 as → ∞ (ii) inf (1 0 )∈G0 min(1 0 ) −1 =1 ∈ − 0 A.4(i) is trivially satisfied if 0 ’s are uniformly bounded or random with finite second moments. A.4(ii) essentially says that one cannot group the parameter vectors © 0 ª 1 ≤ ≤ into 0 groups by leaving out an insignificant number of unclassified individuals. It is satisfied for a variety of global alternatives: 1. The number of groups is = 0 + for some positive integer such that → ∈ (0 1) for each = 1 0 + © ª 2. There is no grouped pattern among 0 1 ≤ ≤ such that we have a completely heterogeneous population of individuals. 3. The regression model is actually a random coefficient model: 0 = 0 + where 0 is a fixed parameter vector, and ’s are independent and identical draws from a continuous distribution with zero mean and finite variance. P 0 4. The regression model is a hierarchical random coefficient model: 0 = =1 ( + ) ×1 { ∈ 0 } where 01 0 are defined as before, ’s (for = 1 ) are independent and identical draws from a continuous distribution with zero mean and finite variance, and may be different from 0 The following theorem establishes the consistency of ˆ1 . Theorem 3.3 Suppose Assumptions A.1 and A.3-A.4 hold. Then under H1 (0 ) with possible diverging max and random coefficients, (ˆ1 (0 ) ≥ ) → 1 as ( ) → ∞ for ¡ ¢ any non-stochastic sequence = 12 The above theorem indicates that our test statistic ˆ1 (0 ) is divergent at 12 -rate under H1 (0 ) and thus has power to detect any alternatives such that A.4 is satisfied. Remark 3.1. Our asymptotic theories here are “pointwise”, as the unknown parameters are treated as fixed. We do not consider the uniformity issue, and so our procedures may suffer from the same problem as the other post-selection inference (see, e.g., Leeb and Pöscher, 2005, 2008, 2009 and Schneider and Pöscher, 2009). This seems to be a well-known challenge in the literature of model selection or pretest estimation. Although it is a very important question, developing a thorough theory on uniform inference is beyond the scope of this paper. 12 4 Time fixed effects In applications, we may also consider the model with both individual and time fixed effects: = 00 + + + = 1 and = 1 (4.1) where is the time fixed effects, the other variables are defined as above, and we assume that 0 ’s have the unknown group structure defined in (2.2). We treat and as unknown fixed parameters that are not separately identifiable. Despite this, it is possible to estimate the group-specific parameters 0 ’s and identify each individual’s group membership consistently. 4.1 Estimation under the null and test statistic When 0 = 0 for all = 1 one can follow Hsiao (2014, Chapter 3.6) and sweep out both the individual and time effects from (4.1) via suitable transformation. We do a similar thing in the presence of heterogeneous slope coefficients 0 ’s. As before, we first eliminate the individual effects in (4.1) via the within-group transformation: ̃ = 0 ̃ + ̃ + ̃ (4.2) P where ̃ = − ̄· ̃ = − ̄ and ̄ = −1 =1 Then we eliminate ̃ from the above model to obtain 1 X 0 ̈ = 0 ̃ − ̃ + ̈ (4.3) =1 P P P 1 where ̈ = − ̄· − ̄· + ̄ ̄· = 1 =1 ̄ = =1 =1 and ̈ ̄· and ̄ are similarly defined. Under H0 (0 ) : = 0 we can follow SSP and estimate β ≡( 01 0 ) and α0 ≡ (01 00 ) by minimizing the following criterion function: (0 ) 2 (β α0 ) where X 0 = 2 (β) + Π || − || =1 =1 (4.4) Ã !2 X X X 1 1 2 (β) = ̈ − 0 ̃ + 0 ̃ =1 =1 =1 Let β̃≡(̃ 1 ̃ ) and α̃0 ≡ (̃1 ̃0 ) denote the estimates of β and α0 respectively. P ẽ = 1 (̃ − ̃ 0 ̃ ) for = 1 Given β̃ and α̃0 , If necessary, we can estimate ̃ by =1 we classify the individuals into 0 groups as in Section 2. As before, we use ̂1 ̂0 to denote the 0 estimated groups and ̂ to denote the cardinality of ̂ The post-Lasso estimates can be obtained in two ways. One is to pool all the observations within each estimated group and estimate the group-specific parameters for each group separately after we demean over time and across individuals within the group. In this way, 13 one can easily work out the standard error for each group-specific estimate as usual. Alternatively, we consider estimation based on (4.3) by using the estimated group membership. Assuming all individuals are classified into one of the 0 groups, we obtain the post-Lasso estimates {̂ } of {0 } as the solution to the following minimization problem: ⎛ ⎞2 0 X X 0 X X X 1 ⎝̈ − 0 ̃ + 1 0 ̃ ⎠ min { } =1 =1 =1 ∈̂ (4.5) ∈̂ Let ̂ = ̂ for ∈ ̂ and = 1 0 Let α̂0 = (̂1 ̂0 ) Given the above post-Lasso estimates, we define the restricted residual as 0 1 X 0 b̈ = ̈ − ̂ ̃ + ̂ ̃ =1 (4.6) Following the analysis in Section 2.2, we propose to test H0 (0 ) by using the following LM-type test statistic 2 (0 ) = X =1 0 −1 b̈ 0 (0 0 ) 0 0 b̈ 1 b̈ )0 We will show that after being appropriately centered, −12 2 (0 ) where b̈ = (b̈ follows the same asymptotic distribution as −12 1 (0 ) under the null. 4.2 Asymptotic properties of the C-Lasso estimates Since SSP do not allow the fixed time effects in their model, their asymptotic theory does not apply to our C-Lasso estimates of the parameters and group identity. Fortunately, we can study the preliminary convergence rates of various estimates under Assumptions A.1, A.2(i) and A.3. Theorem 4.1 °Suppose°that Assumptions A.1, A.2(i), and A.3 hold. Then under H0 (0 ) 2 P ° 0° (i) 1 ̃ − = ( −1 ) ° ° ¢ ¡ −12 ¢ ¡ =1 0 0 ) = − ( (ii) ̃(1) ̃( ) 0 1 0 ° ° ° 0° (iii) max1≤≤ °̃ − ° = ( + ) where = max{( )1(4+2) log ( ) (log ( ) )12 } and (̃(1) ̃(0 ) ) is some suitable permutation of (̃1 ̃0 ) Theorems 4.1(i)-(iii) establish the mean square convergence rate of {̃ } the convergence rate of the group-specific estimates {̃ } and the uniform convergence rate of ̃ ’s, respectively. The findings are similar to those in SSP. In particular, the estimates of the 14 group-specific parameters do not depend on Note that = ((log ( ) )12 ) under the additional restriction that lim sup( )→∞ −(1+) = ∈ [0 ∞) For notational simplicity, hereafter we simply write ̃ for ̃() as the consistent estimator of 0 ’s. Define the two types of classification errors ̂ and ̂ as before. To study the consistency of our classification method, we add the following assumption. Assumption A.5. (i) 12 −1 (ln )9 → 0 and lim sup( )→∞ −(1+) = ∈ [0 ∞) (ii) 2 (ln )6+2 → ∞ and (ln ) → 0 for some 0 as ( ) → ∞ Assumptions A.5(i)-(ii) strengthen the conditions on and . Note that A.5(ii) implies that it suffices to choose ∝ − for some ∈ (14 12) The following theorem establishes the uniform consistency for our classification method. Theorem 4.2 Suppose that Assumptions A.1, A.2(i), A.3 and A.5 hold. Then under H0 (0 ) ³ ´ P ´ ³ 0 0 ≤ → 0 as ( ) → ∞ ̂ (i) ∪ ̂ ³ =1 ´ P=1 ³ ´ 0 0 (ii) ∪ ≤ → 0 as ( ) → ∞ =1 ̂ =1 ̂ Given the results in Theorems 4.1 and 4.2, we will show that the post-Lasso estimators {̂ } is asymptotically oracally efficient in the sense that they are as efficient as the infeasible estimators {̄ } under some regularity conditions: ⎛ ⎞2 0 X X 0 X X X 1 ⎝̈ − 0 ̃ + 1 {̄ } ≡ arg min 0 ̃ ⎠ { } 0 =1 0 =1 =1 ∈ (4.7) ∈ P P P Let ᾱ0 = (̄1 ̄0 ) Let Q = 1 ∈0 X̃0 ̃ Q = 1 ∈0 ∈0 X̃0 ̃ P P and V = 1 ∈0 X̃0 ̈ for = 1 0 where X̃ = ̃ − −1 ∈0 ̃ and ̈ = (̈1 ̈ )0 Define ⎞ ⎞ ⎛ ⎛ Q1 − Q11 V1 −Q12 ··· −Q10 ⎟ ⎜ ⎜ V2 ⎟ −Q21 Q2 − Q22 · · · −Q20 ⎟ ⎟ ⎜ ⎜ and V = Q = ⎜ ⎟ ⎟ ⎜ .. .. .. .. ... ⎠ ⎠ ⎝ ⎝ . . . . −Q0 1 −Q0 2 · · · Q0 − Q0 0 Let Q̂ and V̂ be analogously defined as Q and V with ̂ ’s and ̂ ’s. We can verify that V0 0 ’s (4.8) and ’ replaced by −1 vec(ᾱ0 ) = Q−1 V and vec(α̂0 ) = Q̂ V̂ P P P P P Let B ≡ 1 2 ∈0 =1 =1 U ≡ 1 ∈0 =1 ⊥ and Ω = ¡ ⊥ ⊥0 2 ¢ P P () ⊥ () where ) for = 1 0 0 = − (̄· − ̄ =1 ∈ 2 15 ¡ ¢0 P P () () ̄· = 1 ∈0 and ̄ () = 1 =1 ̄· 4 Let B = B01 B00 U = ¢ ¡ 0 0 U1 U00 and Ω =diag(Ω1 Ω0 ) We add the following assumption. Assumption A.6. (i) Q → Q0 0 as ( ) → ∞ √ (ii) U → (0 Ω0 ) as ( ) → ∞ where Ω0 = lim( )→∞ Ω Assumption A.6 imposes conditions to ensure the asymptotic normality of the post-Lasso estimators of the group-specific parameters. When 0 = 1 Q reduces to Q1 − Q11 which is required to be asymptotically nonsingular for the usual fixed effects estimator in a two-way error component model to be asymptotically normal. Theorem 4.3 Suppose Assumptions A.1, A.2(i), A.3, A.5 and A.6 hold. Then under H0 (0 ) ³ ´ (i) ̂ − 0 = ̄ − 0 + (( )−12 ) = ( )−12 + −1 for = 1 0 √ £ ¡ ¡ ¢ ¤ ¢ −1 −1 B → 0 Q Ω Q (ii) vec α̂0 −α00 + Q−1 0 0 0 estimaTheorem 4.3(i) indicates that ̂ is asymptotically equivalent to the infeasible ° °2 P ° 0° 1 tor ̄ It also reports the convergence rate of ̂ and implies that =1 °̂ − ° = (( )−1 + −2 ) Theorem 4.3(ii) reports the asymptotic distribution of α̂0 . By the −1 Davydov inequality, we can readily show that Q−1 ) which suggests that the B is ( −1 order of the asymptotic bias of vec(α̂0 ) is given by ( ) Note that this bias term has the exact form as the bias term in the linear panel data model with individual fixed effects alone considered by SSP. As in SSP, the bias is asymptotically vanishing in the case where only contains strictly exogenous regressors or = ( ) For inference, one needs to estimate Q0 Ω0 and Q−1 B When necessary, one can −1 follow SSP to estimate Q B A consistent estimate of Q0 is given by Q̂ Similarly, we can consistently estimate Ω0 by a sample analogue of Ω For brevity, we do not give the details. 4.3 Asymptotic null distribution of 2 (0 ) Despite the presence of time fixed effects, we can show that 2 (0 ) shares the same asymptotic null distribution as 1 (0 ) This is summarized in the following theorem. Theorem 4.4 Suppose Assumptions A.1, A.2(i), A.3, A.5 and A.6 hold. Then under H0 (0 ) 4 ¡ ¢ p 2 (0 ) ≡ −12 2 (0 ) − −→ (0 1) as ( ) → ∞ ⊥ If { ≥ 1} is mean-stationary, then = 16 That is, −12 2 (0 ) share the same asymptotic bias ( ) and asymptotic variance ( ) as −12 1 (0 ) As before, we can consistently estimate both and by ̂ (0 ) and ̂ (0 ). The major difference is that we now need to use the residuals b̈ to replace ̂ throughout. Given ̂ (0 ) and ̂ (0 ) we can obtain a feasible version of 2 (0 ) : ³ ´ q −12 ˆ 2 (0 ) − ̂ (0 ) ̂ (0 ) 2 (0 ) ≡ Following the proofs of Theorems 3.2 and 3.3, we can readily show that ˆ2 (0 ) is asymptotically (0 1) under H0 (0 ) and divergent at 12 -rate under under H1 (0 ) That is, results analogous to those in Theorems 3.2 and 3.3 also hold. For brevity we omit the details. 5 Monte Carlo simulations In this section, we conduct Monte Carlo simulations to examine the finite sample performance of our proposed testing method. 5.1 Data generating processes and implementation We consider five generating processes (DGPs). The first four DGPs only have individual fixed effects: DGP 1: = 01 1 + 02 2 + + DGPs 2-4: = 01 1 + 02 −1 + + where = + = 1 2 and 1 2 and are IID (0 1) variables, and mutually independent of each other. DGP 5 has both individual and time fixed effects: DGP 5: = 01 1 + 02 −1 + 05 + 05 + where 1 = 1 + + and 1 and are mutually independent IID (0 1) variables. DGP 1 is a static panel structure, while DGPs 2-5 are dynamic panel structures. In ¡ ¢ DGPs 1, 2 and 5, 01 02 has a group structure: ⎧ ¡ 0 0 ¢ ⎨ (05 −05) with probability 0.3 1 2 = (−05 05) with probability 0.3 ⎩ (0 0) with probability 0.4 Therefore in DGPs 1, 2 and 5, the true number of groups is 3. In DGP 3, we consider a completely heterogeneous (random coefficient) panel structure where 01 and 02 follow (05 1) and U(−05 05) respectively. In principle, the true number of groups is the cross-sectional dimension in this case. In DGP 4, ( 01 02 ) is similar to that in DGPs 1, 17 2 and 5 except that it has some additional small disturbance. ⎧ ¡ 0 0 ¢ ⎨ (05 + 01 1 −05 + 01 2 ) with 1 2 = (−05 + 01 1 05 + 01 2 ) with ⎩ (01 1 01 2 ) with Specifically, probability 0.3 probability 0.3 probability 0.4 where 1 and 2 are each IID (0 1) mutually independent, and independent of , and DGP 4 can be thought of as a small deviation from a group structure. For each DGP, we first test the null hypotheses: H0 (1) H0 (2) and H0 (3) to examine the level and power of our test. We then use our tests to determine the number of groups as described in Section 2.1. We set max = 8 and let the nominal size decrease with the time series dimension to ensure that the type I error decreases with Specifically, we let the nominal size be 1 which equals 0.10 and 0.025 for = 10 and 40, respectively.5 If all eight hypotheses, H0 (1) and H0 (8) are rejected, then we stop and conclude that the number of groups is greater than 8 and we can adopt the random coefficient model. For the combination of and , we consider the typical case in empirical applications that is smaller than or comparable to and let ( ) = (40 10) (80 10) and (40 40) The number of replications in the simulations is 1000. One important step in implementing our testing procedure is to choose the tuning parameter Following the theory in SSP, we let = · 2 · −13 where is the sample standard deviation of and is some constant. We use three different values of (025 05 and 075) to examine the sensitivity of our results to (thus ). As a comparison, we also consider the information criterion (IC) proposed in SSP to determine the number of groups. Specifically, we choose to minimize the following IC: ¡ ¢ ( ) = ln ̂2( ) + 1 = 1 max (5.1) P P 2 1 0 where ̂ 2( ) = =1 =1 ̂ and ̂ s are the residuals under the specification of groups with the tuning parameter . ̂0 s are defined in (2.6) or (4.6) in the presence of both individual and time fixed effects. 1 is a tuning parameter and we set 1 = 23 ( )−12 as in SSP. Note that SSP’s IC can also be used to choose the tuning parameter and jointly by minimizing ( ) over both and 5.2 Simulation results Table 1 shows the level and power behavior of our test statistics for testing the three null hypotheses: H0 (1) H0 (2) and H0 (3) We choose three conventional nominal levels: 0.01, 0.05 and 0.10. For DGPs 1, 2 and 5, the true number of groups is 3. For H0 (1) the rejection frequencies are all 1 for all combinations of and and all three DGPs at all three nominal 5 We also try fixing the nominal level at 0.05. The results are similar and available upon request. 18 levels. For H0 (2) the power of the test increases rapidly with both and For example, when = 40 and = 40 the rejection frequencies are all 1 at all three nominal levels. For H0 (3) we examine the level of our test and find that the rejection frequencies are fairly close to the nominal levels. For the heterogeneous DGP 3, our test rejects all three hypotheses with the frequencies being 1 or nearly 1 at all three nominal levels. This reflects the power of our test against global alternatives. For DGP 4 which represents a small deviation from the group structure, our test shows reasonable power for large , though it rejects = 3 with a small frequency when is small, as expected. Also note that all the testing results are quite robust to the values of (thus ). Table 2A shows how our method performs in terms of determining the number of groups. Specifically, the numbers in the main entries are the proportions of the replications in which the number of groups determined by our method is equal to, less than (5 in DGP 3), or greater than (5 in DGPs 1, 2, 4, and 5, 8 in DGP 3) a number. For DGPs 1, 2 and 5, our method determines the correct number of groups (3) with a large probability. For example, when = 40 and = 40 for DGP 5, the number of groups determined by our testing procedure equals the true number (3) with a probability of 0.99. For DGP 3 where the true number of groups is our method determines a large number of groups (greater than 8) with a probability of 1 when = 40 and = 40. DGP 4 represents a small deviation from a 3-group structure. When the sample size is small ( = 40 and = 10) with a high probability, the number of groups determined by our method is 2 or 3. When the sample size increases to = 40 and = 40 our method determines 4 groups with a probability larger than 0.25. These results are reasonable. Intuitively, if and are small, the data can only provide limited information on the underlying DGP, and it is reasonable to apply a small group structure to serve as a good approximation to the true model. As and become large, more information on the underlying DGP is revealed, and it is sensible to adopt a larger number of groups to approximate the true model more accurately. Table 2B shows the performance of SSP’s information criterion (IC). Comparing Tables 2A and 2B, we find that for DGPs 1, 2 and 5, the performances of our method and SSP’s IC are comparable. However, for DGPs 3 and 4, our method dominates SSP’s. Specifically, for DGP 3 where the number of groups is , SSP tend to determine a small number of groups (less than 6 even when = 40 and = 40), while our method determines that the number of groups is greater than 8 for most cases. For DGP 4, in large samples ( = 40 and = 40), SSP determine 3 groups with a high probability, while our method determines a relatively large number of groups. To examine the performance of the C-Lasso estimator following our method to determine the number of groups, we compare the following four estimators: () the common coefficient estimator, () the random coefficient estimator, () the post C-Lasso estimator with the number of groups determined by our method, and () the infeasible estimator where the true number of groups and group classification are known. We report the mean squared 19 errors (MSE), calculated as the average values of 1 °2 P ° ° 0° =1 °̂ − ° over 1000 replications, where ̂ denotes the estimator. Table 3 presents the results for DGP 1.6 It is clear that our estimator dominates the common coefficient estimator and the random coefficient estimator. However it performs worse than the infeasible estimator, especially when the sample size is small. This reflects the effect of classification errors in finite samples. 6 Empirical application: income and democracy The relationship between income and democracy has attracted much attention in empirical research; see, e.g., Lipset (1959), Barro (1999), AJRY, and BM. To the best of our knowledge, none of the existing studies allows for heterogeneity in the slope coefficients in their model specifications. As discussed in AJRY, “societies may embark on divergent political-economic development paths”. Thus, ignoring the heterogeneity in the slope coefficients may result in model misspecification and invalidate subsequent inferences. Hence, it is important to know whether the data support the assumption of homogeneous slope coefficients. If not, then we need to determine the number of heterogeneous groups and classify the countries using statistic methods. We apply our new method to study this important question. 6.1 Data and implementation We let be a measure of democracy for country in period and be the logarithm of its real GDP per capita. The measure of democracy and real GDP per capita are from the Freedom House and Penn World Tables, respectively.7 Note that the Freedom House measures of democracy ( ) are normalized to be between 0 and 1. We consider the specification with both individual and time fixed effects = 1 −1 + 2 −1 + + + and assume that ( 1 2 ) has a group structure to account for possible heterogeneity. In a closely related paper, BM consider a group structure in the interactive fixed effects and assume ( 1 2 ) is constant for all ’s.8 6 The results for other DGPs are available upon request. All the data are directly from AJRY: http://economics.mit.edu/faculty/acemoglu/data/ajry2008. 8 In BM’s supplementary material, they also consider the slope heterogeneity. Specifically, they consider the following two specifications (using our notation) 7 = 0 + + and = 0 + + + where is the group-specific slope coefficient, is the group-specific (time-varying) fixed effect, and 20 =1 (alternative) DGP 1 =2 (alternative) =3 (null) =1 (alternative) DGP 2 =2 (alternative) =3 (null) =1 (alternative) DGP 3 =2 (alternative) =3 (alternative) =1 (alternative) DGP 4 =2 (alternative) =3 (alternative) =1 (alternative) DGP 5 =2 (alternative) =3 (null) 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 40 80 40 Table 1: Empirical rejection frequency = 025 = 05 0.01 0.05 0.10 0.01 0.05 10 1 1 1 1 1 10 1 1 1 1 1 40 1 1 1 1 1 10 .13 .31 .43 .12 .30 10 .28 .53 .67 .28 .52 40 1 1 1 1 1 10 .000 .01 .04 .01 .03 10 .001 .01 .04 .003 .03 40 .003 .02 .04 .01 .03 10 1 1 1 1 1 10 1 1 1 1 1 40 1 1 1 1 1 10 .06 .21 .32 .06 .18 10 .16 .38 .54 .14 .36 40 1 1 1 1 1 10 .001 .01 .01 .001 .01 10 .000 .003 .01 .002 .01 40 .001 .01 .03 .01 .02 10 1 1 1 1 1 10 1 1 1 1 1 40 1 1 1 1 1 10 1 1 1 1 1 10 1 1 1 1 1 40 1 1 1 1 1 10 .98 1 1 .98 1 10 1 1 1 1 1 40 1 1 1 1 1 10 1 1 1 1 1 10 1 1 1 1 1 40 1 1 1 1 1 10 .12 .30 .44 .12 .28 10 .34 .60 .72 .32 .57 40 1 1 1 1 1 10 .001 .02 .04 .004 .02 10 .01 .05 .08 .02 .06 40 .26 .49 .62 .32 .54 10 1 1 1 1 1 10 1 1 1 1 1 40 1 1 1 1 1 10 .38 .61 .75 .36 .60 10 .72 .89 .94 .71 .89 40 1 1 1 1 1 10 .01 .04 .06 .01 .03 10 .01 .04 .07 .01 .05 40 .002 .02 .05 .003 .02 0.10 1 1 1 .41 .66 1 .06 .07 .07 1 1 1 .30 .51 1 .02 .02 .04 1 1 1 1 1 1 1 1 1 1 1 1 .42 .70 1 .05 .10 .67 1 1 1 .74 .94 1 .06 .09 .06 0.01 1 1 1 .12 .26 1 .01 .02 .02 1 1 1 .05 .14 1 .001 .001 .01 1 1 1 1 1 1 .99 1 1 1 1 1 .10 .31 1 .004 .02 .38 1 1 1 .37 .71 1 .01 .03 .01 = 075 0.05 0.10 1 1 1 1 1 1 .29 .41 .52 .65 1 1 .06 .10 .06 .12 .06 .12 1 1 1 1 1 1 .17 .29 .35 .50 1 1 .02 .04 .02 .05 .05 .09 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .27 .41 .53 .69 1 1 .03 .08 .08 .15 .62 .74 1 1 1 1 1 1 .61 .74 .89 .94 1 1 .04 .07 .07 .13 .03 .07 Note: Numbers in the main entries are the rejection frequencies under the null or the alternative for three nominal levels: 0.01, 0.05 and 0.10. We consider three values of the constant in the tuning parameter : 0.25, 0.5 and 0.75. 21 DGP 1 DGP 2 DGP 3 DGP 4 DGP 5 Table 2A: Frequency of number of groups determined by our method =1 =2 =3 =4 =5 40 10 0 .57 .40 .03 0 = 025 80 10 0 .34 .63 .03 0 40 40 0 0 .99 .01 0 40 10 0 .59 .35 .05 .01 = 05 80 10 0 .34 .60 .05 .01 40 40 0 0 .98 .02 0 40 10 0 .59 .31 .07 .02 = 075 80 10 0 .35 .53 .08 .04 40 40 0 0 .96 .03 .01 =1 =2 =3 =4 =5 40 10 0 .68 .30 .01 0 = 025 80 10 0 .46 .52 .01 0 40 40 0 0 1 0 0 40 10 0 .71 .28 .02 0 = 05 80 10 0 .49 .49 .01 .01 40 40 0 0 .99 .01 0 40 10 0 .72 .25 .03 .01 = 075 80 10 0 .51 .45 .04 .01 40 40 0 0 .98 .02 0 5 =5 =6 =7 =8 40 10 .02 .03 .04 .05 .01 = 025 80 10 0 0 0 0 0 40 40 0 0 0 0 0 40 10 .01 .01 .01 .02 .01 = 05 80 10 0 0 0 0 0 40 40 0 0 0 0 0 40 10 0 0 .01 .01 .01 = 075 80 10 0 0 0 0 0 40 40 0 0 0 0 0 =1 =2 =3 =4 =5 40 10 0 .56 .39 .04 .01 = 025 80 10 0 .28 .64 .07 .01 40 40 0 0 .62 .26 .11 40 10 0 .58 .37 .04 .01 = 05 80 10 0 .30 .60 .09 .02 40 40 0 0 .58 .28 .13 40 10 0 .59 .33 .06 .02 = 075 80 10 0 .31 .54 .11 .03 40 40 0 0 .49 .34 .16 =1 =2 =3 =4 =5 40 10 0 .25 .69 .05 0 = 025 80 10 0 .06 .87 .06 .01 40 40 0 0 .99 .01 0 40 10 0 .26 .68 .05 0 = 05 80 10 0 .06 .85 .08 0 40 40 0 0 .99 .01 0 40 10 0 .27 .67 .06 .01 = 075 80 10 0 .06 .81 .12 .01 40 40 0 0 .99 .01 0 5 0 0 0 0 0 0 .01 .01 0 5 0 0 0 0 0 0 0 0 0 8 .86 1 1 .93 1 1 .97 1 1 5 0 0 .01 0 0 .01 0 .01 .01 5 0 0 0 0 0 0 0 0 0 Note: Numbers in the main entries are the proportions of the replications in which the determined number of groups is less than, equal to, or greater than a number. We consider three values of the constant in the tuning parameter : 0.25, 0.5 and 0.75. 22 Table 2B: Frequency of 40 10 = 025 80 10 40 40 40 10 DGP 1 = 05 80 10 40 40 40 10 = 075 80 10 40 40 40 10 = 025 80 10 40 40 40 10 DGP 2 = 05 80 10 40 40 40 10 = 075 80 10 40 40 40 10 = 025 80 10 40 40 40 10 DGP 3 = 05 80 10 40 40 40 10 = 075 80 10 40 40 40 10 = 025 80 10 40 40 40 10 DGP 4 = 05 80 10 40 40 40 10 = 075 80 10 40 40 40 10 = 025 80 10 40 40 40 10 DGP 5 = 05 80 10 40 40 40 10 = 075 80 10 40 40 number of groups determined by =1 =2 =3 0 .49 .50 0 .13 .81 0 0 1 0 .67 .33 0 .28 .68 0 0 1 0 .80 .20 0 .46 .52 0 0 1 =1 =2 =3 0 .55 .44 0 .17 .79 0 0 1 0 .70 .30 0 .31 .66 0 0 1 0 .81 .19 0 .51 .46 0 0 1 5 =5 =6 .96 .03 .01 .95 .05 0 .67 .21 .10 .99 .01 0 .98 .02 0 .81 .13 .05 .99 .01 0 .99 .01 0 .85 .11 .02 =1 =2 =3 0 .51 .46 0 .16 .74 0 0 1 0 .65 .34 0 .30 .65 0 0 1 0 .77 .22 0 .50 .45 0 0 .99 =1 =2 =3 0 .16 .79 0 .01 .89 0 0 1 0 .18 .75 0 .02 .83 0 0 1 0 .22 .70 0 .03 .70 0 0 1 the information criterion =4 =5 5 .01 0 0 .06 .01 0 0 0 0 .01 0 0 .04 0 0 0 0 0 0 0 0 .02 0 0 0 0 0 =4 =5 5 .01 0 0 .04 0 0 0 0 0 0 0 0 .02 .01 0 0 0 0 0 0 0 .03 0 0 0 0 0 =7 =8 8 0 0 0 0 0 0 .02 0 0 0 0 0 0 0 0 .01 0 0 0 0 0 0.00 0 0 .01 0 0 =4 =5 5 .03 0 0 .10 .01 0 0 0 0 .02 0 0 .05 .01 0 0 0 0 .01 0 0 .04 .01 0 .01 0 0 =4 =5 5 .06 0 0 .10 .01 0 0 0 0 .07 0 0 .16 0 0 0 0 0 .08 0 0 .27 0 0 0 0 0 Note: Numbers in the main entries are the proportions of the replications in which the determined number of groups is less than, equal to, or greater than a number. We consider three values of the constant in the tuning parameter : 0.25, 0.5 and 0.75. 23 40 80 40 10 10 40 Table 3: Comparison of MSE of various estimators, DGP 1 Common Random Post C-Lasso coefficient coefficient = 025 = 05 = 075 Infeasible 0.31 0.33 0.18 0.18 0.18 0.02 0.30 0.33 0.16 0.17 0.17 0.01 0.30 0.06 0.02 0.02 0.03 0.004 Note: Numbers in the main entries are the MSE of various estimators. For the Post C-Lasso estimators, we consider three values of the constant in the tuning parameter : 0.25, 0.5, and 0.75. Time period 1 2 3 4 5 6 7 8 Years 1961 1966 1971 1976 1981 1986 1991 1996 - 1965 1970 1975 1980 1985 1990 1995 2000 Table 4: Summary statistics ( = 74) : democracy : logarithm of real GDP per capita mean median s.d. mean median s.d. 0.56 0.54 0.26 7.69 7.66 0.81 0.37 0.33 0.33 7.83 7.76 0.85 0.34 0.33 0.31 7.93 7.96 0.89 0.42 0.33 0.32 8.03 8.02 0.93 0.46 0.33 0.35 8.06 8.07 0.94 0.51 0.50 0.34 8.10 8.16 1.01 0.54 0.50 0.33 8.14 8.23 1.10 0.60 0.67 0.33 - We use a balanced panel dataset similar to that in BM. The number of countries () is 74. The time index is = 1 8 where each period corresponds to a five-year interval over the period 1961-2000. For example, = 1 refers to years 1961-1965. Because the lagged is used as a regressor, the number of time periods ( ) is 7 i.e., = 2 8 The choice of countries is determined by data availability. In addition, we exclude the countries whose measures of democracy remain constant over the seven periods ( = 2 8). A list of the 74 countries can be found in Table 8. Table 4 presents the summary statistics. The implementation details of our method are the same as in the simulations. 6.2 Testing and estimation results We first test the hypothesis H0 (1) i.e., we test whether ( 1 2 ) is constant for all We soundly reject this hypothesis with a p-value being less than 0.001. This provides strong evidence that the slope coefficients are not homogeneous. We then test the hypothesis H0 (2) is the country-specific (time-invariant) fixed effect. Their model specification is more general than ours, as they allow time-varying group-specific fixed effects They report the results for the number of groups being 4 For their first specification, they find that income effect is statistically significant, while for their second specification, they find that income effect is not statistically significant. Our empirical results are somehow different from theirs due to the different model specifications adopted. 24 for three values of the tuning parameter = · 2 · −13 ( = 025 05 and 075). We reject this hypothesis at the 5% level with p-values being 0.03, 0.03 and 0.02 for = 025 0.5 and 0.75, respectively. We continue to test H0 (3) and find that the p-values are 0.42, 0.20 and 0.14 for = 025 0.5 and 0.75, respectively. Considering that the p-values are above or close to the recommended nominal level 1 (0.14), we stop the testing procedure and conclude that the number of groups is 3. Table 5 presents all the testing results and Table 8 shows the country membership of the three groups. To take into account of the issue of multiple testing, we also report Holm’s (1979) adjusted -values in the last row in Table 5,9 which also lend strong support to the conclusion of three groups in the data. We also consider SSP’s IC (see equation (5.1) above) to determine the number of groups. Again, we try three choices of as above with = 025 05 and 075 When = 025 and 0.5, the IC determines 3 groups, as shown in Figure 1. When = 075 the IC determines 2 groups.10 Given that the majority of our results determine 3 groups, we conclude that = 3 Table 6 presents the estimation results. We report both C-Lasso estimates and post C-Lasso estimates. C-Lasso estimates are defined in Section 4.1. The post C-Lasso is implemented as in (4.5). Both estimates are bias-corrected and the standard errors are obtained using the asymptotic theory developed in SSP. The C-Lasso and post C-Lasso estimation results are similar for different values of = 025 0.5 and 0.75. In the following discussion, we focus on the post C-Lasso estimates with = 05 It is clear that the estimated slope coefficients exhibit substantial heterogeneity. For 1 the estimates for the three groups are 0.26, -0.16 and -0.24. All of them are significant at the 1% level. It is interesting to note that not only the magnitudes but also the signs of estimates are different among the three groups. For Group 1, income has a positive effect on democracy, while for Groups 2 and 3, the effects are negative with different magnitudes. For 2 the three group estimates are 0.10, -0.24 and 0.41. The first estimate is not significant, while the second and third estimates are significant at the 1% level. We also present the point estimates of the cumulative income effect (CIE), which is defined as 1 (1 − 2 ) The estimates of CIE for the three groups are 0.28, -0.13 and -0.40, which imply that a 10% increase in income per capita is associated with increases of 0.028, -0.013 and -0.040 in the measures of democracy, respectively. Note that if we assume that 1 and 2 are homogeneous across then the common estimates of 1 and 2 are -0.02 and 0.27 with t-statistics being -0.45 and 5.95, respectively. 9 Suppose that we have tested ∗ individual hypotheses, and we order the individual p-values from the smallest to the largest as (1) ≤ (2) ≤ ≤ ( ∗ ) with their corresponding null hypotheses labeled accordingly as H0(1) H0(2) H0( ∗ ) We calculate the step-down Holm adjusted -values for testing H0() as ¡ ¢ adjusted-() = min ( ∗ − + 1) () 1 10 The values of IC for 2 groups and 3 groups are -3.4078 and -3.4031, respectively, which are actually quite close. 25 Null hypothesis Statistics p-values Holm adjusted p-values Table 5: Test statistics = 025 = 05 =1 =2 =3 =1 =2 =3 3.57 1.93 0.20 3.57 1.93 0.83 0.0002 0.03 0.42 0.0002 0.03 0.20 0.0006 0.05 0.42 0.0006 0.05 0.20 =1 3.57 0.0002 0.0006 = 075 =2 =3 2.06 1.07 0.02 0.14 0.04 0.14 Note: Numbers in the main entries are the test statistics, p-values, and Holm adjusted p-values for three values of the constant in the tuning parameter : 0.25, 0.5 and 0.75. This suggests that the effect of income on democracy is not statistically significant. This finding is consistent with that in AJRY. Nevertheless, the insignificance could be due to the model misspecification which ignores slope heterogeneity. Note that the common estimate falls in the middle of the group estimates. Here all group estimates are significant, but with different signs. Therefore the common estimate, which can be thought of as a certain weighted average of the group estimates, could be close to zero and statistically insignificant. To understand the heterogeneity in the data intuitively, we select three countries (Malaysia, Indonesia, and Nepal) and show their time-series data in Table 7. We simply calculate the correlations between the dependent variable and the key explanatory variable −1 Even the simple correlations exhibit substantial heterogeneity with the values being -0.86, 0.07 and 0.66. This suggests that it is implausible for the slope coefficients to be the same for all countries even without performing any sophisticated analysis. This application shows that ignoring the heterogeneity in the slope coefficients can mask the true underlying relationship among economic variables. 6.3 Explaining the group pattern As discussed above, the group estimates of 1 and 2 show substantial heterogeneity, though most of them are statistically significant at the 5% level. So far, our classification of the groups is completely statistical and does not use any a priori information. One natural question is how to explain the group membership. Apparently, the group membership shown in Table 8 does not match the countries’ geographic locations. We further investigate the group pattern using a cross-sectional multinomial logit model. We let the dependent variable be group membership, which takes one of three values: 1, 2 or 3.11 The explanatory variables include () a measure of constraints on the executive at independence, () the 500-year change in income per capita over 1500-2000, () the 500-year change in democracy over 1500-2000, () independence year/100, () initial education level 11 We only report the results for = 05. The results for = 025 and 075 are similar and available upon request. 26 Information Criteria (IC) for different numbers of groups −3.2 −3.25 −3.3 −3.35 IC with c=0.25 IC with c=0.5 IC with c=0.75 −3.4 −3.45 1 2 3 4 5 Number of groups 6 7 8 Figure 1: Information criteria Note: Values in the figure are the information criteria for different number of groups for three values of the constant in the tuning parameter : 0.25, 0.5 and 0.75. common estimation Group C-Lasso Group = 025 Group Group Post Group C-Lasso Group Group C-Lasso Group = 05 Group Group Post Group C-Lasso Group Group C-Lasso Group = 075 Group Group Post Group C-Lasso Group 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Table 6: Estimation results 1 2 estimates s.e. t-stat estimates s.e. -0.02 0.038 -0.45 0.27 0.046 0.23 0.050 4.62 0.11 0.058 -0.21 0.032 -6.65 -0.31 0.055 -0.24 0.102 -2.36 0.53 0.052 0.25 0.046 5.38 0.09 0.058 -0.19 0.033 -5.67 -0.24 0.041 -0.28 0.095 -2.97 0.49 0.049 0.27 0.042 6.38 0.11 0.065 -0.19 0.031 -5.98 -0.33 0.067 -0.21 0.084 -2.56 0.47 0.064 0.26 0.045 5.66 0.10 0.064 -0.16 0.033 -4.80 -0.24 0.047 -0.24 0.078 -3.02 0.41 0.061 0.29 0.043 6.84 0.03 0.067 -0.19 0.032 -5.90 -0.29 0.056 -0.17 0.076 -2.25 0.49 0.058 0.27 0.047 5.71 0.04 0.067 -0.16 0.033 -4.76 -0.24 0.047 -0.19 0.070 -2.72 0.43 0.054 t-stat 5.95 1.99 -5.67 10.28 1.59 -5.90 9.99 1.71 -4.91 7.40 1.51 -5.05 6.78 0.44 -5.21 8.52 0.55 -5.00 8.06 CIE estimates -0.02 0.26 -0.16 -0.52 0.27 -0.15 -0.55 0.30 -0.14 -0.40 0.28 -0.13 -0.40 0.30 -0.14 -0.34 0.28 -0.13 -0.34 Note: Common estimation assumes one group. C-Lasso and Post C-Lasso estimates are based on three groups. We consider three values of the constant in the tuning parameter : 0.25, 0.5 and 0.75. CIE stands for cumulative income effect, which is defined as ( 1 (1 − 2 )) 27 Table 7: Correlation between and Malaysia Time period 1 0.80 7.82 2 0.83 7.97 3 0.67 8.19 4 0.67 8.49 5 0.67 8.60 6 0.33 8.78 7 0.50 9.07 8 0.33 9.20 Correlation between -0.86 and −1 −1 for selected countries Indonesia Nepal 0.10 6.80 0.29 6.65 0.33 6.99 0.17 6.70 0.33 7.26 0.17 6.79 0.33 7.55 0.67 6.76 0.33 7.73 0.67 6.91 0.17 7.96 0.50 6.99 0 8.20 0.67 7.13 0.67 8.20 0.67 7.29 0.07 0.66 in 1965, () initial income level in 1965, and () initial democracy level in 1965. Acemoglu, Johnson and Robinson (2005) argue that () is an important determinant of democracy. () and () present long-run changes in income and democracy levels, respectively. () measures how recently a country became independent. () () and () are the initial key economic variables. All the data are taken directly from AJRY. Table 9 provides summary statistics for each of the three groups. The sample averages of () the constraints on the executive at independence are clearly substantially different among the three groups. For this variable, the average value of Group 1 is only about half of that of Group 3. The variable of () the 500-year change in income per capita differs noticeably among the three groups. Group 2 has the largest sample average of 2.26, while Group 3 has the smallest value of 1.50. Table 10 presents the multinomial logit regression results for various model specifications. We choose Group 3 as the base group. Compared with Group 3, at the 5% level, a higher value of () the constraints on the executive at independence leads to a reduced likelihood of being in Group 1. On the other hand, a higher value of () the 500-year change in income per capita leads to a higher likelihood of being in Group 2. For the case = 05 reported here, the variable () (independence year/100) is also significant at the 5% level. However, this result is not robust to other choices of In summary, we find that the constraints on the executive at independence and the long-run economic growth are important determinants of our group pattern. 7 Conclusion We develop a data-driven testing procedure to determine the number of groups in a latent group panel structure proposed in SSP. The procedure is based on conducting hypothesis 28 Table 8: Classification of countries Group 1 (“positive 1 and insignificant 2 ”) (1 Cyprus Algeria Ecuador Greece Jordan Korea Rep. Panama Philippines Portugal Taiwan Uruguay Venezuela RB Brazil Ghana Nepal Thailand Zambia Group 2 (“negative 1 and negative 2 ”) (2 = 16) Congo Rep. Dominican Republic Finland Gabon Japan Luxembourg Mexico Nigeria El Salvador Togo Tunisia Turkey Argentina Indonesia Singapore Uganda Burundi Chile Guinea Iran Morocco Niger Sierra Leone Tanzania Variables constraint growth democ indcent edu65 inc65 dem65 = 21) Spain Malawi Rwanda Congo Dem. Rep Group Benin China Guatemala Israel Madagascar Nicaragua Sweden South Africa 3 (“negative 1 and positive 2 ”) (3 = 37) Burkina Faso Bolivia Central African Republic Cameroon Colombia Egypt Arab Rep Guyana Honduras India Jamaica Kenya Sri Lanka Mali Mauritania Malaysia Peru Paraguay Romania Syrian Arab Republic Chad Trinidad and Tobago Table 9: Summary statistics by group Group 1 Group 2 Variable description mean s.d. mean s.d. constraints on the executive 0.15 0.24 0.33 0.30 at independence 500 year income per 2.00 1.18 2.26 1.09 capita change 500 year democracy change 0.80 0.23 0.66 0.30 year of independence/100 18.91 0.73 18.95 0.71 education level in 1965 2.78 1.52 2.81 2.17 logarithm of real GDP 7.72 0.82 7.93 0.97 per capita in 1965 measure of democracy in 1965 0.51 0.27 0.63 0.25 29 Group 3 mean s.d. 0.39 0.36 1.50 0.95 0.65 19.10 2.62 7.58 0.29 0.67 1.93 0.73 0.55 0.26 constraints growth demco indcent edu65 inc65 Table 10: Determinants of the group pattern Group 1 -3.01** -3.26*** -3.04*** -5.20*** -5.55*** -5.92*** (1.20) (1.18) (1.13) (1.53) (1.68) (1.81) 0.60** 0.58* 0.96** 1.16* 1.67** (0.29) (0.35) (0.41) (0.61) (0.72) 1.65 3.94** 4.95* 4.69* (1.40) (1.85) (2.66) (2.70) 1.77** 2.12** 1.92** (0.77) (0.89) (0.94) -0.34 -0.14 (0.34) (0.37) -1.41 (0.93) dem65 constraints growth demco indcent -0.50 (0.91) -1.03 (0.95) 0.72** (0.32) Group 2 -0.99 -1.39 (0.98) (1.29) 0.89** 1.02** (0.37) (0.41) -0.98 -0.71 (1.22) (1.42) 0.35 (0.70) edu65 inc65 dem65 -1.58 (1.62) 1.19** (0.60) -0.67 (2.27) 0.24 (0.92) -0.29 (0.35) -1.99 (1.74) 1.76** (0.77) -0.73 (2.39) 0.06 (0.99) -0.09 (0.38) -1.40 (0.99) -5.99*** (1.89) 1.69** (0.74) 4.48* (2.66) 1.97** (0.96) -0.08 (0.41) -1.36 (0.96) -0.53 (2.02) -2.44 (1.83) 1.92** (0.80) -0.56 (2.32) 0.10 (1.00) -0.25 (0.42) -1.62 (1.05) 1.97 (2.17) Note: *, ** and *** denote significance at the 10%, 5% and 1% levels, respectively. The results are based on a multinomial logit regression where Group 3 is taken as the reference group. The dependent variable is the group membership. The standard errors are calculated without taking into account the fact that the dependent variables are estimated. 30 testing on the model specifications. The test is a residual-based LM-type test and is asymptotically normally distributed under the null. We apply our new method to study the relationship between income and democracy and find strong evidence that the slope coefficients are heterogeneous and form three distinct groups. Further, we find that the constraints on the executive at independence and long-run economic growth are important determinants of the group pattern. There are several interesting topics for further research. Here we apply our testing procedure to determine the number of groups for slope coefficients. The same idea can be applied to other group structures, such as those considered in BM where fixed effects have a grouped pattern. We may also extend our methods to non-linear panel data models such as discrete choice models. REFERENCES Acemoglu, D., S. Johnson, and J. Robinson (2005), “The rise of Europe: Atlantic trade, institutional change, and economic growth.” American Economic Review, 95, 546-79. Acemoglu, D., S. Johnson, J. Robinson, and P. Yared (2008), “Income and democracy.” American Economic Review, 98, 808-842. Ando, T. and J. Bai (2016), “Panel data models with grouped factor structure under unknown group membership. Journal of Applied Econometrics, 31, 163-191. Barro, R. J. (1999), “Determinants of democracy.” Journal of Political Economy, 107, 158-183. Bester, C. A. and C. B. Hansen (2016). “Grouped effects estimators in fixed effects models.” Journal of Econometrics, 190, 197-208. Bonhomme, S. and E. Manresa (2015), “Grouped patterns of heterogeneity in panel data.” Econometrica, 83, 1147-1184. Deb, P. and P. K. Trivedi (2013), “Finite mixture for panels with fixed effects.” Journal of Econometric Methods, 2, 35-51. Hahn, J. and G. Kuersteiner (2011), “Reduction for dynamic nonlinear panel models with fixed effects.” Econometric Theory, 27, 1152-1191. Hahn, J. and H. R. Moon (2010), “Panel data models with finite number of multiple equilibria.” Econometric Theory, 26, 863-881. Holm, S. (1979), “A simple sequentially rejective multiple test procedure.” Scandinavian Journal of Statistics, 6, 65-70. Hsiao, C. (2014), Analysis of Panel Data. Cambridge University Press, Cambridge. 31 Hsiao, C. and A. K. Tahmiscioglu (1997), “A panel analysis of liquidity constraints and firm investment.” Journal of the American Statistical Association, 92, 455-465. Leeb, H. and P. M. Pötscher (2005), “Model selection and inference: facts and fiction.” Econometric Theory, 21, 21-59. Leeb, H. and P. M. Pötscher (2008), “Sparse estimators and the oracle property, or the return of Hodges’ estimator.” Journal of Econometrics, 142, 201-211. Leeb, H. and P. M. Pötscher (2009), “On the distribution of penalized maximum likelihood estimators: the LASSO, SCAD, and thresholding.” Journal of Multivariate Analysis, 100, 2065-2082. Lin, C-C. and S. Ng (2012), “Estimation of panel data models with parameter heterogeneity when group membership is unknown.” Journal of Econometric Methods, 1, 42-55. Lipset, S. M. (1959), “Some social requisites of democracy: economic development and political legitimacy.” American Political Science Review, 53, 69-105. Onatski, A. (2009), “Testing hypotheses about the number of factors in large factor models.” Econometrica, 77, 1447-1479. Pesaran, M. H., R. Smith, R., and K. S. Im (1996), “Dynamic linear models for heterogeneous panels.” In: L. Matyas and P. Sevestre (Eds.), The Econometrics of Panel Data: A Handbook of the Theory with Applications, second edition, pp.145-195, Kluwer Academic Publishers, Dordrecht. Pesaran, M. H. and T. Yamagata (2008), “Testing slope homogeneity in large panels.” Journal of Econometrics, 142, 50-93. Phillips, P. C. B., and D. Sul (2003), “Dynamic panel estimation and homogeneity testing under cross section dependence.” Econometrics Journal, 6, 217-259. Sarafidis, V. and N. Weber (2015), “A partially heterogeneous framework for analyzing panel data.” Oxford Bulletin of Economics and Statistics, 77, 274-296. Schneider, U. and P. M. Pötscher (2009), “On the distribution of the adaptive LASSO estimator.” Journal of Statistical Planning and Inference, 139, 2775-2790. Su, L. and Q. Chen(2013), “Testing homogeneity in panel data models with interactive fixed effects.” Econometric Theory, 29, 1079-1135. Su, L. and G. Ju (2017), “Identifying latent grouped patterns in panel data models with interactive fixed effects.” Journal of Econometrics, forthcoming. 32 Su, L., Z. Shi, and P. C. B. Phillips (2016), “Identifying latent structures in panel.” Econometrica, 84, 2215-2264. Sun, Y. (2005), “Estimation and inference in panel structure models.” Working Paper, Department of Economics, UCSD. 33