Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Wavelet Test to Identify Structural Changes at Unknown Locations in Independently Distributed Processes June 2011, This version: December 2012 Abstract We propose a powerful wavelet method for detecting structural breaks in the mean of a process. The wavelet transformation decomposes the variance of a process into its additive low and high frequency components. If an independently distributed process is decomposed through a one-scale wavelet transformation, the variance of the wavelet (high-frequency) and scaling coefficients (low-frequency) are assigned equal weights. If there is a structural change in the mean, then the sum of the squared scaling coefficients absorbs more variation, leading to unequal weights for the variances of the wavelet and scaling coefficients. We use this feature of the wavelet decomposition to design a statistical test for changes in the mean of an independently distributed process. We establish the limiting null distribution of our test and demonstrate that our test has good empirical size and substantive power relative to the existing alternatives. JEL Classification: C12, C22 Keywords: Structural change tests, structural break tests, wavelets, maximum overlap discrete wavelet transformation. 1. Introduction The primary goal of this paper is to present a test that can be used to identify a structural break in the mean of an independently distributed process at an unknown location. A wavelet decomposition additively splits data into their local weighted averages (scaling coefficients) and local weighted differences (wavelet coefficients). If the process has a constant mean, then the variances of the wavelet and scaling coefficients have equal magnitude. If, however, there is change in the mean of the process, the variances of the wavelet and scaling coefficients diverge and there is a greater allocation to the variance of the scaling coefficients. We use this feature of the wavelet decomposition to design a statistical test to identify changes in the mean of an independently distributed process.1 It is through these weighted local differences in moving windows that we construct our statistical test for no structural break under the null hypothesis. We derive the test’s null distribution and demonstrate that it is asymptotically normally distributed. The wavelet transformation operates in local time windows. The length of the moving window is determined by the length of the wavelet filter. If the filter length is two, as is the Haar filter, then localized differences amount to the differences between two consecutive observations. If the filter length is four, the length of the moving window is the weighted difference between the last two and first two observations in a window of four observations. A longer filter more accurately captures the local structural features of the data, but boundary issues may arise. In our test, we use a Haar filter that has a length equal to two and is a good compromise between localization in a local time window and cost in terms of boundary treatments. Structural breaks can be permanent or temporary in nature. If a structural break is permanent, then there is a permanent change of indefinite duration in the mean or variance. In a temporary break, the mean or the variance shifts from its null value but eventually reverts to its null value. Whether such breaks are temporary or permanent in nature, they may occur abruptly or gradually. To capture such possibilities, we use Monte Carlo simulations to model break locations through sinusoidals and to allow for abrupt as well as gradual structural breaks. We primarily focus on smooth, multiple structural breaks for two reasons. First, most economic and financial data exhibit gradual structural changes in a time window, and the most abrupt structural changes are exceptions rather than the rule. Second, our framework is all-encompassing and allows for abrupt changes.2 1 Although we primarily focus on structural breaks in the mean for an independently distributed time series, our framework can be generalized to structural breaks in the variance and structural breaks in stationary and nonstationary time series. 2 The usefulness of modeling structural breaks with this framework and was emphasized previously by Ludlow 1 Our Monte Carlo simulations indicate the presence of minimal empirical size distortions relative to their nominal sizes and significant power improvements compared to the existing structural break tests. The literature on structural change tests is extensive. Several tests for structural breaks have been proposed in the literature. Chow (1960) derived F-tests for structural breaks with known break points. Brown et al. (1975) developed cumulative sum (CUSUM) and CUSUM-squared tests that are also applicable to cases where the time of the break is unknown. More recently, contributions by Ploberger et al. (1989), Hansen (2002), Andrews (1993), Inclan and Tiao (1994), Andrews et al. (1996) and Chu et al. (1996) have extended tests for the presence of breaks to account for heteroskedasticity. Methods for estimating the number and timing of multiple break points, as in Bai and Perron (1998, 2003a) and Altissimo and Corradi (2003), have also been developed.3 These tests are formulated in the context of linear regression models, and they use the estimated residuals to detect the departure of parameters from constancy. The mechanism of our wavelet framework can be described in the following way. The wavelet decomposition yields localized weighted differences (wavelet coefficients) that are equal in number to the number of data points.4 We square each wavelet coefficient to obtain its magnitude and calculate the sample average of the squared coefficients. We expect this sample average to be equal to one-half of the variance of the process under the null when there is no structural change.5 However, the sample average of the squared wavelet coefficients should be significantly smaller than the null average when there are one or more structural breaks. We center and standardize the sample average of the squared wavelet coefficients to obtain the null distribution with no structural change. Alternative tests, such as the CUSUM and moving sum (MOSUM), approach the problem through cumulative sums of either recursive residuals (i.e., one step ahead of prediction errors) or OLS residuals. They start from a given window to calculate the sequence of cumulated sums by increasing the window length one step at a time6 and reject the null of no structural change when the properly standardized supremum of these cumulated sums crosses the critical lines or when the and Enders (2000), Becker et al. (2004), Becker et al. (2006), Ashley and Patterson (2010) and Stengos and Yazgan (2012a,b). 3 Bai and Perron (2003a,b) provide versions of these tests that correct for heteroskedasticity and serial correlations. 4 In this paper, the maximum overlap discrete wavelet transformation (MODWT) is used. 5 The average of the squared wavelet coefficients and the average of the squared scaling coefficients are equal to the overall variance in a one-level wavelet decomposition. 6 The MOSUM test uses moving windows but keeps their size constant. 2 maximum cumulated sum is sufficiently large.7 In a Sup-F type test8 , splitting the data into two or more blocks and calculating the unrestricted and restricted residual sum of squares for all possible sub-samples for a given length are necessary to find the supremum in an F-test. In the presence of multiple structural breaks, MOSUM- and CUSUM-type tests yield smaller maximum values of their cumulated sums of residuals (CSR). A similar argument applies to the Sup-F test, where the difference between the sums of squared residuals (SSRs) of the restricted and unrestricted models becomes small in the presence of multiple structural breaks. These tests rely on larger CSRs (or larger differences between restricted and unrestricted SSRs) to detect structural change, and smaller CSRs do not yield high power. Smaller CSRs occur because the OLS estimation goes through the average of multiple structural breaks (in particular, when such breaks are reverting), which yields smaller residuals that underestimate structural change locations. Our test, conversely, always operates in a local time window with excellent frequency localization features, where the imbalance between squared wavelet and scaling coefficients is preserved and the power of our test is not compromised in the presence of multiple structural breaks. Several recent papers have successfully demonstrated the usefulness of wavelets in an econometric hypothesis-testing framework. Fan and Gençay (2010) recently proposed a unified wavelet spectral approach to unit root testing by providing a spectral interpretation of the existing Von Neumann unit root tests. Xue et al. (2010) proposed using wavelet-based jump tests to detect jump arrival times in high frequency financial time series data. These wavelet-based unit root, cointegration and jump tests have desirable empirical sizes and higher power relative to the existing tests. Gençay and Gradojevic (2011) utilized wavelets for errors-in-variables estimation. The outline of this paper is as follows. Section 2 introduces the wavelet-based structural change test and its limiting null distribution. The Monte Carlo simulations are described in Section 3. Our conclusions are presented at the end of this paper. 7 While the CUSUMs of the recursive residuals, properly standardized, have a distribution that tends to approach a standard Wiener process (Sen (1982) and Kramer et al. (1988)), Ploberger et al. (1989) show that the OLS-based CUSUMs have a distribution that tends to approach a Brownian bridge. Because the OLS residuals sum to zero when there is an intercept in the regression, their cumulated sum cannot be expected to drift off after a structural change, as often occurs with recursive residuals, which provides the rationale for the standard CUSUM test. No matter how large a structural shift has occurred, the cumulated OLS residuals will return eventually to the origin; Ploberger et al. (1989) provide critical lines that are parallel to the horizontal axis of the plot of cumulated residuals, unlike the positively and negatively sloped critical lines of the conventional CUSUM test. 8 The Sup-F test for structural breaks was proposed by Andrews (1993) and generalized by Bai and Perron (1998). 3 2. The Wavelet test for structural change Let a time series {yt }Tt=1 evolve according to the following data generation process (DGP): yt = µt + εt (1) where εt are normally, identically and independently distributed with mean zero and variance σ 2 for t = 1, . . . , T . The structural changes in the mean of yt occurring at (unknown) dates t1 , t2 ,. . .,tk can be formulated as follows: µ1 µ2 µt = . .. µz for 1 ≤ t ≤ t1 , for t1 < t ≤ t2 , .. . (2) for tz−1 < t ≤ T . Under the null hypothesis, it is assumed that there is no structural change in the mean, H0 : µ1 = µ2 = . . . = µz = µ. Under the alternative hypothesis, (H1 ), there may be one or more breaks in the mean9 at unknown locations, such that there exists at least one i ∈ {1, . . . , z} such that µi 6= µ. 2.1. Wavelet and scaling coefficients Consider the unit scale Haar maximum overlapping discrete wavelet transformation (MODWT) of {yt }Tt=1 , where T is the number of observations. The wavelet and scaling coefficients for this transformation are given by 1 Wt = (yt − yt−1 ), 2 1 Vt = (yt + yt−1 ), 2 (t = 1, 2, . . . , T ; mod T ) (3) (t = 1, 2, . . . , T ; mod T ) (4) where we assume that the boundary condition is circular. As we approach the end of the sample, if enough data are not available for a given filter length, we use data from the beginning of the sample to complete the analysis. This type of boundary treatment is innocuous, as we use a Haar filter that has two coefficients and the data are assumed to be stationary.10 9 We use the terminology ‘structural change in the mean’ and ‘break in the mean’ interchangeably. The circular boundary treatment may be problematic for trending or nonstationary data processes; therefore, using data from the beginning of the sample to complete the analysis may introduce a superficial structural change due 10 4 The wavelet coefficients, {Wt }Tt=1 , capture the behavior of {yt } in the high frequency band [ 12 , 1], while the scaling coefficients, {Vt }Tt=1 , capture the behavior of {yt } in the low frequency band [0, 21 ]. Accordingly, the variance (energy) of {yt } is given by the sum of the energies of {Wt }Tt=1 and {Vt }Tt=1 , where T X yt2 t=1 2.2. = T X Wt2 + t=1 T X Vt2 . t=1 Statistical properties of wavelet tests We propose a test statistic based on the squared average of the wavelet coefficients, namely, 2 δm = m+j 1 X 2 Wt m (5) t=j for j ∈ 1, 2, 3, . . . , T − m, where m is an arbitrary length of the window that is used to test for 2 is defined as the average of the squared wavelet coefficients structural changes. In Equation (5), δm over an interval, m. When m = T , then δT2 is the average of the squared wavelet coefficients for the whole sample. In the remainder of this paper, we explore the statistical properties of δT2 under the null hypothesis of µ1 = µ2 = . . . = µz = µ in Equation (2). The following proposition establishes the expected value and variance of δT2 under the null hypothesis of no structural change. Proposition 1. Under H0 , E{δT2 } = σ2 . 2 e V ar(δT2 )= 3σ 4 for large T . T Proof. The proof is in the Appendix. Let s2 = (1/T ) PT t=1 (yt − ȳ)2 be a consistent estimator of σ 2 . Accordingly, we center δT2 and work with δT2 − 21 s2 , which has zero expectation under H0 . In the following proposition, we derive the variance of δT2 − 12 s2 . s2 σ4 2 Proposition 2. Under H0 , Var δT − = e for large T . 2 4T Proof. The proof is in the Appendix. to the change in level between the beginning and the end of the sample. A longer filter will require more coefficients, and it is more likely that such an analysis will be affected by the boundary coefficients. However, longer filters may have better localization properties and may be more powerful for identifying the local structural variations in the data. 5 s2 by dividing it by its standard deviation, we propose the test statistic GY OW . 2 s2 2 2 δT − √ δ 2 = T 2 T2 − 1 (6) = p s s4 /4T Normalizing δT2 − GY OW The asymptotic distribution of GY OW is given in the following proposition: (d) (d) Proposition 3. Under H0 , GY OW −→ N (0, 1) as T → ∞ where −→ denotes convergence in the distribution. Proof. The proof is in the Appendix. The following proposition states the expected value of the test statistics under the alternative hypothesis. Proposition 4. Under H1 , assuming a single break, let t1 be the last point after which the process yt assumes the new mean µ2 , and let q = t1 /T be the fraction of the data with µ1 where q ∈ { T1 , T2 , . . . T T−1 }. Then, |µ1 − µ2 |2 + √ 2T E {GY OW } = T − 1 . 2 + q(1 − q) |µ − µ |2 σ 1 2 σ2 Proof. The proof is in the Appendix. Corollary 4.1. For a given break location q and sample size T , E {GY OW } → 0 as |µ1 − µ2 | → 0. As the break disappears, the test statistics return its value under the null as expected. For all q ∈ { T1 , T2 , . . . , T T−1 } and T ∈ {3, 4, . . .}, q(1 − q) > 1/(2T ), the quotient on the right-hand-side of Proposition 4 is always smaller than the denominator, and, therefore, E {GY OW } < 0. Hence, Corollary 4.2. For a given break with location, q, and size, (|µ1 − µ2 |), E {GY OW } → −∞ as T → ∞ and E {GY OW } ∈ (−∞; 0) for T > 2. Corollary 4.3. For a given break with location, q, and sample size, T , as |µ1 − µ2 | → ∞, i √ h 1 E {GY OW } decreases and approaches − T 1 − 2T q(1−q) . 6 Therefore, our test is applicable in (−∞; 0) and increases its power with T for a given location and break size. In Figure 1, we illustrate the behavior of the expected value of the test statistics, under the alternative, for given break sizes. The graph in panel (a) illustrates the case for a small break, whereas panel (b) refers to a larger break. Given the size of the break, it is apparent that an increase in q or T dramatically carries E {GY OW } away from zero to the negative region. As T and q increase, a larger break size corresponds to a higher rate of decrease in the values of the test statistics. This indicates that the power is higher when the break size is larger. The behavior of the test statistics is symmetrical around q = 0.5, where the power is maximum for given values of T and q. [Insert Figure 1 here] In Figure 2, we depict the test statistics for given location of the break. Panel (a) shows the case for small q, i.e., the break is at the beginning of the data, whereas the left panel illustrates the case for a break at the center of the data. As T and |µ1 − µ2 | increase, E {GY OW } decreases √ and approaches − T for both values of q shown in Figure 2. However, the expected values of our test statistics decrease faster when q = 0.5 compared to the decrease observed when q = 0.05, for given magnitudes of |µ1 − µ2 | and T , as shown in panel (b). The behavior of the test statistics is symmetrical across q = 0.5 in this case, too, as established previously in (Figure 1). [Insert Figure 2 here] In the following proposition, we generalize Proposition 4 to multiple breaks. Proposition 5. Under H1 , suppose that there are z mean breaks, and let ti be the last points after which the process yt assumes the new mean µi+1 such that i ∈ {0, 1, . . . , z}, t0 = 0, qi = (ti −ti−1 )/T , i ∈ {1, 2, . . . , z} are the fractions of the data with constant means. Then, z−1 2 σ 2 + (2T )−1 P |µ − µ | i+1 i √ i=0 E {GY OW } = T − 1 . z z X z X X σ2 + qi µ2i − qi qj µ i µ j i=1 Proof. The proof is in the Appendix. 7 i=1 j=1 As in the case of a single break, the break locations ti and the distribution of µi determine the expected value of GY OW under the alternative. Suppose that µj is different from the rest such that µl = µ, l 6= j and µj = µ + for a non-zero . Then, z X qi µ2i = µ2 i=1 z X qi + 2qj µ + qj 2 = µ2 + 2qj µj + qj 2 > µ2 + 2qj µ + qj2 2 = (µ + qj )2 i=1 = z X !2 µqi + qj z X = i=1 !2 µ i qi = i=1 z X z X qj µ j qi µ i i=1 j=1 because 0 < qj < 1 by assumption. Hence, the test statistics approach −∞ as T → ∞. 3. Monte Carlo simulations Our framework allows for single and multiple structural breaks in Monte Carlo simulations. Following Becker et al. (2004), Becker et al. (2006) and Ludlow and Enders (2000)11 , we allow smooth and abrupt structural breaks. Because a Fourier expansion is capable of approximating absolutely integrable functions to any desired degree of accuracy, smooth or abrupt breaks can be approximated by Fourier sinusoidals with an appropriate frequency mix. Therefore, the structural breaks in the mean of Equation (1) are identified via the following function.12 µt ∼ = ρ+α n X i=1 −1 (2i − 1) 2π(2i − 1)kt sin T (7) where n is the number of frequencies included in the approximation, k represents a particular frequency and α indicate their size (amplitude). The frequency coefficient, k, alone determines the number of breaks and whether they are temporary or permanent. Moreover, if a single frequency is used (n = 1), then the transitions tend to be smooth, whereas higher n values facilitate the analysis of abrupt temporary or permanent breaks. In Figure 3, the sample paths of yt are displayed for permanent and temporary breaks. In the top panel where k = 0.9, we observe a single permanent structural break, which is permanent in the sense that the series do not return to their initial mean levels. In the second panel where k = 1.4, the series are subject to temporary single breaks. The remaining panels depict multiple breaks that are both temporary and abrupt in nature. Moreover, Figure 3 illustrates sample paths 11 12 See also Ashley and Patterson (2010) and Stengos and Yazgan (2012a,b). We approximate the discrete square function in Equation (2) using the continuous function in Equation (7). 8 of yt for higher n = 1 values where the breaks are more abrupt. When n assumes a value as high as 4 (the third column), the breaks become significantly more abrupt. For higher values of n such as 128, the structural breaks occur more suddenly. [Insert Figure 3 here] We compare the performance of the GY OW test using three well known structural break tests statistics: the Sup-F, OLS-based cumulative CUSUM, and MOSUM tests. The Sup-F test was first proposed by Andrews (1993) and was generalized by Bai and Perron (1998). It can be obtained by calculating a series of F-statistics over all potential change points in the data and evaluating their supremum. The Sup-F is a regression-based test. It uses the null hypothesis of no break in the regression equation at time i against the alternative of two different regressions over the intervals (1, i) and (i + 1, T ), where i is a point in (T0 , T − T0 ). The parameter T0 is generally set to h × T , and h is a bandwidth parameter h ∈ (0, 1). Hansen (1997) provided an approximation for the asymptotic p−values. The OLS-based CUSUM test statistics of Ploberger and Kramer (1992) is given by13 bT tc 1 X SC (t) = √ ûi ; 0 ≤ t ≤ 1 σ̂ T i=1 (8) where û are the OLS residuals from the model under the null, σ̂ is the standard deviation of the estimated residuals. Instead of using cumulative sums of the same residuals, another method to detect a structural change is to analyze moving sums of the residuals. The OLS-based MOSUM test of Chu et al. (1995) considers this possibility. This test does not contain the sum of all of the residuals up to a certain time, t, but it instead uses the sum of a fixed number of residuals in a data window whose size is determined by the bandwidth parameter, h. The test statistic is computed as 1 SM (t) = √ σ̂ T bNT tc+bT hc X ûi (0 ≤ t ≤ 1 − h) (9) i=bNT tc+1 where NT = (T − bT hc)/(1 − h). We reject the “no structural break” null hypothesis when SC (t) is greater than the critical boundary curve, λCU SU M (t, α), for at least one t. The same rejection criterion applies to the 13 bT tc is the nearest integer to T t smaller than or equal to T t. 9 MOSUM test for the statistic SM (t) and the boundary λM OSU M (t, h, α). Boundaries λCU SU M (t, α) and λM OSU M (t, h, α) are calculated according to Zeileis (2001, 2005). The wavelet test deposits the structural breaks into the scaling coefficient variance in a local window. As the number of structural breaks increases, a larger percentage of the variance is allocated to the scaling coefficients and the contribution of the wavelet variance to the overall variance becomes marginal. The sum of the variances of the wavelet and scaling coefficients is equal to the overall variance. Because the scaling coefficient variance becomes the driver of the overall variance in the presence of multiple structural breaks, we use the marginalization of the wavelet coefficient variance as a test for the presence of structural breaks. The MOSUM and wavelet tests both operate in local windows. One possible drawback of the MOSUM test is that it assigns the same weight to each observation in its local window. The wavelet test, however, uses time-frequency optimized weights to sum the information. The empirical size and power properties of the wavelet coefficient test statistics, GY OW , are calculated for Equation (7) under various parameter sets of α, k, and T by assuming ρ = 0. The experiment is run first for smooth structural breaks by setting n = 1, and it is then extended to abrupt breaks for higher values of n up to 1024. Because the GY OW test is a left-tailed test, we use one-sided p−values in our Monte Carlo simulations. The α values are allowed to vary in the range of 0.2, 0.4, . . . , 2. The k parameter varies between 0 and 5 by gradual increments of ∆k = 0.05 between 0.0 and 2.0, and ∆k = 0.1 between 2.1 and 5.0. We consider cases where the data length, T , is 50, 100, and 200. For each combination of α, T , and k, we run 10, 000 replicates and use the same pseudo-random generated identically and independently distributed standard normal series to simulate εt . The rejection frequencies are calculated at the 1, 5, and 10 percent levels of significance. We report size-corrected empirical power calculations.14 [Insert Table 1 and Figure 4 here] For the case of smooth breaks (n = 1), the results of the Monte Carlo experiments are illustrated in Table 1 and Figure 4. In Table 1, we provide the size-adjusted rejection frequencies of Sup-F, OLS-based CUSUM, OLS-based MOSUM, and GY OW test statistics for T = 50. The values in the first panel, where k = 0 corresponds to the case of a pure white-noise process, are the size values of the tests. The remaining values reported in the following panels are the size-corrected power of the tests statistics. Each panel corresponds to selected α values for different ks. 14 For brevity, we report a subset of our Monte Carlo simulation results. Complete results are available upon request. 10 In Figure 4, we plot the size-corrected rejection frequencies of the test statistics for a continuum of k values when T = 50.15 Similar to Table 1, each panel plot corresponds to selected α values. The values at the intersection with the y−axis are the size values of the tests. As k varies along the x−axis, each line is the size-adjusted power of that test statistic. The information contained in the table and figure clearly shows that, for all parameters in the region where k > 1, GY OW has more empirical power than the other tests. Conversely, for small values of αs and in a small region in which k is smaller or slightly larger than 1, the Sup-F and, to a small extent, the MOSUM tests perform better than GY OW , although they lose power quickly as k increases. This implies that the alternative tests do relatively better than our test in terms of power when there is a single break. Otherwise, our test has higher relative power than the other tests for higher k values, i.e., when there are multiple structural breaks. As the number of smooth structural breaks in the data increases, GY OW performs much better as a structural break test. To evaluate the performance of our test over a longer time horizon, we set T = 200, as in Figure 5, and obtain results that are similar to those reported above. We observe similar performance for the wavelet test as for the Sup-F and MOSUM tests, especially for higher value of α, e.g., in the bottom panel where k is close to zero. These results indicate that the power properties of our test are similar to those of other tests, even in the case of single breaks. Conversely, when multiple breaks spanned a longer time horizon, the other tests, especially the MOSUM test, do not lose their power as quickly as our test does when k increases. The MOSUM test keeps its high power up to a very large number of ks, i.e., for reasonably large numbers of breaks. In conclusion, when T is large, the MOSUM test and our test perform similarly in general, which makes sense because both tests use local information. [Insert Figure 5 here] We also consider the size and power properties of the tests when the breaks are abrupt in nature. In Figures 6 and 7, we illustrate the results when n = 128.16 For abrupt breaks, our test not only preserves its empirical size but also increases its power compared to the smooth case. In the case of a single break, except for small values of α and for small T , our test begins to be competitive with the Sup-F test and other tests in terms of power. Even for small T and multiple breaks, our test appears to be the best performer. Similar to the smooth break case, the other tests also become 15 The size correction is for a 5 percent level of significance. As mentioned above, we increased n to 1024 and obtained qualitatively similar results when the breaks were smooth. 16 11 more powerful when T gets larger, and our test ceases to be the best performer except in cases with large numbers of structural breaks. [Insert Figure 6 and Figure 7 here] 4. Conclusions We develop a structural change test based on a maximum overlap wavelet transformation, derive its limiting null distribution, and compare its empirical size and power properties to those for a number of alternative tests. Our results indicate that the wavelet-based test has higher power than the alternative tests when there are multiple structural breaks at unknown locations. In general, the power of the competing tests starts to deteriorate after a single break, regardless of whether the break is abrupt or smooth. In contrast, the wavelet tests maintain high uniform levels of power at all multiple break levels. However, the MOSUM test attains the same power as the wavelet test for relatively large data sets. We attribute this result to the fact that both the MOSUM and wavelet tests use local information in a given window. 12 Table 1: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for smooth breaks 13 GY OW Sup-F T 50 100 200 50 100 200 50 Significance 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% (Size) k=0.0 0.01 0.04 0.01 0.06 0.01 0.05 0.01 0.06 0.01 0.06 0.01 0.05 0.00 0.03 α = 0.8 k=0.6 0.71 0.88 0.94 0.99 1.00 1.00 0.99 1.00 1.00 1.00 1.00 1.00 0.99 1.00 k=1.0 0.73 0.89 0.95 0.99 1.00 1.00 0.83 0.95 1.00 1.00 1.00 1.00 0.83 0.95 k=1.2 0.79 0.92 0.97 0.99 1.00 1.00 0.39 0.79 0.98 1.00 1.00 1.00 0.18 0.68 k=1.6 0.71 0.87 0.95 0.98 1.00 1.00 0.18 0.45 0.71 0.89 0.99 1.00 0.03 0.22 k=2.0 0.73 0.91 0.95 0.98 1.00 1.00 0.13 0.34 0.63 0.81 0.98 1.00 0.04 0.20 k=4.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02 k=5.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02 α = 1.2 k=0.6 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 k=1.0 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00 1.00 0.99 1.00 k=1.2 1.00 1.00 1.00 1.00 1.00 1.00 0.76 0.99 1.00 1.00 1.00 1.00 0.41 0.98 k=1.6 0.99 1.00 1.00 1.00 1.00 1.00 0.35 0.73 0.97 1.00 1.00 1.00 0.06 0.39 k=2.0 1.00 1.00 1.00 1.00 1.00 1.00 0.23 0.58 0.93 0.99 1.00 1.00 0.05 0.30 k=4.0 0.99 1.00 1.00 1.00 1.00 1.00 0.00 0.01 0.03 0.11 0.27 0.59 0.00 0.01 k=5.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02 α = 1.6 k=0.6 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 k=1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 k=1.2 1.00 1.00 1.00 1.00 1.00 1.00 0.96 1.00 1.00 1.00 1.00 1.00 0.63 1.00 k=1.6 1.00 1.00 1.00 1.00 1.00 1.00 0.48 0.87 1.00 1.00 1.00 1.00 0.08 0.52 k=2.0 1.00 1.00 1.00 1.00 1.00 1.00 0.34 0.77 0.99 1.00 1.00 1.00 0.05 0.39 k=4.0 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.02 0.10 0.38 0.76 0.00 0.00 k=5.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02 α = 2.0 k=0.6 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 k=1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 k=1.2 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.79 1.00 k=1.6 1.00 1.00 1.00 1.00 1.00 1.00 0.62 0.94 1.00 1.00 1.00 1.00 0.09 0.65 k=2.0 1.00 1.00 1.00 1.00 1.00 1.00 0.44 0.87 1.00 1.00 1.00 1.00 0.04 0.46 k=4.0 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.02 0.08 0.48 0.88 0.00 0.00 k=5.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02 CUSUM MOSUM 100 200 50 100 200 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 0.00 0.03 0.01 0.04 0.00 0.01 0.00 0.01 0.00 0.02 1.00 1.00 0.94 0.41 0.36 0.01 0.01 1.00 1.00 1.00 0.77 0.68 0.08 0.08 1.00 1.00 1.00 0.93 0.88 0.05 0.05 1.00 1.00 1.00 1.00 0.99 0.24 0.24 0.57 0.57 0.61 0.44 0.41 0.05 0.05 0.83 0.84 0.86 0.72 0.74 0.15 0.15 1.00 0.99 1.00 0.97 0.97 0.21 0.21 1.00 1.00 1.00 1.00 1.00 0.51 0.51 1.00 1.00 1.00 1.00 1.00 0.75 0.75 1.00 1.00 1.00 1.00 1.00 0.98 0.98 1.00 1.00 1.00 0.79 0.72 0.01 0.01 1.00 1.00 1.00 0.98 0.93 0.07 0.08 1.00 1.00 1.00 1.00 1.00 0.07 0.05 1.00 1.00 1.00 1.00 1.00 0.39 0.24 0.91 0.90 0.91 0.73 0.72 0.04 0.05 0.99 0.99 0.99 0.97 0.96 0.16 0.15 1.00 1.00 1.00 1.00 1.00 0.31 0.21 1.00 1.00 1.00 1.00 1.00 0.71 0.51 1.00 1.00 1.00 1.00 1.00 0.99 0.75 1.00 1.00 1.00 1.00 1.00 1.00 0.98 1.00 1.00 1.00 0.96 0.90 0.00 0.01 1.00 1.00 1.00 1.00 1.00 0.05 0.08 1.00 1.00 1.00 1.00 1.00 0.07 0.05 1.00 1.00 1.00 1.00 1.00 0.51 0.24 0.99 0.98 0.99 0.90 0.91 0.02 0.05 1.00 1.00 1.00 1.00 1.00 0.12 0.15 1.00 1.00 1.00 1.00 1.00 0.35 0.21 1.00 1.00 1.00 1.00 1.00 0.82 0.51 1.00 1.00 1.00 1.00 1.00 1.00 0.75 1.00 1.00 1.00 1.00 1.00 1.00 0.98 1.00 1.00 1.00 0.99 0.98 0.00 0.01 1.00 1.00 1.00 1.00 1.00 0.03 0.08 1.00 1.00 1.00 1.00 1.00 0.05 0.05 1.00 1.00 1.00 1.00 1.00 0.58 0.24 1.00 1.00 1.00 0.96 0.96 0.01 0.05 1.00 1.00 1.00 1.00 1.00 0.07 0.15 1.00 1.00 1.00 1.00 1.00 0.33 0.21 1.00 1.00 1.00 1.00 1.00 0.88 0.51 1.00 1.00 1.00 1.00 1.00 1.00 0.75 1.00 1.00 1.00 1.00 1.00 1.00 0.98 Notes: The numbers represent the fraction of the cases that H0 is rejected in 10, 000 replications, for each test (size corrected powers). The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, n = 1, and εt is iid ∼ N (0, 1). The bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not use the heteroskedasticity and autocorrelation consistent (HAC) kernel. 14 (a) |µ1 − µ2 | = 1 (b) |µ1 − µ2 | = 3 Figure 1: Behavior of E{GY OW } for a given size. 15 (a) q = 0.05 (b) q = 0.5 Figure 2: Behavior of E{GY OW } for a given location. 0 50 100 150 200 0 50 100 k=0.9, n=1 k=0.9, n=2 k=0.9, n=4 k=0.9, n=128 k=1.4, n=1 k=1.4, n=2 k=1.4, n=4 k=1.4, n=128 150 200 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 16 k=1.9, n=1 k=1.9, n=2 k=1.9, n=4 k=1.9, n=128 k=2.4, n=1 k=2.4, n=2 k=2.4, n=4 k=2.4, n=128 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 k=4.9, n=1 k=4.9, n=2 k=4.9, n=4 k=4.9, n=128 1.5 1.0 0.5 0.0 −0.5 −1.0 −1.5 0 50 100 150 200 0 50 100 150 200 index Figure 3: Sample paths of smooth/abrupt and permanent/temporary breaks Notes: The dashed line is µt , as given in Equation (7) with α = 1 and ρ = 0, and the solid line is yt = µt + εt , where εt is iid ∼ N (0, 0.25), and T = 200. 1.0 0.8 0.6 0.4 0.0 0.2 Rej. freq. (α=1.2, T=50) 1.0 0.8 0.6 0.4 0.2 0.0 Rej. freq. (α=0.8, T=50) 0 1 2 3 4 5 0 1 2 5 3 4 5 1.0 0.8 0.6 0.0 0.2 0.4 Rej. freq. (α=2, T=50) 0.8 0.6 0.4 0.2 0.0 Rej. freq. (α=1.6, T=50) 4 k 1.0 k 3 0 1 2 3 4 5 0 1 k GYO test 2 k supF test CUSUM test MOSUM test Figure 4: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for smooth breaks (n = 1, T = 50). Notes: x−axis are the k values in Equation (7) and at the y−axis are size corrected (at the 5% level) empirical powers. The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, and εt is iid ∼ N (0, 1). The bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not use the heteroskedasticity and autocorrelation consistent (HAC) kernel. 17 1.0 0.8 0.6 0.4 0.0 0.2 Rej. freq. (α=1.2, T=200) 1.0 0.8 0.6 0.4 0.2 0.0 Rej. freq. (α=0.8, T=200) 0 1 2 3 4 5 0 1 2 5 3 4 5 1.0 0.8 0.6 0.4 0.0 0.2 Rej. freq. (α=2, T=200) 0.8 0.6 0.4 0.2 0.0 Rej. freq. (α=1.6, T=200) 4 k 1.0 k 3 0 1 2 3 4 5 0 1 k GYO test 2 k supF test CUSUM test MOSUM test Figure 5: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for smooth breaks (n = 1, T = 200). Notes: x−axis are the k values in Equation (7) and at the y−axis are size corrected (at the 5% level) empirical powers. The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, and εt is iid ∼ N (0, 1). The bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not use the heteroskedasticity and autocorrelation consistent (HAC) kernel. 18 1.0 0.8 0.6 0.4 0.0 0.2 Rej. freq. (α=1.2, T=50) 1.0 0.8 0.6 0.4 0.2 0.0 Rej. freq. (α=0.8, T=50) 0 1 2 3 4 5 0 1 2 5 3 4 5 1.0 0.8 0.6 0.0 0.2 0.4 Rej. freq. (α=2, T=50) 0.8 0.6 0.4 0.2 0.0 Rej. freq. (α=1.6, T=50) 4 k 1.0 k 3 0 1 2 3 4 5 0 1 k GYO test 2 k supF test CUSUM test MOSUM test Figure 6: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for abrupt breaks (n = 128, T = 50). Notes: x−axis are the k values in Equation (7) and at the y−axis are size corrected (at the 5% level) empirical powers. The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, and εt is iid ∼ N (0, 1). The bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not use the heteroskedasticity and autocorrelation consistent (HAC) kernel. 19 1.0 0.8 0.6 0.4 0.0 0.2 Rej. freq. (α=1.2, T=200) 1.0 0.8 0.6 0.4 0.2 0.0 Rej. freq. (α=0.8, T=200) 0 1 2 3 4 5 0 1 2 5 3 4 5 1.0 0.8 0.6 0.4 0.0 0.2 Rej. freq. (α=2, T=200) 0.8 0.6 0.4 0.2 0.0 Rej. freq. (α=1.6, T=200) 4 k 1.0 k 3 0 1 2 3 4 5 0 1 k GYO test 2 k supF test CUSUM test MOSUM test Figure 7: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for abrupt breaks (n = 128, T = 200). Notes: x−axis are the k values in Equation (7) and at the y−axis are size corrected (at the 5% level) empirical powers. The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, and εt is iid ∼ N (0, 1). The bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not use the heteroskedasticity and autocorrelation consistent (HAC) kernel. 20 References Altissimo, F. and V. Corradi (2003). Strong rules for detecting the number of breaks in a time series. Journal of Econometrics 117 (2), 207–244. Andrews, D. W. K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica 61 (4), 821–856. Andrews, D. W. K., I. Lee, and W. Ploberger (1996). Optimal change-point tests for normal linear regression. Journal of Econometrics 70 (1), 9–38. Ashley, R. A. and D. M. Patterson (2010). Apparent long memory in time series as an artifact of a time-varying mean: Considering alternatives to the fractionally integrated model. Macroeconomic Dynamics 14 (S1), 59–87. Bai, J. and P. Perron (1998). Estimating and testing linear models with multiple structural changes. Econometrica 66 (1), 47–78. Bai, J. and P. Perron (2003a). Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18 (2), 1–22. Bai, J. and P. Perron (2003b). Critical values for multiple structural change tests. The Econometrics Journal 6 (3), 72–78. Becker, R., W. Enders, and S. Hurn (2004). A general test for time dependence in parameters. Journal of Applied Econometrics 19 (7), 899–906. Becker, R., W. Enders, and J. Lee (2006). A stationarity test in the presence of an unknown number of smooth breaks. Journal of Time Series Analysis 27 (3), 381–409. Brown, R. L., J. Durbin, and J. M. Evans (1975). Techniques for testing the constancy of regression relationships over time. Journal of the Royal Statistical Society. Series B (Methodological) 37 (2), 149–192. Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica 28 (3), 591–605. Chu, C.-S. J., K. Hornik, and C.-M. Kuan (1995). Biometrika 82 (3), 603–617. 21 MOSUM tests for parameter constancy. Chu, C.-S. J., M. Stinchcombe, and H. White (1996). Monitoring structural change. Econometrica 64 (5), 1045–1065. Conniffe, D. and J. E. Spencer (2001). When moments of ratios are ratios of moments. Journal of the Royal Statistical Society. Series D (The Statistician) 50 (2), 161–168. Fan, Y. and R. Gençay (2010). Unit root tests with wavelets. Econometric Theory 26, 1305–1331. Gençay, R. and N. Gradojevic (2011). Errors-in-variables estimation with wavelets. Journal of Statistical Computation and Simulation 11 (8), 1545–1564. Hansen, B. E. (1997). Approximate asymptotic p values for structural-change tests. Journal of Business & Economic Statistics 15 (1), 60–67. Hansen, B. E. (2002). Tests for parameter instability in regressions with I(1) processes. Journal of Business & Economic Statistics 20 (1), 45–59. Inclan, C. and G. C. Tiao (1994). Use of cumulative sums of squares for retrospective detection of changes of variance. Journal of the American Statistical Association 89 (427), 913–923. Kramer, W., W. Ploberger, and R. Alt (1988). Testing for structural change in dynamic models. Econometrica 56, 1355–1369. Ludlow, J. and W. Enders (2000). Estimating non-linear ARMA models using Fourier coefficients. International Journal of Forecasting 16 (3), 333–347. Ploberger, W. and W. Kramer (1992, March). The CUSUM test with OLS residuals. Econometrica 60 (2), 271–85. Ploberger, W., W. Krämer, and K. Kontrus (1989). A new test for structural stability in the linear regression model. Journal of Econometrics 40 (2), 307–318. Sen, P. K. (1982). Invariance principles for recursive residuals. The Annals of Statistics 10, 307–312. Stengos, T. and M. E. Yazgan (2012a). Persistence in convergence. Macroeconomic Dynamics, Forthcoming. Stengos, T. and M. E. Yazgan (2012b). Persistence in real exchange rate convergence. Studies in Nonlinear Dynamics and Econometrics, Forthcoming. 22 White, H. (2001). Asymptotic Theory for Econometricians. Academic Press, San Diego. Williams, J. D. (1941). Moments of the ratio of the mean square successive difference to the mean square difference in samples from a normal universe. The Annals of Mathematical Statistics 12 (2), 239–241. Xue, Y., R. Gençay, and S. Fagan (2010). Testing for jump arrivals in financial time series. Technical report, Department of Economics, Simon Fraser University. Zeileis, A. (2001). strucchange: Testing for structural change in linear regression relationships. R News 1 (3), 8–11. Zeileis, A. (2005). A unified approach to structural change tests based on ML scores, F statistics, and OLS residuals. Econometric Reviews 24 (4), 445–466. 23 Appendix Proposition 1. Under H0 , E{δT2 } = σ2 3σ 4 . V ar(δT2 )= for large T . e 2 T Proof. Without any loss of generality, we set µ = 0 under the H0 because σ 2 and s2 are invariant under mean change. For the remaining part of the proof, we assume that t ∈ {1, 2, . . . , T ; mod T }. By expanding δ 2 as the sum of Wi2 , we have ) T T X 1X 1 1 2 2 Wt = EWt2 = T E(W12 ) = E(W12 ) EδT = E T T T t=1 t=1 ( 2 ) y1 − y0 1 =E = E{(y12 + y02 − 2y1 y0 )} 2 4 1 2 1 2 1 = E(y12 ) + E(y02 ) − 2E(y1 y0 ) = 2σ = σ 4 4 2 ( As for the variance, EδT2 = V ar(δT2 ) = V ar T −1 TX +j−1 Wt2 t=j =T −2 TX +j−1 V ar(Wt2 ) + 2T −2 t=j TX +j−1 T X +j−1 t=j Cov(Wt2 , Ws2 ). s=t+1 2 ) = 0 for k ≥ 2 (see Lemma 1.1). Then, We also have that Cov(Wt2 , Wt+k V ar(δT2 ) = T −2 TX +j−1 t=j V ar(Wt2 ) + 2T −2 TX +j−1 2 Cov(Wt−1 , Wt2 ). t=j+1 2 , W 2 ): Writing the probabilistic expression for Cov(Wt−1 t 2 2 2 Cov(Wt−1 , Wt2 ) = E{(Wt−1 − EWt−1 )(Wt2 − EWt2 )} 2 2 = E{Wt−1 Wt2 } − EWt−1 EWt2 . 24 2 W 2 }: where EWt2 , the unconditional mean of Wt2 , is σ 2 /2, ∀t. Then, expanding E{Wt−1 t yt−1 − yt−2 2 yt − yt−1 2 ) ( ) } 2 2 2 2 2 = 2−4 E{(yt−1 − 2yt−1 yt−2 + yt−2 )(yt2 − 2yt yt−1 + yt−1 )} 2 E{Wt−1 Wt2 } = E{( 2 3 4 2 3 = 2−4 E{yt−1 yt2 − 2yt−1 yt + yt−1 − 2yt−1 yt−2 yt2 + 4yt−1 yt yt−2 − 2yt−1 yt−2 2 2 2 2 +yt−2 yt2 − 2yt−1 yt−2 yt + yt−2 yt−1 } because yt and yt−1 are independent ∀t. We also know that under H0 , the first four raw moments of yt are E{yt } = 0, E{yt2 } = σ 2 , E{yt3 } = 0, E{yt4 } = 3σ 4 , ∀t. Then, 2 24 E{Wt−1 Wt2 } = 3(σ 2 )2 − 3σ 4 . = 6σ 4 ⇒ 3 4 2 E{Wt−1 Wt2 } = σ . 8 Henceforth, 2 Cov(Wt−1 , Wt2 ) = 3 4 1 4 1 4 σ − σ = σ . 8 4 8 and 2 V ar(δT2 ) = T −2 T V ar(Wt2 ) + 2T −2 (T − 1) Cov(Wt−1 , Wt2 ) σ4 1 T − 1 3σ 4 ( + ) −→ as T gets larger, T 2 4T 4T σ4 (see Lemma 1.2). because V ar(Wt2 ) = 2 = 25 2 ) = 0 for k ≥ 2. Lemma 1.1. Under H0 , Cov(Wt2 , Wt+k Proof. 2 2 2 Cov(Wt2 , Wt+k ) = E{(Wt2 − EWt2 )(Wt+k − EWt+k )} yt − yt−1 2 yt+k − yt+k−1 2 σ2 σ2 ) ( ) }−( ) 2 2 2 2 σ4 2 2 2 = 2−4 E{(yt2 − 2yt yt−1 + yt−1 )(yt+k − 2yt+k yt+k−1 + yt+k−1 )} − 4 2 2 = 2−4 E{yt2 yt+k − 2yt2 yt+k yt+k−1 + yt2 yt+k−1 2 2 } − EWt2 EWt+k = E{( = E{Wt2 Wt+k 2 2 −2yt yt−1 yt+k + 4yt yt−1 yt+k yt+k−1 − 2yt yt−1 yt+k−1 2 2 2 2 2 }− − 2yt−1 yt+k yt+k−1 + yt−1 yt+k−1 +yt−1 yt+k σ4 . 4 Because ys and yr are independent whenever s 6= r, we have E{ysm yrn } = E{ysm }E{yrn } for all m, n such that m, n ∈ R. We know that E{ys } = 0 and E{ys2 } = σ 2 , ∀s ∈ {1, . . . , T }, and it follows that 2 ) = σ 2 σ 2 + σ 2 σ 2 + σ 2 σ 2 + σ 2 σ 2 − 4σ 4 = 0 24 Cov(Wt2 , Wt+k Lemma 1.2. Under H0 , V ar(Wt2 ) = σ4 . 2 Proof. Because yt − yt−1 = ut − ut−1 , µ in Wt = ut − ut−1 is eliminated. So, 2 ut − ut−1 2 ) ) 2 24 V ar(Wt2 ) = V ar(u2t − 2ut ut−1 + u2t−1 ) V ar(Wt2 ) = V ar(( = V ar(u2t ) + V ar(u2t−1 ) + V ar(−2ut ut−1 ) +2Cov(u2t , −2ut ut−1 ) + 2Cov(u2t−1 , −2ut ut−1 ) + 2Cov(u2t , u2t−1 ) (10) = 2V ar(u2t ) + 4V ar(ut ut−1 ) − 8Cov(u2t , ut ut−1 ) + 2Cov(u2t , u2t−1 ). because V ar(u2t ) = V ar(u2t−1 ) and Cov(u2t , ut ut−1 ) = Cov(u2t−1 , ut ut−1 ). Additionally, we have V ar(u2t ) = V ar(u2t−1 ) = E{(u2t )2 } − E{u2t }E{u2t } = 3σ 4 − (σ 2 )2 = 2σ 4 26 and V ar(ut ut−1 ) = E{(u2t u2t−1 )} − E{ut ut−1 }E{ut ut−1 } = E{u2t }E{u2t−1 } − 0 = σ 4 and Cov(u2t , ut ut−1 ) = E{(u2t ut ut−1 )} − E{u2t }E{ut ut−1 } = E{ut−1 }(E{u3t } − E{u2t }E{ut }) = 0 × (0 − σ 2 × 0) = 0, Cov(u2t , u2t−1 ) = E{(u2t u2t−1 )} − E{u2t }E{u2t−1 } = E{u2t }E{u2t−1 } − E{u2t }E{u2t−1 } = 0. because E{ut } = E{ut−1 } = 0, E{u3t } = 0. Then, by making the necessary substitutions in Equation (10), 24 V ar(Wt2 ) = 2 × 2σ 4 + 4 × σ 4 − 8 × 0 + 2 × 0 ⇒ 8σ 4 σ4 V ar(Wt2 ) = = . 16 2 27 σ4 s2 2 = e Proposition 2. Under H0 , Var δT − for large T . 2 4T s2 we have the sum of first-order serially correlated variables: 2 ( T ) T T T X X 1 X 1 1X 2 Wt − (yt − ȳ)2 = (yt − yt−1 )2 − 2 (yt − ȳ)2 T 2T 4T t=1 t=1 t=1 t=1 ( T ) 1 X 2 2 yt + yt−1 − 2yt yt−1 − 2yt2 + 4ȳyt − 2ȳ 2 4T t=1 ! T T T T T T X X X X X X 1 2 yt−1 −2 yt yt−1 − 2 yt2 + 4 ȳyt − 2 ȳ 2 yt2 + 4T t=1 t=1 t=1 t=1 t=1 t=1 ! T T T T T X X X X X 1 2 yt2 − 2 yt yt−1 − 2 yt2 + 4 ȳyt − 2 ȳ 2 4T t=1 t=1 t=1 t=1 t=1 ! ! T T T X X X 1 1 − yt yt−1 + 2 ȳyt − T ȳ 2 = − yt yt−1 + 2T ȳ 2 − T ȳ 2 2T 2T t=1 t=1 t=1 ! T T X X 1 − yt yt−1 + yt ȳ 2T Proof. By rewriting δT2 − δT2 − s2 = 2 = = = = = =− 1 2T t=1 T X (11) t=1 yt (yt−1 − ȳ) . t=1 Hence, ! T 2 X s 1 V ar δT2 − = V ar yt yt−1 − T ȳ 2 2 4T 2 t=1 ! !! T T X X 1 2 2 = V ar yt yt−1 + V ar T ȳ − 2Cov T ȳ , yt yt−1 . 4T 2 t=1 (12) t=1 Below, we expand the three terms in the parentheses in Equation (12) (a), (b), and (c). The s2 s2 parameters δT2 , , and δT2 − are invariant under µ, and we set µ = 0 without loss of generality. 2 2 ! T T T T P P P P (a) We have V ar yt yt−1 = V ar (yt yt−1 ) + Cov yt yt−1 , ys ys−1 . However, the t=1 t=1 t=1 28 s6=t covariance terms in the summation operator are zero, as we set µ to zero above, and ! T T X X V ar yt yt−1 = V ar (yt yt−1 ) = T σ 4 . t=1 t=1 because Cov(yt yt−1 , ys ys−1 ) = 0 whenever s 6= t. T 2 ! T 2 T T P P P P P 1 2 ys yt , (b) We have V ar T ȳ = 2 V ar . By substituting yt2 + yt yt = T t=1 t=1 t=1 s6=t t=1 ! T X T T X T X X X X 1 V ar T ȳ 2 = 2 V ar yt2 , yt2 + V ar ys yt + 2Cov ys yt . (13) T t=1 s6=t t=1 Cov T P t=1 yt2 , ! T P P ys yt is zero because E T P t=1 t=1 s6=t yt2 × t=1 T P P ! ys yt and E t=1 s6=t t=1 s6=t T P P ! ys yt are both t=1 s6=t zero. We need to derive V ar T P t=1 " V ar T X # yt2 = t=1 " T X yt2 term in Equation (13). Because yt are independent and identical, # V ar(yt2 ) = T × V ar(yt2 ), t ∈ {1, 2, . . . , T }. t=1 yt2 is square integrable because yt ∈ L4 , i.e., the fourth (central) moment of normal random variables exists. Writing the probabilistic expression for V ar(yt2 ), h 2 i 2 2 yt2 − E yt 2 = E yt4 − E yt2 . V ar(yt2 ) = E Substituting the second and fourth moments of a normal random variate with zero mean and σ 2 variance parameter, such that, E yt2 = σ 2 and E yt4 = 3σ 4 , we have V ar(yt2 ) = 3σ 4 −(σ 2 )2 = 2σ 4 . Then " V ar T X # yt2 = 2T σ 4 . t=1 We need to obtain variance of T P P ys yt in terms of σ. t=1 s6=t T X T X T X T X X X X X V ar ys yt = V ar(ys yt ) + Cov y y , yu yv s t . t=1 s6=t t=1 s6=t t=1 s6=t 29 u=1 v6=u v6=s (14) Regardless of the independence of the pairs (ys , yt ) and (yu , yv ), we have T X T X T XX T X X X X Cov = Cov(ys yt , yu yv ). y y , y y s t u v t=1 s6=t t=1 u=1 s6=t v6=u v6=s u=1 v6=u v6=s where Cov(yt ys , yu yv ) is non-zero if and only if {t, s} = {u, v}. Otherwise, if {t, s} = 6 {u, v}, at least one of the variables t, s, u, or v is different from the three others, by assumption. Without loss of generality, we may suppose that t is different from the others, and Cov(yt ys , yu yv ) = E(yt ys yu yv ) − E(yt ys )E(yu yv ) = E(yt )E(ys yv yu ) − E(yt )E(ys )E(yu yv ) = 0 However, the equality {t, s} = {u, v} holds if and only if t = v and s = u. The t = s = u = v case is ruled out by the inequalities t 6= s or u 6= v. Furthermore, t = u and s = v case is contradictory with v 6= s. The sum of the covariances of all of the pairs for which {t, s} = {u, v} holds is T P P Cov(ys yt , yt ys ) t=1 s6=t which is equivalent to T P P V ar(ys yt ). Then, t=1 s6=t Cov T X X t=1 s6=t ys yt , T X X u=1 v6=u v6=s T X X yu yv = V ar(ys yt ) t=1 s6=t and, by Equation (14), T X T X T X X X X V ar ys yt = V ar(ys yt ) + V ar(ys yt ) t=1 s6=t t=1 s6=t t=1 s6=t = T (T − 1)σ 4 + T (T − 1)σ 4 = (2T 2 − 2T )σ 4 . Hence, by Equations (13) and (15) we have 1 V ar T ȳ 2 = 2 (2T 2 − 2T )σ 4 + 2T σ 4 = 2σ 4 . T 30 (15) (c) Rewriting, Cov T ȳ 2 , T X ! = yt yt−1 t=1 and we have T P 2 = yt t=1 t=1 Cov T ȳ 2 , T X T P yt yt−1 t=1 Cov T P t=1 yt2 , T P 1 Cov T yt2 + ! = T X T P t=1 1 Cov T !2 , yt t=1 yt T P T X yt yt−1 (16) t=1 ! ys . Then, s6=t T X t=1 yt2 , T X ! yt yt−1 T T T X X X 1 + Cov yt ys , yt yt−1 T t=1 t=1 s6=t t=1 yt yt−1 is zero because the third raw moment of yt is zero as we have set µ to zero t=1 above. However, ! T T T T X X X X yt yt yt−1 = T σ 4 , Cov yt yt−1 , ys = V ar t=1 t=1 t=1 s6=t because the yt yt−1 terms appear in both terms inside the covariance operator. However, the Cov(yt yt−1 , ys yr ) are zero unless s = t and ther = t − 1 condition is satisfied. Then, from T P Equation (16) it follows that Cov T ȳ 2 , yt yt−1 = σ 4 . t=1 In conclusion, based on (a), (b) and (c), we have s2 1 σ4 2 4 4 4 V ar δT − = T σ + 2σ − 2σ = . 2 4T 2 4T 31 Corollary 2.4. Let {Xt } be a martingale difference sequence such that E{|Xt |2+γ } < ∞ for some T P Xt T X (d) p γ > 0 and all t. If V ar(Xt ) > 0 and T −1 Xt2 − V ar(Xt ) −→ 0 then √ t=1 −→ N (0, 1) as T V ar(Xt ) t=1 N → ∞. Proof. For proof, see White (2001). (d) Proposition 3. Under H0 , GY OW −→ N (0, 1) as T → ∞. Proof. In Equation (11), we obtained the identity δT2 T s2 1 X − =− yt−1 (yt − ȳ) 2 2T t=1 Define a new series such that ZT = T (δT2 − s2 ). Hence, 2 T ZT = 1X yt−1 (ȳ − yt ) . 2 (17) t=1 where ZT is a sum and define the increments of ZT as ∆ZT := ZT − ZT −1 = yT −1 (ȳ − yT ). Let (Ω, P, F) be the complete probability space on which ∆ZT lives. Further, let FT be the σ−field generated by (∆Z1 , ∆Z2 , , . . . , ∆ZT ), i.e., ∆ZT is adapted to FT . Then, ∆ZT is an FT -martingale difference sequence: E{∆ZT |FT −1 } = = 1 E{yT −1 (ȳ − yT ) |FT −1 } 2 1 1 yT −1 (ȳ − E{yT |FT −1 }) = yT −1 (µ − µ) = 0. 2 2 (18) However, ∆ZT have finite moments of order greater than 2: E{|∆ZT |2+γ } ≤ E{|yT −1 |2+γ |(yT − ȳ)|2+γ } = E{|yT −1 |2+γ }E{|(yT − ȳ)|2+γ }. The first and second terms inside the above expectation operators are normal random variates. Therefore, because the moments for yT −1 and (yT − ȳ) exist provided that V ar(∆zT ) is finite, then the moments of all orders for yT −1 (yT − ȳ) exists as well. 32 Last, using Chebyshev’s inequality P{ω : |T −1 T X E{|T −1 T X ∆Zt2 − V ar(∆Zt )|} t=1 ∆Zt2 − V ar(∆Zt )| > } ≤ t=1 T T X X 2 2 −1 −1 E{T ∆Zt − V ar(∆Zt )} E{T ∆Zt } − E{V ar(∆Zt )} t=1 t=1 ≤ = for all > 0. But, for large T , E{T −1 T X ∆Zt2 } = E{V ar(∆Zt )} and we have t=1 P{ω : |T −1 T X ∆Zt2 − V ar(∆Zt )| > } = 0 (19) t=1 for all > 0. Combining these results with Corollary 2.4 we have √ T ∆Zt = T √ 1X 1 (d) T ∆Zt = √ ZT −→ N (0, σ 2 /2). T T t=1 Then, by Equation (18) we have 2 √ δT − s2 /2 (d) T −→ N (0, 1). σ 2 /2 (20) With s2 /2 in the denominator of Equation (20) instead of σ 2 /2, 2 2 √ √ δT − s2 /2 δT − s2 /2 σ2 T = T × 2 . s2 /2 σ 2 /2 s Conniffe and Spencer (2001) shows that (21) √ σ2 δT2 − s2 /2 is independent of s2 . In addition, T 2 = 2 σ /2 s 1 + oP (1). These results sum to 2 2 √ √ δT − s2 /2 δT − s2 /2 T = T + oP (1) s2 /2 σ 2 /2 with a distribution that converges to N (0, 1) by Equation (20). 33 (22) Proposition 4. Under H1 , assuming a single break, let t1 be the last point after which the process yt assumes the new mean µ2 and let q = t1 /T be the fraction of the data with µ1 where q ∈ { T1 , T2 , . . . T T−1 }. Then, |µ1 − µ2 |2 + √ 2T − 1 E {GY OW } = T . 2 + q(1 − q) |µ − µ |2 σ 1 2 σ2 Proof. The proof is a special case of the next proposition. Proposition 5. Under H1 , suppose there are z mean breaks, and let ti be the last points after which the process yt assumes the new mean µi+1 such that i ∈ {0, 1, . . . , z}, t0 = 0 . Let qi = (ti −ti−1 )/T , i ∈ {1, 2, . . . , z} be the fractions of the data with constant means. Then, z−1 P 2 −1 2 σ + (2T ) |µ − µ | i+1 i √ i=0 E {GY OW } = T − 1 . z z X z X X σ2 + qi µ2i − qi qj µ i µ j i=1 i=1 j=i Proof. The proof consists of three parts. Assuming that yt follows the dynamics defined in Equation (2), we first derive the expectation of the unconditional variance of yt , i.e., E(s2 ), and second, 2 2 E(δT2 ). Finally, we use Williams (1941)’s result any moment of the ratio of (δT ) to s is the 2that δ ratio of the moments of δT2 to s2 to obtain E T2 . s (a) We have n o (23) s2 = E (yt − Eyt )2 = E yt2 − (E (yt ))2 by definition, the unconditional expectations of yt and yt2 are sums of conditional expectations weighted with probabilities: E (yt ) = z X z X P{t ∈ (ti−1 , ti ]} × E yt t ∈ (ti−1 , ti ] = qi µi i=1 i=1 and z z z X X X E yt2 = P{t ∈ (ti−1 , ti ]} × E yt2 t ∈ (ti−1 , ti ] = qi (µ2i + σ 2 ) = σ 2 + qi µ2i . i=1 i=1 34 i=1 Then, substituting E (yt ) and E{yt2 } into Equation (23): 2 2 s =σ + z X qi µ2i − i=1 z X !2 qi µ i = σ2 + i=1 z X qi µ2i − z X z X i=1 qi qj µi µj . i=1 j=1 (b) In this part we will obtain EδT2 under H1 . T X EδT2 = E (4T )−1 (yt − yj−1 )2 j=1 ( z−1 ti+1 ) z−1 X X X 2 2 −1 = (4T ) E (yt − yj−1 ) + (yti +1 − yti ) i=0 t=ti +1 i=0 Because the expectation is a linear operator we can distribute the expectations inside the sums and n o n o we have that E (yt − yt−1 )2 = E (εt − εt−1 )2 = 2σ 2 whenever t ∈ / {t0 + 1, t1 + 1, . . . , tz−1 + 1}. However, when t ∈ {t0 + 1, t1 + 1, . . . , tz−1 + 1} n o n o E (yti +1 − yti )2 = E ( |µi+1 − µi | + εj − εj−1 )2 n o = |µi+1 − µi |2 + E (εj − εj−1 )2 (using Eεj = 0) = |µi+1 − µi |2 + 2σ 2 . Then, EδT2 = (4T )−1 2σ 2 (T − z) + 2 z−1 X ! |µi+1 − µi |2 i=0 σ 2 (T − z) + = 2T z−1 P |µi+1 − µi |2 i=0 2T σ2 = + 2 z−1 P |µi+1 − µi |2 − zσ 2 i=0 2T σ2 ∼ , = 2 for large T . (c) In the final part of the proof, we combine the results from (a) and (b) using a particular conclusion that is a result of Williams (1941) such that 2 E δT2 δT E 2 = . s /2 E {s2 /2} 35 (24) Then, 2 δT E 2 −1 = s /2 σ2 −1 z z P z P P 2 2 σ + qi µ i − qi qj µ i µ j i=1 i=1 j=1 as T → ∞. 36