Download A Wavelet Test to Identify Structural Changes at

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
A Wavelet Test to Identify Structural Changes at Unknown
Locations in Independently Distributed Processes
June 2011, This version: December 2012
Abstract
We propose a powerful wavelet method for detecting structural breaks in the mean of a
process. The wavelet transformation decomposes the variance of a process into its additive
low and high frequency components. If an independently distributed process is decomposed
through a one-scale wavelet transformation, the variance of the wavelet (high-frequency) and
scaling coefficients (low-frequency) are assigned equal weights. If there is a structural change in
the mean, then the sum of the squared scaling coefficients absorbs more variation, leading to
unequal weights for the variances of the wavelet and scaling coefficients. We use this feature of
the wavelet decomposition to design a statistical test for changes in the mean of an independently
distributed process. We establish the limiting null distribution of our test and demonstrate that
our test has good empirical size and substantive power relative to the existing alternatives.
JEL Classification: C12, C22
Keywords: Structural change tests, structural break tests, wavelets, maximum overlap discrete
wavelet transformation.
1.
Introduction
The primary goal of this paper is to present a test that can be used to identify a structural
break in the mean of an independently distributed process at an unknown location. A wavelet
decomposition additively splits data into their local weighted averages (scaling coefficients) and
local weighted differences (wavelet coefficients). If the process has a constant mean, then the
variances of the wavelet and scaling coefficients have equal magnitude. If, however, there is change
in the mean of the process, the variances of the wavelet and scaling coefficients diverge and there
is a greater allocation to the variance of the scaling coefficients. We use this feature of the wavelet
decomposition to design a statistical test to identify changes in the mean of an independently
distributed process.1 It is through these weighted local differences in moving windows that we
construct our statistical test for no structural break under the null hypothesis. We derive the test’s
null distribution and demonstrate that it is asymptotically normally distributed.
The wavelet transformation operates in local time windows. The length of the moving window
is determined by the length of the wavelet filter. If the filter length is two, as is the Haar filter,
then localized differences amount to the differences between two consecutive observations. If the
filter length is four, the length of the moving window is the weighted difference between the last
two and first two observations in a window of four observations. A longer filter more accurately
captures the local structural features of the data, but boundary issues may arise. In our test, we
use a Haar filter that has a length equal to two and is a good compromise between localization in
a local time window and cost in terms of boundary treatments.
Structural breaks can be permanent or temporary in nature. If a structural break is permanent,
then there is a permanent change of indefinite duration in the mean or variance. In a temporary
break, the mean or the variance shifts from its null value but eventually reverts to its null value.
Whether such breaks are temporary or permanent in nature, they may occur abruptly or gradually.
To capture such possibilities, we use Monte Carlo simulations to model break locations through
sinusoidals and to allow for abrupt as well as gradual structural breaks. We primarily focus on
smooth, multiple structural breaks for two reasons. First, most economic and financial data exhibit
gradual structural changes in a time window, and the most abrupt structural changes are exceptions
rather than the rule. Second, our framework is all-encompassing and allows for abrupt changes.2
1
Although we primarily focus on structural breaks in the mean for an independently distributed time series,
our framework can be generalized to structural breaks in the variance and structural breaks in stationary and nonstationary time series.
2
The usefulness of modeling structural breaks with this framework and was emphasized previously by Ludlow
1
Our Monte Carlo simulations indicate the presence of minimal empirical size distortions relative to
their nominal sizes and significant power improvements compared to the existing structural break
tests.
The literature on structural change tests is extensive. Several tests for structural breaks have
been proposed in the literature. Chow (1960) derived F-tests for structural breaks with known break
points. Brown et al. (1975) developed cumulative sum (CUSUM) and CUSUM-squared tests that
are also applicable to cases where the time of the break is unknown. More recently, contributions
by Ploberger et al. (1989), Hansen (2002), Andrews (1993), Inclan and Tiao (1994), Andrews
et al. (1996) and Chu et al. (1996) have extended tests for the presence of breaks to account for
heteroskedasticity. Methods for estimating the number and timing of multiple break points, as in
Bai and Perron (1998, 2003a) and Altissimo and Corradi (2003), have also been developed.3 These
tests are formulated in the context of linear regression models, and they use the estimated residuals
to detect the departure of parameters from constancy.
The mechanism of our wavelet framework can be described in the following way. The wavelet
decomposition yields localized weighted differences (wavelet coefficients) that are equal in number
to the number of data points.4 We square each wavelet coefficient to obtain its magnitude and
calculate the sample average of the squared coefficients. We expect this sample average to be
equal to one-half of the variance of the process under the null when there is no structural change.5
However, the sample average of the squared wavelet coefficients should be significantly smaller
than the null average when there are one or more structural breaks. We center and standardize the
sample average of the squared wavelet coefficients to obtain the null distribution with no structural
change.
Alternative tests, such as the CUSUM and moving sum (MOSUM), approach the problem
through cumulative sums of either recursive residuals (i.e., one step ahead of prediction errors) or
OLS residuals. They start from a given window to calculate the sequence of cumulated sums by
increasing the window length one step at a time6 and reject the null of no structural change when
the properly standardized supremum of these cumulated sums crosses the critical lines or when the
and Enders (2000), Becker et al. (2004), Becker et al. (2006), Ashley and Patterson (2010) and Stengos and Yazgan
(2012a,b).
3
Bai and Perron (2003a,b) provide versions of these tests that correct for heteroskedasticity and serial correlations.
4
In this paper, the maximum overlap discrete wavelet transformation (MODWT) is used.
5
The average of the squared wavelet coefficients and the average of the squared scaling coefficients are equal to
the overall variance in a one-level wavelet decomposition.
6
The MOSUM test uses moving windows but keeps their size constant.
2
maximum cumulated sum is sufficiently large.7 In a Sup-F type test8 , splitting the data into two or
more blocks and calculating the unrestricted and restricted residual sum of squares for all possible
sub-samples for a given length are necessary to find the supremum in an F-test.
In the presence of multiple structural breaks, MOSUM- and CUSUM-type tests yield smaller
maximum values of their cumulated sums of residuals (CSR). A similar argument applies to the
Sup-F test, where the difference between the sums of squared residuals (SSRs) of the restricted
and unrestricted models becomes small in the presence of multiple structural breaks. These tests
rely on larger CSRs (or larger differences between restricted and unrestricted SSRs) to detect
structural change, and smaller CSRs do not yield high power. Smaller CSRs occur because the
OLS estimation goes through the average of multiple structural breaks (in particular, when such
breaks are reverting), which yields smaller residuals that underestimate structural change locations.
Our test, conversely, always operates in a local time window with excellent frequency localization
features, where the imbalance between squared wavelet and scaling coefficients is preserved and the
power of our test is not compromised in the presence of multiple structural breaks.
Several recent papers have successfully demonstrated the usefulness of wavelets in an econometric hypothesis-testing framework. Fan and Gençay (2010) recently proposed a unified wavelet
spectral approach to unit root testing by providing a spectral interpretation of the existing Von
Neumann unit root tests. Xue et al. (2010) proposed using wavelet-based jump tests to detect jump
arrival times in high frequency financial time series data. These wavelet-based unit root, cointegration and jump tests have desirable empirical sizes and higher power relative to the existing tests.
Gençay and Gradojevic (2011) utilized wavelets for errors-in-variables estimation.
The outline of this paper is as follows. Section 2 introduces the wavelet-based structural change
test and its limiting null distribution. The Monte Carlo simulations are described in Section 3. Our
conclusions are presented at the end of this paper.
7
While the CUSUMs of the recursive residuals, properly standardized, have a distribution that tends to approach
a standard Wiener process (Sen (1982) and Kramer et al. (1988)), Ploberger et al. (1989) show that the OLS-based
CUSUMs have a distribution that tends to approach a Brownian bridge. Because the OLS residuals sum to zero
when there is an intercept in the regression, their cumulated sum cannot be expected to drift off after a structural
change, as often occurs with recursive residuals, which provides the rationale for the standard CUSUM test. No
matter how large a structural shift has occurred, the cumulated OLS residuals will return eventually to the origin;
Ploberger et al. (1989) provide critical lines that are parallel to the horizontal axis of the plot of cumulated residuals,
unlike the positively and negatively sloped critical lines of the conventional CUSUM test.
8
The Sup-F test for structural breaks was proposed by Andrews (1993) and generalized by Bai and Perron (1998).
3
2.
The Wavelet test for structural change
Let a time series {yt }Tt=1 evolve according to the following data generation process (DGP):
yt = µt + εt
(1)
where εt are normally, identically and independently distributed with mean zero and variance σ 2
for t = 1, . . . , T . The structural changes in the mean of yt occurring at (unknown) dates t1 , t2 ,. . .,tk
can be formulated as follows:

µ1




µ2
µt = .
..





µz
for 1 ≤ t ≤ t1 ,
for t1 < t ≤ t2 ,
..
.
(2)
for tz−1 < t ≤ T .
Under the null hypothesis, it is assumed that there is no structural change in the mean, H0 :
µ1 = µ2 = . . . = µz = µ. Under the alternative hypothesis, (H1 ), there may be one or more
breaks in the mean9 at unknown locations, such that there exists at least one i ∈ {1, . . . , z} such
that µi 6= µ.
2.1.
Wavelet and scaling coefficients
Consider the unit scale Haar maximum overlapping discrete wavelet transformation (MODWT)
of {yt }Tt=1 , where T is the number of observations. The wavelet and scaling coefficients for this
transformation are given by
1
Wt = (yt − yt−1 ),
2
1
Vt = (yt + yt−1 ),
2
(t = 1, 2, . . . , T ;
mod T )
(3)
(t = 1, 2, . . . , T ;
mod T )
(4)
where we assume that the boundary condition is circular. As we approach the end of the sample,
if enough data are not available for a given filter length, we use data from the beginning of the
sample to complete the analysis. This type of boundary treatment is innocuous, as we use a Haar
filter that has two coefficients and the data are assumed to be stationary.10
9
We use the terminology ‘structural change in the mean’ and ‘break in the mean’ interchangeably.
The circular boundary treatment may be problematic for trending or nonstationary data processes; therefore,
using data from the beginning of the sample to complete the analysis may introduce a superficial structural change due
10
4
The wavelet coefficients, {Wt }Tt=1 , capture the behavior of {yt } in the high frequency band
[ 12 , 1], while the scaling coefficients, {Vt }Tt=1 , capture the behavior of {yt } in the low frequency band
[0, 21 ]. Accordingly, the variance (energy) of {yt } is given by the sum of the energies of {Wt }Tt=1
and {Vt }Tt=1 , where
T
X
yt2
t=1
2.2.
=
T
X
Wt2
+
t=1
T
X
Vt2 .
t=1
Statistical properties of wavelet tests
We propose a test statistic based on the squared average of the wavelet coefficients, namely,
2
δm
=
m+j
1 X 2
Wt
m
(5)
t=j
for j ∈ 1, 2, 3, . . . , T − m, where m is an arbitrary length of the window that is used to test for
2 is defined as the average of the squared wavelet coefficients
structural changes. In Equation (5), δm
over an interval, m. When m = T , then δT2 is the average of the squared wavelet coefficients for
the whole sample. In the remainder of this paper, we explore the statistical properties of δT2 under
the null hypothesis of µ1 = µ2 = . . . = µz = µ in Equation (2). The following proposition
establishes the expected value and variance of δT2 under the null hypothesis of no structural change.
Proposition 1. Under H0 , E{δT2 } =
σ2
.
2
e
V ar(δT2 )=
3σ 4
for large T .
T
Proof. The proof is in the Appendix.
Let s2 = (1/T )
PT
t=1 (yt
− ȳ)2 be a consistent estimator of σ 2 . Accordingly, we center δT2 and work
with δT2 − 21 s2 , which has zero expectation under H0 . In the following proposition, we derive the
variance of δT2 − 12 s2 .
s2
σ4
2
Proposition 2. Under H0 , Var δT −
=
e
for large T .
2
4T
Proof. The proof is in the Appendix.
to the change in level between the beginning and the end of the sample. A longer filter will require more coefficients,
and it is more likely that such an analysis will be affected by the boundary coefficients. However, longer filters may
have better localization properties and may be more powerful for identifying the local structural variations in the
data.
5
s2
by dividing it by its standard deviation, we propose the test statistic GY OW .
2
s2
2
2
δT −
√
δ
2
= T 2 T2 − 1
(6)
= p
s
s4 /4T
Normalizing δT2 −
GY OW
The asymptotic distribution of GY OW is given in the following proposition:
(d)
(d)
Proposition 3. Under H0 , GY OW −→ N (0, 1) as T → ∞ where −→ denotes convergence in the
distribution.
Proof. The proof is in the Appendix.
The following proposition states the expected value of the test statistics under the alternative
hypothesis.
Proposition 4. Under H1 , assuming a single break, let t1 be the last point after which the process
yt assumes the new mean µ2 , and let q = t1 /T be the fraction of the data with µ1 where q ∈
{ T1 , T2 , . . . T T−1 }. Then,





|µ1 − µ2 |2



+
√
2T
E {GY OW } = T
−
1
.
2 + q(1 − q) |µ − µ |2


σ


1
2


σ2
Proof. The proof is in the Appendix.
Corollary 4.1. For a given break location q and sample size T , E {GY OW } → 0 as |µ1 − µ2 | → 0.
As the break disappears, the test statistics return its value under the null as expected. For all
q ∈ { T1 , T2 , . . . , T T−1 } and T ∈ {3, 4, . . .}, q(1 − q) > 1/(2T ), the quotient on the right-hand-side of
Proposition 4 is always smaller than the denominator, and, therefore, E {GY OW } < 0. Hence,
Corollary 4.2. For a given break with location, q, and size, (|µ1 − µ2 |), E {GY OW } → −∞ as
T → ∞ and E {GY OW } ∈ (−∞; 0) for T > 2.
Corollary 4.3. For a given break with location, q, and sample size, T , as |µ1 − µ2 | → ∞,
i
√ h
1
E {GY OW } decreases and approaches − T 1 − 2T q(1−q) .
6
Therefore, our test is applicable in (−∞; 0) and increases its power with T for a given location
and break size.
In Figure 1, we illustrate the behavior of the expected value of the test statistics, under the
alternative, for given break sizes. The graph in panel (a) illustrates the case for a small break,
whereas panel (b) refers to a larger break. Given the size of the break, it is apparent that an
increase in q or T dramatically carries E {GY OW } away from zero to the negative region. As T
and q increase, a larger break size corresponds to a higher rate of decrease in the values of the test
statistics. This indicates that the power is higher when the break size is larger. The behavior of
the test statistics is symmetrical around q = 0.5, where the power is maximum for given values of
T and q.
[Insert Figure 1 here]
In Figure 2, we depict the test statistics for given location of the break. Panel (a) shows the
case for small q, i.e., the break is at the beginning of the data, whereas the left panel illustrates
the case for a break at the center of the data. As T and |µ1 − µ2 | increase, E {GY OW } decreases
√
and approaches − T for both values of q shown in Figure 2. However, the expected values of our
test statistics decrease faster when q = 0.5 compared to the decrease observed when q = 0.05, for
given magnitudes of |µ1 − µ2 | and T , as shown in panel (b). The behavior of the test statistics is
symmetrical across q = 0.5 in this case, too, as established previously in (Figure 1).
[Insert Figure 2 here]
In the following proposition, we generalize Proposition 4 to multiple breaks.
Proposition 5. Under H1 , suppose that there are z mean breaks, and let ti be the last points after
which the process yt assumes the new mean µi+1 such that i ∈ {0, 1, . . . , z}, t0 = 0, qi = (ti −ti−1 )/T ,
i ∈ {1, 2, . . . , z} are the fractions of the data with constant means. Then,






z−1


2
 σ 2 + (2T )−1 P |µ



−
µ
|
i+1
i


√
i=0
E {GY OW } = T
−
1
.
z
z X
z
X
X







 σ2 +
qi µ2i −
qi qj µ i µ j




i=1
Proof. The proof is in the Appendix.
7
i=1 j=1
As in the case of a single break, the break locations ti and the distribution of µi determine the
expected value of GY OW under the alternative. Suppose that µj is different from the rest such
that µl = µ, l 6= j and µj = µ + for a non-zero . Then,
z
X
qi µ2i = µ2
i=1
z
X
qi + 2qj µ + qj 2 = µ2 + 2qj µj + qj 2 > µ2 + 2qj µ + qj2 2 = (µ + qj )2
i=1
=
z
X
!2
µqi + qj z
X
=
i=1
!2
µ i qi
=
i=1
z X
z
X
qj µ j qi µ i
i=1 j=1
because 0 < qj < 1 by assumption. Hence, the test statistics approach −∞ as T → ∞.
3.
Monte Carlo simulations
Our framework allows for single and multiple structural breaks in Monte Carlo simulations. Following Becker et al. (2004), Becker et al. (2006) and Ludlow and Enders (2000)11 , we allow smooth
and abrupt structural breaks. Because a Fourier expansion is capable of approximating absolutely
integrable functions to any desired degree of accuracy, smooth or abrupt breaks can be approximated by Fourier sinusoidals with an appropriate frequency mix. Therefore, the structural breaks
in the mean of Equation (1) are identified via the following function.12
µt ∼
= ρ+α
n X
i=1
−1
(2i − 1)
2π(2i − 1)kt
sin
T
(7)
where n is the number of frequencies included in the approximation, k represents a particular
frequency and α indicate their size (amplitude). The frequency coefficient, k, alone determines the
number of breaks and whether they are temporary or permanent. Moreover, if a single frequency is
used (n = 1), then the transitions tend to be smooth, whereas higher n values facilitate the analysis
of abrupt temporary or permanent breaks.
In Figure 3, the sample paths of yt are displayed for permanent and temporary breaks. In
the top panel where k = 0.9, we observe a single permanent structural break, which is permanent
in the sense that the series do not return to their initial mean levels. In the second panel where
k = 1.4, the series are subject to temporary single breaks. The remaining panels depict multiple
breaks that are both temporary and abrupt in nature. Moreover, Figure 3 illustrates sample paths
11
12
See also Ashley and Patterson (2010) and Stengos and Yazgan (2012a,b).
We approximate the discrete square function in Equation (2) using the continuous function in Equation (7).
8
of yt for higher n = 1 values where the breaks are more abrupt. When n assumes a value as high
as 4 (the third column), the breaks become significantly more abrupt. For higher values of n such
as 128, the structural breaks occur more suddenly.
[Insert Figure 3 here]
We compare the performance of the GY OW test using three well known structural break tests
statistics: the Sup-F, OLS-based cumulative CUSUM, and MOSUM tests. The Sup-F test was first
proposed by Andrews (1993) and was generalized by Bai and Perron (1998). It can be obtained by
calculating a series of F-statistics over all potential change points in the data and evaluating their
supremum. The Sup-F is a regression-based test. It uses the null hypothesis of no break in the
regression equation at time i against the alternative of two different regressions over the intervals
(1, i) and (i + 1, T ), where i is a point in (T0 , T − T0 ). The parameter T0 is generally set to h × T ,
and h is a bandwidth parameter h ∈ (0, 1). Hansen (1997) provided an approximation for the
asymptotic p−values.
The OLS-based CUSUM test statistics of Ploberger and Kramer (1992) is given by13
bT tc
1 X
SC (t) = √
ûi ; 0 ≤ t ≤ 1
σ̂ T i=1
(8)
where û are the OLS residuals from the model under the null, σ̂ is the standard deviation of the
estimated residuals. Instead of using cumulative sums of the same residuals, another method to
detect a structural change is to analyze moving sums of the residuals. The OLS-based MOSUM
test of Chu et al. (1995) considers this possibility. This test does not contain the sum of all of the
residuals up to a certain time, t, but it instead uses the sum of a fixed number of residuals in a data
window whose size is determined by the bandwidth parameter, h. The test statistic is computed
as
1
SM (t) = √
σ̂ T
bNT tc+bT hc
X
ûi
(0 ≤ t ≤ 1 − h)
(9)
i=bNT tc+1
where NT = (T − bT hc)/(1 − h).
We reject the “no structural break” null hypothesis when SC (t) is greater than the critical
boundary curve, λCU SU M (t, α), for at least one t. The same rejection criterion applies to the
13
bT tc is the nearest integer to T t smaller than or equal to T t.
9
MOSUM test for the statistic SM (t) and the boundary λM OSU M (t, h, α). Boundaries λCU SU M (t, α)
and λM OSU M (t, h, α) are calculated according to Zeileis (2001, 2005).
The wavelet test deposits the structural breaks into the scaling coefficient variance in a local
window. As the number of structural breaks increases, a larger percentage of the variance is
allocated to the scaling coefficients and the contribution of the wavelet variance to the overall
variance becomes marginal. The sum of the variances of the wavelet and scaling coefficients is
equal to the overall variance. Because the scaling coefficient variance becomes the driver of the
overall variance in the presence of multiple structural breaks, we use the marginalization of the
wavelet coefficient variance as a test for the presence of structural breaks.
The MOSUM and wavelet tests both operate in local windows. One possible drawback of the
MOSUM test is that it assigns the same weight to each observation in its local window. The wavelet
test, however, uses time-frequency optimized weights to sum the information.
The empirical size and power properties of the wavelet coefficient test statistics, GY OW , are
calculated for Equation (7) under various parameter sets of α, k, and T by assuming ρ = 0. The
experiment is run first for smooth structural breaks by setting n = 1, and it is then extended to
abrupt breaks for higher values of n up to 1024. Because the GY OW test is a left-tailed test, we use
one-sided p−values in our Monte Carlo simulations. The α values are allowed to vary in the range
of 0.2, 0.4, . . . , 2. The k parameter varies between 0 and 5 by gradual increments of ∆k = 0.05
between 0.0 and 2.0, and ∆k = 0.1 between 2.1 and 5.0. We consider cases where the data length,
T , is 50, 100, and 200. For each combination of α, T , and k, we run 10, 000 replicates and use the
same pseudo-random generated identically and independently distributed standard normal series to
simulate εt . The rejection frequencies are calculated at the 1, 5, and 10 percent levels of significance.
We report size-corrected empirical power calculations.14
[Insert Table 1 and Figure 4 here]
For the case of smooth breaks (n = 1), the results of the Monte Carlo experiments are illustrated
in Table 1 and Figure 4. In Table 1, we provide the size-adjusted rejection frequencies of Sup-F,
OLS-based CUSUM, OLS-based MOSUM, and GY OW test statistics for T = 50. The values in the
first panel, where k = 0 corresponds to the case of a pure white-noise process, are the size values
of the tests. The remaining values reported in the following panels are the size-corrected power of
the tests statistics. Each panel corresponds to selected α values for different ks.
14
For brevity, we report a subset of our Monte Carlo simulation results. Complete results are available upon
request.
10
In Figure 4, we plot the size-corrected rejection frequencies of the test statistics for a continuum
of k values when T = 50.15 Similar to Table 1, each panel plot corresponds to selected α values.
The values at the intersection with the y−axis are the size values of the tests. As k varies along
the x−axis, each line is the size-adjusted power of that test statistic.
The information contained in the table and figure clearly shows that, for all parameters in the
region where k > 1, GY OW has more empirical power than the other tests. Conversely, for small
values of αs and in a small region in which k is smaller or slightly larger than 1, the Sup-F and, to
a small extent, the MOSUM tests perform better than GY OW , although they lose power quickly
as k increases. This implies that the alternative tests do relatively better than our test in terms of
power when there is a single break. Otherwise, our test has higher relative power than the other
tests for higher k values, i.e., when there are multiple structural breaks. As the number of smooth
structural breaks in the data increases, GY OW performs much better as a structural break test.
To evaluate the performance of our test over a longer time horizon, we set T = 200, as in Figure
5, and obtain results that are similar to those reported above. We observe similar performance for
the wavelet test as for the Sup-F and MOSUM tests, especially for higher value of α, e.g., in the
bottom panel where k is close to zero. These results indicate that the power properties of our test
are similar to those of other tests, even in the case of single breaks. Conversely, when multiple
breaks spanned a longer time horizon, the other tests, especially the MOSUM test, do not lose their
power as quickly as our test does when k increases. The MOSUM test keeps its high power up to
a very large number of ks, i.e., for reasonably large numbers of breaks. In conclusion, when T is
large, the MOSUM test and our test perform similarly in general, which makes sense because both
tests use local information.
[Insert Figure 5 here]
We also consider the size and power properties of the tests when the breaks are abrupt in nature.
In Figures 6 and 7, we illustrate the results when n = 128.16 For abrupt breaks, our test not only
preserves its empirical size but also increases its power compared to the smooth case. In the case of
a single break, except for small values of α and for small T , our test begins to be competitive with
the Sup-F test and other tests in terms of power. Even for small T and multiple breaks, our test
appears to be the best performer. Similar to the smooth break case, the other tests also become
15
The size correction is for a 5 percent level of significance.
As mentioned above, we increased n to 1024 and obtained qualitatively similar results when the breaks were
smooth.
16
11
more powerful when T gets larger, and our test ceases to be the best performer except in cases
with large numbers of structural breaks.
[Insert Figure 6 and Figure 7 here]
4.
Conclusions
We develop a structural change test based on a maximum overlap wavelet transformation, derive
its limiting null distribution, and compare its empirical size and power properties to those for a
number of alternative tests. Our results indicate that the wavelet-based test has higher power than
the alternative tests when there are multiple structural breaks at unknown locations. In general,
the power of the competing tests starts to deteriorate after a single break, regardless of whether the
break is abrupt or smooth. In contrast, the wavelet tests maintain high uniform levels of power at
all multiple break levels. However, the MOSUM test attains the same power as the wavelet test for
relatively large data sets. We attribute this result to the fact that both the MOSUM and wavelet
tests use local information in a given window.
12
Table 1: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for smooth breaks
13
GY OW
Sup-F
T
50
100
200
50
100
200
50
Significance 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% 5% 1% 5%
(Size) k=0.0 0.01 0.04 0.01 0.06 0.01 0.05 0.01 0.06 0.01 0.06 0.01 0.05 0.00 0.03
α = 0.8
k=0.6 0.71 0.88 0.94 0.99 1.00 1.00 0.99 1.00 1.00 1.00 1.00 1.00 0.99 1.00
k=1.0 0.73 0.89 0.95 0.99 1.00 1.00 0.83 0.95 1.00 1.00 1.00 1.00 0.83 0.95
k=1.2 0.79 0.92 0.97 0.99 1.00 1.00 0.39 0.79 0.98 1.00 1.00 1.00 0.18 0.68
k=1.6 0.71 0.87 0.95 0.98 1.00 1.00 0.18 0.45 0.71 0.89 0.99 1.00 0.03 0.22
k=2.0 0.73 0.91 0.95 0.98 1.00 1.00 0.13 0.34 0.63 0.81 0.98 1.00 0.04 0.20
k=4.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02
k=5.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02
α = 1.2
k=0.6 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
k=1.0 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00 1.00 0.99 1.00
k=1.2 1.00 1.00 1.00 1.00 1.00 1.00 0.76 0.99 1.00 1.00 1.00 1.00 0.41 0.98
k=1.6 0.99 1.00 1.00 1.00 1.00 1.00 0.35 0.73 0.97 1.00 1.00 1.00 0.06 0.39
k=2.0 1.00 1.00 1.00 1.00 1.00 1.00 0.23 0.58 0.93 0.99 1.00 1.00 0.05 0.30
k=4.0 0.99 1.00 1.00 1.00 1.00 1.00 0.00 0.01 0.03 0.11 0.27 0.59 0.00 0.01
k=5.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02
α = 1.6
k=0.6 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
k=1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
k=1.2 1.00 1.00 1.00 1.00 1.00 1.00 0.96 1.00 1.00 1.00 1.00 1.00 0.63 1.00
k=1.6 1.00 1.00 1.00 1.00 1.00 1.00 0.48 0.87 1.00 1.00 1.00 1.00 0.08 0.52
k=2.0 1.00 1.00 1.00 1.00 1.00 1.00 0.34 0.77 0.99 1.00 1.00 1.00 0.05 0.39
k=4.0 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.02 0.10 0.38 0.76 0.00 0.00
k=5.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02
α = 2.0
k=0.6 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
k=1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
k=1.2 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.79 1.00
k=1.6 1.00 1.00 1.00 1.00 1.00 1.00 0.62 0.94 1.00 1.00 1.00 1.00 0.09 0.65
k=2.0 1.00 1.00 1.00 1.00 1.00 1.00 0.44 0.87 1.00 1.00 1.00 1.00 0.04 0.46
k=4.0 1.00 1.00 1.00 1.00 1.00 1.00 0.00 0.00 0.02 0.08 0.48 0.88 0.00 0.00
k=5.0 0.65 0.84 0.95 0.99 1.00 1.00 0.00 0.02 0.03 0.10 0.13 0.36 0.00 0.02
CUSUM
MOSUM
100
200
50
100
200
1% 5% 1% 5% 1% 5% 1% 5% 1% 5%
0.00 0.03 0.01 0.04 0.00 0.01 0.00 0.01 0.00 0.02
1.00
1.00
0.94
0.41
0.36
0.01
0.01
1.00
1.00
1.00
0.77
0.68
0.08
0.08
1.00
1.00
1.00
0.93
0.88
0.05
0.05
1.00
1.00
1.00
1.00
0.99
0.24
0.24
0.57
0.57
0.61
0.44
0.41
0.05
0.05
0.83
0.84
0.86
0.72
0.74
0.15
0.15
1.00
0.99
1.00
0.97
0.97
0.21
0.21
1.00
1.00
1.00
1.00
1.00
0.51
0.51
1.00
1.00
1.00
1.00
1.00
0.75
0.75
1.00
1.00
1.00
1.00
1.00
0.98
0.98
1.00
1.00
1.00
0.79
0.72
0.01
0.01
1.00
1.00
1.00
0.98
0.93
0.07
0.08
1.00
1.00
1.00
1.00
1.00
0.07
0.05
1.00
1.00
1.00
1.00
1.00
0.39
0.24
0.91
0.90
0.91
0.73
0.72
0.04
0.05
0.99
0.99
0.99
0.97
0.96
0.16
0.15
1.00
1.00
1.00
1.00
1.00
0.31
0.21
1.00
1.00
1.00
1.00
1.00
0.71
0.51
1.00
1.00
1.00
1.00
1.00
0.99
0.75
1.00
1.00
1.00
1.00
1.00
1.00
0.98
1.00
1.00
1.00
0.96
0.90
0.00
0.01
1.00
1.00
1.00
1.00
1.00
0.05
0.08
1.00
1.00
1.00
1.00
1.00
0.07
0.05
1.00
1.00
1.00
1.00
1.00
0.51
0.24
0.99
0.98
0.99
0.90
0.91
0.02
0.05
1.00
1.00
1.00
1.00
1.00
0.12
0.15
1.00
1.00
1.00
1.00
1.00
0.35
0.21
1.00
1.00
1.00
1.00
1.00
0.82
0.51
1.00
1.00
1.00
1.00
1.00
1.00
0.75
1.00
1.00
1.00
1.00
1.00
1.00
0.98
1.00
1.00
1.00
0.99
0.98
0.00
0.01
1.00
1.00
1.00
1.00
1.00
0.03
0.08
1.00
1.00
1.00
1.00
1.00
0.05
0.05
1.00
1.00
1.00
1.00
1.00
0.58
0.24
1.00
1.00
1.00
0.96
0.96
0.01
0.05
1.00
1.00
1.00
1.00
1.00
0.07
0.15
1.00
1.00
1.00
1.00
1.00
0.33
0.21
1.00
1.00
1.00
1.00
1.00
0.88
0.51
1.00
1.00
1.00
1.00
1.00
1.00
0.75
1.00
1.00
1.00
1.00
1.00
1.00
0.98
Notes: The numbers represent the fraction of the cases that H0 is rejected in 10, 000 replications, for each test (size
corrected powers). The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, n = 1, and εt is iid ∼ N (0, 1).
The bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not
use the heteroskedasticity and autocorrelation consistent (HAC) kernel.
14
(a) |µ1 − µ2 | = 1
(b) |µ1 − µ2 | = 3
Figure 1: Behavior of E{GY OW } for a given size.
15
(a) q = 0.05
(b) q = 0.5
Figure 2: Behavior of E{GY OW } for a given location.
0
50
100
150
200
0
50
100
k=0.9, n=1
k=0.9, n=2
k=0.9, n=4
k=0.9, n=128
k=1.4, n=1
k=1.4, n=2
k=1.4, n=4
k=1.4, n=128
150
200
1.5
1.0
0.5
0.0
−0.5
−1.0
−1.5
1.5
1.0
0.5
0.0
−0.5
−1.0
−1.5
16
k=1.9, n=1
k=1.9, n=2
k=1.9, n=4
k=1.9, n=128
k=2.4, n=1
k=2.4, n=2
k=2.4, n=4
k=2.4, n=128
1.5
1.0
0.5
0.0
−0.5
−1.0
−1.5
1.5
1.0
0.5
0.0
−0.5
−1.0
−1.5
k=4.9, n=1
k=4.9, n=2
k=4.9, n=4
k=4.9, n=128
1.5
1.0
0.5
0.0
−0.5
−1.0
−1.5
0
50
100
150
200
0
50
100
150
200
index
Figure 3: Sample paths of smooth/abrupt and permanent/temporary breaks
Notes: The dashed line is µt , as given in Equation (7) with α = 1 and ρ = 0, and the solid line is yt = µt + εt , where εt
is iid ∼ N (0, 0.25), and T = 200.
1.0
0.8
0.6
0.4
0.0
0.2
Rej. freq. (α=1.2, T=50)
1.0
0.8
0.6
0.4
0.2
0.0
Rej. freq. (α=0.8, T=50)
0
1
2
3
4
5
0
1
2
5
3
4
5
1.0
0.8
0.6
0.0
0.2
0.4
Rej. freq. (α=2, T=50)
0.8
0.6
0.4
0.2
0.0
Rej. freq. (α=1.6, T=50)
4
k
1.0
k
3
0
1
2
3
4
5
0
1
k
GYO test
2
k
supF test
CUSUM test
MOSUM test
Figure 4: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for smooth
breaks (n = 1, T = 50).
Notes: x−axis are the k values in Equation (7) and at the y−axis are size corrected (at the 5% level) empirical
powers. The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, and εt is iid ∼ N (0, 1). The
bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not
use the heteroskedasticity and autocorrelation consistent (HAC) kernel.
17
1.0
0.8
0.6
0.4
0.0
0.2
Rej. freq. (α=1.2, T=200)
1.0
0.8
0.6
0.4
0.2
0.0
Rej. freq. (α=0.8, T=200)
0
1
2
3
4
5
0
1
2
5
3
4
5
1.0
0.8
0.6
0.4
0.0
0.2
Rej. freq. (α=2, T=200)
0.8
0.6
0.4
0.2
0.0
Rej. freq. (α=1.6, T=200)
4
k
1.0
k
3
0
1
2
3
4
5
0
1
k
GYO test
2
k
supF test
CUSUM test
MOSUM test
Figure 5: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for smooth
breaks (n = 1, T = 200).
Notes: x−axis are the k values in Equation (7) and at the y−axis are size corrected (at the 5% level) empirical
powers. The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, and εt is iid ∼ N (0, 1). The
bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not
use the heteroskedasticity and autocorrelation consistent (HAC) kernel.
18
1.0
0.8
0.6
0.4
0.0
0.2
Rej. freq. (α=1.2, T=50)
1.0
0.8
0.6
0.4
0.2
0.0
Rej. freq. (α=0.8, T=50)
0
1
2
3
4
5
0
1
2
5
3
4
5
1.0
0.8
0.6
0.0
0.2
0.4
Rej. freq. (α=2, T=50)
0.8
0.6
0.4
0.2
0.0
Rej. freq. (α=1.6, T=50)
4
k
1.0
k
3
0
1
2
3
4
5
0
1
k
GYO test
2
k
supF test
CUSUM test
MOSUM test
Figure 6: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for abrupt
breaks (n = 128, T = 50).
Notes: x−axis are the k values in Equation (7) and at the y−axis are size corrected (at the 5% level) empirical
powers. The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, and εt is iid ∼ N (0, 1). The
bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not
use the heteroskedasticity and autocorrelation consistent (HAC) kernel.
19
1.0
0.8
0.6
0.4
0.0
0.2
Rej. freq. (α=1.2, T=200)
1.0
0.8
0.6
0.4
0.2
0.0
Rej. freq. (α=0.8, T=200)
0
1
2
3
4
5
0
1
2
5
3
4
5
1.0
0.8
0.6
0.4
0.0
0.2
Rej. freq. (α=2, T=200)
0.8
0.6
0.4
0.2
0.0
Rej. freq. (α=1.6, T=200)
4
k
1.0
k
3
0
1
2
3
4
5
0
1
k
GYO test
2
k
supF test
CUSUM test
MOSUM test
Figure 7: Size corrected powers of the GY OW , Sup-F, CUSUM, and MOSUM tests for abrupt
breaks (n = 128, T = 200).
Notes: x−axis are the k values in Equation (7) and at the y−axis are size corrected (at the 5% level) empirical
powers. The DGP is yt = µt + εt , where µt is given in Equation (7) with ρ = 0, and εt is iid ∼ N (0, 1). The
bandwidth parameter, h, is taken as 0.15 for Sup-F and MOSUM tests. Sup-F test, due to Andrews (1993), does not
use the heteroskedasticity and autocorrelation consistent (HAC) kernel.
20
References
Altissimo, F. and V. Corradi (2003). Strong rules for detecting the number of breaks in a time
series. Journal of Econometrics 117 (2), 207–244.
Andrews, D. W. K. (1993). Tests for parameter instability and structural change with unknown
change point. Econometrica 61 (4), 821–856.
Andrews, D. W. K., I. Lee, and W. Ploberger (1996). Optimal change-point tests for normal linear
regression. Journal of Econometrics 70 (1), 9–38.
Ashley, R. A. and D. M. Patterson (2010). Apparent long memory in time series as an artifact of a
time-varying mean: Considering alternatives to the fractionally integrated model. Macroeconomic
Dynamics 14 (S1), 59–87.
Bai, J. and P. Perron (1998). Estimating and testing linear models with multiple structural changes.
Econometrica 66 (1), 47–78.
Bai, J. and P. Perron (2003a). Computation and analysis of multiple structural change models.
Journal of Applied Econometrics 18 (2), 1–22.
Bai, J. and P. Perron (2003b). Critical values for multiple structural change tests. The Econometrics
Journal 6 (3), 72–78.
Becker, R., W. Enders, and S. Hurn (2004). A general test for time dependence in parameters.
Journal of Applied Econometrics 19 (7), 899–906.
Becker, R., W. Enders, and J. Lee (2006). A stationarity test in the presence of an unknown number
of smooth breaks. Journal of Time Series Analysis 27 (3), 381–409.
Brown, R. L., J. Durbin, and J. M. Evans (1975). Techniques for testing the constancy of regression
relationships over time. Journal of the Royal Statistical Society. Series B (Methodological) 37 (2),
149–192.
Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica 28 (3), 591–605.
Chu, C.-S. J., K. Hornik, and C.-M. Kuan (1995).
Biometrika 82 (3), 603–617.
21
MOSUM tests for parameter constancy.
Chu, C.-S. J., M. Stinchcombe, and H. White (1996). Monitoring structural change. Econometrica 64 (5), 1045–1065.
Conniffe, D. and J. E. Spencer (2001). When moments of ratios are ratios of moments. Journal of
the Royal Statistical Society. Series D (The Statistician) 50 (2), 161–168.
Fan, Y. and R. Gençay (2010). Unit root tests with wavelets. Econometric Theory 26, 1305–1331.
Gençay, R. and N. Gradojevic (2011). Errors-in-variables estimation with wavelets. Journal of
Statistical Computation and Simulation 11 (8), 1545–1564.
Hansen, B. E. (1997). Approximate asymptotic p values for structural-change tests. Journal of
Business & Economic Statistics 15 (1), 60–67.
Hansen, B. E. (2002). Tests for parameter instability in regressions with I(1) processes. Journal of
Business & Economic Statistics 20 (1), 45–59.
Inclan, C. and G. C. Tiao (1994). Use of cumulative sums of squares for retrospective detection of
changes of variance. Journal of the American Statistical Association 89 (427), 913–923.
Kramer, W., W. Ploberger, and R. Alt (1988). Testing for structural change in dynamic models.
Econometrica 56, 1355–1369.
Ludlow, J. and W. Enders (2000). Estimating non-linear ARMA models using Fourier coefficients.
International Journal of Forecasting 16 (3), 333–347.
Ploberger, W. and W. Kramer (1992, March). The CUSUM test with OLS residuals. Econometrica 60 (2), 271–85.
Ploberger, W., W. Krämer, and K. Kontrus (1989). A new test for structural stability in the linear
regression model. Journal of Econometrics 40 (2), 307–318.
Sen, P. K. (1982). Invariance principles for recursive residuals. The Annals of Statistics 10, 307–312.
Stengos, T. and M. E. Yazgan (2012a). Persistence in convergence. Macroeconomic Dynamics,
Forthcoming.
Stengos, T. and M. E. Yazgan (2012b). Persistence in real exchange rate convergence. Studies in
Nonlinear Dynamics and Econometrics, Forthcoming.
22
White, H. (2001). Asymptotic Theory for Econometricians. Academic Press, San Diego.
Williams, J. D. (1941). Moments of the ratio of the mean square successive difference to the mean
square difference in samples from a normal universe. The Annals of Mathematical Statistics 12 (2),
239–241.
Xue, Y., R. Gençay, and S. Fagan (2010). Testing for jump arrivals in financial time series. Technical
report, Department of Economics, Simon Fraser University.
Zeileis, A. (2001). strucchange: Testing for structural change in linear regression relationships. R
News 1 (3), 8–11.
Zeileis, A. (2005). A unified approach to structural change tests based on ML scores, F statistics,
and OLS residuals. Econometric Reviews 24 (4), 445–466.
23
Appendix
Proposition 1. Under H0 , E{δT2 } =
σ2
3σ 4
. V ar(δT2 )=
for large T .
e
2
T
Proof. Without any loss of generality, we set µ = 0 under the H0 because σ 2 and s2 are invariant
under mean change. For the remaining part of the proof, we assume that t ∈ {1, 2, . . . , T ; mod T }.
By expanding δ 2 as the sum of Wi2 , we have
)
T
T
X
1X
1
1
2
2
Wt =
EWt2 = T E(W12 ) = E(W12 )
EδT = E
T
T
T
t=1
t=1
(
2 )
y1 − y0
1
=E
= E{(y12 + y02 − 2y1 y0 )}
2
4
1 2 1 2
1
=
E(y12 ) + E(y02 ) − 2E(y1 y0 ) =
2σ = σ
4
4
2
(
As for the variance,

EδT2 = V ar(δT2 ) = V ar T −1
TX
+j−1

Wt2 
t=j
=T
−2
TX
+j−1
V
ar(Wt2 )
+ 2T
−2
t=j
TX
+j−1 T X
+j−1
t=j
Cov(Wt2 , Ws2 ).
s=t+1
2 ) = 0 for k ≥ 2 (see Lemma 1.1). Then,
We also have that Cov(Wt2 , Wt+k
V ar(δT2 ) = T −2
TX
+j−1
t=j
V ar(Wt2 ) + 2T −2
TX
+j−1
2
Cov(Wt−1
, Wt2 ).
t=j+1
2 , W 2 ):
Writing the probabilistic expression for Cov(Wt−1
t
2
2
2
Cov(Wt−1
, Wt2 ) = E{(Wt−1
− EWt−1
)(Wt2 − EWt2 )}
2
2
= E{Wt−1
Wt2 } − EWt−1
EWt2 .
24
2 W 2 }:
where EWt2 , the unconditional mean of Wt2 , is σ 2 /2, ∀t. Then, expanding E{Wt−1
t
yt−1 − yt−2 2 yt − yt−1 2
) (
) }
2
2
2
2
2
= 2−4 E{(yt−1
− 2yt−1 yt−2 + yt−2
)(yt2 − 2yt yt−1 + yt−1
)}
2
E{Wt−1
Wt2 } = E{(
2
3
4
2
3
= 2−4 E{yt−1
yt2 − 2yt−1
yt + yt−1
− 2yt−1 yt−2 yt2 + 4yt−1
yt yt−2 − 2yt−1
yt−2
2
2
2
2
+yt−2
yt2 − 2yt−1 yt−2
yt + yt−2
yt−1
}
because yt and yt−1 are independent ∀t. We also know that under H0 , the first four raw moments
of yt are
E{yt } = 0,
E{yt2 } = σ 2 ,
E{yt3 } = 0,
E{yt4 } = 3σ 4 ,
∀t.
Then,
2
24 E{Wt−1
Wt2 } = 3(σ 2 )2 − 3σ 4 .
= 6σ 4 ⇒
3 4
2
E{Wt−1
Wt2 } =
σ .
8
Henceforth,
2
Cov(Wt−1
, Wt2 ) =
3 4 1 4 1 4
σ − σ = σ .
8
4
8
and
2
V ar(δT2 ) = T −2 T V ar(Wt2 ) + 2T −2 (T − 1) Cov(Wt−1
, Wt2 )
σ4 1 T − 1
3σ 4
( +
) −→
as T gets larger,
T 2
4T
4T
σ4
(see Lemma 1.2).
because V ar(Wt2 ) =
2
=
25
2 ) = 0 for k ≥ 2.
Lemma 1.1. Under H0 , Cov(Wt2 , Wt+k
Proof.
2
2
2
Cov(Wt2 , Wt+k
) = E{(Wt2 − EWt2 )(Wt+k
− EWt+k
)}
yt − yt−1 2 yt+k − yt+k−1 2
σ2 σ2
) (
) }−(
)
2
2
2 2
σ4
2
2
2
= 2−4 E{(yt2 − 2yt yt−1 + yt−1
)(yt+k
− 2yt+k yt+k−1 + yt+k−1
)} −
4
2
2
= 2−4 E{yt2 yt+k
− 2yt2 yt+k yt+k−1 + yt2 yt+k−1
2
2
} − EWt2 EWt+k
= E{(
= E{Wt2 Wt+k
2
2
−2yt yt−1 yt+k
+ 4yt yt−1 yt+k yt+k−1 − 2yt yt−1 yt+k−1
2
2
2
2
2
}−
− 2yt−1
yt+k yt+k−1 + yt−1
yt+k−1
+yt−1
yt+k
σ4
.
4
Because ys and yr are independent whenever s 6= r, we have E{ysm yrn } = E{ysm }E{yrn } for all m, n
such that m, n ∈ R. We know that E{ys } = 0 and E{ys2 } = σ 2 , ∀s ∈ {1, . . . , T }, and it follows that
2
) = σ 2 σ 2 + σ 2 σ 2 + σ 2 σ 2 + σ 2 σ 2 − 4σ 4 = 0
24 Cov(Wt2 , Wt+k
Lemma 1.2. Under H0 , V ar(Wt2 ) =
σ4
.
2
Proof. Because yt − yt−1 = ut − ut−1 , µ in Wt =
ut − ut−1
is eliminated. So,
2
ut − ut−1 2
) )
2
24 V ar(Wt2 ) = V ar(u2t − 2ut ut−1 + u2t−1 )
V ar(Wt2 ) = V ar((
= V ar(u2t ) + V ar(u2t−1 ) + V ar(−2ut ut−1 )
+2Cov(u2t , −2ut ut−1 ) + 2Cov(u2t−1 , −2ut ut−1 ) + 2Cov(u2t , u2t−1 )
(10)
= 2V ar(u2t ) + 4V ar(ut ut−1 ) − 8Cov(u2t , ut ut−1 ) + 2Cov(u2t , u2t−1 ).
because V ar(u2t ) = V ar(u2t−1 ) and Cov(u2t , ut ut−1 ) = Cov(u2t−1 , ut ut−1 ). Additionally, we have
V ar(u2t ) = V ar(u2t−1 ) = E{(u2t )2 } − E{u2t }E{u2t }
= 3σ 4 − (σ 2 )2 = 2σ 4
26
and
V ar(ut ut−1 ) = E{(u2t u2t−1 )} − E{ut ut−1 }E{ut ut−1 }
= E{u2t }E{u2t−1 } − 0 = σ 4
and
Cov(u2t , ut ut−1 ) = E{(u2t ut ut−1 )} − E{u2t }E{ut ut−1 }
= E{ut−1 }(E{u3t } − E{u2t }E{ut })
= 0 × (0 − σ 2 × 0) = 0,
Cov(u2t , u2t−1 ) = E{(u2t u2t−1 )} − E{u2t }E{u2t−1 }
= E{u2t }E{u2t−1 } − E{u2t }E{u2t−1 } = 0.
because E{ut } = E{ut−1 } = 0, E{u3t } = 0. Then, by making the necessary substitutions in
Equation (10),
24 V ar(Wt2 ) = 2 × 2σ 4 + 4 × σ 4 − 8 × 0 + 2 × 0 ⇒
8σ 4
σ4
V ar(Wt2 ) =
=
.
16
2
27
σ4
s2
2
=
e
Proposition 2. Under H0 , Var δT −
for large T .
2
4T
s2
we have the sum of first-order serially correlated variables:
2
( T
)
T
T
T
X
X
1 X
1
1X 2
Wt −
(yt − ȳ)2 =
(yt − yt−1 )2 − 2
(yt − ȳ)2
T
2T
4T
t=1
t=1
t=1
t=1
( T
)
1 X 2
2
yt + yt−1
− 2yt yt−1 − 2yt2 + 4ȳyt − 2ȳ 2
4T
t=1
!
T
T
T
T
T
T
X
X
X
X
X
X
1
2
yt−1
−2
yt yt−1 − 2
yt2 + 4
ȳyt − 2
ȳ 2
yt2 +
4T
t=1
t=1
t=1
t=1
t=1
t=1
!
T
T
T
T
T
X
X
X
X
X
1
2
yt2 − 2
yt yt−1 − 2
yt2 + 4
ȳyt − 2
ȳ 2
4T
t=1
t=1
t=1
t=1
t=1
!
!
T
T
T
X
X
X
1
1
−
yt yt−1 + 2
ȳyt − T ȳ 2 =
−
yt yt−1 + 2T ȳ 2 − T ȳ 2
2T
2T
t=1
t=1
t=1
!
T
T
X
X
1
−
yt yt−1 +
yt ȳ
2T
Proof. By rewriting δT2 −
δT2 −
s2
=
2
=
=
=
=
=
=−
1
2T
t=1
T
X
(11)
t=1
yt (yt−1 − ȳ) .
t=1
Hence,
!
T
2
X
s
1
V ar δT2 −
=
V ar
yt yt−1 − T ȳ 2
2
4T 2
t=1
!
!!
T
T
X
X
1
2
2
=
V ar
yt yt−1 + V ar T ȳ − 2Cov T ȳ ,
yt yt−1
.
4T 2
t=1
(12)
t=1
Below, we expand the three terms in the parentheses in Equation (12) (a), (b), and (c). The
s2
s2
parameters δT2 , , and δT2 −
are invariant under µ, and we set µ = 0 without loss of generality.
2
2
!
T
T
T
T
P
P
P
P
(a) We have V ar
yt yt−1 =
V ar (yt yt−1 ) +
Cov yt yt−1 ,
ys ys−1 . However, the
t=1
t=1
t=1
28
s6=t
covariance terms in the summation operator are zero, as we set µ to zero above, and
!
T
T
X
X
V ar
yt yt−1 =
V ar (yt yt−1 ) = T σ 4 .
t=1
t=1
because Cov(yt yt−1 , ys ys−1 ) = 0 whenever s 6= t.
T 2 !
T 2
T
T P
P
P
P
P
1
2
ys yt ,
(b) We have V ar T ȳ = 2 V ar
. By substituting
yt2 +
yt
yt =
T
t=1
t=1
t=1 s6=t
t=1





!
T X
T
T X
T
X
X
X
X
1
V ar T ȳ 2 = 2 V ar
yt2 ,
yt2 + V ar 
ys yt  + 2Cov 
ys yt  . (13)
T
t=1 s6=t
t=1
Cov
T
P
t=1
yt2 ,
!
T P
P
ys yt
is zero because E
T
P
t=1
t=1 s6=t
yt2 ×
t=1
T P
P
!
ys yt
and E
t=1 s6=t
t=1 s6=t
T P
P
!
ys yt
are both
t=1 s6=t
zero.
We need to derive V ar
T
P
t=1
"
V ar
T
X
#
yt2
=
t=1
" T
X
yt2
term in Equation (13). Because yt are independent and identical,
#
V
ar(yt2 )
= T × V ar(yt2 ),
t ∈ {1, 2, . . . , T }.
t=1
yt2 is square integrable because yt ∈ L4 , i.e., the fourth (central) moment of normal random variables
exists. Writing the probabilistic expression for V ar(yt2 ),
h
2 i 2 2
yt2
− E yt
2
= E yt4 − E yt2
.
V ar(yt2 ) = E
Substituting the second and fourth moments of a normal random variate with zero mean and σ 2
variance parameter, such that, E yt2 = σ 2 and E yt4 = 3σ 4 , we have V ar(yt2 ) = 3σ 4 −(σ 2 )2 = 2σ 4 .
Then
"
V ar
T
X
#
yt2 = 2T σ 4 .
t=1
We need to obtain variance of
T P
P
ys yt in terms of σ.
t=1 s6=t




T X
T X
T X
T X
X
X
X
X

V ar 
ys yt  =
V ar(ys yt ) + Cov 
y
y
,
yu yv 
s
t

.
t=1 s6=t
t=1 s6=t
t=1 s6=t
29
u=1 v6=u
v6=s
(14)
Regardless of the independence of the pairs (ys , yt ) and (yu , yv ), we have


T X
T X
T XX
T X
X
X
 X

Cov 
=
Cov(ys yt , yu yv ).
y
y
,
y
y
s t
u v

t=1 s6=t
t=1 u=1 s6=t v6=u
v6=s
u=1 v6=u
v6=s
where Cov(yt ys , yu yv ) is non-zero if and only if {t, s} = {u, v}. Otherwise, if {t, s} =
6 {u, v}, at
least one of the variables t, s, u, or v is different from the three others, by assumption. Without
loss of generality, we may suppose that t is different from the others, and
Cov(yt ys , yu yv ) = E(yt ys yu yv ) − E(yt ys )E(yu yv )
= E(yt )E(ys yv yu ) − E(yt )E(ys )E(yu yv ) = 0
However, the equality {t, s} = {u, v} holds if and only if t = v and s = u. The t = s = u = v case
is ruled out by the inequalities t 6= s or u 6= v. Furthermore, t = u and s = v case is contradictory
with v 6= s.
The sum of the covariances of all of the pairs for which {t, s} = {u, v} holds is
T P
P
Cov(ys yt , yt ys )
t=1 s6=t
which is equivalent to
T P
P
V ar(ys yt ). Then,
t=1 s6=t



Cov 

T X
X
t=1 s6=t
ys yt ,
T X
X
u=1 v6=u
v6=s
T X
 X

yu yv  =
V ar(ys yt )
t=1 s6=t
and, by Equation (14),


T X
T X
T X
X
X
X
V ar 
ys yt  =
V ar(ys yt ) +
V ar(ys yt )
t=1 s6=t
t=1 s6=t
t=1 s6=t
= T (T − 1)σ 4 + T (T − 1)σ 4 = (2T 2 − 2T )σ 4 .
Hence, by Equations (13) and (15) we have
1 V ar T ȳ 2 = 2 (2T 2 − 2T )σ 4 + 2T σ 4 = 2σ 4 .
T
30
(15)
(c) Rewriting,
Cov T ȳ 2 ,
T
X

!
=
yt yt−1
t=1
and we have
T
P
2
=
yt
t=1
t=1
Cov T ȳ 2 ,
T
X
T
P
yt yt−1
t=1
Cov
T
P
t=1
yt2 ,
T
P
1
Cov 
T
yt2 +
!
=
T
X
T
P
t=1
1
Cov
T
!2
,
yt
t=1
yt
T
P
T
X

yt yt−1 
(16)
t=1
!
ys . Then,
s6=t
T
X
t=1
yt2 ,
T
X
!
yt yt−1


T
T
T
X
X
X
1
+ Cov 
yt
ys ,
yt yt−1 
T
t=1
t=1
s6=t
t=1
yt yt−1
is zero because the third raw moment of yt is zero as we have set µ to zero
t=1
above. However,



!
T
T
T
T
X
X
X
X
yt
yt yt−1 = T σ 4 ,
Cov 
yt yt−1 ,
ys  = V ar
t=1
t=1
t=1
s6=t
because the yt yt−1 terms appear in both terms inside the covariance operator. However, the
Cov(yt yt−1 , ys yr ) are zero unless s = t and ther = t − 1 condition is satisfied. Then, from
T
P
Equation (16) it follows that Cov T ȳ 2 ,
yt yt−1 = σ 4 .
t=1
In conclusion, based on (a), (b) and (c), we have
s2
1
σ4
2
4
4
4
V ar δT −
=
T
σ
+
2σ
−
2σ
=
.
2
4T 2
4T
31
Corollary 2.4. Let {Xt } be a martingale difference sequence such that E{|Xt |2+γ } < ∞ for some
T
P
Xt
T
X
(d)
p
γ > 0 and all t. If V ar(Xt ) > 0 and T −1
Xt2 − V ar(Xt ) −→ 0 then √ t=1
−→ N (0, 1) as
T V ar(Xt )
t=1
N → ∞.
Proof. For proof, see White (2001).
(d)
Proposition 3. Under H0 , GY OW −→ N (0, 1) as T → ∞.
Proof. In Equation (11), we obtained the identity
δT2
T
s2
1 X
−
=−
yt−1 (yt − ȳ)
2
2T
t=1
Define a new series such that ZT = T (δT2 −
s2
). Hence,
2
T
ZT =
1X
yt−1 (ȳ − yt ) .
2
(17)
t=1
where ZT is a sum and define the increments of ZT as ∆ZT := ZT − ZT −1 = yT −1 (ȳ − yT ). Let
(Ω, P, F) be the complete probability space on which ∆ZT lives. Further, let FT be the σ−field
generated by (∆Z1 , ∆Z2 , , . . . , ∆ZT ), i.e., ∆ZT is adapted to FT . Then, ∆ZT is an FT -martingale
difference sequence:
E{∆ZT |FT −1 } =
=
1
E{yT −1 (ȳ − yT ) |FT −1 }
2
1
1
yT −1 (ȳ − E{yT |FT −1 }) = yT −1 (µ − µ) = 0.
2
2
(18)
However, ∆ZT have finite moments of order greater than 2:
E{|∆ZT |2+γ } ≤ E{|yT −1 |2+γ |(yT − ȳ)|2+γ }
= E{|yT −1 |2+γ }E{|(yT − ȳ)|2+γ }.
The first and second terms inside the above expectation operators are normal random variates.
Therefore, because the moments for yT −1 and (yT − ȳ) exist provided that V ar(∆zT ) is finite, then
the moments of all orders for yT −1 (yT − ȳ) exists as well.
32
Last, using Chebyshev’s inequality
P{ω : |T −1
T
X
E{|T −1
T
X
∆Zt2 − V ar(∆Zt )|}
t=1
∆Zt2 − V ar(∆Zt )| > } ≤
t=1
T
T
X
X
2
2
−1
−1
E{T
∆Zt − V ar(∆Zt )} E{T
∆Zt } − E{V ar(∆Zt )} t=1
t=1
≤
=
for all > 0. But, for large T ,
E{T −1
T
X
∆Zt2 } = E{V ar(∆Zt )} and we have
t=1
P{ω : |T −1
T
X
∆Zt2 − V ar(∆Zt )| > } = 0
(19)
t=1
for all > 0. Combining these results with Corollary 2.4 we have
√
T ∆Zt =
T
√ 1X
1
(d)
T
∆Zt = √ ZT −→ N (0, σ 2 /2).
T
T
t=1
Then, by Equation (18) we have
2
√
δT − s2 /2 (d)
T
−→ N (0, 1).
σ 2 /2
(20)
With s2 /2 in the denominator of Equation (20) instead of σ 2 /2,
2
2
√
√
δT − s2 /2
δT − s2 /2
σ2
T
= T
× 2 .
s2 /2
σ 2 /2
s
Conniffe and Spencer (2001) shows that
(21)
√ σ2
δT2 − s2 /2
is independent of s2 . In addition, T 2 =
2
σ /2
s
1 + oP (1).
These results sum to
2
2
√
√
δT − s2 /2
δT − s2 /2
T
= T
+ oP (1)
s2 /2
σ 2 /2
with a distribution that converges to N (0, 1) by Equation (20).
33
(22)
Proposition 4. Under H1 , assuming a single break, let t1 be the last point after which the process
yt assumes the new mean µ2 and let q = t1 /T be the fraction of the data with µ1 where q ∈
{ T1 , T2 , . . . T T−1 }. Then,





|µ1 − µ2 |2



+
√
2T
−
1
E {GY OW } = T
.
2 + q(1 − q) |µ − µ |2


σ


1
2


σ2
Proof. The proof is a special case of the next proposition.
Proposition 5. Under H1 , suppose there are z mean breaks, and let ti be the last points after which
the process yt assumes the new mean µi+1 such that i ∈ {0, 1, . . . , z}, t0 = 0 . Let qi = (ti −ti−1 )/T ,
i ∈ {1, 2, . . . , z} be the fractions of the data with constant means. Then,






z−1
P


2
−1
2




σ
+
(2T
)
|µ
−
µ
|
i+1
i


√
i=0
E {GY OW } = T
−
1
.
z
z X
z
X
X






 σ2 +

qi µ2i −
qi qj µ i µ j




i=1
i=1 j=i
Proof. The proof consists of three parts. Assuming that yt follows the dynamics defined in Equation (2), we first derive the expectation of the unconditional variance of yt , i.e., E(s2 ), and second,
2
2
E(δT2 ). Finally, we use Williams (1941)’s result
any moment of the ratio of (δT ) to s is the
2that
δ
ratio of the moments of δT2 to s2 to obtain E T2 .
s
(a) We have
n
o
(23)
s2 = E (yt − Eyt )2 = E yt2 − (E (yt ))2
by definition, the unconditional expectations of yt and yt2 are sums of conditional expectations
weighted with probabilities:
E (yt ) =
z
X
z
X
P{t ∈ (ti−1 , ti ]} × E yt t ∈ (ti−1 , ti ] =
qi µi
i=1
i=1
and
z
z
z
X
X
X
E yt2 =
P{t ∈ (ti−1 , ti ]} × E yt2 t ∈ (ti−1 , ti ] =
qi (µ2i + σ 2 ) = σ 2 +
qi µ2i .
i=1
i=1
34
i=1
Then, substituting E (yt ) and E{yt2 } into Equation (23):
2
2
s =σ +
z
X
qi µ2i
−
i=1
z
X
!2
qi µ i
= σ2 +
i=1
z
X
qi µ2i −
z X
z
X
i=1
qi qj µi µj .
i=1 j=1
(b) In this part we will obtain EδT2 under H1 .


T


X
EδT2 = E (4T )−1
(yt − yj−1 )2


j=1
( z−1 ti+1
)
z−1
X X
X
2
2
−1
= (4T ) E
(yt − yj−1 ) +
(yti +1 − yti )
i=0 t=ti +1
i=0
Because the expectation is a linear operator we can distribute the expectations inside the sums and
n
o
n
o
we have that E (yt − yt−1 )2 = E (εt − εt−1 )2 = 2σ 2 whenever t ∈
/ {t0 + 1, t1 + 1, . . . , tz−1 + 1}.
However, when t ∈ {t0 + 1, t1 + 1, . . . , tz−1 + 1}
n
o
n
o
E (yti +1 − yti )2 = E ( |µi+1 − µi | + εj − εj−1 )2
n
o
= |µi+1 − µi |2 + E (εj − εj−1 )2
(using Eεj = 0)
= |µi+1 − µi |2 + 2σ 2 .
Then,
EδT2 = (4T )−1
2σ 2 (T − z) + 2
z−1
X
!
|µi+1 − µi |2
i=0
σ 2 (T − z)
+
=
2T
z−1
P
|µi+1 − µi |2
i=0
2T
σ2
=
+
2
z−1
P
|µi+1 − µi |2 − zσ 2
i=0
2T
σ2
∼
,
=
2
for large T .
(c) In the final part of the proof, we combine the results from (a) and (b) using a particular
conclusion that is a result of Williams (1941) such that
2 E δT2
δT
E 2
=
.
s /2
E {s2 /2}
35
(24)
Then,
2
δT
E 2 −1 =
s /2
σ2
−1
z
z P
z
P
P
2
2
σ +
qi µ i −
qi qj µ i µ j
i=1
i=1 j=1
as T → ∞.
36