* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Detection of change in the spatiotemporal mean function
Survey
Document related concepts
Transcript
J. R. Statist. Soc. B (2016) Detection of change in the spatiotemporal mean function Oleksandr Gromenko, Tulane University, New Orleans, USA Piotr Kokoszka Colorado State University, Fort Collins, USA and Matthew Reimherr Pennsylvania State University, University Park, USA [Received December 2013. Final revision October 2015] Summary. The paper develops inferential methodology for detecting a change in the annual pattern of an environmental variable measured at fixed locations in a spatial region. Using a framework built on functional data analysis, we model observations as a collection of functionvalued time sequences available at many sites. Each sequence is modelled as an annual mean function, which may change, plus a sequence of error functions, which are spatially correlated. The tests statistics extend the cumulative sum paradigm to this more complex setting. Their asymptotic distributions are not parameter free because of the spatial dependence but can be effectively approximated by Monte Carlo simulations. The new methodology is applied to precipitation data. Its finite sample performance is assessed by a simulation study. Keywords: Change point; Functional data; Mean function; Spatiotemporal data 1. Introduction Motivated by the problem of detecting a change in the annual pattern of a climatic variable over a spatial region, we develop change point detection methodology that is applicable to spatially indexed panels of functions. Climatic variables can be measured by using a variety of techniques and sources, but motivating the present work are data collected at a number of terrestrial observatories. These observatories often have the capacity to generate data at a very high temporal frequency. It is this mechanism which functional data analysis attempts to exploit: the data can be viewed as curves or functions, not just vectors of unrelated co-ordinates. For example, at any specific location, the record of 365 daily precipitation amounts is naturally viewed as a noisy curve with some underlying annual pattern. Data at many locations obtained over several decades form a collection of spatially dependent sequences. Owing to the annual periodicity, these time series are usually not stationary on small timescales but can be naturally viewed as stationary time series of annual functions. The present work thus views years as the fundamental statistical unit. We denote a measurement made in year n, at spatial location s ∈ S, Address for correspondence: Matthew Reimherr, Department of Statistics, Pennsylvania State University, 411 Thomas Building, University Park, PA 16802, USA. E-mail: [email protected] © 2016 Royal Statistical Society 1369–7412/16/79000 2 O. Gromenko, P. Kokoszka and M. Reimherr and time (of year) t ∈ T , as Xn .s, t/. The spatiotemporal field may have complicated and highly non-stationary patterns, but, in indexing by n, we exploit the repetition of those patterns from year to year. The goal of the present work is to develop statistical methods for inferring whether the pattern of a spatiotemporal process has changed over a specific time period, which is several decades for the data that motivated this research, e.g. the industrial revolution, the introduction of hybrid cars or international agreements which place restrictions on greenhouse gases. Such a perspective motivates us to use a change point detection framework. An alternative and complementary approach is to consider trend detection and estimation. Change point analysis has been recognized as a useful paradigm, which continues to benefit from new approaches, e.g. Fryzlewicz (2014) and Cho and Fryzlewicz (2015). Developing change point tools for functional data has been undertaken only in the last few years; Horváth et al. (2010), Berkes et al. (2009), Hörmann and Kokoszka (2010), Zhang et al. (2011) and Aston and Kirch (2012a,b). Several chapters of Horváth and Kokoszka (2012) provide an account of this research. At present, we are unaware of any research which has explored the change point problem for spatially indexed functional data. Our methodology uses the proven cumulative sum approach as a starting point. Although many competing methods are now available for scalar data, the fundamental idea of comparing the value of a parameter (the mean function in our case) before and after all potential change points remains attractive. In the absence of any change point methodology for the data structure that we study, and because of the lack of a parametric model on which likelihood methods could be based, this seems to be a productive approach. The main challenge in this project turned out to be the estimation of the spatiotemporal structure of a functional random field in the presence of potential change points in the temporal mean structure. This problem can be approached from many directions; we present a solution which has worked best for the data that motivate this research. To estimate the spatiotemporal covariance structure, we exploit the replications (one per year) of the spatial error field. In this, our methodology differs from purely spatial analysis, where only a single replication is available. The derivation of the tests statistics requires computation of spatiotemporal moments. The practical implementation of the test relies on a new calibration approach which utilizes feasible approximations of these moments under the assumption of Gaussianity. To illustrate the new methodology, we examine daily precipitation patterns at 59 locations in 12 states in the midwestern USA, over the course of 60 years. Since we view each year as a statistical unit, we have a sample size of N ≈ 60, with 59 spatial locations, and 365 time points. The dimension of the problem is, therefore, substantially larger than the sample size. Our methodology exploits the underlying periodic structure of data to make the problem tractable. Such a data structure is common to many weather, environmental and ecological data sets, and it is hoped that our methodology will find applications in these fields. The remainder of the paper is organized as follows. Section 2 sets up the statistical framework by introducing the requisite concepts and assumptions. The test statistics and their asymptotic distributions are presented in Section 3. Section 4 contains the details of the implementation of the tests, which are further illustrated in Section 5 by application to midwest precipitation data. In Section 5, we also study the finite sample properties of the tests. Proofs of the results of Section 3, technical calculations and details of the estimation of the covariance structure are collected respectively in Appendices A, B and C. Our methods are implemented in an R package, scpt, which, along with example code, is available to download from http://scpt.r-forge.r-project.org, or through R by using the command install.packages("scpt",rep="http://R-Forge.R-project.org"). 16 O. Gromenko, P. Kokoszka and M. Reimherr (a) (b) (c) Fig. 8. Empirical power for the shift as a function of parameter a for the tests based on Λ1 ( ) and Λ2 ( ) for three levels of significance (the horizontal lines): (a) α D 0.01; (b) α D 0.05; (c) α D 0.01 size of the change and is comparable with that seen for separable covariances. However, for β = 1, the largest admissible value of the space–time interaction, the power becomes unacceptably small. A conclusion of this experiment is that our method is robust to mild violations of the separability assumption but may fail to detect a change point if the space–time interaction is very strong. Appendix A: Proofs of theorems 1 and 2 We begin with a simple lemma which specifies the covariances of the scores. 4 O. Gromenko, P. Kokoszka and M. Reimherr see Haas (1995), Genton (2007), Hoff (2011) and Paul and Peng (2011). Also, separability plays a key role in modelling very large data sets since it significantly reduces the computational time that is required for inverting space–time covariance matrices; Sun et al. (2012). In our setting, it also leads to asymptotic distributions which are functionals of standard Gaussian processes and so can be readily simulated. Our implementation of the test procedure allows the variances to change between locations but assumes that the correlation function is stationary and isotropic. We do not state here stationarity and isotropy as explicit assumptions, as our methods are designed for a variety of assumptions and techniques for estimating σ.·, ·/. This is discussed further in Appendix C. Since C.t, t / and σ.sk , sl / are determined up to only multiplicative constants, we impose the identifiability condition C.t, t/dt = 1: .2:2/ T .s/ ∈ L2 .T By assumption 1, Xn / for almost all s ∈ S, so without any loss of generality we assume that each function Xn .s/ is an element of L2 .T /. By the spectral theorem, the covariance function C admits the representation ∞ C.t, t / = λi vi .t/vi .t /, i=1 where λ1 λ2 : : : 0 are the eigenvalues of C and the vi are the functional principal components; see for example chapter 2 of Horváth and Kokoszka (2012). Each function Xn .s/ admits the corresponding Karhunen–Loève expansion ∞ Xn .s; t/ = μn .s; t/ + ξni .s/vi .t/, .2:3/ i=1 where ξni .s/ are the scores defined by ξni .s/ = {Xn .s; t/ − μn .s; t/}vi .t/dt: We make one last assumption for technical convenience. We discuss the estimation of the spatiotemporal covariance structure in Appendix C. However, large sample justification can be derived for any estimators which are consistent. Assumption 4 refers to the K × K spatial covariance matrix Σ = [σ.sk , sl /, 1 k, l K]: Assumption 4. The estimators Ĉ and Σ̂ are such that, as N → ∞, P {Ĉ.t, s/ − C.t, s/}2 dt ds → 0, P Σ̂ − Σ → 0: Furthermore, assume that the eigenvalues of C are distinct: λ1 > λ2 > λ3 >: : :. Assumption 4 is very weak as most estimators are root N consistent under mild conditions; see for example chapter 2 of Horváth and Kokoszka (2012). However, it allows us to establish the asymptotic results of our testing procedure for a variety of estimators. The distinctness of the eigenvalues ensures that the empirical eigenvalues and eigenfunctions of Ĉ are consistent, and it is a common assumption in functional data analysis. Detection of Change in the Spatiotemporal Mean Function 5 3. Tests statistics and their asymptotic distribution By assumptions 2 and 3, the vi and the λi do not depend on n or sk . Denote by v̂i and by λ̂i their estimates. To detect a change point, we propose the test statistics r 2 p K N N 1 r −1 Λ̂1 = 2 ŵ.k/ Xn .sk / − Xn .sk /, v̂i , .3:1/ λ̂i N n=1 N k=1 i=1 r=1 n=1 and p K N 1 Λ̂2 = 2 ŵ.k/ N k=1 i=1 r=1 r N r Xn .sk / − Xn .sk /, v̂i N n=1 n=1 2 : .3:2/ In the case of a single location (K = 1; ŵ.1/ = 1), statistic (3.1) reduces to the test statistic of Berkes et al. (2009). In the spatial setting, we introduce the random weights ŵ.k/ which reflect the intuition that spatially close records contribute some redundant information and so should be given smaller weights, whereas records at isolated locations contribute more information and should be given larger weights. We allow the weights to be random to reflect that they are usually estimated from the data. Statistics Λ̂1 and Λ̂2 are weighted sums of Cramér–von Mises type of functionals of the functional cumulative sum process. Kolmogorov–Smirnov type of statistics can be defined analogously, but it is well known that such tests generally have poor finite sample properties owing to the slow convergence of maximally selected statistics to the double-exponential distribution; see for example Horváth et al. (1999) and Bugni et al. (2009), among many other contributions. We therefore focus on the Cramér–von Mises type of statistics Λ̂1 and Λ̂2 . The selection of the weights is discussed at the end of this section. Observe that for a p which explains a large fraction of the variance 2 r 2 r p N N r r Xn .sk / − Xn .sk /, v̂i ≈ Xn .sk / − Xn .sk / , N n=1 N n=1 i=1 n=1 n=1 by Parceval’s identity. Using this fact we introduce one more test statistic based directly on the L2 -norm: 2 K N r N 1 r ∞ Λ̂2 = 2 ŵ.k/ Xn .sk / − .3:3/ Xn .sk / : N n=1 N k=1 r=1 n=1 The inner products in Λ̂1 are normalized with the estimated variances of the scores λ̂i , which leads to an asymptotic distribution that is free of the λi . Only the largest p eigenvalues λ̂i are used ∞ so as not to inflate the variability. The asymptotic distribution of Λ̂2 depends on the λi , but the statistic does not require the selection of p. In the spatial setting, both limit distributions depend on the spatial covariances; unlike the limit in Berkes et al. (2009), none of them is parameter free. The incorporation and estimation of the spatial covariance structure in the presence of a possible change point in the temporal functional structure has not been addressed in existing research. The asymptotic null distributions of the test statistics are presented in theorem 1. Theorem 1. Suppose that assumptions 1–4 hold. Furthermore, assume that the weights ŵ.k/ are chosen such that ŵ.k/ →P w.k/ where w.k/ are deterministic real-valued weights. Then, under hypothesis H0 , p 1 K D 2 Λ̂1 → Λ1 = w.k/ Bik .x/dx, .3:4/ k=1 i=1 0 6 O. Gromenko, P. Kokoszka and M. Reimherr D Λ̂2 → Λ2 = K w.k/ k=1 ∞ D Λ̂2 → Λ∞ 2 = K p λi i=1 w.k/ k=1 ∞ 1 0 λi i=1 0 2 Bik .x/dx, 1 2 Bik .x/dx, .3:5/ .3:6/ where Bik are Brownian bridges, independent across i. For each i, the vector of Brownian bridges Bi = .Bi1 , : : : , BiK /T has covariance matrix Σ, i.e. Σ−1=2 Bi is a vector of independent standard Brownian bridges. Theorem 2 describes the behaviour of the test statistics under the alternative of a single change point. Extensions to multiple change points or epidemic alternatives are possible but are more technical and are not pursued here. In the context of a single functional time series, the issues that are related to the behaviour under HA of similar test statistics have been well explained in Aston and Kirch (2012a,b). Theorem 2. Suppose that assumptions 1–4 hold. Furthermore, assume that the weights ŵ.k/ are chosen such that ŵ.k/ →P w.k/ where w.k/ are deterministic real-valued weights. Abusing the notation slightly, let μnÆ = μ1 , μnÆ +1 = μ2 and Δ = μ1 − μ2 = 0. Under HA , if nÅ =N → θ ∈ .0, 1/ and Δ.sk /, vi = 0 for some 1 i p and 1 k K, then Λ̂1 = N K w.k/ k=1 p i=1 2 λ−1 i Δ.sk /, vi .1 − θ/2 θ2 P + oP .N/ → ∞ 3 and Λ̂2 = N K k=1 w.k/ p Δ.sk /, vi 2 i=1 .1 − θ/2 θ2 P + oP .N/ → ∞, 3 ∞ with an analogous statement for Λ̂2 . The proofs of theorems 1 and 2 are provided in Appendix A. If a change point has been detected, its location can be estimated by r 2 N K r w.k/ Xn .sk / − r̂ = arg max .3:7/ Xn .sk / : N n=1 1<r<N k=1 n=1 We have thus far assumed arbitrary weights ŵ.k/. We shall now explore the choices that are in some sense optimal. In spatial statistics, the weights are typically selected to minimize the expected squared distance to an unknown object of interest, be it the value at an unknown location or a parameter of the model. In a testing setting, the most natural approach is to minimize the variance of the test statistic under hypothesis H0 . The asymptotic variances of the limits in theorem 1 can be shown to be proportional to wT Σ2 w = K w.k/w.l/σ 2 .sk , sl /, k,l=1 where Σ2 = [σ 2 .sk , sl /, 1 k, l N]: .3:8/ (Note that the entries of Σ are σ.sk , sl / and those of Σ2 are their squares.) Using the method of Detection of Change in the Spatiotemporal Mean Function 7 Lagrange multipliers, it is not difficult to verify that the weights w minimizing wT Σ2 w subject to wT 1 = ΣK k=1 w.k/ = 1 are given by w= .Σ2 /−1 1 : T 1 .Σ2 /−1 1 .3:9/ The estimation of the covariances σ.sk , sl / is addressed in Appendix C. 4. Description of the test procedures In this section, we provide algorithmic descriptions of the tests based on the asymptotic results of Section 3. An R implementation is available in the package scpt. To implement the tests, we must estimate the spatial covariance matrix Σ and the temporal eigenelements λi and vi . The latter can be calculated once an estimate Ĉ of the temporal covariance kernel C is available. There are several ways to compute Σ̂ and Ĉ. The key difficulty is to obtain estimates which are valid, at least approximately, under both H0 and HA . We explored several approaches and used the method that is described in Appendix C for the final analysis. It uses B-splines for estimating σ.·, ·/, where we assume that, whereas the variance from location to location might be different, the correlation function is stationary and isotropic. This allows us to pool across locations to obtain a very good estimator. We also performed many simulations aimed at comparing the performance of the tests based ∞ ∞ on statistics Λ̂1 , Λ̂2 and Λ̂2 . The test based on Λ̂2 (which does not require the selection of the optimal number p of functional principal components) tends to overreject. If it does not detect a change point, we can be fairly sure that there is none. If it does detect a change point, a more careful analysis based on the test statistics Λ̂1 or Λ̂2 which require a careful selection of ∞ p (calibration) is recommended. We thus first describe the test based on Λ̂2 , which does not require calibration, and then proceed with the description of the calibration method. ∞ Algorithm 1 (test based on the statistic Λ̂2 ). Step 1: calculate the differenced series (C.1). Step 2: estimate the spatial covariances by using formulae (C.7)–(C.9) followed by parametric or non-parametric smoothing. Step 3: calculate the weights by using expressions (C.6) and (C.11). Step 4: rescale (if necessary) T to the unit interval [0, 1] and estimate the temporal covariance function by using expression (C.10). Step 5: calculate the eigenfunctions and the eigenvalues of the covariance function (C.10). Step 6: using the estimates that were obtained in the previous steps, calculate the Monte Carlo distribution of 1 K T N 2 Λ2 = ŵ.k/ λ̂i Bik .x/ dx: k=1 i=1 0 1=2 0 , B0 , : : : , B0 /T , and the B0 are independent standard For each i, Bi = Σ̂ Bi0 , where Bi0 = .Bi1 iK ik i2 0 Brownian bridges. The vectors Bi are independent. ∞ Step 7: calculate the statistic Λ̂2 given by equation (3.3). Find the P-value by using the Monte Carlo distribution that was found in the previous step. We now proceed with the description of the tests which adaptively choose the value of p leading to the correct empirical size. They require that the scores are approximately multivariate normal. Justification of expression (4.1) is based on lemma 1 in Appendix A. J. R. Statist. Soc. B (2016) Detection of change in the spatiotemporal mean function Oleksandr Gromenko, Tulane University, New Orleans, USA Piotr Kokoszka Colorado State University, Fort Collins, USA and Matthew Reimherr Pennsylvania State University, University Park, USA [Received December 2013. Final revision October 2015] Summary. The paper develops inferential methodology for detecting a change in the annual pattern of an environmental variable measured at fixed locations in a spatial region. Using a framework built on functional data analysis, we model observations as a collection of functionvalued time sequences available at many sites. Each sequence is modelled as an annual mean function, which may change, plus a sequence of error functions, which are spatially correlated. The tests statistics extend the cumulative sum paradigm to this more complex setting. Their asymptotic distributions are not parameter free because of the spatial dependence but can be effectively approximated by Monte Carlo simulations. The new methodology is applied to precipitation data. Its finite sample performance is assessed by a simulation study. Keywords: Change point; Functional data; Mean function; Spatiotemporal data 1. Introduction Motivated by the problem of detecting a change in the annual pattern of a climatic variable over a spatial region, we develop change point detection methodology that is applicable to spatially indexed panels of functions. Climatic variables can be measured by using a variety of techniques and sources, but motivating the present work are data collected at a number of terrestrial observatories. These observatories often have the capacity to generate data at a very high temporal frequency. It is this mechanism which functional data analysis attempts to exploit: the data can be viewed as curves or functions, not just vectors of unrelated co-ordinates. For example, at any specific location, the record of 365 daily precipitation amounts is naturally viewed as a noisy curve with some underlying annual pattern. Data at many locations obtained over several decades form a collection of spatially dependent sequences. Owing to the annual periodicity, these time series are usually not stationary on small timescales but can be naturally viewed as stationary time series of annual functions. The present work thus views years as the fundamental statistical unit. We denote a measurement made in year n, at spatial location s ∈ S, Address for correspondence: Matthew Reimherr, Department of Statistics, Pennsylvania State University, 411 Thomas Building, University Park, PA 16802, USA. E-mail: [email protected] © 2016 Royal Statistical Society 1369–7412/16/79000 Detection of Change in the Spatiotemporal Mean Function Fig. 1. 9 Locations of the 59 stations selected observations. To remove the effects due to the heavy tail distribution, we apply the transformation Xn .s; t/ = log10 {Yn .s; t/ + 1}, where Yn .s; t/ are original records in tenths of a millimetre. After the transformation, we presmooth data by using the cubic splines function smooth.spline in R; Fig. 2. For our precipitation data set and simulated data, the conclusions of the test were not affected by this presmoothing step; they were the same for several degrees of smoothing which visually preserved the general shape of the curves. The application of the test of Gabrys and Kokoszka (2007) to residuals obtained by subtracting the sample mean functions before and after the estimated change point shows that the curves Xn can be treated as independent. Time series plots of the scores of the residuals confirm that the assumption of constant variance is reasonable. Detection of a change point in the amount of precipitation has been recognized as an important problem in climatology and environmental science. Gallagher et al. (2012) have reviewed Fig. 2. Transformed raw precipitation record for a single year n and a single location s ( ) and smoothed precipitation Xn .sI t/ ( ) 10 O. Gromenko, P. Kokoszka and M. Reimherr recent research and pointed out that the usual approach is to work with aggregated annual precipitation, which results in one scalar data point per year. In our approach, we work with one function per year. We are concerned not only with a change in the total annual precipitation, but also with a potential change in the timing of the precipitation, reflected in a change of the mean precipitation pattern. Moreover, we are concerned with a change taking place over a region, not just at a single location. Such a perspective can be relevant as shifts in a precipitation pattern over an agriculturally important region have profound economic consequences. We note that the annual curves X.sk / that we consider are different from the Canadian precipitation curves that were extensively used in Ramsay and Silverman (2005); we work with a time series of curves at each location, whereas Ramsay and Silverman (2005) used one curve at every location, the average over several decades. By construction, our approach has the maximum change point detection resolution of 1 year; Gallagher et al. (2012) proposed a method for daily data at a single location. The results of the application of the procedures described in Section 4 are presented in Tables 1, 2 and 3. All tests lead to the same conclusion: one change point in the second half of the 1960s, though the test based on Λ̂1 is not quite significant at a 5% level. The pattern of the change is shown in Fig. 3. The biggest changes are in the area around Michigan Lake and south-west of the lake. To visualize the spatial distribution of change displayed in Fig. 3, we used the spatial field φ̂.s/ = μˆ1 .s/ − μ̂2 .s/, Table 1. Iteration 1 2 3 Table 2. Iteration 1 2 3 Table 3. Iteration 1 2 3 .5:1/ Results of the test described in algorithm 1 ∞ Segment (years) Decision Λ̂2 1941–2000 1941–1965 1966–2000 Reject Accept Accept 0:011719 0:010337 0:009997 P-value Estimated change point year 0:0013 0:0987 0:1354 1966 — — Results of the test described in algorithm 2, based on statistics Λ̂1 Segment (years) Decision Cumulative variance (%) p P-value Estimated change point year 1941–2000 1941–1967 1968–2000 Accept Accept Accept 77.10 76.75 77.88 23 20 22 0.0606 0.8931 0.8747 1968 — — Results of the test described in algorithm 2, based on statistics Λ̂2 Segment (years) Decision Cumulative variance (%) p P-value Estimated change point year 1941–2000 1941–1967 1968–2000 Reject Accept Accept 85.01 86.00 85.44 28 28 27 0.0189 0.8964 0.8350 1968 — — Detection of Change in the Spatiotemporal Mean Function 11 Fig. 3. Spatial field showing the L2 -distance between the mean log-precipitation before and after 1966: there is an increase in precipitation throughout the year in the area around location 4 and a decrease in the first half of the year in the area around location 1; locations close to 2 and 3 do not show a large change nor a consistent pattern where μ̂1 .s; t/ = r̂ −1 rˆ Xn .s; t/, n=1 μ̂2 .s; t/ = .N − r̂/−1 N n=r+1 ˆ Xn .s; t/: We performed ordinary kriging with the exponential covariance model to obtain the heat map that is shown in Fig. 3. The application of our tests confirms that the heat map shows a statistically significant change over a region, not a variation in the magnitude of change which may be due to chance. Fig. 4 illustrates the fit of the model and the typical mean precipitation curves before and after the change point. Our tests show that the spatially indexed vectors of the mean functions before and after around 1966 are different. When the test of Berkes et al. (2009) is applied to records at individual locations, no change points are detected. If the curves are smoothed by using a penalty, change points at two locations are detected but are not significant after a multiple-testing correction. Our test combines many weak individual signals to detect a change over a region with enhanced power. We now discuss some aspects of the implementation of the tests. The distribution of the scores is normal to a reasonable approximation, which is a point that is exploited in the simulation study that is described in what follows. To take into account the curvature of the Earth we use chordal distance which is the three-dimensional Euclidean distance, so any covariance function that is valid in the three-dimensional Euclidean space remains valid in our application. Fig. 5 shows the behaviour of the statistic T2 .r/ that is defined by r 2 K N r ŵ.k/ Xn .sk / − Xn .sk / .5:2/ T2 .r/ = , N n=1 k=1 n=1 which is used to identify the change point via expression (3.7). 1.0 0.0 0.5 Xn (s;t) 1.0 0.0 0.5 Xn (s;t) 1.5 O. Gromenko, P. Kokoszka and M. Reimherr 1.5 12 0 100 200 300 0 100 (a) 200 300 (b) 0.010 0.000 0.005 T2(r) 0.015 0.020 Fig. 4. Examples of functional observations and mean curves at a single location in southern Illinois ( , functional observations Xn .tI s/; , curves modelled by using the mean and the functional principal components; these curves practically overlap; p D 60) ( , estimated sampled mean functions): (a) observation in 1942; (b) observation in 1999 1940 Fig. 5. 1966) Function T2 .r/ ( 1950 1960 1970 1980 1990 ), which is used to compute the test statistic Λ̂1 2 2000 ( , estimated change point, We conclude this section by reporting the results of a simulation study that was motivated by our data. We prefer to use a data-generating process which resembles real data rather than some generic Gaussian models because we would also like to validate our conclusions by a relevant simulation study. To resemble the original precipitation data, we generated synthetic data by using the model Xn .sk ; t/ = T √ λ̂i ξni .sk / v̂i .t/ T = 365, 1 n 60, 1 k 59, i=1 where ξni ∼ NK .0, Σ̂/, K = 59, are independent vectors in an array indexed by .n, i/. All quantities with circumflexes are estimates obtained for the original precipitation data. The locations are the same as for the precipitation records. We generated 104 replications, and for each repli- Detection of Change in the Spatiotemporal Mean Function 13 cation we applied the estimation procedures that were described in Section 4. The empirical sizes for the test that is described in algorithm 1 are 1.5–2 times larger than the nominal size, depending on the length of the series N: for N = 60, this factor is about 1.5; for N = 30 it is about 2. For larger N, it becomes close to 1, but real precipitation records are not much longer that N = 60 years. For the methods that were described in algorithm 2, the empirical sizes are practically equal to the nominal sizes, essentially by construction. Calibration of the test based on the test statistic Λ1 leads to the value of p which corresponds to about 77% of the cumulative variance (CV). The CV methodology is standard in functional data analysis research; see for example Horváth and Kokoszka (2012), page 41. The test based on the test statistic Λ2 requires 85% of the CV. This number is usually recommended in the functional data analysis literature. We emphasize that these percentages are obtained for our specific data set and may be different for other data sets. The empirical sizes as a function of CV are presented in Figs 6 and 7. The 95% pointwise confidence intervals are calculated by using the normal approximation to the binomial proportion. These graphs show that the test based on Λ2 is more robust to the selection of CV, and the usual 85% rule works remarkably well for it. To determine the empirical power, we generate data by using the model p √ ⎧ ⎪ n < nÅ , λ̂i ξni .sk / v̂i .t/ ⎨ .5:3/ Xn .sk ; t/ = i=1 p ⎪ ⎩ η.sk ; t/ + √λ̂i ξni .sk / v̂i .t/ n nÅ : i=1 We report simulation results for a uniform shift η.s; t/ = a. We also tried several shapes and obtained very similar results. The empirical power, as a function of the parameter a, is shown in Fig. 8 for the tests that are based on calibrated statistics Λ1 and Λ2 . The test based on algorithm 1 has higher power because the empirical size is inflated. Comparing the calibrated tests based on Λ1 and Λ2 , we see that the test based on Λ2 has uniformly higher power. Combined with its robustness to the selection of p, we conclude that the test based on Λ2 has better finite sample properties than the test based on Λ1 , at least for the data that we considered. Regarding the estimation of the change point, if it exists, when the change point is close to N=2, its position is detected very precisely, exactly for most replications. However, when nÅ is near the edges, the estimated position is usually shifted towards N=2. This phenomenon is well known in change point literature. In our case, this bias is about 2–3 years if the true change point is in the lower or upper quartile of the time interval. Our detected change point is close to the middle of the sample, so this shift does not affect the conclusion of a change in the second half of the 1960s. Our final comment relates to the robustness of the tests to the assumption of the separability of the spatiotemporal covariance function. We conducted a small simulation study in which the errors "n .sk ; t/ were generated to follow the non-separable covariance of Gneiting (2002): σ2 ch2γ C.h; t|β/ = .h; t/ ∈ R2 × R, exp − .a|t|2α + 1/τ .a|t|2α + 1/βγ with σ 2 = 1, a = 1, c = 20, α = 0:7, γ = 0:7 and τ = 2:5: The parameter β ∈ [0, 1] controls the strength of the space–time interaction, with β = 0 corresponding to separable covariances. The other aspects of the spatiotemporal setting imitated the real precipitation data. The calibrated methods based on algorithm 2 suffer from a small size distortion of about 1 percentage point for β = 0:5 and β = 1. For β = 0:5, the power increases monotonically with the Fig. 6. (a) (b) (c) Estimated empirical size as a function of the captured CV for the test based on Λ1 : (a) α D 0.01; (b) α D 0.05; (c) α D 0.1 14 O. Gromenko, P. Kokoszka and M. Reimherr Fig. 7. (a) (b) (c) Estimated empirical size as a function of the captured CV for the test based on Λ2 : (a) α D 0.01; (b) α D 0.05; (c) α D 0.1 Detection of Change in the Spatiotemporal Mean Function 15 16 O. Gromenko, P. Kokoszka and M. Reimherr (a) (b) (c) Fig. 8. Empirical power for the shift as a function of parameter a for the tests based on Λ1 ( ) and Λ2 ( ) for three levels of significance (the horizontal lines): (a) α D 0.01; (b) α D 0.05; (c) α D 0.01 size of the change and is comparable with that seen for separable covariances. However, for β = 1, the largest admissible value of the space–time interaction, the power becomes unacceptably small. A conclusion of this experiment is that our method is robust to mild violations of the separability assumption but may fail to detect a change point if the space–time interaction is very strong. Appendix A: Proofs of theorems 1 and 2 We begin with a simple lemma which specifies the covariances of the scores. Detection of Change in the Spatiotemporal Mean Function 17 Lemma 1. Under assumptions 1 and 3, E[ξni .sk /ξmj .sl /] = δnm δij λi σ.sk , sl /: Proof. The covariances are given by E[ξni .sk / ξmj .sl /] = E {Xn .sk ; t/ − μn .sk ; t/}vi .t/dt {Xm .sl ; t / − μm .sl ; t /}vj .t /dt = cov{Xn .sk ; t/, Xm .sl ; t /}vi .t/vj .t /dt dt = δnm σ.sk , sl / C.t, t /vi .t/vj .t /dt dt = δnm δij λi σ.sk , sl /: A.1. Proof of theorem 1 Asymptotic properties for Λ̂1 and Λ̂2 are proved in much the same fashion as in Berkes et al. (2009), and ∞ so we shall only outline the major details here. The properties for Λ̂2 do not follow from previous results and thus a complete proof is presented. The inclusion of ∞ random weights ŵ.k/ is handled the same in all three cases and thus we shall provide the details for the Λ̂2 -setting only. We begin with a reminder of the functional central limit theorem for our setting. Theorem 3. Suppose that assumptions 1–3 hold. Then we have the functional limit theorem (with respect to the Skorohod space DpK [0, 1]) ⎞ ⎛ Xn .s1 / − μn .s1 /, v1 : : : Xn .s1 / − μn .s1 /, vp [Nx] ⎟D ⎜ :: :: :: N −1=2 vec⎝ ⎠ → W.x/, : : : n=1 Xn .sK / − μn .sK /, v1 : : : Xn .sK / − μn .sK /, vp where W.x/ is a pK-dimensional vector of Brownian motions with covariance matrix diag.λ1 , : : : , λp / ⊗ Σ, where Σ is the spatial covariance matrix with elements σ.sk , sl /. The claim follows from the functional central limit theorem for random vectors. We simply need to check that the covariance matrix is correct. By assumption 4, the eigenelements can be consistently estimated and thus theorem 3 holds with the estimated eigenfunctions as well. Now the asymptotic results for Λ̂1 and Λ̂2 follow from the continuous mapping theorem. ∞ We now need only to establish the desired results for Λ̂2 . By Parseval’s identity 2 K N r N 1 r ∞ ŵ.k/ Xn .sk / Λ̂2 = 2 Xn .sk / − N N k=1 r=1 n=1 n=1 r 2 N N ∞ K 1 r = 2 ŵ.k/ Xn .sk / − Xn .sk /, vi : N k=1 N n=1 r=1 i=1 n=1 Now define Λ̃2 = p N K 1 ŵ.k/ 2 N k=1 r=1 i=1 r Xn .sk / − n=1 N r Xn .sk /, vi N n=1 2 , p which is nearly the same as Λ̂2 , but with the estimated eigenfunctions replaced by their population counterparts. Combining theorem 3, Slutsky’s lemma and the continuous mapping theorem we have Λ̃2 →D Λ2 ∞ D ∞ and, as p → ∞, Λ2 → Λ2 : We shall then have that Λ̂2 →D Λ∞ 2 if we can establish that, for any > 0, ∞ lim lim sup P.|Λ̂2 − Λ̃2 | / = 0; p→∞ .A:1/ N→∞ see Billingsley (1999) for details. Note that ∞ |Λ̂2 − Λ̃2 | K k=1 |ŵ.k/| N ∞ 1 N 2 r=1 i=p+1 r n=1 Xn .sk / − N r Xn .sk /, vi N n=1 2 : 18 O. Gromenko, P. Kokoszka and M. Reimherr We then have that, for each k, r 2 N N ∞ r 1 E Xn .sk / − Xn .sk /, vi N 2 r=1 i=p+1 N n=1 n=1 N ∞ 1 r = 2 λi σ.sk , sk /r 1 − N r=1 i=p+1 N ∞ N r 1 = σ.sk , sk / λi →0 r 1 − N 2 r=1 N i=p+1 as p → ∞; furthermore, this convergence happens uniformly in N. Since ŵ.sk / →P w.sk / and K is finite, it follows that sup 1nN;1kK ŵ.k/ = OP .1/: and we can therefore conclude that ∞ P.|Λ̂2 − Λ̃2 | / → 0, as p → ∞, and, since this convergence occurs uniformly in N, we have that theorem 1 holds. A.2. Proof of theorem 2. We can express r Xn − n=1 N r N r N r r r Xn = "n − "n + μn − μn : N n=1 N n=1 N n=1 n=1 n=1 An application of Markov’s inequality gives r n=1 "n − N r "n = OP .N 1=2 /: N n=1 Examining the second term we have ⎧ r ⎨ rμ1 − {nÅ μ1 + .N − nÅ /μ2 }, r N r N μn − μn = ⎩ nÅ μ1 + .r − nÅ /μ2 − r {nÅ μ1 + .N − nÅ /μ2 }, N n=1 n=1 N Then we have, for 1 r nÅ , r N − nÅ .N − nÅ / rμ1 − rμ2 rμ1 − {nÅ μ1 + .N − nÅ /μ2 } = N N N N − nÅ rΔ, = N and, for nÅ < r N, 1 r nÅ , nÅ < r N: r N −r Å .r − nÅ /N − r.N − nÅ / n μ1 + μ2 nÅ μ1 + .r − nÅ /μ2 − {nÅ μ1 + .N − nÅ /μ2 }nÅ = N N N N −r Å N −r Å = n μ1 − n μ2 N N N −r Å n Δ: = N If r = [Nx] then, as N → ∞, r = Nx + O.1/. Combined with the assumption that nÅ =N = θ + o.1/, we have N −1 .N − nÅ /rΔ = N{.1 − θ/xΔ + o.1/}, and N −1 .N − r/nÅ Δ = N{θ.1 − x/Δ + o.1/}: Therefore, under the alternative, θ 1 p K −1 λ̂i Δ.sk /, vi 2 .1 − θ/2 w.k/ x2 dx + θ2 .1 − x/2 dx + oP .N/ Λ̂1 = N k=1 i=1 0 θ Detection of Change in the Spatiotemporal Mean Function .1 − θ/2 θ3 θ2 .1 − θ/3 −1 2 λ̂i Δ.sk /, vi + + oP .N/ w.k/ =N 3 3 k=1 i=1 p K .1 − θ/2 θ2 −1 λ̂i + oP .N/: w.k/ Δ.sk /, vi 2 =N 3 k=1 i=1 K 19 p By Slutsky’s lemma we can conclude that Λ̂1 = N K w.k/ k=1 2 2 2 .1 − θ/ θ + oP .N/: λ−1 /, v Δ.s k i i 3 i=1 p Similar arguments yield Λ̂2 = N K w.k/Δ.sk /2 k=1 .1 − θ/2 θ2 + oP .N/: 3 Appendix B: Integrated covariances We derive approximations to integrated covariances that are used in Appendix C. We assume that the X.sk / are Gaussian elements of L2 .T /. We shall refer to the following two formulae: 2 2 cov{Ĉk .t, t /, Ĉl .t, t /} dt dt = σ 2 .sk , sl / λi ; .B:1/ N −1 .Z/ .Z/ cov{Ĉk .t, t /, Ĉl .t, t /} dt dt = σ 2 .sk , sl / 2 − 3.N − 2/ 2 λi : .N − 1/2 .B:2/ We shall verify relationships (B.1) and (B.2). Let X.sk ; t/ = .X1 .sk ; t/, : : : , XN .sk ; t//T and A1 = 1 − I=N, ⎛ 1 −1 0 2 −1 ⎜−1 ⎜ 0 −1 2 A2 = ⎜ ⎜ : :: :: ⎝ :: : : 0 0 0 ··· ··· ··· :: : ··· ⎞ 0 0⎟ 0⎟ ⎟, :: ⎟ :⎠ 1 where 1 = diag.1, : : : , 1/ and I is a matrix with all entries equal to 1s. Now, the classical estimator of the temporal structure at location sk is Ĉk .t, t / = 1 X.sk ; t/T A1 X.sk ; t /: N −1 Then cov{Ĉk .t, t /, Ĉl .t, t /} can be expressed as a covariance of bilinear forms: 1 cov{X.sk ; t/T A1 X.sk ; t /, X.sl ; t/T A1 X.sl ; t /}: cov{Ĉk .t, t /, Ĉl .t, t /} = .N − 1/2 Next, applying standard results (see for example chapter 2, equation (58), in Searle (1971)), we obtain 2 σ 2 .sk , sl / C2 .t, t / tr.A1 A1 / .N − 1/2 2 = σ 2 .sk , sl / C2 .t, t /: N −1 cov{Ĉk .t, t /, Ĉl .t, t /} = After integration we obtain the final formula. Similarly, 1 .Z/ X.sk ; t/T A2 X.sk ; t /, Ĉk .t, t / = 2.N − 1/ and thus 20 O. Gromenko, P. Kokoszka and M. Reimherr .Z/ 1 σ 2 .sk , sl / C2 .t, t / tr.A2 A2 / 2.N − 1/2 2 − 3.N − 1/ 2 σ .sk , sl / C2 .t, t /: = .N − 1/2 .Z/ cov{Ĉk .t, t /, Ĉl .t, t /} = After integration we obtain the final formula. Appendix C: Estimation of the covariance structure There are several potentially useful methods of estimating the matrix Σ and the function C.·, ·/. We experimented with several of them and describe the one that worked best for the data that we considered: annual temperature and log-precipitation profiles over a region with an area of about 20% of the area of the USA. To motivate the estimator of Σ that we propose, observe that, by assumption 3 and by equation (2.2), σ.sk , sl / = σ.sk , sl / C.t, t/dt = C.t, t/σ.sk , sl / dt = E "n .sk ; t/ "n .sl ; t/ dt : By assumption 2, the integrals "n .sk ; t/ "n .sl ; t/, 1 n N, are independently and identically distributed replications, so a method-of-moments estimator of σ̂.sk , sl / would be N −1 ΣN "ˆn .sk ; t/ "ˆn .sl ; t/dt, where n=1 the functions "n .sk / are suitably defined residuals. These residuals must be computed by removing change points whose number or even existence is unknown. We experimented with several schemes but did not obtain satisfactory results. We therefore turned to estimators based on differencing. The justification of the consistency of the estimators that is proposed below relies on the assumption that there are only a few change points. In the derivations, we thus ignore the effect of a few possible change points. Simulations have confirmed the validity of this approach. Consider the differenced series Zn .sk ; t/ = Xn+1 .sk ; t/ − Xn .sk ; t/ 1 n N − 1: .C:1/ By expression (2.1) (if n is not a change point), Zn .sk ; t/ = "n+1 .sk ; t/ − "n .sk ; t/: .C:2/ This implies that the Zn are not independent; they form a 1-dependent, mean 0, stationary (in n) sequence. Using equation (C.2) and assumption 3, we see that E[Zn .sk ; t/ Zn .sl , t/] = 2 C.t, t/ σ.sk , sl /: Since C.t, t/ dt = 1, a preliminary estimator of σ.sk , sl / is N−1 1 Zn .sk ; t/Zn .sl , t/dt: .C:3/ 2.N − 1/ n=1 Estimator (C.3) is noisy and not necessarily positive definite. Following the usual practice, a parametric covariance model can be used to obtain the final estimate σ̂ .Z/ .sk , sl / = Γ̂.sk − sl /, where Γ is a parametric spatial covariance function. Alternatively, Γ can be estimated non-parametrically as in Gromenko and Kokoszka (2013), who extended the ideas of Hall et al. (1994) and Hall and Patil (1994) to the spatial setting. We followed this non-parametric approach. C.1. Temporal covariance For a fixed k, .Z/ Ĉk .t, t / := N−1 1 Zn .sk ; t/Zn .sk , t / → C.t, t /σ.sk , sk / 2.N − 1/ n=1 almost surely, .C:4/ .Z/ so an estimator of C.t, t / can be obtained by calculating a weighted average of the Ĉk : Ĉ.t, t / = K k=1 .Z/ w̃.k/ Ĉk .t, t / K k=1 w̃.k/ = 1: .C:5/ Detection of Change in the Spatiotemporal Mean Function 21 A natural way to find the weights is by minimizing 2 K .Z/ E C.t, t / − w̃.k/ Ĉk .t, t / dt dt k=1 subject to ΣK k=1 w̃.k/ = 1. Using the method of Lagrange multipliers, this is equivalent to solving the system of K + 1 linear equations K K .Z/ .Z/ w̃.k/ cov{Ĉk .t, t /, Ĉl .t, t /} dt dt − r = 0 l = 1, : : : , K, w̃.k/ = 1: k=1 k=1 Setting .Z/ .Z/ cov{Ĉk .t, t /, Ĉl .t, t /} dt dt , Ckl = C = .Ckl , 1 k, l K/, the solution w̃ = .w̃1 , : : : , w̃.K//T is given by w̃ = aC−1 1, where the constant a is chosen so that a1T C−1 1 = 1. The main difficulty lies in finding the integrated covariances Ckl . A usable formula can be derived assuming that the functions X.sk / are Gaussian. Formula (B.2) shows that Ckl = c σ 2 .sk , sl /, where c is a constant. The weights that we use are thus given by −1 −1 ŵ = .1T Σ̂2 1/−1 Σ̂2 1, .C:6/ where Σ̂2 = .σ̂ .sk , sl /, 1 k, l K/, and the σ̂.sk , sl / are the final estimates of the spatial covariances. Note that, under our simplifying assumptions, the weights ŵ are estimates of the weights w that are given by expression (3.9). Even though both sets of weights arise from different considerations, in the final implementation it is enough to use a single set of spatial weights. 2 C.2. Spatially dependent variances We have so far assumed stationarity and isotropy of the spatial covariance, i.e. we postulated that σ.sk , sl / = Γ.sk − sl / depends only on the distance sk − sl . This assumption can be relaxed by allowing different variability at each spatial location. Such an approach is motivated by the data set that we consider; the graphs of the log-precipitation curves exhibit different variability at different locations. This is also the approach that is implemented in the R package scpt and applied to all our data analyses. Specifically, we postulate a non-stationary covariance function of the form σNS .sk , sl / = c.sk / c.sl / H.sk − sl /, where H is a correlation function on the line. Since we work with the differenced series, we thus assume that non-stationary covariances are given by .Z/ σNS .sk , sl / = c.Z/ .sk /c.Z/ .sl /H .Z/ .sk − sl /: .C:7/ The method-of-moments estimator of c.Z/ .s/ is ĉ.Z/ .s/ = N−1 1 2.N − 1/ n=1 1=2 Zn2 .s; t/dt : .C:8/ The preliminary estimator of function H .Z/ .·/ thus is N−1 1 1 .Z/ .Z/ 2.N − 1/ ĉ .sk / ĉ .sl / n=1 Zn .sk ; t/Zn .sl ; t/dt: .C:9/ Estimator (C.9) is noisy and not necessarily positive definite. Denote the final estimator that is obtained .Z/ by a parametric model fit by Ĥ .sk − sl /. The temporal covariances C.t, t / are estimated by Ĉ .Z/ .t, t / = K k=1 .Z/ w.k/ Ĉk .t, t /, .C:10/ 22 where O. Gromenko, P. Kokoszka and M. Reimherr .Z/ Ĉk .t, t / is given by expression (C.1) and the weights w.k/ are given by equation (C.6) with 2 Σ̂2 = .σ̂ .Z/ NS .sk , sl / , 1 k, l N/: .C:11/ The covariance function (C.10) is represented numerically as a T × T matrix, so we can compute the maximum of T eigenvalues and eigenfunctions. In the case of the precipitation data that were used in Section 5, we obtain 365 eigenvalues. References Aston, J. A. D. and Kirch, C. (2012a) Detecting and estimating epidemic changes in dependent functional data. J. Multiv. Anal., 109, 204–220. Aston, J. A. D. and Kirch, C. (2012b) Evaluating stationarity via change-point alternatives with applications to fMRI data. Ann. Appl. Statist., 6, 1906–1948. Berkes, I., Gabrys, R., Horváth, L. and Kokoszka, P. (2009) Detecting changes in the mean of functional observations. J. R. Statist. Soc. B, 71, 927–946. Billingsley, P. (1999) Convergence of Probability Measures, 2nd edn. New York: Wiley. Bugni, F. A., Hall, P., Horowitz, J. L. and Neuman, G. R. (2009) Goodness-of-fit tests for functional data. Econmetr. J., 12, 1–18. Cho, H. and Fryzlewicz, P. (2015) Multiple-change-point detection for high dimensional time series via sparsified binary segmentation. J. R. Statist. Soc. B, 77, 475–507. Eddelbuettel, D. and François, R. (2011) Seamless R and C++ Integration with Rcpp. New York: Springer. Fryzlewicz, P. (2014) Wild binary segmentation for multiple change-point detection. Ann. Statist., 42, 2243–2281. Gabrys, R. and Kokoszka, P. (2007) Portmanteau test of independence for functional observations. J. Am. Statist. Ass., 102, 1338–1348. Gallagher, C., Lund, R. and Robbins, M. (2012) Changepoint detection in daily precipitation-data. Environmetrics, 23, 407–419. Genton, M. G. (2007) Separable approximations of space-time covariance matrices. Environmetrics, 18, 681–695. Gneiting, T. (2002) Nonseparable, stationary covariance functions for space–time data. J. Am. Statist. Ass., 97, 590–600. Gromenko, O. and Kokoszka, P. (2013) Nonparametric inference in small data sets of spatially indexed curves with application to ionospheric trend determination. Computnl Statist. Data Anal., 59, 82–94. Haas, T. C. (1995) Local prediction of a spatio-temporal process with an application to wet sulfate deposition. J. Am. Statist. Ass., 90, 1189–1199. Hall, P., Fisher, N. I. and Hoffmann, B. (1994) On the nonparametric estimation of covariance functions. Ann. Statist., 22, 2115–2134. Hall, P. and Patil, P. (1994) Properties of nonparametric estimators of autocovariance for stationary random fields. Probab. Theor. Reltd Flds, 99, 399–424. Hoff, P. D. (2011) Separable covariance arrays via the Tucker product, with applications to multivariate relational data. Baysn Anal., 6, 179–196. Hörmann, S. and Kokoszka, P. (2010) Weakly dependent functional data. Ann. Statist., 38, 1845–1884. Horváth, L. (1993) The maximum likelihood method for testing changes in the parameters of normal observations. Ann. Statist., 21, 671–680. Horváth, L., Hušková, M. and Kokoszka, P. (2010) Testing the stability of the functional autoregressive process. J. Multiv. Anal., 101, 352–367. Horváth, L. and Kokoszka, P. (2012) Inference for Functional Data with Applications. New York: Springer. Horváth, L., Kokoszka, P. S. and Steinebach, J. (1999) Testing for changes in multivariate dependent observations with applications to temperature changes. J. Multiv. Anal., 68, 96–119. Paul, D. and Peng, J. (2011) Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach. Electron. J. Statist., 5, 1960–2003. Ramsay, J. O. and Silverman, B. W. (2005) Functional Data Analysis. New York: Springer. Searle, S. R. (1971) Linear Models. New York: Wiley. Stein, M. L. (2005) Space–time covariance functions. J. Am. Statist. Ass., 100, 310–321. Sun, Y., Li, B. and Genton, M. G. (2012) Geostatistics for large datasets. In Advances and Challenges in Space-time Modelling of Natural Events (eds E. Porcu, J. M. Montero and M. Schlather), ch. 3, pp. 55–77. Berlin: Springer. Zhang, X., Shao, X., Hayhoe, K. and Wuebbles, D. J. (2011) Testing the structural stability of temporally dependent functional observations and application to climate projections. Electron. J. Statist., 5, 1765–1796.