Download Statistical inference for nonparametric GARCH models Alexander Meister Jens-Peter Kreiß May 15, 2015

Statistical inference for nonparametric GARCH models Alexander Meister∗ Jens-Peter Kreiß† May 15, 2015 Abstract We consider extensions of the famous GARCH(1, 1) model where the recursive equation for the volatilities is not specified by a parametric link but by a smooth autoregression function. Our goal is to estimate this function under nonparametric constraints when the volatilities are observed with multiplicative innovation errors. We construct an estimation procedure whose risk attains the usual convergence rates for bivariate nonparametric regression estimation. Furthermore, those rates are shown to be optimal in the minimax sense. Numerical simulations are provided for a parametric submodel. Keywords: autoregression; financial time series; inference for stochastic processes; minimax rates; nonparametric regression. AMS subject classification 2010: 62M10; 62G08. 1 Introduction During the last decades GARCH time series have become a famous and widely studied model for the analysis of financial data, e.g. for the investigation of stock market indices. Since the development of the basic ARCH processes by Engle (1982) and their generalization to GARCH time series in Bollerslev (1986) a considerable amount of literature deals with statistical inference – in particular parameter estimation – for those models. Early interest goes back to Lee and Hansen (1994), who consider a quasi-maximum likelihood estimator for the GARCH(1, 1) parameters, and to Lumsdaine (1996) who proves consistency and asymptotic normality of such methods. In a related setting, Berkes et at. (2003) and Berkes and Horvath (2004) establish ∗ Institut für Mathematik, Universität Rostock, D-18051 Rostock, Germany, email: alexander.meister@uni- rostock.de † Institut für Mathematische Stochastik, TU Braunschweig, D-38092 Braunschweig, Germany, email: [email protected] 1 n−1/2 -consistency, asymptotic normality and asymptotic efficiency for the quasi maximum likelihood estimator of the GARCH parameters. Hall and Yao (2003) study GARCH models under heavy-tailed errors. More recent approaches to parameter estimation in the GARCH setting include Robinson and Zaffaroni (2006), Francq et al. (2011) and the manuscript of Fan et al. (2012). Also we refer to the books of Fan and Yao (2003) and Francq and Zikoı̈an (2010) for comprehensive overviews of estimation procedures in general nonlinear time series and GARCH models, respectively. Moreover, Buchmann and Müller (2012) prove asymptotic equivalence of the GARCH model and a time-continuous COGARCH experiment in LeCam’s sense. Besides the standard GARCH time series many related models have been derived such as e.g. AGARCH (Ding et al. (1993)) and EGARCH (Nelson (1991)). Straumann and Mikosch (2006) consider quasi maximum likelihood estimation in those settings among others. Linton and Mammen (2005) investigate kernel methods in semiparametric ARCH(∞) models. Hörmann (2008) deduces asymptotic properties of augmented GARCH processes. Čı́žek et al. (2009) study various of parametric GARCH-related models where the parameters may vary nonparametrically in time and, thus, the data are non-stationary. Other extensions concern the GARCH-in-mean models. Recently, Christensen et al. (2012) consider semiparametric estimation in that model in which the recursive equation for the volatilities is still parametric while an additional function occurs in the recursion formula for the observations. Conrad and Mammen (2009) study testing for parametric specifications for the mean function in those models and introduce an iterative kernel estimator for the mean curve. In the current note we study a general extension of the GARCH(1, 1) model where the recursive equation for the volalilities is specified by a link function which is not uniquely determined by finitely many real-valued parameters but follows only nonparametric constraints. More specifically, we introduce the stochastic processes (Xn )n≥1 and (Yn )n≥1 driven by the recursive formula Yn = Xn εn , (1.1) Xn+1 = m(Xn , Yn ) , with i.i.d. nonnegative random variables {εn }n≥0 , which satisfy Eεn = 1 and Rε := ess sup εn < ∞, and a measurable autoregression function m : [0, ∞)2 → [0, ∞). The data Yj , j = 1, . . . , n, are observed while the Xn are not accessible. Therein, we assume that (Xn )n is strictly stationary and causal, i.e. for any fixed finite T ⊆ N0 , the marginal distributions of (Xt )t−s∈T coincide for any integer shift s ≥ 0; Xn is measurable in the σ-field generated by X1 , ε1 , . . . , εn−1 ; and X1 , ε1 , . . . , εn−1 are independent. The random variables Xn are non-negative as well. Our goal is to estimate the link function m without any parametric pre-specification. The GARCH(1, 1)2 setting is still included by putting m(x, y) = α0 + β1 x + α1 y , where Xn represent the squared volatilities and Yn represent the squared returns of a financial asset. This simplification reduces on the one hand the technical effort to establish our main result, namely to rigorously show that in nonparametric GARCH settings the usual convergence rates for nonparametric estimation of smooth bivariate regression functions can be achieved. On the other hand relevant extensions of the standard GARCH model, which are devoted to include asymmetry in the volatility function according to the dependence on the lagged return, are not included in our model. Threshold GARCH (TGARCH) models have also been introduced in order to capture asymmetry (i.e. usage of threshold zero). If one instead thinks of a positive threshold, at which the GARCH parameters alter depending on whether the lagged squared return is less than or larger than the threshold, (1.1) covers rather similar situations with smooth transitions (depending on squared or absolute values of lagged returns) between different GARCH models. Such general models have been mentioned e.g. in Bühlmann and McNeil (2002) or Audrino and Bühlmann (2001, 2009). In those papers, some iterative estimation and forecasting algorithms are proposed. However, to our best knowledge, a rigorous asymptotic minimax approach to nonparametric estimation of the autoregression function m has not been studied so far. The current paper intends to fill that gap by introducing a nonparametric minimum contrast estimator of m (section 3), deducing the convergence rates of its risk and showing their minimax optimality (section 4). Numerical simulations are provided in Section 5. The proofs are deferred to section 6 while, in section 2, some important properties of stationary distributions of the volatilities are established. 2 Properties of stationary distributions We introduce the function m(x, e y) := m(x, xy) so that the second equation of (1.1) can be represented by Xn+1 = m(X e n , εn ). We impose Condition A (A1) We assume that m is continuously differentiable, maps the subdomain [0, ∞)2 into [0, ∞) and satisfies kmx k∞ + kmy k∞ Rε ≤ c1 < 1 where mx , my denote the corresponding partial derivatives of m and the supremum norm is to be taken over [0, ∞)2 . Moreover, Rε = ess sup ε1 . 3 (A2) We impose that the function m is bounded by linear functions from above and below, i.e. α00 + α10 x + β10 y ≤ m(x, y) ≤ α000 + α100 x + β100 y, ∀(x, y) ∈ [0, ∞)2 , for some positive finite constants α00 , α000 , α10 , α100 , β10 , β100 with α100 < 1. (A3) Moreover the function m increases monotonically with respect to both components; concretely we have m(x, y) ≤ m(x0 , y 0 ), ∀x0 ≥ x ≥ 0, y 0 ≥ y ≥ 0 . (A4) Furthermore, we impose that ε1 has a bounded Lebesgue density fε which decreases monotonically on (0, ∞) and satisfies rε := inf x∈[0,Rε /2] fε (x) > 0. We have Eε1 = 1. The existence of stationary solutions in modified GARCH models has been shown in the papers of Bougerol (1993) and Straumann and Mikosch (2006). Also Diaconis and Freedman (1999) and Wu and Shao (2004) study the existence and properties of a stationary solution for similar iterated random functions. Still, for the purpose of our work, some additional properties of the stationary distribution are required such as its support constraints. Therefore, we modify the proof of Bougerol (1993) in order to obtain part (a) of the following proposition. Part (b) provides a fixed point equation for the left endpoint of stationary distributions. Part (c) gives some lower bound on the small ball probabilities of stationary distributions. As a remarkable side result, a lower bound on the tail of this distribution at its left endpoint is deduced. Proposition 2.1 (a) (modification of the result of Bougerol (1993)) We assume that Condition A holds true. Then there exists a strictly stationary and causal process (Xn )n with Xn ≥ 0 a.s. and RX := ess sup Xn ≤ α000 /(1 − c1 ) < ∞ , which represents a solution to (1.1) for a given sequence (εn )n . The distribution of X1 is unique for any given m and fε under the assumptions of strict stationarity and causality. (b) We impose that (Xn )n satisfies all properties mentioned in part (a). We denote the left endpoint of the stationary distribution of X1 by LX := sup x ≥ 0 : X1 ≥ x a.s. . Then, LX is the only fixed point of the mapping x 7→ m(x, 0), x ∈ [0, ∞). (c) We impose that (Xn )n satisfies all properties mentioned in part (a). Also, we assume that α00 (1 − α100 ) Rε 1 − α10 − ≤ . 0 00 0 β1 α0 β1 8 4 (2.1) Then, for any z ∈ (LX , LX + Rε β10 α00 /4) and δ ∈ (0, α00 /8), we have P [|X1 − z| ≤ δ] ≥ δ · µ1 (z − LX )µ2 exp µ3 log2 (z − LX ) , for some real-valued constants µ1 , µ2 , µ3 , which only depends on c1 , rε , Rε , α00 , α000 , β10 , where µ1 > 0, µ2 > 0 and µ3 < 0. It immediately follows from Proposition 2.1(a) and Condition A that (Yn )n with Yn := Xn εn is a strictly stationary and causal series as well and consists of nonnegative and essentially bounded random variables. Therein note that Xn and εn are independent. Throughout this work we assume that the series (Xn )n is strictly stationary and causal and satisfies all properties mentioned in Proposition 2.1(a). 3 Methodology In this section we construct a nonparametric estimator for the function m and suggest how to choose some approximation space used for its definition. 3.1 Construction of the estimation procedure Here and elsewhere, we utilize the following notation: Given a measurable basic function f : [a, ∞)2 → [a, ∞) or R2 → R, we define the functions f [k] : [a, ∞)k+1 → [a, ∞) or Rk+1 → R, respectively, a ∈ R, integer k ≥ 1, via the recursive scheme f [1] (x, y1 ) := f (x, y1 ) , f [k+1] (x, y1 , . . . , yk+1 ) := f f [k] (x, y1 , . . . , yk ), yk+1 . (3.1) In the sequel we impose that j ≥ k are arbitrary. Iterating the second equation of (1.1) k-fold, we obtain that Xj+1 = m[k] (Xj−k+1 , Yj−k+1 , . . . , Yj ) . (3.2) Considering the causality of the solution (Xn )n we establish independence of εj+1 on the one hand and of Xj−k+1 , Yj−k+1 , . . . , Yj on the other hand so that E Yj+1 |Xj−k+1 , Yj−k+1 , . . . , Yj = Xj+1 , almost surely; and taking the conditional expectation with respect to the smaller σ-field generated by Yj−k+1 , . . . , Yj leads to E Yj+1 |Yj−k+1 , . . . , Yj = E Xj+1 |Yj−k+1 , . . . , Yj = E m[k] (Xj−k+1 , Yj−k+1 , . . . , Yj )|Yj−k+1 , . . . , Yj 5 a.s. . (3.3) The result (3.3) can be viewed as a pseudo estimation equation since the left hand side is accessible by the data while the right hand side contains the target curve m in its representation. Unfortunately the right hand side also depends on the unobserved Xj−k+1 . However we can show in the following lemma that this dependence is of minor importance for large k. Lemma 3.1 We assume Condition A and that j ≥ k > 1. Let f : [0, ∞)2 → [0, ∞) or R2 → R be a continuously differentiable function with bounded partial derivative with respect to x. Fixing an arbitrary deterministic x0 ≥ 0, we have [k] f (Xj−k+1 , Yj−k+1 , . . . , Yj ) − f [k] (x0 , Yj−k+1 , . . . , Yj ) ≤ kfx kk∞ |Xj−k+1 − x0 | . We recall Condition (A1) which guarantees that the assumptions on f in Lemma 3.1 are satisfied by m; and which provides in addition that kmx k∞ ≤ c1 < 1. Combining (3.2) and Lemma 3.1, we realize that m[k] (x0 , Yj−k+1 , . . . , Yj ) can be employed as a proxy of Xj+1 . Concretely we have 2 2 E Xj+1 − m[k] (x0 , Yj−k+1 , . . . , Yj ) ≤ kmx k2k ∞ E|X1 − x0 | , (3.4) where the strict stationarity of the sequence (Xn )n has been used in the last step. Furthermore, as an immediate consequence of (3.2) and (3.4), we deduce that 2 E E m[k] (Xj−k+1 , Yj−k+1 , . . . , Yj )|Yj−k+1 , . . . , Yj − m[k] (x0 , Yj−k+1 , . . . , Yj ) 2 = E E m[k] (Xj−k+1 , Yj−k+1 , . . . , Yj ) − m[k] (x0 , Yj−k+1 , . . . , Yj )Yj−k+1 , . . . , Yj ≤ E|m[k] (Xj−k+1 , Yj−k+1 , . . . , Yj ) − m[k] (x0 , Yj−k+1 , . . . , Yj )|2 2 ≤ kmx k2k ∞ E|X1 − x0 | , (3.5) Thus, suggesting m[k] (x0 , Yj−k+1 , . . . , Yj ) for an arbitrary fixed deterministic x0 ≥ 0 as a proxy of the right hand side of (3.3) seems reasonable. The left hand side of this equality represents the best least square approximation of Yj+1 among all measurable functions based on Yj−k+1 , . . . , Yj . That motivates us to consider that g ∈ G which minimizes the (nonlinear) contrast functional Φn (g) := n−2 X 1 Yj+1 − g [K] (x0 , Yj−K+1 , . . . , Yj )2 n−K −1 j=K 2 + Yj+2 − g [K+1] (x0 , Yj−K+1 , . . . , Yj+1 ) , as the estimator m b = m(Y b 1 , . . . , Yn ) of m where we fix some finite collection G of applicants for the true regression function m. Moreover the parameter K ≤ n − 2 remains to be chosen. We use a two-step adaptation of g [j] to Yj+1 , i.e. for j = K and j = K + 1, in order to reconstruct g from g [j] which will be made precise in section 4. In particular, Lemma 4.1 will clarify the merit of this double least square procedure. 6 3.2 Selection of the approximation set G Now we focus on the problem of how to choose the approximation set G. In addition to Condition A, we impose Condition B The restriction of the true regression function m to the domain I := [0, R00 ] × [0, R00 Rε ] can be continued to a function on [−1, R00 + 1] × [−1, R00 Rε + 1] which is bβc-fold continuously differentiable and all partial derivatives of m with the order ≤ bβc are bounded by a uniform constant cM on the enlarged domain. Therein we write R00 := α000 /(1−c1 ) with c1 as in Condition (A1). Moreover, for non-integer β, the Hölder condition ∂ bβc m ∂ bβc m (x1 , y1 ) − (x2 , y2 )β−bβc , (x , y ) − (x , y ) ≤ c 1 1 2 2 M (∂x)k (∂y)bβc−k (∂x)k (∂y)bβc−k is satisfied for all k = 0, . . . , bβc, (x1 , y1 ), (x2 , y2 ) which are located in [−1, R00 +1]×[−1, R00 Rε +1]. We assume that β > 2. Note that, by Condition A and Proposition 2.1(a), we have RX ≤ R00 so that the smoothness region of m contains the entire support of the distribution of (X1 , Y1 ). Condition B represents classical smoothness assumptions on m where β describes the smoothness level of the function m. All admitted regression functions, i.e. those functions which satisfy all constraints imposed on m in the Conditions A and B are collected in the function class M = M(c1 , α00 , α10 , β10 , α000 , α100 , β100 , β, cM ) . Contrarily, the distribution of the error ε1 is assumed to be unknown but fixed. By k · k2 we denote the Hilbert space norm kmk22 Z := m2 (x, y)dPX1 ,Y1 (x, y) , for any measurable function m : [0, ∞)2 → R, for which the above expression is finite, with respect to the stationary distribution PX1 ,Y1 of (X1 , Y1 ) when m is the true regression function. Now let us formulate our stipulations on the approximation space G which are required to show the asymptotic results in Section 4. Condition C We assume that G consists either of finitely many measurable mappings g : [0, ∞)2 → [0, ∞) or R2 → R. In addition, we assume that g is continuously differentiable and satisfies kgx k∞ ≤ c1 with the constant c1 as in Condition A and kgy k∞ ≤ cG as well as supx∈[0,R00 ]×[0,R00 Rε ] |g(x, y)| ≤ 7 cG for some constant cG > cM . Furthermore, we impose that any m ∈ M is well-approximable in G; more specifically sup min km − gk22 = O n−β/(1+β) . m∈M g∈G (3.6) On the other hand, the cardinality of G is restricted as follows, #G ≤ exp cN · n1/(1+β) , for some finite constant cN . Therein β and R00 are as in Condition B. We provide Lemma 3.2 For cG sufficiently large, there always exists some set G such that Condition C is satisfied. The cover provided in Lemma 3.2 depends on the smoothness level β in Condition C so that the estimator m b depends on β. Therefore the estimator is denoted by m b β while the index is oppressed elsewhere. That motivates us to consider a cross-validation selector for β as used in the construction of our estimator. The integrated squared error of our estimator is given by ZZ 2 2 km b β − mk2 = km b β k2 − 2 m b β (x, y)m(x, y)dP(X,Y ) (x, y) + kmk22 . (3.7) While the last term in (3.7) does not depend on the estimator we have to mimic the first two expressions. For that purpose we introduce 0 < n1 < n2 < n and base our estimator only on the dataset Y1 , . . . , Yn1 . Then km b β k22 can be estimated by 1 n − n2 n X [j−n1 ] 2 m bβ (x0 , Yn1 +1 , . . . , Yj ) , j=n2 +1 in the notation (3.1), and the second term in (3.7) is estimated by n−1 X 2 [j−n ] Yj+1 · m b β 1 (x0 , Yn1 +1 , . . . , Yj ) . n − n2 − 1 j=n2 +1 Replacing the first two terms in (3.7) by these empirical proxies and omitting the third one provides an empirically accessible version of (3.7). This version can be minimized over some discrete grid with respect to β instead of the true integrated squared error. Then the minimizing b Still we have to leave the question of adaptivity open for future research. value is suggested as β. Finally we mention that the estimator derived in this section can also be applied to parametric approaches although we do not focus on such models in the framework of the current paper. In the simulation section we will consider such a parametric subclass. 8 4 Asymptotic properties In this section we investigate the asymptotic quality of our estimator m b as defined in section 3. In order to evaluate this quality we consider the mean integrated square error Ekm b − mk22 (MISE). In the following subsection 4.1, we deduce upper bounds on the convergence rates of the MISE when the sample size n tends to infinity; while, in subsection 4.2, we will show that those rates are optimal with respect to any estimator based on the given observations in a minimax sense. Note that we consider the MISE uniformly over the function class m ∈ M while the error density fε is viewed as unknown but fixed; in particular, it does not change in n. 4.1 Convergence rates – upper bounds We fix some arbitrary g ∈ G. Then we study the expectation of the functional Φn (g). Exploiting (3.3) and the strict stationarity of (Yn )n , we obtain that 2 2 EΦn (g) = E YK+1 − g [K] (x0 , Y1 , . . . , YK ) + E YK+2 − g(g [K] (x0 , Y1 , . . . , YK ), YK+1 ) = E var(YK+1 |Y1 , . . . , YK ) + E var(YK+2 |Y1 , . . . , YK+1 ) + E|WK − GK (g)|2 + E|WK+1 − GK+1 (g)|2 , where we define Wj := E m[j] (X1 , Y1 , . . . , Yj )|Y1 , . . . , Yj , Gj (g) := g [j] (x0 , Y1 , . . . , Yj ) , g ∈G, for j = K, K + 1. For any g1 , g2 ∈ G we deduce that EΦn (g1 ) − EΦn (g2 ) = E|GK (g1 )|2 + E|GK+1 (g1 )|2 − E|GK (g2 )|2 − E|GK+1 (g2 )|2 − 2 EWK GK (g1 ) + EWK+1 GK+1 (g1 ) − EWK GK (g2 ) − EWK+1 GK+1 (g2 ) . For l = 1, 2, k = K, K + 1, we combine (3.4), (3.5), the Cauchy-Schwarz inequality and the triangle inequality in L2 (Ω, A, P ) to show that EWj Gj (gl ) − EXj+1 Gj (gl ) ≤ 2kmx kj∞ (E|X1 − x0 |2 )1/2 {E|Gj (gl )|2 }1/2 . Then elementary calculation yields that EΦn (g1 ) − EΦn (g2 ) − ϑK (g1 ) + ϑK (g2 ) 2 1/2 ≤ 2kmx kK ∞ (E|X1 − x0 | ) 2 X {E|GK (gl )|2 }1/2 + kmx k∞ {E|GK+1 (gl )|2 }1/2 , (4.1) l=1 9 where ϑK (g) := E|XK+1 − GK (g)|2 + E|XK+2 − GK+1 (g)|2 , g ∈ G. This functional will be studied in the following lemma. In particular, we will provide both an upper and a lower bound of ϑK (g) where the right hand side of the corresponding inequalities both contain the distance between g and the true target function m with respect to the k · k2 -norm. Lemma 4.1 We assume the Conditions A,B and C. Then, for any g ∈ G, we have 1/2 2 1/2 ϑK (g) ≤ 2km − gk2 / 1 − kgx k∞ + kgx kK , ∞ {1 + kgx k∞ }(E|X1 − x0 | ) as well as ϑK (g) ≥ km − gk22 /2 . As a side result of Lemma 4.1, we deduce that under the Conditions A and C 1/2 (E|GK (g)|2 )1/2 ≤ (EX12 )1/2 + ϑK (g) ≤ (EX12 )1/2 + 2(E|X1 − x0 |2 )1/2 + 2km − gk2 /(1 − kgx k∞ ) ≤ (EX12 )1/2 + 2(E|X1 − x0 |2 )1/2 + 2{(EX12 )1/2 + kgk2 }/(1 − kgx k∞ ) ≤ 3RX + x0 + 2{RX + cG }/(1 − c1 ) , using the equality X2 = m(X1 , Y1 ) and the strict stationarity of (Xn )n . Applying that to (4.1) we have EΦn (g1 ) − EΦn (g2 ) − ϑK (g1 ) + ϑK (g2 ) ≤ SK , with SK := 4cK 1 (RX n RX + cG o + x0 ) · 6RX + 4x0 + 4 . 1 − c1 Now we introduce g0 := arg ming∈G km − gk2 . Then, for any g ∈ G, we obtain under Condition A and C by Lemma 4.1 that EΦn (g) − EΦn (g0 ) ≥ ϑK (g) − ϑK (g0 ) − SK 4 1 2 ≥ km − gk22 − 2 · km − g0 k2 − RK , 2 1 − c1 (4.2) where RK := 8(cG + α000 + α100 RX + β100 RX Rε ) 2K 2 (RX + x0 )cK 1 + 4c1 (RX + x0 ) + SK 1 − c1 ≤ Λ1 · cK 1 , (4.3) where the positive finite constant Λ1 only depends on α000 , α100 , β100 , Rε , x0 , cG , c1 by Proposition 2.1(a). The equation (4.2) shows that the distance between the expectation of Φn at g and at 10 g0 , i.e. the best-approximating element of m within G, increases with the order km − gk22 if this distance is large enough so that the expected value of Φn (g) really contrasts g from g0 with respect to their k · k2 -distance to the target function m. That is essential to prove the following theorem. Theorem 4.1 We consider the model (1.1) under the Conditions A, B. The series (Xn )n is assumed to be strictly stationary and causal. We choose K nι for some ι ∈ (0, 1) and the approximation set G such that Condition C is satisfied. Then the estimator m b from section 3 satisfies sup Ekm b − mk22 = O n−β/(β+1) . m∈M We realize that our estimator attains the usual convergence rates for the nonparametric estimation of smooth bivariate regression functions. Remark 1 An inspection of the proof of Theorem 4.1 shows that the result remains valid when M is changed into one of its subsets in the theorem as well as in Condition C. 4.2 Convergence rates – lower bounds In this subsection we focus on the question whether or not the rates established in Theorem 4.1 are optimal with respect to an arbitrary estimator sequence (m b n )n where m b n is based on the data Y1 , . . . , Yn in a minimax sense. Prepatory to this investigation we provide Condition C’ The partial derivative my satisfies inf (x,y)∈[0,∞)2 my (x, y) ≥ c0M for some uniform constant c0M > 0. All m ∈ M which satisfy this constraint are collected in the set M0 . and Lemma 4.2 We consider the model (1.1) under the Conditions A, B and C’. Then the stationary distribution of X1 is absolutely continuous and has a Lebesgue density fX which is bounded from above by kfε k∞ /(c0M α00 ). By k · kλ we denote the norm 1/2 Z R00 Z R00 Rε kf kλ := f 2 (x, y) dx dy , x=α00 y=0 of the Hilbert space of all measurable and squared integrable functions from [α00 , R00 ] × [0, R00 Rε ] to C with respect to the Lebesgue measure λ. For all functions f in this space, we have kf k22 ≤ kf k2λ · kfε k2∞ /{c0M (α00 )2 } . 11 We consider the model (1.1) under the Conditions A and B where, in addition, we stipulate Condition D We have 0 < α00 = α000 =: α0 and α100 > α10 as well as β100 > β10 . Also α100 + β100 Rε < c1 . Also we impose that α100 − α10 ≤ Rε β10 /8 . For some given Rε > 1, the parameters α0 , α10 , α100 , β10 and β100 can be selected such that Condition D is granted while these constraints are consistent with Condition A. For any m ∈ M under the additional Condition D, we consider the interval J (m) := (LX + Rε β10 α0 /16, LX + Rε β10 α0 /8]. Since Rε β10 < 1 by Condition D, it follows from Proposition 2.1(c) that n h R β0 α R β 0 α io µ1 Rε β10 α0 µ2 ε 1 0 ε 1 0 · · exp µ3 log2 + log2 · λ(I) 2 16 16 8 ≥ µ6 · λ(I) , (4.4) P [X1 ∈ I] ≥ for all closed intervals I ⊆ J (m) where the constant µ6 > 0 only depends on c1 , rε , Rε , α0 , β10 and λ stands for the one-dimensional Lebesgue measure. We give the following lemma. Lemma 4.3 Let I = (i1 , i2 ] be an left-open non-void bounded interval and c > 0 some constant. Let Q be some fixed probability measure on the Borel σ-field of I. We impose that for all [a, b] ⊆ I we have Q([a, b]) ≥ c · λ([a, b]) , Then, for all non-negative measurable functions ϕ on the domain I we have Z Z ϕ(x)dQ(x) ≥ c · ϕ(x)dλ(x) . If, in addition, Q has a Lebesgue-Borel density q on I then q(x) ≥ c holds true for Lebesgue almost all x ∈ I. Combining Lemma 4.3 and (4.4), we conclude by Fubini’s theorem that Z Z 2 kmk2 ≥ m2 (x, xt) · 1J (m) (x)dPX1 (x) fε (t)dt Z Z ≥ µ6 · m2 (x, xt)dx fε (t)dt , (4.5) J (m) for any measurable function m : [0, ∞)2 → R with Em2 (X1 , Y1 ) < ∞. Note that the k · k2 norm is constructed with respect to the stationary distribution of (X1 , Y1 ) when m is the true autoregression function. In particular PX1 denotes the marginal distribution of X1 . 12 We define mG (x, y) = α0 + α1 x + β1 y for α1 ∈ (α10 , α100 ) and β1 ∈ (β10 , β100 ). Thus the function m = mG leads to a standard GARCH(1, 1) time series. Considering that (mG )x ≡ α1 and (mG )y ≡ β1 , we realize that m = mG fulfills Condition A. Choosing cM sufficiently large with respect to any β > 2 Condition B is satisfied as well as all derivatives of mG with the order ≥ 2 vanish. Then mG lies in M. By Proposition 2.1(b) we easily deduce that the left endpoint LX of the distribution of X1 under m = mG equals LX = α0 /(1 − α1 ). We introduce the points j 1 , j = 0, . . . , bn , ξj,n := α0 /(1 − α1 ) + Rε β10 α0 · 1 + 16 bn for some sequence (bn )n ↑ ∞ of even integers. Furthermore we define the grid points 1 ij,k := (ξj,n + ξj+1,n ), (k + 1/2)Rε /(2bn ) , j, k = 0, . . . , bn − 1 . 2 1 1 The function H(x, y) := exp x2 −1 + y2 −1 for max{|x|, |y|} < 1 and H(x, y) := 0 otherwise is supported on [−1, 1]2 and differentiable infinitely often on R2 . We specify the following parameterized family of functions m e θ (x, y) := m e G (x, y) + bX n −1 0 cH b−β n θj,k · H 64bn (x − ij,k,1 )/(Rε β1 α0 ), 8bn (y − ij,k,2 )/Rε , j,k=0 where θ = {θj,k }j,k=0,...,bn −1 denotes some binary matrix, cH > 0 denotes a constant, which is viewed as sufficiently small compared to all other constants, and β is as in Condition B. Accordingly we define mθ (x, y) := m e θ (x, y/x) . (4.6) We have provided the family of competing regression functions which will be used to prove the minimax lower bound in the following theorem. Theorem 4.2 We consider the model (1.1) under the Conditions A,B and D with cM sufficiently large. Furthermore, we assume that fε is continuously differentiable and that the Fisher information of the density f ε of log ε1 satisfies 0 2 E f ε (log ε1 )/f ε (log ε1 ) ≤ Fε , (4.7) for some Fε < ∞. Let {m b n }n be an arbitrary estimator sequence where m b n is based on the data Y1 , . . . , Yn . Then we have lim inf nβ/(β+1) sup Ekm b n − mk22 > 0 . n→∞ m∈M As f ε (x) = fε (exp(x)) · exp(x) the condition (4.7) holds true for some finite Fε whenever 2 Eε21 fε0 (ε1 )/fε (ε1 ) < ∞ . Thus, Theorem 4.2 ensures optimality of the convergence rates established for our estimator in Theorem 4.1. 13 5 Implementation and numerical simulations We consider the implementation of our estimator and the perspectives for its use in practice. The main difficulty represents the selection of the approximation set G. Nevertheless we will show that our method can be used beyond the framework of the Conditions A–C, whose main purpose is to guarantee the optimal nonparametric convergence rates as considered in the previous section. As mentioned in Subsection 3.2 we concentrate on a parametric type of G for the numerical simulation. Concretely, we consider the functions mα0 ,α1 ,β1 ,βp (x, y) = α0 + β1 x + α1 y + βp sin(2πx) + cos(2πy) , which represent the GARCH model with some trigonometric perturbance. Figure 1 and 2 show the graphical plot of these functions for the values α0 = 0.2, α1 = 0.2, β1 = 0.3, βp = −0.03 and α0 = 0.3, α1 = 0.2, β1 = 0.3, βp = 0, respectively. α0 α1 β1 βp α0 α1 β1 βp 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2000 0.3000 0 0.2000 0.2500 0.2500 −0.0300 0.3000 0.2000 0.3000 0 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2500 0.2500 0.0100 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2000 0.3000 0 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2000 0.3000 −0.0100 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2000 0.3000 0 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2000 0.3000 0.0100 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2000 0.3000 0 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2000 0.3000 −0.0100 0.2000 0.2000 0.3000 −0.0300 0.3000 0.2000 0.3000 −0.0100 Table 1: results corr. to Fig. 1 Table 2: results corr. to Fig. 2 As huge sample sizes are usually available in financial econometrics we base our simulations on 5, 000 observations Y1 , . . . , Y5000 . We choose G = mα0 ,α1 ,β1 ,βp : α0 , α1 , β1 = 0.1 : 0.05 : 0.3, βp = −0.03 : 0.01 : 0.03 , for our estimator under the true m from Figure 1 as well as from Figure 2. Note that the existence of a stationary solution is ensured for each g ∈ G by the results of Bougerol (1993). We have executed 10 independent simulation runs for each setting. We select the parameters K = 5 and x0 = 0. Moreover the error variables εj have the Lebesgue density fε (x) = 2 3 1 − x/4 1[0,4] (x) , 4 14 0.8 1 0.6 0.8 0.4 0.6 0.2 0.4 0 1 0.2 1 0.8 0.8 1 0.6 0.4 0.2 0.2 0 0.8 0.6 0.4 0.4 0.2 1 0.6 0.8 0.6 0.4 0.2 0 0 Figure 1: trigonometric perturbance 0 Figure 2: pure GARCH surface in our simulations. The results of all simulations for m as in Figure 1 and Figure 2 are given in Table 1 and 2, respectively. We realize that our procedure performs well for the considered estimation problem. However refining the equidistant grid from the step width 0.05 to e.g. 0.01 will increase the computational effort intensively and also deteriorate the estimation results due to the increasing variance of the estimator under enlarged G. Critically we address that the appropriate selection of G remains a challenging problem in practice in order to avoid a curse of dimensionality if many coefficients are involved. The MATLAB code is available from the authors upon request. 6 Proofs Proof of Proposition 2.1: (a) Inspired by the proof of Bougerol (1993) we introduce random variables εn with integer n < 0 such that εn , integer n, are i.i.d. random variables on a joint probability space (Ω, A, P ). Furthermore, we define the random variables Zt,0 := 0 for all integer t ≥ 0 and recursively Zt,s+1 := m(Z e t,s , εs−t ). We deduce that |Zt+1,s+1 − Zt,s | = m(Z e t+1,s , εs−t−1 ) − m(Z e t,s−1 , εs−t−1 ) ≤ km e x (·, εs−t−1 )k∞ |Zt+1,s − Zt,s−1 | , for any t ≥ 0, s ≥ 1 by the mean value theorem. Repeated application of this inequality yields that |Zt+1,s+1 − Zt,s | ≤ s−1 Y j=0 15 km e x (·, εj−t )k∞ α000 . As the random variables km e x (·, εj−t )k∞ , j = 0, . . . , s − 1, are bounded from above by kmx k∞ + kmy k∞ Rε almost surely we obtain that ess sup|Zt+1,s+1 − Zt,s | ≤ (kmx k∞ + kmy k∞ Rε )s α000 ≤ cs1 α000 . (6.1) As a side result of this inequality we derive by induction that all random variables Zt+n,n , t, n ≥ 0, are located in L∞ (Ω, A, P ), i.e. the Banach space consisting of all random variables X on (Ω, A, P ) with ess sup|X| < ∞. Moreover, Condition (A1) and the nonnegativity of the εn provide that Zt+n,n ≥ 0 almost surely for all t, n ≥ 0 also by induction. Since kmx k∞ + kmy k∞ Rε ≤ c1 < 1 and (6.1) holds true we easily conclude by the triangle inequality that (Zt+n,n )n≥0 are Cauchy sequences in L∞ (Ω, A, P ) for all integer t ≥ 0. By the geometric summation formula, we also deduce that ess sup Zt+n,n ≤ α000 /(1 − c1 ) , (6.2) for all t, n ≥ 0. The completeness of L∞ (Ω, A, P ) implies convergence of each (Zt+n,n )n≥0 to some Zt∗ ∈ L∞ (Ω, A, P ) with respect to the underlying essential supremum norm. The nonnegative random variables on (Ω, A, P ) which are bounded from above by α000 /(1 − c1 ) almost surely form a closed subset of L∞ (Ω, A, P ) as L∞ (Ω, A, P )-convergence implies convergence in distribution. We conclude that (6.2) can be taken over to the limit variables Zt∗ so that Zt∗ ∈ 0, α000 /(1 − c1 ) almost surely for all t ≥ 0. Since L∞ (Ω, A, P )-convergence also implies convergence in probability the inequality (6.1) also yields that P |Zt+n,n − Zt∗ | > δ ≤ lim sup P |Zt+n,n − Zt+m,m | > δ/2 m→∞ ≤ 2δ −1 ∞ X ess sup |Zt+m,m − Zt+m+1,m+1 | m=n ≤ 2δ −1 α000 cn1 /(1 − c1 ) , by Markov’s inequality for any δ > 0 so that the sum of the above probabilities taken over all n ∈ N is finite. Hence, (Zt+n,n )n converges to Zt∗ almost surely as well. By the recursive definition of the Zt,s and the joint initial values Zt,0 = 0 we realize that, for any fixed n, the random variables Zt+n,n , t ≥ 0, are identically distributed. We conclude that all limit variables Zt∗ possess the same distribution. As m e is continuous and we have Zt+n+1,n+1 = m(Z e t+n+1,n , ε−t−1 ) the established almost sure convergence implies that ∗ ∗ ∗ Zt∗ = m(Z e t+1 , ε−t−1 ) = m(Zt+1 , Zt+1 ε−t−1 ) , almost surely for all t ≥ 0. The variable Zt+1+n,n is measurable in the σ-algebra generated by ∗ Zt+1+n,0 , ε−t−2 , ε−t−3 , . . . , ε−t−n−1 . Hence, an appropriate version of the almost sure limit Zt+1 16 is measurable in the σ-algebra generated by all εs , s ≤ −t − 2, and therefore independent of ε−t−1 . We conclude that, by putting X0 := Z0∗ , the induced sequences (Xn )n , which are defined according to the recursive relation (1.1), is strictly stationary and causal since X0 , ε0 , ε1 , . . . are independent so that all marginal distributions of (Xn )n−s∈T , with any finite T ⊆ N0 , do not depend on the integer shift s ≥ 0. Also all Xn have the same distribution as the Zt∗ so that Xn ∈ [0, α000 /(1 − c1 )] holds true almost surely. Uniqueness of the stationary distribution can be seen as follows: We have Zt+n,n = m e [n] (0, ε−n−t , . . . , ε−1−t ) for all t and n in the notation (3.1). Now let (Xn )n be a strictly stationary and causal time series where X1 is independent of all εs , integer s. Again, by the mean value theorem and induction we obtain that Zt+n,n − m e [n] (X1 , ε−n−t , . . . , ε−1−t ) ≤ α000 cn1 , [n] e (X1 , ε−n−t , . . . , ε−1−t ) n converges to the random variable so that the stochastic process m Zt∗ almost surely and, hence, in distribution. As we claim that {Xn }n is a strictly stationary and causal solution for (1.1) all the random variables m e [n] (X1 , ε−n−t , . . . , ε−1−t ) have the same distribution as X1 . Therefore the distributions of X1 and Zt∗ coincide so that the distribution of Zt∗ is indeed the unique strictly stationary distribution when m is the autoregression function and fε is the error density. (b) Since {X1 ≥ LX } = T n∈N {X1 ≥ LX − 1/n} we have X1 ≥ LX almost surely. The monotonicity constraints in Condition (A3) imply that X2 = m(X e 1 , ε1 ) ≥ m(L e X , 0) = m(LX , 0) a.s.. The strict stationarity of the process (Xn )n provides that X1 ≥ m(LX , 0) a.s. holds true as well. By the definition of LX it follows that LX ≥ m(LX , 0). Now we assume that LX > m(LX , 0) = m(L e X , 0). As m e is continuous there exists some ρ > 0 with LX > m(L e X + ρ, ρ). Thus, P [X1 ≤ LX + ρ, ε1 ≤ ρ] ≤ P [X2 = m(X e 1 , ε1 ) ≤ m(L e X + ρ, ρ)] ≤ P [X1 < LX ] = 0 , due to Condition (A3) and the definition of LX . On the other hand we have P [X1 ≤ LX + ρ, ε1 ≤ ρ] = P [X1 ≤ LX + ρ] · P [ε1 ≤ ρ] = 0 , so that we obtain a contradiction to Condition (A4) and the definition of LX . We conclude that LX = m(LX , 0). If another solution x ≥ 0 existed we would have |x − LX | = |m(x, 0) − m(LX , 0)| ≤ kmx k∞ |x − LX | ≤ c1 |x − LX | , leading to a contradiction since c1 < 1. 17 (c) The constraints in Condition (A1) and (A2) give us that LX ≥ α00 > 0. Writing FX for the distribution function of X1 we deduce for any x > LX that Z FX (x) = P [m(X e 1 , ε1 ) ≤ x] = P [m(t, e ε1 ) ≤ x] dFX (t) , (6.3) t≥LX where we exploit the independence of X1 and ε1 . Let us consider the probability P [m(t, e ε1 ) ≤ x] for any fixed t ≥ LX . We have P [m(t, e ε1 ) ≤ x] = P [m(t, e ε1 ) − m(t, e 0) ≤ x − m(t, 0)] ≥ P kmy k∞ tε1 ≤ x − m(t, 0) ≥ P ε1 ≤ (x − m(t, 0))/(c1 t) , by Condition (A1), which provides via Rε ≥ Eε1 = 1 that 1 > c1 ≥ kmy k∞ Rε ≥ kmy k∞ , and the mean value theorem. This theorem also yields that x − m(t, 0) = x − LX + m(LX , 0) − m(t, 0) ≥ x − LX − kmx k∞ (t − LX ) ≥ x − LX − c1 (t − LX ) , (6.4) so that P m(t, e ε1 ) ≤ x ≥ Fε x − LX − c1 (t − LX ) /(c1 t) , where Fε denotes the distribution function of ε1 . Also if t − LX ≤ c−1 1 (x − LX ) holds true then x ≥ m(t, 0) and also c1 t ≤ x. Writing GX (y) := FX (y + LX ), y ∈ R, we insert these results into (6.3), which yields that Z GX (x − LX ) ≥ 1[0,c−1 (x−LX )] (s) Fε (x − LX − c1 s)/x dGX (s) . 1 We define c∗1 := (1 + c1 )/2 and conclude that Z 1 − c y 1 · · GX (y/c∗1 ) , GX (y) ≥ 1[0,y/c∗1 ] (s)Fε y(1 − c1 /c∗1 )/(y + LX ) dGX (s) ≥ Fε 1 + c1 y + LX for all y > 0. Iterating this inequality we obtain by induction that ≥ GX Y n−1 x − LX 1 + c1 (c∗1 )j LX + x − LX j=0 1 − c x − L 1 X (x − LX )/(c∗1 )n · Fεn · , 1 + c1 x1 GX (x − LX ) ≥ GX (x − LX )/(c∗1 )n Fε 1 − c 1 · for all integer n ≥ 2 and x ∈ (LX , x1 ) where we put x1 := 2α000 /(1 − c1 ). By the upper bound on RX in part (a) we derive that GX (x1 − LX ) = FX (x1 ) = 1 . 18 This also provides that x1 > LX . Thus, for all integer n ≥ − log (x1 − LX )/(x − LX ) / log c∗1 and x ∈ (LX , x1 ), we have GX (x − LX ) ≥ Fεn 1 − c 1 · x − LX x1 1 + c1 Specifying n := d− log (x1 − LX )/(x − LX ) / log c∗1 e we obtain that n log(x − L ) − log(x − L ) 1 − c x − L o 1 1 X X X GX (x − LX ) ≥ exp − · − 1 · log Fε . ∗ log c1 1 + c1 x1 The lower bound imposed on fε in Condition (A4) implies that Fε (t) ≥ rε t for all t ∈ [0, Rε /2] so that by elementary calculation and bounding we obtain n (1 + c1 )Rε x1 o FX (x) ≥ µ5 (x − LX )µ3 exp µ4 log2 (x − LX ) , ∀x ∈ LX , min x1 , LX + , 2 − 2c1 (6.5) where µ3 , µ4 , µ5 are real-valued constants which only depend on c1 , rε , cε , α00 , α000 . In particular we have µ4 = log−1 c∗1 < 0 and µ3 , µ5 > 0. Now we fix some arbitrary z > LX and δ > 0. Using the arguments from above (see eq. (6.3)) we deduce that Z x2 P |X1 − z| ≤ δ ≥ t=LX P |m(t, e ε1 ) − z| ≤ δ dFX (t) , (6.6) for some x2 ≥ LX which remains to be specified. We learn from Condition (A2) and (A3) that the continuous functions s 7→ m(t, e s), s ∈ [0, ∞), which increase monotonically, tend to +∞ as s → +∞ and, therefore, range from m(t, 0) to +∞. We focus on the case where z ≥ m(t, 0). Then there exists some s(t, z) ≥ 0 such that m(t, e s(t, z)) = z. The mean value theorem provides that for all s ≥ 0 with |s − s(t, z)| ≤ Rε δ/x2 we have |m(t, e s) − z| ≤ δ since kmy k∞ ≤ c1 /Rε < 1/Rε . We conclude that P |m(t, e ε1 ) − z| ≤ δ ≥ P |ε1 − s(t, z)| ≤ Rε δ/x2 ≥ Rε δx−1 2 fε s(t, z) + Rε δ/x2 , by Condition (A4). The lower bound on m in Condition (A2) yields that s(t, z) ≤ (z − α00 − α10 t)/(β10 t). Therein note that z ≥ m(t, 0) ≥ α00 + α10 t. We obtain the inequality z − α0 α10 −1 0 − + R δx . P |m(t, e ε1 ) − z| ≤ δ ≥ Rε δx−1 f ε ε 2 2 β10 LX β10 Inserting this low bound, which does not depend on t, into (6.6) for all t ∈ [LX , x2 ] with m(t, 0) ≤ z we deduce that z − α0 0 0 P |X1 − z| ≤ δ ≥ P X1 ≤ x2 , m(X1 , 0) ≤ z · Rε δx−1 f + R δ/α ε ε 0 2 β10 α00 z − L α100 − α10 X 0 ≥ FX (x2 ) · 1[0,z] (m(x2 , 0)) · Rε δx−1 f − + R δ/α ε ε 0 . 2 β10 α00 β10 19 Also we have used that x2 ≥ LX ≥ α00 ; and that LX = m(LX , 0) ≥ α000 + α100 LX , due to part (b) of the proposition, leading to LX ≤ α000 /(1 − α100 ). Using the arguments from (6.4) when replacing t by x2 and x by z we derive that selecting x2 = LX + (z − LX )/ċ1 implies m(x2 , 0) ≤ z for any ċ1 > max 1, β10 /4(1 + Rε (1 − c1 )) , which finally leads to P |X1 − z| ≤ δ ≥ z −1 δ (ċ1 )−µ3 µ5 exp(µ4 log2 ċ1 )(z − LX )µ3 −2µ4 log ċ1 exp µ4 log2 (z − LX ) z − L α100 − α10 X 0 · fε − + R δ/α ε 0 , β10 α00 β10 when combining these findings with (6.5) and x2 ≤ z. The assumptions on z and δ along with (2.1) yield that z − LX α100 − α10 − + Rε δ/α00 < Rε /2 , β10 α00 β10 so that Condition (A4) completes the proof of the proposition. Proof of Lemma 3.1: From the recursive definition of the f [k] we learn that [k] f (Xj−k+1 , Yj−k+1 , . . . , Yj ) − f [k] (x0 , Yj−k+1 , . . . , Yj ) ≤ f f [k−1] (Xj−k+1 , Yj−k+1 , . . . , Yj−1 ), Yj − f f [k−1] (x0 , Yj−k+1 , . . . , Yj−1 ), Yj ≤ kfx k∞ f [k−1] (Xj−k+1 , Yj−k+1 , . . . , Yj−1 ) − f [k−1] (x0 , Yj−k+1 , . . . , Yj−1 ) , by the mean value theorem. Repeated use of this inequality (by reducing j and k by 1 in each step) proves the claim of the lemma. Proof of Lemma 3.2: We define M0 as the set of the restriction of m to the domain I := [0, R00 ] × [0, R00 Rε ] for all m ∈ M. We learn from Theorem 2.7.1, p. 155 in van der Vaart and Wellner (1996) that the κ-covering number N (κ) of M0 with respect to the uniform metric, i.e. the minimal number of closed balls in C0 (I) with the radius κ which cover M0 , is bounded from above by exp cN · κ−2/β for all κ > 0 where the finite constant cN only depends on R00 , Rε , cM and β. Related results on covering numbers can be found in the book of van de Geer (2000). Moreover C0 (I) denotes the Banach space of all continuous functions on I, equipped with the underlying supremum norm. If, in addition, we stipulate that the centers of the covering balls in the definition of N (κ) are located in M0 we refer to the minimal number of such balls as the intrinsic covering number N (M0 , κ). Now assume that M1 represents a κ/2-cover of M0 with #M1 = N (κ/2). Hence, for any m1 ∈ M1 there exists some m0 ∈ M0 such that km1 − m0 k∞ ≤ κ/2. Otherwise, M1 \{m1 } would be a κ/2-cover of M0 with the cardinality N (κ/2) − 1. Therefore we can define a mapping Ψ from the finite set M1 to M0 which maps 20 m1 ∈ M1 to some fixed m0 ∈ M0 with km1 − m0 k∞ ≤ κ/2. Obviously, {Ψ(m1 ) : m1 ∈ M1 } represents an intrinsic κ-cover of M0 so that N (M0 , κ) ≤ exp cN 41/β · κ−2/β . Now we select the set G1 such that G0 := g |I : g ∈ G is an intrinsic κn -cover of M0 with −2/β #G ≤ exp cN 41/β ·κn . Note that kgx k∞ ≤ c1 follows from Condition A for all g ∈ G1 ⊆ M. When choosing the constant cG sufficiently large with respect to Rε , R00 , c1 , α000 , α100 , β100 , the other constraints contained in Condition B are also fulfilled. When choosing the sequence (κn )n such that κn n−β/(2β+2) Condition C is granted since kf k22 ≤ sup f 2 (x, y) , (x,y)∈I for all measurable functions f : [0, ∞)2 → R with Ef 2 (X1 , Y1 ) < ∞. Proof of Lemma 4.1: We deduce by (3.2) that 1/2 E|Xj+1 − g [j] (X1 , Y1 , . . . , Yj )|2 2 1/2 2 1/2 ≤ E m(Xj , Yj ) − g(Xj , Yj ) + E g(Xj , Yj ) − g(g [j−1] (X1 , Y1 , . . . , Yj−1 ), Yj ) 1/2 ≤ km − gk2 + kgx k∞ E|Xj − g [j−1] (X1 , Y1 , . . . , Yj−1 )|2 , by the strict stationarity of the sequences (Xn )n and (Yn )n . Iterating this inequality by replacing j by j − 1 in the next step, we conclude that 2 1/2 [j] E|Xj+1 − g (X1 , Y1 , . . . , Yj )| ≤ km − gk2 j−1 X kgx kk∞ ≤ km − gk2 / 1 − kgx k∞ , k=0 for all integer j ≥ 1. Moreover we obtain that E|Xj+1 − Gj (g)|2 1/2 1/2 ≤ km − gk2 / 1 − kgx k∞ + E|g [j] (X1 , Y1 , . . . , Yj ) − g [j] (x0 , Y1 , . . . , Yj )|2 ≤ km − gk2 / 1 − kgx k∞ + kgx kj∞ (E|X1 − x0 |2 )1/2 , so that the proposed upper bound on ϑ(g) has been established. 21 In order to prove the lower bound, we consider that |XK+1 − GK (g)|2 + |m(XK+1 , YK+1 ) − g(GK (g), YK+1 )|2 ≥ |XK+1 − GK (g)|2 + |m(XK+1 , YK+1 ) − g(XK+1 , YK+1 )|2 − 2|gx (ξK , YK+1 )| · |XK+1 − GK (g)| · |m(XK+1 , YK+1 ) − g(XK+1 , YK+1 )| + |gx (ξK , YK+1 )|2 · |XK+1 − GK (g)|2 ≥ min |XK+1 − GK (g)|2 + |m(XK+1 , YK+1 ) − g(XK+1 , YK+1 )|2 λ∈[0,1] − 2λ|XK+1 − GK (g)| · |m(XK+1 , YK+1 ) − g(XK+1 , YK+1 )| + λ2 |XK+1 − GK (g)|2 . by the mean value theorem where ξK denotes some specific value between XK+1 and GK (g). In order to determine that minimum only the values λ ∈ {0, 1, |m(XK+1 , YK+1 ) − g(XK+1 , YK+1 )|/ |XK+1 − GK (g)|} have to be considered where the last point has to be taken into account only if |m(XK+1 , YK+1 ) − g(XK+1 , YK+1 )| ≤ |XK+1 − GK (g)|. In all of those cases we derive that the above term is bounded from below by |m(XK+1 , YK+1 ) − g(GK (g), YK+1 )|2 /2. Then, taking the expectation of these random variables yields the desired result. Proof of Theorem 4.1: By Condition C we have kgk22 ≤ c2G for all g ∈ G. As m b ∈ G hold true by the definition of m b we have kmk b 22 ≤ c2G almost surely. Condition B yields that kmk22 ≤ c2M for the true regression function m since |m| is bounded by cM on the entire support of (X1 , Y1 ). Therefore, we have km b − mk2 ≤ cM + cG almost surely and Ekm b − mk22 = Z ∞ δ=0 P km b − mk22 > δ dδ ≤ δn + Z (cM +cG )2 δ=δn P km b − mk22 > δ dδ , (6.7) where δn := cδ · n−β/(β+1) for some constant cδ > 1 still to be chosen. Clearly we have 0 < δn < (cM + cG )2 for n sufficiently large. By the definition of m b we derive that Φn (m) b ≤ Φn (g0 ) holds 2 >δ true almost surely. Writing ∆n (g) := Φn (g)−EΦn (g), equation (4.2) provides that km−mk b 2 implies the existence of some g ∈ G with km − gk22 > δ such that 1 4 1 km − g0 k22 + RK . ∆n (g) − ∆n (g0 ) ≤ − km − gk22 − δ + 4 4 (1 − c1 )2 Condition C yields that sup km − g0 k22 = O n−β/(1+β) . m∈M Therefore cδ can be selected sufficiently large (uniformly for all m ∈ M) such that |∆n (g) − ∆n (g0 )| ≥ 22 1 km − gk22 , 4 for all δ > δn . Therein, note that, by (4.3) and the imposed choice of K, the term RK is asymptotically negligible uniformly with respect to m. Hence, we have shown that h i 1 P km b − mk22 > δ ≤ P ∃g ∈ G : km − gk22 > δ , ∆n (g) − ∆n (g0 ) ≥ km − gk22 4 h 1 i ≤ P ∃g ∈ G : ∆n (g) − ∆n (g0 ) ≥ max δ, km − gk22 , 4 holds true for all δ ∈ δn , (cG + cM )2 and for all n > N with some N which does not depend on m but only on the function class M. By the representation ∆n (g) − ∆n (g0 ) = n−2 X 1 Vj,K (g) + Vj+1,K+1 (g) − Vj,K (g0 ) − Vj+1,K+1 (g0 ) n−K −1 j=K − 2Uj,K (g) + 2Uj,K (g0 ) − 2Uj+1,K+1 (g) + 2Uj+1,K+1 (g0 ) , where 2 Vj,K (g) := m[K] (Xj−K+1 , Yj−K+1 , . . . , Yj ) − g [K] (x0 , Yj−K+1 , . . . , Yj ) 2 − E m[K] (Xj−K+1 , Yj−K+1 , . . . , Yj ) − g [K] (x0 , Yj−K+1 , . . . , Yj ) , Uj,K (g) := Xj+1 · m[K] (Xj−K+1 , Yj−K+1 , . . . , Yj ) − g [K] (x0 , Yj−K+1 , . . . , Yj ) · (εj+1 − 1) , we deduce that 1 h X P km b − mk22 > δ ≤ P sup l=0 + g∈G j=K 1 h X P sup g∈G l=0 ≤ n−2 X 1 1i Vj+l,K+l (g)/ max{δ, km − gk22 } ≥ n−K −1 32 n−2 X 1i 1 Uj+l,K+l (g)/ max{δ, km − gk22 } ≥ n−K −1 64 j=K 1 n − K − 1i h n−2 XX X P Vj+l,K+l (g)/ max{δ, km − gk22 } ≥ 32 g∈G l=0 + j=K 1 n − K − 1i h n−2 XX X Uj+l,K+l (g)/ max{δ, km − gk22 } ≥ , P 64 g∈G l=0 (6.8) j=K for all δ ∈ δn , (cG + cM )2 and n > N . In order to continue the bounding of the probability P km b − mk22 > δ an exponential inequality for weakly dependent and identically distributed random variables is required. In the following we will utilize a result from Merlevède et al. (2011). For that purpose, we introduce the stochastic processes (Xj∗ )j and (Yj∗ )j by  x0 , ∗ Xj := m e [j−p−2] (x0 , εp+2 , . . . , εj−1 ) , 23 for j ≤ p + 2 , otherwise. and Yj∗  x0 , := X ∗ ε , j j for j ≤ p + 2 , otherwise. ∗ (g) and U ∗ (g) by Accordingly we define the random variables Vj,k j,K 2 ∗ ∗ ∗ ∗ Vj,K (g) := m[K] (Xj−K+1 , Yj−K+1 , . . . , Yj∗ ) − g [K] (x0 , Yj−K+1 , . . . , Yj∗ ) 2 − E m[K] (Xj−K+1 , Yj−K+1 , . . . , Yj ) − g [K] (x0 , Yj−K+1 , . . . , Yj ) , ∗ ∗ ∗ ∗ ∗ Uj,K (g) := Xj+1 · m[K] (Xj−K+1 , Yj−K+1 , . . . , Yj∗ ) − g [K] (x0 , Yj−K+1 , . . . , Yj∗ ) · (εj+1 − 1) , All variables Xj∗ and Yj∗ are measurable in the σ-field generated by εp+2 , εp+3 , . . . and so are ∗ (g) and U ∗ (g) for all j ≥ p + 1. Moreover, by M , s = 0, 1, we denote the σ-fields Vj,K p,s j,K induced by the Vj+l,K+l (g), j ≤ p and Uj+l,K+l (g), j ≤ p, respectively, which are both contained ∗ (g) and U ∗ (g), j ≥ p + 1 are independent in σ (Xj+l , εj+l ) : j ≤ p as a subset. Hence, all Vj,K j,K of Mp,s , s = 0, 1, respectively, so that E f (fL (Vj∗1 +l,K+l , . . . , Vj∗L +l,K+l )) | Mp,0 = E f (fL (Vj∗1 +l,K+l , . . . , Vj∗L +l,K+l )) , E f (fL (Uj∗1 +l,K+l , . . . , Uj∗L +l,K+l )) | Mp,1 = E f (fL (Uj∗1 +l,K+l , . . . , Uj∗L +l,K+l )) , hold true almost surely for p + 1 ≤ j1 < · · · < jL . Therein f and fL denote some arbitrary 1-Lipschitz-functions from R and RL , resp., to R. From there we learn that E f (fL (Vj +l,K+l , . . . , Vj +l,K+l )) | Mp,0 − E f (fL (Vj +l,K+l , . . . , Vj +l,K+l )) 1 1 L L ≤ E f (fL (Vj1 +l,K+l , . . . , VjL +l,K+l )) − f (fL (Vj∗1 +l,K+l , . . . , Vj∗L +l,K+l )) | Mp,0 + E f (fL (Vj1 +l,K+l , . . . , VjL +l,K+l )) − f (fL (Vj∗1 +l,K+l , . . . , Vj∗L +l,K+l )) L L X X ∗ ≤ E Vjr +l,K+l − Vjr +l,K+l + E Vjr +l,K+l − Vj∗r +l,K+l | Mp,0 . r=1 (6.9) r=1 Analogously we deduce that E f (fL (Uj +l,K+l , . . . , Uj +l,K+l )) | Mp,1 − E f (fL (Uj +l,K+l , . . . , Uj +l,K+l )) 1 1 L L ≤ L L X X E Ujr +l,K+l − Uj∗r +l,K+l + E Ujr +l,K+l − Uj∗r +l,K+l | Mp,1 . r=1 (6.10) r=1 Now we consider that 2 00 Vj +l,K+l − V ∗ r jr +l,K+l ≤ ∆ + 2(R + cG )∆ , 00 00 ∗ Uj +l,K+l − U ∗ r jr +l,K+l ≤ (Rε + 1) R ∆ + (R + cG )|Xjr +l+1 − Xjr +l+1 | + ∆|Xj∗r +l+1 − Xjr +l+1 | , 24 (6.11) where R00 ≥ RX is as in Condition B and ∆ := m[K] (Xj∗r −K+1 , Yj∗r −K+1 , . . . , Yj∗r +l ) − m[K] (Xjr −K+1 , Yjr −K+1 , . . . , Yjr +l ) + g [K] (x0 , Yj∗r −K+1 , . . . , Yj∗r +l ) − g [K] (x0 , Yjr −K+1 , . . . , Yjr +l ) . We have [K] ∗ m (Xj −K+1 , Yj∗ −K+1 , . . . , Y ∗ ) − m[K] (Xjr −K+1 , Yjr −K+1 , . . . , Yj +l ) r jr +l r r ≤ kmx k∞ · m[K−1] (Xj∗r −K+1 , Yj∗r −K+1 , . . . , Yj∗r +l−1 ) − m[K−1] (Xjr −K+1 , Yjr −K+1 , . . . , Yjr +l−1 ) + kmy k∞ · Yj∗r +l − Yjr +l .. . ≤ kmy k∞ · K+l−1 X ∗ kmx kq∞ Yj∗r +l−q − Yjr +l−q + kmx kK+l ∞ · Xjr −K+1 − Xjr −K+1 , (6.12) q=0 by iterating the first inequality driven by the mean value theorem. We obtain the corresponding inequality [K] g (x0 , Yj∗ −K+1 , . . . , Y ∗ ) − g [K] (x0 , Yjr −K+1 , . . . , Yj +l ) r jr +l r ≤ kgy k∞ · K+l−1 X kgx kq∞ Yj∗r +l−q − Yjr +l−q . (6.13) q=0 For the distances between the variables and their proxies we deduce that ∗ Xj −K+1 − Xjr −K+1 ≤ (x0 + R00 )c(jr −K−p−1)+ . 1 r since km e x (·, εj )k∞ ≤ c1 for all integer j and Xjr −K+1 = m e [jr −K−p−1] (Xp+2 , εp+2 , . . . , εjr −K ) for jr − K − p ≥ 2. Also we have  x0 + Rε R00 , jr +l−q − Yjr +l−q ≤ R (x + R00 )cjr +l−q−p−2 , ε 0 1 for q ≥ jr + l − p − 2 , ∗ Y otherwise. Inserting these inequalities into (6.12) we obtain that [K] ∗ m (Xj −K+1 , Yj∗ −K+1 , . . . , Y ∗ ) − m[K] (Xjr −K+1 , Yjr −K+1 , . . . , Yj +l ) r jr +l r r ≤ c1 (x0 + R00 )c1jr +l−p−2 min{K + l, (jr + l − p − 2)+ } + c1 (x0 + Rε R00 ) Rε K+l−1 X K+l+(jr −K−p−1)+ cq1 + (x0 + R00 ) c1 q=jr +l−p−2 ≤ (x0 + R00 )(jr + l − p − 2)+ cj1r +l−p−1 + x0 + Rε R00 jr +l−p−1 c Rε (1 − c1 ) 1 + (x0 + R00 ) c1jr +l−p−1 ≤ const. · (jr − p)cj1r −p , 25 where we arrange that the constant “const.” only depends on c1 , x0 , R00 , Rε , cG . Note that the deterministic constant const. may change its value from line to line as long as its sole dependence on the above quantities is not affected. With respect to (6.13) we analogously derive that [K] g (x0 , Yj∗ −K+1 , . . . , Y ∗ ) − g [K] (x0 , Yjr −K+1 , . . . , Yj +l ) r jr +l r ≤ cG Rε (x0 + R00 )(jr + l − p − 2)+ cj1r +l−p−2 + cG (x0 + Rε R00 ) jr +l−p−1 c1 1 − c1 ≤ const. · (jr − p)cj1r −p . Thus, ∆ also admits the upper bound const. · (jr − p)cj1r −p . As above we deduce that ∗ 00 (jr +l−p−1)+ X . jr +l+1 − Xjr +l+1 ≤ (x0 + R )c1 Combining these findings with the inequalities (6.11) we provide that max Vjr +l,K+l − Vj∗r +l,K+l , Ujr +l,K+l − Uj∗r +l,K+l ≤ const. · (jr − p)cj1r −p . The equations (6.9) and (6.10) yield that E f (fL (Vj 1 +l,K+l , . . . , VjL +l,K+l )) | Mp,0 ≤ const. · − E f (fL (Vj1 +l,K+l , . . . , VjL +l,K+l )) ∞ L X X ici1 (jr − p)cj1r −p ≤ const. · i=j1 −p r=1 = const. · c1 cj11 −p d = const. · cj11 −p · (j1 − p)(1 − c1 ) + c1 , dc1 1 − c1 (6.14) while E f (fL (Uj 1 +l,K+l , . . . , UjL +l,K+l )) | Mp,1 − E f (fL (Uj1 +l,K+l , . . . , UjL +l,K+l )) , admits the same upper bound. Note that this upper bound does not depend on L. In the notation of Merlevède et al. (2011), in particular eq. (2.3), we have established that bxc τ (x) ≤ const. · c1 · bxc(1 − c1 ) + c1 , for all x ≥ 1. Thus, τ (x) ≤ const. · exp x · log c1 + 1/c1 − 1 , Since c1 ∈ (0, 1) we have log c1 + 1/c1 − 1 < 0 as the function h(x) := log x + 1/x − 1, x ∈ (0, 1], is continuous, increases strictly monotonically and satisfies h(1) = 0. Therefore, the condition (2.6) in Merlevède et al. (2011) is fulfilled when putting c := − log c1 − 1/c1 + 1, γ1 := 1 and choosing a sufficiently large (only depending on c1 , x0 , R00 , Rε ). Moreover the condition (2.7) of that paper is also satisfied for any γ2 > 0 and an appropriate constant b with respect to γ2 26 since the supports of the distributions of the random variables Vj,K (g) and Uj,K (g) are included in the intervals [−2(cG + R00 )2 , 2(cG + R00 )2 ] and [−R00 (cG + R00 )(Rε + 1), R00 (cG + R00 )(Rε + 1)], respectively, as subsets. Then, the constraint (2.8) of Merlevède et al. (2011) is also satisfied for γ := γ2 /(1 + γ2 ) ∈ (0, 1). Now let us consider the covariance of Vj+l,K+l (g) and Vi+l,K+l (g) for j > i. We have cov Vj+l,K+l (g), Vi+l,K+l (g) = EVj+l,K+l (g)Vi+l,K+l (g) = EVi+l,K+l (g) E Vj+l,K+l (g) | Mi,0 . We obtain that cov Vj+l,K+l (g), Vi+l,K+l (g) ≤ E|Vi+l,K+l (g)| · ess sup E Vj+l,K+l (g) | Mi,0 . Let us put f (x) := x and f1 (x) := x with L := 1 in (6.14). Then this inequality yields that ess sup E Vj+l,K+l (g) | Mi,0 ≤ const. · cj−i · (j − i)(1 − c1 ) + c1 . 1 Therein we have used that Vj+l,K+l (g) is centered. The quantity V in eq. (1.12) in Merlevède et al. (2011) – with respect to the sum of the Vj+l,K+l (g)/ max{δ, km − gk22 } – turns out to be bounded from above by 2 (g)/ max{δ 2 , km − gk42 } EVK+l,K+l X · (j − K)(1 − c1 ) + c1 / max{δ 2 , km − gk42 } + const. · E|VK+l,K+l (g)|cj−K 1 j>K ≤ const. · E|VK+l,K+l (g)|/ max{δ 2 , km − gk42 } . Note that the VK+l,K+l are identically distributed for all l and that |VK+l,K+l | is essentially bounded by 2(cG + R00 )2 . We consider that 2 E|VK+l,K+l (g)| ≤ 2E m[K] (X1 , Y1 , . . . , YK+l ) − g [K] (x0 , Y1 , . . . , YK+l ) ≤ 2E kgx k∞ · m[K−1] (X1 , Y1 , . . . , YK−1+l ) − g [K−1] (x0 , Y1 , . . . , YK−1+l ) 2 + m(XK+l , YK+l ) − g(XK+l , YK+l ) .. . ≤ 2E ≤ K+l−1 X 2 ci1 m(XK+l−i , YK+l−i ) − g(XK+l−i , YK+l−i ) + cK+l |X − x | 1 0 1 i=0 K+l−1 X 4 · 1 − c1 2 2(K+l) ci1 E m(XK+l−i , YK+l−i ) − g(XK+l−i , YK+l−i ) + 4(R00 + x0 )2 c1 i=0 27 ≤ 4 2(K+l) · km − gk22 + 4(R00 + x0 )2 c1 , (1 − c1 )2 (6.15) by the Cauchy-Schwarz inequality and the strict stationarity of the process (Xi , Yi ), integer i. Focussing on the covariance of Uj+l,K+l (g) and Ui+l,K+l (g) for j > i we realize that EUj+l,K+l Ui+l,K+l = EUi+l,K+l Xj+1+l · m[K] (Xj−K+1 , Yj−K+1 , . . . , Yj+l ) − g [K] (x0 , Yj−K+1 , . . . , Yj+l ) · E(εj+l+1 − 1) = 0, for j > i. The Uj+l,K+l (g) are centered so that cov Uj+l,K+l , Ui+l,K+l vanishes for all j > i. Hence, the quantity V in eq. (1.12) in Merlevède et al. (2011) – with respect to the sum of the 2 (g)/ max{δ 2 , km − gk42 } since the |Uj+l,K+l | Uj+l,K+l (g)/ max{δ, km − gk22 } – equals EUK+l,K+l are essentially bounded by R00 (Rε + 1)(cG + R00 ). We consider that 2 EUK+l,K+l (g) = (R00 )2 (Rε + 1)2 · E m[K] (Xj−K+1 , Yj−K+1 , . . . , Yj+l ) 2 − g [K] (x0 , Yj−K+1 , . . . , Yj+l ) ≤ (R00 )2 (Rε + 1)2 · 2 · km − gk22 (1 − c1 )2 2(K+l) + 2(R00 )2 (Rε + 1)2 (R00 + x0 )2 c1 , (6.16) where we have used the upper bound from (6.15). Thus the quantity V in the notation of Merlevède et al. (2011) is bounded from above by 2 4 2K 2 d1 / max{δ, km − gk22 } + d2 c2K 1 / max{δ , km − gk2 } ≤ d1 /δ + d2 c1 /δ , where the finite positive constants d1 and d2 only depends on c1 , R00 and Rε . Applying Theorem 1 from Merlevède et al. (2011) to (6.8) yields that 2 P km b − mk2 > δ ≤ 4 #G · (n − K − 1) exp − D1 (n − K + 1)γ + exp − D2 (n − K − 1)2 /3 n D o 2 (n − K − 1)δ + exp − 3d1 n D o 2 2 + exp − (n − K − 1)c−2K δ 1 3d2 n (n − K − 1)γ(1−γ) o + exp − D3 (n − K − 1) · exp D4 , logγ (n − K − 1) (6.17) 28 for all δ ∈ (δn , (cG + cM )2 ) and n > max{4, N } where the positive finite constants D1 , . . . , D4 only depend on c1 , x0 , R00 , Rε , cG and γ2 . Considering the upper bound on the cardinality of G provided by Condition C, we apply the inequality (6.17) to (6.7) and obtain that sup Ekm b − mk22 ≤ δn + 4(cM + cG )2 (n − K − 1) exp cN n1/(1+β) − D1 (n − K + 1)γ m∈M + 4(cM + cG )2 · exp cN n1/(1+β) − D2 (n − K + 1)2 /3 n (n − K − 1)γ(1−γ) o + 4(cM + cG )2 · exp cN n1/(1+β) − D3 (n − K − 1) · exp D4 logγ (n − K − 1) n o D2 + 4(cM + cG )2 · exp cN n1/(1+β) − (n − K − 1)c−2K δn2 1 3d2 o n 12d1 D2 + (n − K − 1)δn exp cN n1/(1+β) − D2 (n − K − 1) 3d1 for all n > max{4, N }. Due to δn = cδ · n−β/(1+β) with cδ sufficiently large, the selection of K and the fact that γ can be chosen arbitarily close to 1 (select γ2 large enough) while β > 1 by Condition C, the first addend δn dominates asymptotically in the above inequality which finally leads to the desired rate. Proof of Lemma 4.2: The derivative of m e (x) (y) := m(x, e y) with respect to y is bounded from below by c0M α00 for any m ∈ M0 and x ≥ α00 so that the functions m e (x) : [0, ∞) → [m(x, 0), ∞), x ≥ α00 , increase strictly monotonically and, hence, are invertible. For any c ≥ 0 we have {(x, z) : m e −1 e c) − z ≥ 0} so that Borel measurability of the function (x) (z) ≤ c} = {(x, z) : m(x, −1 (x, z) 7→ m e (x) (z) follows of that of m. e By Fubini’s theorem, it follows from there that Z Z P [X1 ≤ z] = P [m(x, e ε1 ) ≤ z]dPX1 (x) = P ε≤m e −1 (x) (z) dPX1 (x) Z Z = m e −1 (z) (x) Z z fε (t) dt dPX1 (x) = t=0 fε (m e −1 (x) (s)) Z s=0 m e y (x, m e −1 (x) (s)) dPX1 (x) ds , for all z ≥ 0 so that PX1 is absolutely continuous and has the Lebesgue density fX (z) = E fε (m e −1 (X1 ) (z)) m e y (X1 , m e −1 (X1 ) (z)) , which is bounded from above by kfε k∞ /(c0M α00 ) as X1 ≥ α00 holds true almost surely. Moreover the support of fX is included in [α00 , R00 ] with R00 as in Condition B. Therefore we have ZZ ZZ 2 2 kf k2 = f (x, xt)fX (x)fε (t) dx dt ≤ f 2 (x, xt) dx dt · kfε k2∞ /(c0M α00 ) [α00 ,R00 ]×[0,Rε ] ZZ ≤ [α00 ,R00 ]×[0,R00 Rε ] f 2 (x, y) dx dy · kfε k2∞ /{c0M (α00 )2 } , 29 so that k · k2 is dominated by k · kλ multiplied by a uniform constant. Proof of Lemma 4.3: It follows immediately that Q((a, b]) ≥ c · λ((a, b]) , for all left-open intervals (a, b] ⊆ I since Q((a, b]) ≥ Q([a − 1/m, b]) ≥ c · λ([a − 1/m, b]) → c(b − a) = cλ((a, b]) , as m → ∞ for all (a, b] ⊆ I. Let F denote the collection of all unions of finitely many intervals (a, b] ⊆ I. One can verify that F forms a ring of subsets of I which is closed with respect to the relative complement related to I. Moreover, F can equivalently be defined by the collection of all unions of finitely many intervals (a, b] ⊆ I which are pairwise disjoint. For all f ∈ F we conclude that Q(f ) = Q n [ n n X X (ak , bk ] = Q((ak , bk ]) ≥ c · λ((ak , bk ]) = c · λ(f ) , k=1 k=1 k=1 where the intervals (ak , bk ] are assumed to be pairwise disjoint. The σ-field generated by the ring F equals the Borel σ-field of I as F contains all left-open intervals of I. As Q and λ are finite measures on that Borel σ-field their restrictions to the domain F are finite pre-measures. By Carathéodory’s extension theorem, Q and λ fulfil Q(g) = inf ∞ nX Q(fj ) : g ⊆ j=1 λ(g) = inf ∞ nX j∈N λ(fj ) : g ⊆ j=1 Since P∞ j=1 Q(fj ) ≥ c· P∞ j=1 λ(fj ) o fj , fj ∈ F , [ [ o fj , fj ∈ F . j∈N the outer measure representation of Q and λ yields that Q(g) ≥ c · λ(g) for all Borel subsets g of I. Clearly for all elementary non-negative measurable P mappings ϕ on I, ϕ = nk=1 ϕk · 1gk , for some real numbers ϕk ≥ 0 and pairwise disjoint Borel subsets gk , k = 1, . . . , n, of I, it follows that Z Z ϕ(x)dQ(x) ≥ c · ϕ(x)dλ(x) . This inequality extends to all non-negative measurable functions ϕ by the monotone convergence theorem as any such function ϕ is pointwise approximable by a pointwise monotonically increasing sequence of elementary non-negative measurable mappings on the domain I. 30 Now we assume that the Lebesgue density q of Q on I exists. Then we put ϕ equal to the indicator function of the preimage q −1 ((−∞, c)) ⊆ I so that Z Z Z (q(x) − c)ϕ(x)dλ(x) = ϕ(x)dQ(x) − c · ϕ(x)dλ(x) ≥ 0 . As the function (q − c)ϕ is non-positive it follows that, for Lebesgue almost all x ∈ I, we have q(x) = c or ϕ(x) = 0. Thus, for those x we have q(x) ≥ c. Proof of Theorem 4.2: The statistical experiments under which one observes the data X 1 , Y 1 , . . . , Y n with X 1 = log X1 and Y j = log Yj from model (1.1) is more informative than the model under consideration in which only Y1 , . . . , Yn are accessible. Hence, as we are proving a lower bound, we may assume that log X1 is additionally observed and that the Y j are recorded instead of the Yj . First let us verify that, for all binary matrices θ, the functions mθ from (4.6) lie in M. The support of the function mθ − mG is included in o n 1 (x, y) ∈ R2 : x ∈ α0 /(1 − α1 ) + Rε β10 α0 · [1, 2] , 0 ≤ y ≤ Rε x/2 , 16 as a subset. Thus we derive that k(mθ )x − (mG )x k∞ ≤ 8cH bn1−β · k(mθ )y − (mG )y k∞ ≤ n o 8 4 kH k + kH k x ∞ y ∞ , Rε β10 α0 α0 8cH kHy k∞ · bn1−β , Rε α0 (6.18) while (mG )x ≡ α1 and (mG )y ≡ β1 . By Condition D we have α1 + β1 Rε < c1 so that Condition (A1) holds true for all mθ for cH > 0 sufficiently small. All functions mθ coincide with mG on their restriction to h i 1 0, α0 /(1 − α1 ) + Rε β10 α0 × [0, ∞) , 16 (6.19) while, for x > α0 /(1 − α1 ) + Rε β10 α0 /16, the upper bound kmθ − mG k∞ ≤ cH bβn kHk∞ , for cH small enough, suffices to verify the envelope Condition (A2). By (6.18) and the small cH we also establish that inf (mθ )x ≥ α1 /2 > 0 , inf (mθ )y ≥ β1 /2 > 0 , x≥0,y≥0 x≥0,y≥0 for all θ, what guarantees the monotonicity Condition (A3). Moreover, the above inequalities guarantee Condition C’ so that the stationary distribution of (Xn )n under any mθ has a bounded 31 Lebesgue density fX,θ by Lemma 4.2. Also, as cH is sufficiently small with respect to all other constants, we can verify the smoothness constraint in Condition B for all admitted values of θ. As we have shown that MP := mθ : θ ∈ Θn ⊆ M , where Θn denotes the set of all binary (bn × bn )-matrices, we follow a usual strategy of minimax theory and put an a-priori distribution on MP . Concretely, we assume that all components of the binary matrix θ are independent and satisfy P [θj,k = 1] = 1/2 and estimate the minimax risk by the corresponding Bayesian risk from below. However, we face the problem that the loss function depends on the stationary distribution of the data and, hence, on mθ . As all mθ equal mG on the interval (6.19) we have α0 /(1 − α1 ) = mθ α0 /(1 − α1 ), 0 . By Proposition 2.1(b), the left endpoint of the stationary distribution under the regression function mθ equals α0 /(1 − α1 ) for all θ ∈ Θn so that J := J (mθ ) = (α0 /(1 − α1 ) + Rε β10 α0 /16, α0 /(1 − α1 ) + Rε β10 α0 /8] , for all θ ∈ Θn . The closure of this interval contains the whole support of mθ − mG for all θ ∈ Θn . Then, (4.5) yields that km bn − mθ k22 Z Rε /2 Z 2 m b n (x, xt) − m e θ (x, t) dx dt , ≥ µ6 rε · J 0 where the k · k2 -norm depends on θ. We introduce the intervals Ij,k := ij,k + −Rε β10 α0 /(64bn ), Rε β10 α0 /(64bn ) × −Rε /(8bn ), Rε /(8bn ) , j, k = 0, . . . , bn −1 , which are pairwise disjoint and subsets of [0, Rε /2] × J . We deduce that Z Rε /2 Z 2 2 m sup Em km b n − mk2 ≥ µ6 rε Eθ Emθ b n (x, xt) − m e θ (x, t) dx dt m∈M ≥ 1 µ6 rε 2 0 bX n −1 Eθ,j,k Emθ,j,k,0 ZZ Ij,k j,k=0 J 2 m b n (x, xt) − m e θ,j,k,0 (x, t) dx dt ZZ + Emθ,j,k,1 Ij,k ≥ 1 µ6 rε 4 bX n −1 j,k=0 Z ZZ Eθ,j,k Ij,k µ6 rε Rε2 β10 cH α0 −2β−2 ≥ · bn · 2048 2 m b n (x, xt) − m e θ,j,k,1 (x, t) 2 m e θ,j,k,1 (x, t) − m e θ,j,k,0 (x, t) dx dt min{Ln,θ,j,k,0 (y), Ln,θ,j,k,1 (y)}dy ZZ 2 H (x, y)dxdy · bX n −1 Eθ,j,k q 1 − 1 − A(Ln,θ,j,k,0 , Ln,θ,j,k,1 ) j,k=0 where Em , Emθ , Emθ,j,k,b , b ∈ {0, 1}, denote the expectation with respect to the data when m, mθ , mθ,j,k,b , resp., is the true autoregression function and we write mθ,j,k,b for the function 32 mθ when θj,k = b and all other components of θ are random; Eθ stands for the expectation with respect to the a-priori distribution of θ; Eθ,j,k denotes that expectation with respect to all components of θ except θj,k ; Ln,θ,j,k,b , b ∈ {0, 1}, denotes the Lebesgue density of the data under the true autoregression function mθ,j,k,b ; finally, Z p A(f1 , f2 ) := f1 (x)f2 (x)dx , stands for the Hellinger affinity between two densities f1 and f2 . We have used LeCam’s inequality in the last step. Therefore, we obtain the lower bound bn−2β on the convergence rate if we verify that inf inf inf n≥N θ∈Θn j,k=0,...,bn −1 A(Ln,θ,j,k,0 , Ln,θ,j,k,1 ) > 0 , (6.20) for some fixed integer N . For the likelihood function, i.e. the data density with respect to the (n + 1)-dimensional Lebesgue measure, we have the following recursive equations L0,θ,j,k,b (x1 ) = f X,θ,j,k,b (x1 ) , Ll+1,θ,j,k,b (x1 , y1 , . . . , yl+1 ) = f ε yl+1 − mθ,j,k,b (xl , yl ) · Ll,θ,j,k,b (x1 , y1 , . . . , yl ) . for l = 0, . . . , n − 1. Therein we have mθ,j,k,b (x, y) = log mθ,j,k,b (exp(x), exp(y)) and f ε , f X,θ,j,k,b denote the density of log ε1 and the stationary density of (log Xn )n under m = mθ,j,k,b , respectively. Also the x2 , . . . , xl are recursively defined by xr+1 = mθ,j,k,b (xr , yr ). Writing Z p p 2 1/2 H(f1 , f2 ) := f1 (x) − f2 (x) dx , for the Hellinger distance between densities f1 and f2 , we deduce that Al+1 := A(Ll+1,θ,j,k,0 , Ll+1,θ,j,k,1 ) Z Z Z q f ε yl+1 − mθ,j,k,0 (xl , yl ) f ε yl+1 − mθ,j,k,1 (xl , yl ) dyl+1 = ··· q · Ll,θ,j,k,0 (x1 , y1 , . . . , yl ) · Ll,θ,j,k,1 (x1 , y1 , . . . , yl )dx1 dy1 · · · dyl Z Z 1 = ··· 1 − H2 f ε · −mθ,j,k,0 (xl , yl ) , f ε · −mθ,j,k,1 (xl , yl ) 2 q · Ll,θ,j,k,0 (x1 , y1 , . . . , yl ) · Ll,θ,j,k,1 (x1 , y1 , . . . , yl )dx1 dy1 · · · dyl = Al − 1 ∆n,θ,j,k , 2 (6.21) with ∆n,θ,j,k ≤ Emθ,j,k,0 H2 f ε · −mθ,j,k,0 (X l , Y l ) , f ε · −mθ,j,k,1 (X l , Y l ) 1/2 , · Emθ,j,k,1 H2 f ε · −mθ,j,k,0 (X l , Y l ) , f ε · −mθ,j,k,1 (X l , Y l ) 33 by the Cauchy-Schwarz inequality. For any a, b ∈ R the regularity constraint (4.7) implies that H2 f ε · −a , f ε · −b ≤ Fε · (a − b)2 , so that 2 ∆n,θ,j,k ≤ Fε · max log mθ,j,k,0 − log mθ,j,k,1 2,b , b=0,1 where we write k·k2,b for the k·k2 -norm under the true autoregression function mθ,j,k,b . According to Lemma 4.2, we have log mθ,j,k,0 − log mθ,j,k,1 2 2,b ≤ ≤ ZZ = Z 2kfε k2∞ c2H −2β H2 · bn · β1 α02 Ij,k ZZ 0 2 2 kfε k∞ Rε β1 cH 2 256β1 α0 · 2 log m e θ,j,k,0 (x, t) − log m e θ,j,k,1 (x, t) fX,θ,j,k,b (x)fε (t) dx dt 64bn (x − ij,k,1 )/(Rε β10 α0 ), 8bn (t − ij,k,2 )/Rε dx dt H (x, y)dxdy · bn−2β−2 , which provides an upper bound on ∆n,θ,j,k . Then it follows from (6.21) by induction that ZZ kfε k2∞ Rε2 β10 cH An ≥ A 0 − · H 2 (x, y)dxdy · nbn−2β−2 . (6.22) 512β1 α0 Then we consider that Z q Z q A0 = f X,θ,j,k,0 (x)f X,θ,j,k,1 (x) dx = fX,θ,j,k,0 (exp(x))fX,θ,j,k,1 (exp(x)) exp(x)dx Z q Z q = fX,θ,j,k,0 (x)fX,θ,j,k,1 (x) dx ≥ fX,θ,j,k,0 (x)fX,θ,j,k,1 (x) dx J ≥ µ6 Rε β10 α0 /16 > 0, using (4.4) and Lemma 4.3 where the above lower bound does not depend on cH . Viewing cH as sufficiently small and choosing bn = n1/(2β+2) , the equation (6.22) finally proves (6.20), which = n−β/(β+1) as a lower bound on the convergence rate. establishes b−2β n References [1] Audrino, F. and Bühlmann, P. (2001). Tree-structured generalized autoregressive conditional heteroscedastic models. J. Roy. Statist. Soc., Ser. B 63, 727–744. [2] Audrino, F. and Bühlmann, P. (2009). Splines for financial volatility. J. Roy. Statist. Soc., Ser. B 71, 655–670. [3] Berkes, I., Horváth, L. and Kokoszka, P.S. (2003). GARCH processes: structure and estimation. Bernoulli 9, 201–227. 34 [4] Berkes, I. and Horváth, L. (2004). The efficiency of the estimators of the parameters in GARCH processes. Ann. Statist. 32, 633–655. [5] Bollerslev, T. (1986). Generalized autoregressive conditional heteroscedasticity. J. Econometrics 31, 307–327. [6] Bougerol, P. and Picard, N. (1992). Stationarity of GARCH processes and of some non-negative time series. J. Econometrics 52, 115–127. [7] Bougerol, P. (1993). Kalman filtering with random coefficients and contractions. SIAM J. Control. Optim. 31, 942–959. [8] Buchmann, B. and Müller, G. (2012). Limits experiments of GARCH. Bernoulli 18, 64–99. [9] Bühlmann, P. and McNeil, A.J. (2002). An algorithm for nonparametric GARCH modelling. Comp. Statist. Data Anal. 40, 665–683. [10] Čı́žek, P., Härdle, W. and Spokoiny, V. (2009). Adaptive pointwise estimation in timeinhomogeneous conditional heteroscedasticity models, Econom. J. 12, 248–271. [11] Conrad, C. and Mammen, E. (2009). Nonparametric regression on latent covariates with an application to semiparametric GARCH-in-Mean models. Discussion Paper No. 473, University of Heidelberg, Department of Economics. [12] Christensen, B.J., Dahl, C.M. and Iglesias, E. (2012). Semiparametric inference in a GARCH-inmean model. J. Econometrics 167, 458-472. [13] Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Review 41, 45–76. [14] Ding, Z., Granger, C.W.J. and Engle, R. (1993). A long memory property of stock market returns and a new model. J. Empirical Finance 1, 83–106. [15] Engle, R. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of the United Kingdom inflation. Econometrica 50, 987–1007. [16] Fan, J., Qi, L. and Xiu, D. (2014). Quasi Maximum Likelihood Estimation of GARCH Models with Heavy-Tailed Likelihoods. J. Busi. Econ. Stat. 32, 178–205. [17] Fan, J. and Yao, Q., Nonlinear time series: nonparametric and parametric methods, 2003, Springer. [18] Francq, C. and Zikoı̈an, J.-M., GARCH models: structure, statistical inference and financial applications, 2010, Wiley. [19] Francq, C., Lepage, G. and Zikoı̈an, J.-M. (2011). Two-stage non Gaussian QML estimation of GARCH models and testing the efficiency of the Gaussian QMLE. J. Econometrics 165, 246–257. [20] Hall, P. and Yao, Q. (2003). Inference in ARCH and GARCH models with heavy-tailed errors. Econometrica 71, 285–317. 35 [21] Hörmann, S. (2008). Augmented GARCH sequences: dependence structure and asymptotics. Bernoulli 14, 543–561. [22] Lee, S.-W and Hansen, B.E. (1994). Asymptotic theory for the GARCH(1, 1) quasi-maximum likelihood estimator. Econometric Theory 10, 29–52. [23] Linton, O. and Mammen, E. (2005). Estimating semiparametric ARCH(∞) models by kernel smoothing methods. Econometrica 73, 771–836. [24] Lumsdaine, R.L. (1996). Consistency and asymptotic normality of the quasi-maximum likelihood estimator in IGARCH(1, 1) and covariance stationary GARCH(1, 1) models. Econometrica 64, 575– 596. [25] Merlevède, F., Peligrad, M. and Rio, E. (2011). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Prob. Theo. Rel. Fields 151, 435–474. [26] Nelson, D. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59, 347–370. [27] Robinson, P.M. and Zaffaroni, P. (2006). Pseudo-maximum-likelihood estimation of ARCH(∞) models. Ann. Statist. 34, 1049–1074. [28] Straumann, D. and Mikosch, Th. (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. Ann. Statist. 34, 2449– 2495 [29] van de Geer, Sara A., Empirical Processes in M-estimation: Applications of empirical process theory, 2000, Cambridge Series in Statistical and Probabilistic Mathematics 6, Cambridge University Press. [30] van der Vaart, A.W. and Wellner, J.A., Weak Convergence and Empirical Processes with Applications to Statistics, 1996, Springer. [31] Wu, W.B. and Shao, X. (2004). Limit theorems for iterated random functions. J. Appl. Prob. 41, 425–436. A MATLAB code – Supplement not to be published In this section we provide the MATLAB codes for the estimators in the simulation section 5 from the main article. 36 A.1 Computation of the function mα0 ,α1 ,β1 ,βp f u n c t i o n e r g = l i n s i n ( a l p h a 0 , a l p h a 1 , beta , b et a p , x , y ) ; % here ” beta ” = beta 1 s t o e r u n g = b e t a p ∗ s i n ( 2 ∗ p i ∗x ’ ) ∗ o n e s ( 1 , l e n g t h ( y ) ) + b e t a p ∗ o n e s ( 1 , l e n g t h ( x ) ) ’ ∗ c o s ( 2 ∗ p i ∗y ) ; erg = alpha 0 ∗ ones ( length ( x ) , length ( y ) ) + beta ∗ x ’ ∗ ones (1 , length ( y ) ) + alpha 1 ∗ ones (1 , length ( x ) ) ’ ∗ y + stoerung ; A.2 Computation of the contrast functional Φ f u n c t i o n e r g = Phi ( a l p h a 0 , a l p h a 1 , beta , b et a p , K, x0 , Daten ) ; % i n s e r t t h e o b s e r v e d data a s ” Daten ” n = l e n g t h ( Daten ) ; summe = 0 ; f o r j = K: 1 : n−2; g1 = l i n s i n ( a l p h a 0 , a l p h a 1 , beta , b et a p , x0 , Daten ( j −K+ 1 ) ) ; f o r l = 2 : 1 :K; g1 = l i n s i n ( a l p h a 0 , a l p h a 1 , beta , b et a p , g1 , Daten ( j −K+l ) ) ; end ; g2 = l i n s i n ( a l p h a 0 , a l p h a 1 , beta , b et a p , g1 , Daten ( j + 1 ) ) ; d e l t a 1 = ( Daten ( j +1) − g1 ) ˆ 2 ; d e l t a 2 = ( Daten ( j +2) − g2 ) ˆ 2 ; summe = summe + d e l t a 1 + d e l t a 2 ; end ; e r g = summe / ( n−K−1); A.3 Computation of the final coefficient estimators f u n c t i o n e r g = m S c h a e t z e r (K, x0 , Daten ) ; % i n s e r t t h e o b s e r v e d data a s ” Daten ” Minimum = [ 0 . 1 0 . 1 0 . 1 0 ; 0 0 0 Phi ( 0 . 1 , 0 . 1 , 0 . 1 , 0 ,K, x0 , Daten ) ] ; f o r beta = 0 . 1 : 0 . 0 5 : 0 . 3 ; for alpha 1 = 0 . 1 : 0 . 0 5 : 0 . 3 ; for alpha 0 = 0 . 1 : 0 . 0 5 : 0 . 3 ; f o r beta p = − 0 . 0 3 : 0 . 0 1 : 0 . 0 3 ; Wert = Phi ( a l p h a 0 , a l p h a 1 , beta , b et a p , K, x0 , Daten ) ; 37 i f Wert < Minimum ( 2 , 4 ) , Minimum = [ a l p h a 0 a l p h a 1 b e t a b e t a p ; 0 0 0 Wert ] ; e l s e end ; end ; end ; end ; end ; e r g = Minimum ( 1 , : ) ; 38

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Statistical inference for nonparametric GARCH models Alexander Meister Jens-Peter Kreiß May 15, 2015