Download Large Sample Theory

Large Sample Theory In statistics, we are interested in the properties of particular random variables (or “estimators”), which are functions of our data. In asymptotic analysis, we focus on describing the properties of estimators when the sample size becomes arbitrarily large. The idea is that given a reasonably large dataset, the properties of an estimator even when the sample size is finite are similar to the properties of an estimator when the sample size is arbitrarily large. In these notes we focus on the large sample properties of sample averages formed from i.i.d. data. That is, assume that Xi ∼ i.i.d.F , for i = 1, P. . . , n, . . .. Assume EXi = µ, 1 for all i. The sample average after n draws is X̄n ≡ n i Xi . We focus on two important sets of large sample results: (1) Law of large numbers: X̄n → EX as n → ∞. √ √ (2) Central limit theorem: n(X̄n − EX) → N (0, ·). That is, n times a sample average looks like (in a precise sense to be defined later) a normal random variable as n gets large. An important endeavor of asymptotic statistics is to show (1) and (2) under various assumptions on the data sampling process. Consider a sequence of random variables Z1 , Z2 , . . . , Zn . Convergence in probability: p Zn → Z ⇐⇒ for all > 0, limn→∞ P rob (|Zn − Z| < ) = 1. More formally: for all > 0 and δ > 0, there exists n0 (, δ) such that for all n > n0 (, δ): P rob (|Zn − Z| < ) > 1 − δ. Note that the limiting variable Z can also be a random variable. Furthermore, for convergence in probability, the random variables Z1 , Z2 , . . . and Z should be defined on the same probability space: if this common sample space is Ω with elements ω, then the statement of convergence in probability is that: P rob (ω ∈ Ω : |Zn (ω) − Z(ω)| < ) > 1 − δ. Corresponding to this convergence concept, we have the Weak Law of Large Numbers p (WLLN), which is the result that X̄n → µ. 1 Earlier, we had used Chebyshev’s inequality to prove a version of the WLLN, under the assumption that Xi are i.i.d. with mean µ and variance σ 2 . Khinchine’s law of large numbers only requires that the mean exists (i.e., is finite), but does not require existence of variances. Recall Markov’s inequality: for positive random variables Z, and p > 0, > 0, we have P (Z > ) ≤ E(Z)p /p . Take Z = |Xn − X ∗ |. Then if we have: E[|Xn − X ∗ |p ] → 0 : that is, Xn converges in p-th mean to X ∗ , then by Markov’s inequality, we also have P (|Xn − X ∗ | ≥ ) → 0. That is, convergence in p-th mean implies convergence in probability. Some definitions and results: • Define: the probability limit of a random sequence Zn , denoted plimn→∞ Zn , is p a non-random quantity α such that Zn → α. • Stochastic orders of magnitude: – “big-O.p.” (bounded in probability): Zn = Op (nλ ) ⇔ for every δ, there exists a finite ∆(δ) and n∗ (δ) such that P rob(| Znλn | > ∆) < δ for all n ≥ n∗ (δ). – “little-o.p.”: Zn = op (nλ ) ⇔ Zn nλ p → 0. p Zn = op (1) ⇔ Zn → 0. • Plim operator theorem: Let Zn be a k-dimensional random vector, and g(·) be a function which is continuous at a constant k-vector point α. Then p p Zn → α =⇒ g(Zn ) → g(α). In other words: g(plim Zn ) = plim g(Zn ). Proof: See Serfling, Approximation Theorems of Mathematical Statistics, p. 24. P 1 Don’t confuse this: plim i g(Zi ) = Eg(Zi ), from the LLN, but this is distinct n P 1 from plim g( n i Zi ) = g(EZi ), using plim operator P result. The two1 are P gen1 erally not the same; if g(·) is convex, then plim n i g(Zi ) ≥ plim g( n i Zi ). 2 p • A useful intermediate result (“squeezing”): if Zn → α (a constant) and Zn∗ lies p between Zn and α with probability 1, then Zn∗ → α. (Quick proof: for all > 0, |Xn −α| < ⇒ |Xn∗ −α| < . Hence P rob(|Xn∗ −α| < ) ≥ P rob(|Xn − α| < ) → 1.) Convergence almost surely as Zn → Z ⇐⇒ P rob (ω : limn→∞ Zn (ω) = Z(ω)) = 1. As with convergence in probability, almost sure convergence makes sense when the random variables Z1 , . . . , Zn , . . . and Z are all defined on the same sample space. Hence, for each element in the sample space ω ∈ Ω, we can associate the sequence Z1 (ω), . . . , Zn (ω), . . . as well as the limit point Z(ω). To understand the probability statement more clearly, assume that all these RVs are defined on the sample space (Ω, B(Ω), P ). Define some set-theoretic concepts. Consider a sequence of sets S1 , . . . , Sn , all ∈ B(Ω). Unless this sequence is monotonic, it’s difficult to talk about a “limit” of this sequence. So we define the “liminf” and “limsup” of this set sequence. lim inf n→∞ Sn ≡ lim n→∞ ∞ \ Sm = m=n ∞ \ ∞ [ Sm n=1 m=n = {ω ∈ Ω : ω ∈ Sn , ∀n ≥ n0 (ω)} . T Note that the sequence of sets ∞ m=n Sm , for n = 1, 2, 3, . . . is a non-decreasing sequence of sets. By taking the union, the liminf is this the limit of this monotone sequence of sets. That is, for all ω ∈ lim infSn , there exists some number n0 (ω) such that ω ∈ Sn for all n ≥ n0 (ω). Hence we can say that lim infSn is the set of outcomes which occur “eventually”. lim supn→∞ Sn ≡ ∞ [ ∞ \ Sm = lim ∞ [ n→∞ Sm n=1 m=n m=n = ω ∈ Ω : for every m, ω ∈ Sn0 (ω,m) for some n0 (ω, m) ≥ m . 3 S Note that the sequence of sets ∞ m=n Sn , for n = 1, 2, 3, . . . is a non-increasing sequence of sets. Hence, limsup is limit of this monotone sequence of sets. The lim supSn is the set of outcomes ω which occurs within every tail of the set sequence Sn . Hence we say that an outcome ω ∈ lim supSn occurs “infinitely often”.1 Note that liminf Sn ⊂ limsup Sn . Hence P (limsup Sn ) ≥ P (liminf Sn ) and P (limsup Sn ) = 0 → P (liminf Sn ) = 0 P (liminf Sn ) = 1 → P (limsup Sn ) = 1. Borel-Cantelli Lemma: if P∞ i=1 P (Si ) < ∞ then P (lim supn→∞ Sn ) = 0. Proof: for each m, we have ( P (lim supn→∞ Sn ) = P (lim ) ∞ [ Sm ) = lim P ( m=n n→∞ ≤ lim n→∞ which equals zero (by the assumption that P∞ i=1 ∞ [ Sm ) m=n X P (Sm ) m≥n P (Si ) < ∞). The first equality (interchange of “limit” and “Prob” operations) is an application of S∞ the Monotone Convergence Theorem, which holds only because the sequence m=n Sm is monotone. To apply this to almost-sure convergence, consider the sequence of sets Sn ≡ {ω : |Zn (ω) − Z(ω)| > } , for > 0. Almost sure convergence is the statement that P (limsup Sn ) = 0: S • Then ∞ m=n Sm denotes the ω’s such that the sequence |Zn (ω) − Z(ω)| exceeds in the tail beyond n. 1 However, note that all outcomes ω ∈ lim infSn also occur an infinite number of times along the infinite sequence Sn . 4 S • Then limn→∞ ∞ m=n Sm denotes all ω’s such that the sequence |Zn (ω) − Z(ω)| exceeds in every tail: those ω for which Zn (ω) “escapes” the -ball around Z(ω) infinitely often. For these ω’s, limn→∞ Zn (ω) 6= Z(ω). • For almost-sure convergence, you require the probability of this set to equal S∞ zero, i.e. Pr(limn→∞ m=n Sm ) = 0. Corresponding to this convergence concept, we have the Strong Law of Large Numbers as (SLLN), which is the result that X̄n → µ. Consider that the sample averages are random variables X̄1 (ω), X̄2 (ω), ... defined on the same probability space, say (Ω, B, P ), where each ω indexes a sequence a sequence X̄1 (ω), X̄2 (ω), X̄3 (ω), . . .. Consider the set sequence Sn = ω : |X̄n (ω) − µ| > . SLLN is the statement that P (limsupSn ) = 0. Proof (sketch; Davidson, pg. 296): For convenience, consider µ = 0. From the as above discussion, we see that the two statements X̄n → 0 and P (|X̄n | > , i.o.) = 0 for P∞any > 0 are equivalent. Therefore, one way of proving a SLLN is to verify that i=1 P (|X̄n | > ) < ∞ for all > 0, and then apply the Borel-Cantelli Lemma. As a starting point, we see that Chebyshev’s inequality is not good enough by itself; it tells us ∞ X σ2 1 P (|X̄n | > ) ≤ 2 ; but = ∞. n n2 n=1 However, consider the subsequence with indices n2 : {1, 4, 9, 16, . . .}. Again using Chebyshev’s inequality, we have2 σ2 P (|X̄n2 | > ) ≤ 2 2 ; n ∞ X 1 1 1.64 and = 2 (π 2 /6) ≈ 2 < ∞. 2 2 n n=1 as Hence, by BCP lemma, we have that X̄n2 → 0. Next, examine the “omitted” terms from the sum ∞ i=1 P (|X̄n | > ). Define Dn2 = maxn2 ≤k<(n+1)2 |X̄k − X̄n2 |. Note that: X̄k − X̄n2 = k n2 1 X − 1 X̄n2 + Xt k k 2 t=n +1 2 Another of Euler’s greatest hits: P∞ 1 i=1 n2 = π2 6 5 (the Riemann zeta function). and, by independence of Xi , the two terms on RHS are uncorrelated. Hence 2 2 n2 σ k − n2 2 V ar(X̄k − X̄n2 ) = 1 − + σ k n2 k2 1 1 1 1 2 2 ≤σ . − − =σ n2 k n2 (n + 1)2 By Chebyshev’s inequality, then, we have σ2 1 1 P (Dn2 > ) ≤ ; so − n2 (n + 1)2 X 1 1 σ2 X 1 σ2 X 1 σ2 − − P (Dn2 > ) ≤ 2 < = n2 (n + 1)2 2 n n2 (n + 1)2 2 2 2 n n as implying, using BC Lemma, that Dn2 → 0. Now, consider, for n2 ≤ l < (n + 1)2 , |X̄l | = |X̄l − X̄n2 + X̄n2 | ≤ |X̄l − X̄n2 | + |X̄n2 | ≤ Dn2 + |X̄n2 |. By the above discussion, the RHS converges a.s. to 0. Hence, for all integers l, we have that |X̄l | is bounded by a random variable which converges a.s. to 0. Thus, as X̄l → 0. as Example: Xi ∼ i.i.d. U [0, 1]. Show that X(1:n) ≡ mini=1,...,n Xi → 0. Take Sn ≡ |X(1:n) | > . For all > 0, P (|X(1:n) | > ) = P (X(1:n) > ) = P (Xi > , i = 1, . . . , n) = (1 − )n . P n Hence, ∞ n=1 (1 − ) = 1/ < ∞. So the conclusion follows by the BC Lemma. as p Theorem: Zn → Z =⇒ Zn → Z. 6 Proof: 0=P lim n→∞ = lim P n→∞ ∞ [ ! {ω : |Zm (ω) − Z(ω)| > } m≥n ∞ [ ! {ω : |Zm (ω) − Z(ω)| > } m≥n ≥ lim P ({ω : |Zn (ω) − Z(ω)| > }) . n→∞ d Convergence in Distribution: Zn → Z. A sequence of real-valued random variables Z1 , Z2 , Z3 , . . . converges in distribution to a random variable Z if lim FZn (z) = FZ (z) ∀z s.t. FZ (z) is continuous. n→∞ (1) This is a statement about the CDFs of the random variables Z, and Z1 , Z2 , . . .. These random variables do not need to be defined on the same probability space. FZ is also called the “limiting distribution” of the random sequence Zn . Alternative definitions of convergence in distribution (“Portmanteau theorem”): • Letting F ([a, b]) ≡ F (b) − F (a) for a < b, convergence (1) is equivalent to FZn ([a, b]) → FZ ([a, b]) ∀[a, b] s.t. FZ continuous at a, b. • For all bounded, continuous functions g(·), Z Z g(z)dFZn (z) → g(z)dFZ (z) ⇔ Eg(Zn ) → Eg(Z). (2) This definition of distributional convergence is more useful in advanced settings, because it is extendable to setting where Zn and Z are general random elements taking values in metric space. • Levy’s continuity theorem: A sequence {Xn } of random variables converges in distribution to random variable X if and only if the sequence of characteristic function {φXj (t)} converges pointwise (in t) to a function φX (t) which is continuous at the origin. Then φX is the characteristic function of X. 7 Hence, this ties together convergence of characteristic functions, and convergence in distribution. This theorem will be used to prove the CLT later. • For proofs of most results here, see Serfling, Approximation Theorems in Mathematical Statistics, ch. 1. Distributional convergence, as defined above, is called “weak convergence”. This is because there are stronger notions of distributional convergence. One such notion is: supA∈B(R) |PZn (A) − PZ (A)| → 0 which is convergence in total variation norm. Example: Multinomial distribution on [0, 1]. Xn ∈ n1 , n2 , n3 , . . . , nn = 1 each with probability n1 . Consider A = Q (set of rational numbers in [0, 1]). Some definitions and results: p d • Slutsky Theorem: If Zn → Z, and Yn → α (a constant), then d (a) Yn Zn → αZ d (b) Zn + Yn → Z + α. • Theorem *: p d d p (a) Zn → Z =⇒ Zn → Z (b) Zn → Z =⇒ Zn → Z if Z is a constant. Note: convergence in probability implies that (roughly speaking), the random variables Zn (for n large enough) and Z frequently have the same numerical value. Convergence in distribution need not imply this, only that the CDF’s of Zn and Z are similar. d • Zn → Z =⇒ Zn = Op (1). Use ∆ = max(|FZ−1 (1 − )|, |FZ−1 ()|) in definition of Op (1). p,as d Note that the LLN tells us that X̄n → µ, which implies (trivially) that X̄n → µ, a degenerate limiting distribution. 8 This is not very useful for our purposes, because we are interested in knowing (say) how far X̄n is from µ, which is unknown. How do we “fix” X̄n , so that it has a non-degenerate limiting distribution? Central Limit Theorem: (Lindeberg-Levy) Let Xi be i.i.d. with mean µ and variance σ 2 . Then √ n(X̄n − µ) d → N (0, 1). σ √ n is also called the “rate of convergence” of the sequence X̄n . By definition, a rate of convergence is the lowest polynomial in n for which np (X̄n − µ) converges in distribution to a nondegenerate distribution. √ The rate of convergence n make sense: if you blow up X̄n by a constant (no matter how big), you still get a degenerate limiting distribution. If you blow up by n, then P the sequence Sn ≡ nX̄n = ni=1 Xn will diverge. Proof via characteristic functions: Consider Xi ∼ i.i.d. with mean µ and variance σ 2 . Define: √ 1 X n(X̄n − µ) X Xi − µ √ = ≡√ Zn ≡ Wi . σ nσ n i i Note that, Wi ≡ (Xi − µ)/σ are i.i.d. with mean EWi = 0 and variance VWi = EWi2 = 1. Let φ(t) denote characteristic function. Then h in √ n φZn (t) = φ √W (t) = φW (t/ n) . n √ Using the second-order Taylor expansion for φW (t/ n) around 0, we have: 2 √ t t2 2 t 2 φW (t/ n) = φW (0) + √ iEW + i EW + o 2n n n 2 2 t t = φW (0) + 0 − ∗1+o 2n n 9 Now we have 2 (it)2 t log φZn (t) (t) = n ∗ log φW (0) + 0 + ∗1+o 2n n 2 2 t (it) +o = n ∗ log 1 + 2n n 2 2 2 (it) t t 0 ≈ n ∗ log 1 + +o ∗ log (1) + o 2n n n 2 2 t t =0+ −n∗o 2 n 2 −t →n→∞ 2 The third line comes from Taylor-expanding log {· · · } around 1. Hence 2 −t φZn (t) (t) →n→∞ exp = φN (0,1) (t). 2 d Since φN (0,1) (t) is continuous at t = 0, we have Zn → N (0, 1). Asymptotic approximations for X̄n √ CLT tells us that n(X̄n − µ)/σ has a limiting standard normal distribution. We can use this “true” result to say something about the distribution of X̄n , even when n is finite. That is, we use the CLT to derive an asymptotic approximation for the finite-sample distribution of X̄n . The approximation is as follows: starting with the result of the CLT: √ d n(X̄n − µ)/σ → N (0, 1) we “flip over” the result of the CLT: 1 a σ a X̄n ∼ √ N (0, 1) + µ ⇒ X̄n ∼ N (µ, σ 2 ). n n a The notation “∼” makes explicit that what is on the RHS is an approximation. Note d that X̄n → N (µ, n1 σ 2 ) is definitely not true! This approximation intuitively makes sense: under the assumptions of the LLCLT, we know that E X̄n = µ and V ar(X̄n ) = σ 2 /n. What the asymptotic approximation tells us is that the distribution of X̄n is approximately normal.3 3 Note that the approximation is exactly right if we assumed that Xi ∼ N (µ, σ 2 ), for all i. 10 Asymptotic approximations for functions of X̄n Oftentimes, we are not interested per se in approximating the finite-sample distribution of the sample mean X̄n , but rather functions of a sample mean. (Later, you will see that the asymptotic approximations for many statistics and estimators that you run across are derived by expressing them as sample averages.) Continuous mapping theorem: Let g(·) be a continuous function. Then d d Zn → Z =⇒ g(Zn ) → g(Z). Proof: Serfling, Approximation Theorems of Mathematical Statistics, p. 25. (Note: you still have to figure out what the limiting distribution of g(Z) is. But if you know FX , then you can get Fg(X) by the change of variables formulas.) Note that for any linear function g(X̄n ) = aX̄n +b, deriving the limiting distribution of a g(X̄n ) is no problem (just use Slutsky’s Theorem to get aX̄n + b ∼ N (aµ + b, n1 a2 σ 2 )). The “problem” in deriving the distribution of g(X̄n ) arises when g(·) is nonlinear so that the X̄n is “inside” the g function: 1. Use the Mean Value Theorem: Let g be a continuous function on [a, b] that is differentiable on (a, b). Then, there exists (at least one) λ ∈ (a, b) such that g 0 (λ) = g(b) − g(a) ⇔ g(b) − g(a) = g 0 (λ)(b − a). b−a Using the MVT, we can write g(X̄n ) − g(µ) = g 0 (Xn∗ )(X̄n − µ) (3) where Xn∗ is an RV strictly between X̄n and µ. 2. On the RHS of Eq. (3): p (a) g 0 (Xn∗ ) → g 0 (µ) by the “squeezing” result, and the plim operator theorem. √ (b) If we multiply by n and divide by σ, we can apply the CLT to get √ d n(X̄n − µ)/σ → N (0, 1). (c) Hence, √ n ∗ RHS = √ d n[g(X̄n ) − g(µ)] → N (0, g 0 (µ)2 σ 2 ), using Slutsky’s theorem. 11 (4) 3. Now, in order to get the asymptotic approximation for the distribution of g(X̄n ), we “flip over” to get 1 a g(X̄n ) ∼ N (g(µ), g 0 (µ)2 σ 2 ). n 4. Check that g satisfies assumptions for these results to go through: continuity and differentiability for MVT, and continuous at µ for plim operator theorem. Examples: 1/X̄n , exp(X̄n ), (X̄n )2 , etc. Eq. (4) is a general result, known as the Delta method. For the purposes of this class, I want you to derive the approximate distributions for g(X̄n ) from first principles (as we did above). 12 1 1.1 Some extra topics (CAN SKIP) Triangular array CLT LLCLT assumes independence (across n) as well as identically distributed. We extend this to the “independent, non-identically distributed” setting. Lindeberg-Feller CLT: For each n, let Yn,1 , . . . , Yn,n be independent random vari2 ables with finite (possibly non-identical) means and variances σn,i such that 2 (1/Cn ) · n X E|Yn,i − EYn,i |2 1{|Yn,i −EYn,i |/Cn >} → 0 every > 0 i=1 where Cn ≡ [ Pn 2 1/2 . Then i=1 σn,i ] [ Pn i=1 (Yn,i −EYn,i )] Cn d −→ N (0, 1) • The sampling framework is known as a “triangular array”. Note that the “(n − 1)-th observation in the n-th sample” (Yn,n−1 ) need not coincide with Yn−1,n−1 , the “(n − 1)-th observation in the (n − 1)-th sample. P P [ n (Y −EY )] • Cn is just the standard deviation of ni=1 Yn,i . Hence i=1 Cn,in n,i is a “standardized sum”. This is useful for showing asymptotic normality of Least-Squares regression. Let y = β0 X + with i.i.d. with mean zero and variance σ 2 , and x being an n × 1 vector of covariates (here we have just 1 RHS variable). The OLS estimator is β̂ = (X 0 X)−1 X 0 Y . To apply the LFCLT, we consider the normalized difference between the estimated and true β’s: 0 −1/2 ((X X) 0 0 −1/2 /σ) · (β̂ − β ) = (1/σ) · (X X) 0 X ≡ (1/σ) · n X an,i i i=1 where an,i corresponds to the i-th component of the 1 × n vector (1/σ) · (X 0 X)−1/2 X 0 . So wePjust need to show P that the Lindeberg condition applies to Yn,i = an,i i . Note n that i=1 V (an,i i ) = V ( ni=1 an,i i ) = (1/σ 2 ) · V ((X 0 X)−1/2 X 0 ) = σ 2 /σ 2 · 1 = Cn2 . 1.2 Convergence in distribution vs. a.s. Combining two results above, we have as d Zn → Z =⇒ Zn → Z. 13 The converse is not generally true. (Indeed, {Z1 , Z2 , Z3 , . . .} need not be defined on the same probability space, in which case s.d. convergence makes no sense.) However, consider Zn∗ = FZ−1 (U ), n Z ∗ = FZ−1 (U ); U ∼ U [0, 1]. Here FZ−1 (U ) denotes the quantile function corresponding to the CDF F (z): F −1 (τ ) = inf {z : F (z) > τ } τ ∈ [0, 1]. We have F (F −1 (τ )) = τ . (Note that quantile function is also right-continuous; discontinuity points of the quantile function arise where the CDF function is “flat”.) Then P (Zn∗ ≤ z) = P (FZ−1 (U ) ≤ z) = P (U ≤ FZn (z)) = FZn (z) n d d so that Zn∗ = Zn . (Even though their domains are different!) Similarly, Z ∗ = Z. The d notation “=” means “identically distributed”. (U ) → FZ−1 (U ) Moreover, it turns out (quite intuitive) that the convergence FZ−1 n −1 fails only at points where FZ (U ) is discontinuous (corresponding to flat portions of FZ (z)). Since these points of discontinuity are a countable set, their probability (under U [0, 1]) is equal to zero, so that FZ−1 (U ) → FZ−1 (U ) for almost-all U . n d So what we have here, is a result that for real-valued random variables Zn → Z, we d as can construct identically distributed variables such that both Zn∗ → Z ∗ and Zn∗ → Z ∗ . This is called the “Skorokhod construction”. Skorokhod representation: Let Zn (n = 1, 2, . . .) be random elements defined on d probability spaces (Ωn , B(Ωn ), Pn ) and Zn → Z, where X is defined on (Ω, B(Ω), P ). Then there exist random variables Zn∗ (n = 1, 2, . . .) and Z ∗ defined on a common probability space (Ω̃, B(Ω̃), P̃ ) such that d Zn∗ = Zn (n = 1, 2, . . .); d Z ∗ = Z; as Zn∗ → Z ∗ . Applications: d as • Continuous mapping theorem: Zn → Z ⇒ Zn∗ → Z ∗ ; for h(·) continuous, we as d d have h(Zn∗ ) → h(Z ∗ ) ⇒ h(Zn∗ ) → h(Z ∗ ) which implies h(Zn ) → h(Z). 14 • Building on the above, if h is bounded, then we get Eh(Zn ) = Eh(Zn∗ ) → Eh(Z ∗ ) = Eh(Z) under the bounded convergence theorem, which shows one direction of the “Portmanteau” theorem, Eq. (2). 1.3 Functional Central limit theorems There are a set of distributional convergence results, which are known as functional CLT’s (or Donsker theorems), because they deal with convergence of random functions (or, interchangeably, processes). These are indispensible tools in finance. One of the simplest random functions is the Wiener process W(t), which is viewed as a random function on the unit interval t ∈ [0, 1]. This is also known as a Brownian motion process. Features of the Wiener process: 1. W(0) = 0 d 2. Gaussian marginals: W(t) = N (0, t); that is, 2 Z a −u 1 P (W(t) ≤ a) = √ exp du. 2t 2πt −∞ 3. Independent increments: Define xt ≡ W(t). For any set 0 ≤ t0 ≤ t1 ≤ . . . ≤ 1, the differences xt1 − xt0 , xt2 − xt1 , . . . are independent. 4. Given the two above features, we have that the increments are themselves normally distributed: d xti − xti−1 = N (0, ti − ti−1 ). Moreover, from t1 − t0 = V (x1 − x0 ) = E[(x1 − x0 )2 ] = Ex21 − 2Ex1 x0 + Ex20 = t1 + t0 − 2Ex1 x0 implying Ex1 x0 = Cov(x1 , x0 ) = t0 . 15 5. Furthermore, we know that any finite collection (xt1 , xt2 , . . .) is jointly distributed multivariate normal, with mean 0 and variance matrix Σ as described above. Take the conditions of the LLCLT: X1 , X2 , ... are iid with mean zero and finite variance σ 2 . For a given n, define the “partial sum” Sk = X1 + X2 + · · · + Xk for k ≤ n. Now, for t ∈ [0, 1], define the normalized “partial sum process” S[tn] 1 S̃n (t) = √ + (tn − [tn]) ∗ √ X[tn]+1 σ n σ n where [a] denotes the largest integer ≤ a. S̃n (t) is a random function on [0, 1]. (Graph) Then the functional CLT is the following result: d Functional CLT: S̃n → W. Note that S̃n is a random element taking values in C[0, 1], the space of continuous functions on [0,1]. Proving this result is beyond our focus in this class.4 Note that the functional CLT implies more than the LLCLT. For example, take t = 1. Pn √ Xi nX̄n i=1 √ Then S̃n (1) = σ n = σ By the FCLT, we have P (S̃n (1) ≤ a) → P (W(1) ≤ a) = P (N (0, 1) ≤ a) which is the same result as the LLCLT. Similarly, for kn < n but kn n → τ ∈ [0, 1], we have Pk Xi k · X̄k d k √ S̃n = i=1 = √ → N (0, τ ) . n σ n σ n Also, it turns out, by a functional version of the continuous mapping theorem, we can get convergence results for functionals of the partial-sum process. For instance, d supt S̃n (t) → supt W(t), and it turns out Z a 2 P (supt W(t) ≤ a) = √ exp(−u2 /2) du; a ≥ 0. 2π 0 (Recall that W0 = 0, so a < 0 makes no sense.) 4 See, for instance, Billingsley, Convergence of Probability Measures, ch. 2, section 10. 16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Large Sample Theory