Download 7. Introduction to Large Sample Theory

7. Introduction to Large Sample Theory Hayashi p. 88-97/109-133 Advanced Econometrics I, Autumn 2010, Large-Sample Theory 1 Introduction We looked at finite-sample properties of the OLS estimator and its associated test statistics These are based on assumptions that are violated very often The finite-sample theory breaks down if one of the following three assumptions is violated: - the exogeneity of regressors - the normality of the error term, and - the linearity of the regression equation Advanced Econometrics I, Autumn 2010, Large-Sample Theory 2 Introduction (cont’d) Asymptotic or large-sample theory provides an alternative approach when these assumptions are violated It derives an approximation to the distribution of the estimator and its associated statistics assuming that the sample size is sufficiently large Rather than making assumptions on the sample of a given size, largesample theory makes assumptions on the stochastic process that generates the sample. Advanced Econometrics I, Autumn 2010, Large-Sample Theory 3 Introduction (cont’d) The two main concepts in asymptotics relate to consistency and asymptotic normality. Some intuition: Consistency: the more data we get, the closer we get to knowing the truth (or we eventually know the truth) Asymptotic normality: as we get more and more data, averages of random variables behave like normally distributed random variables. Example: Establishing consistency and asymptotic normality of an i.i.d. random sample X1, . . . , XN with E(Xi) = µ and var(Xi) = σ 2. Advanced Econometrics I, Autumn 2010, Large-Sample Theory 4 Introduction (cont’d) The main probability theory tools for asymptotics: The probability theory tools for establishing consistency of estimators are: • Laws of Large Numbers (LLNs) – A LLN is a result that states the conditions under which a sample average of random variables converges to a population expectation. – LLNs concern conditions under which the sequence of sample mean converges either in probability or almost surely – There are many LLN results (eg. Chebychev’s LLN, Kolmongorov’s/Khinchine’s LLN, Markov’s LLN) Advanced Econometrics I, Autumn 2010, Large-Sample Theory 5 Introduction (cont’d) The probability tools for establishing asymptotic normality are: • Central Limit Theorems (CLTs) – CLTs are about the limiting behaviour of the difference between a sample mean and its expected value – There are many CLTs (eg. Lindeberg-Levy CLT, Lindeberg-Feller CLT, Liapounov’s CLT) Advanced Econometrics I, Autumn 2010, Large-Sample Theory 6 Basic concepts of large sample theory Using large sample theory, we can dispense with basic assumptions from finite sample theory 1.2 E(εi|X) = 0: strict exogeneity 1.4 V ar(ε|X) = σ 2I: homoscedasticity 1.5 ε|X ∼ N (0, σ 2In): normality of the error term Approximate/assymptotic distribution of b, and t- and the F-statistic can be obtained Advanced Econometrics I, Autumn 2010, Large-Sample Theory 7 Modes of convergence - Convergence in probability {zn}: sequence of random variables {zn}: sequence of random vectors Convergence in probability: A sequence {zn} converges in probability to a constant α if for any ε > 0 lim P (|zn − α| > ε) = 0 n→∞ Short-hand we write: plim zn = α or zn → α or zn − α → 0 n→∞ p p Extends to random vectors: If lim P (|zkn − αk | > ε) = 0 ∀ k = 1, 2, ..., K, then zn → α n→∞ p where znk is the k-th element of zn and αk the k-th element of α Advanced Econometrics I, Autumn 2010, Large-Sample Theory 8 Modes of convergence - Almost Sure Convergence Almost Sure Convergence: A sequence of random scalars {zn} converges almost surely to a constant α if: Prob lim zn = α = 1 n→∞ We write this as “zn →a.s. α.” The extension to random vectors is analogous to that for convergence in probability. Note: This concept of convergence is stronger than convergence in probability ⇒ if a sequence converges a.s., then it converges in probability. Advanced Econometrics I, Autumn 2010, Large-Sample Theory 9 Modes of convergence - Convergence in mean square 2 Convergence in mean square: lim E (zn − α) = 0 or zn → α n→∞ m.s. The extension to random vectors is analogous to that for convergence in probability: zn →m.s. α if each element of zn converges in mean square to the corresponding component of α. Convergence in mean square extend to random vectors Advanced Econometrics I, Autumn 2010, Large-Sample Theory 10 Modes of convergence - Convergence to a Random Variable In the above definitions of convergence, the limit is a constant. However, the limit can also be a random variable. We say that a sequence of K-dimensional random variables {zn} converges to a K-dimensional random variable z and write zn →p z if {zn − z} converges to 0: “zn → z” is the same as “zn − z → 0.” “zn → z” is the same as “zn − z → 0,” “zn → z” is the same as “zn − z → 0.” p p Similarly, a.s. m.s. Advanced Econometrics I, Autumn 2010, Large-Sample Theory a.s. m.s. 11 Modes of convergence - Convergence in distribution Convergence in distribution: Let {zn} be a sequence of random scalars and Fn be the cumulative distribution function (c.d.f.) of zn. We say that {zn} converges in distribution to a random scalar z if the c.d.f. Fn of zn converges to the c.d.f. F of z at every continuity point of F . We write “zn →d z” or “zn →L z” and call F the asymptotic (or limit or limiting) distribution of zn. Advanced Econometrics I, Autumn 2010, Large-Sample Theory 12 Modes of convergence - Convergence in distribution Convergence in probability is stronger than convergence in distribution, i.e., “zn → z” ⇒ “zn → z.” p d A special case of convergence in distribution is that z is a constant (a trivial random variable). The extension to a sequence of random vectors is immediate: zn →d z if the joint c.d.f. Fn of the random vector zn converges to the joint c.d.f. F of z at every continuity point of F . Note: For convergence in distribution, unlike the other concepts of convergence, element-by-element convergence does not necessarily mean convergence for the vector sequence. Advanced Econometrics I, Autumn 2010, Large-Sample Theory 13 Weak Law of Large Numbers (WLLN) according to Khinchine {zi} i.i.d. with E(zi) = µ, then z n = 1 n Pn i=1 zi we have: z n → µ or p lim P (|z kn − µ| > ε) = 0 or n→∞ plim z n = µ Advanced Econometrics I, Autumn 2010, Large-Sample Theory 14 Extensions of the Weak Law of Large Numbers (WLLN) The WLLN holds for: Extension (1): Multivariate Extension (sequence of random vectors {zi}) Extension (2): Relaxation of independence Extension (3): Functions of random variables h(zi) Extension (4): Vector valued functions f (zi) Advanced Econometrics I, Autumn 2010, Large-Sample Theory 15 Central Limit Theorems (Lindeberg-Levy) {zi} i.i.d. with E(zi) = µ and V ar(zi) = σ 2. Then for z n = √ n(z n − µ) → N 0, σ 2 d a 2 z n − µ ∼ N 0, σn a 1 n Pn i=1 zi : or 2 or z n ∼ N µ, σn a Remark: Read ∼ ’approximately distributed as’ CLT also holds for multivariate extension: sequence of random vectors {zi} Advanced Econometrics I, Autumn 2010, Large-Sample Theory 16 Useful lemmas of large sample theory Lemma 1: zn → α with a as a continuous function which does not depend on n then: p a(zn) → a(α) or p plim a(zn) = a plim (zn) n→∞ n→∞ Examples: xn → α ⇒ xn → β and p p Yn → Γ p ⇒ ln(xn) → ln(α) p yn → γ p ⇒ xn + yn → β + γ p Yn−1 → Γ−1 p Advanced Econometrics I, Autumn 2010, Large-Sample Theory 17 Useful lemmas of large sample theory (continued) Lemma 2: zn → z then: d a(zn) → a(z) d Examples: zn → z, z ∼ N (0, 1) d ⇒ z 2 ∼ χ2(1) zn → N (0, 1) d z 2 → χ2(1) d Advanced Econometrics I, Autumn 2010, Large-Sample Theory 18 Useful lemmas of large sample theory (continued) Lemma 3: xn → x and yn → α then: p d xn + yn → x + α d Examples: xn → N (0, 1), yn → α p d xn → x, yn → 0 ⇒ p d Lemma 4: xn → x and d ⇒ xn + yn → N (α, 1) d xn + yn → x d yn → 0 then: p xn · yn → 0 p Advanced Econometrics I, Autumn 2010, Large-Sample Theory 19 Useful lemmas of large sample theory (continued) Lemma 5: xn → x and An → A then: p d An · xn → A · x d Example: xn → M V N (0, Σ) d An · xn → M V N (0, AΣA0) d Lemma 6: xn → x and An → A then: p d 0 −1 x0nA−1 x → x A x n n d Advanced Econometrics I, Autumn 2010, Large-Sample Theory 20 8. Time Series Basics (Stationarity and Ergodicity) Hayashi p. 97-107 Advanced Econometrics I, Autumn 2010, Large-Sample Theory 21 Dependence in the data Certain degree of dependence in the data in time series analysis; only one realization of the data generating process is given CLT and WLLN rely on i.i.d. data, but dependence in real world data Examples: Inflation rate Stock market returns Stochastic process: sequence of r.v.s. indexed by time {z1, z2, z3, ...} or {zi} with i = 1, 2, ... A realization/sample path: One possible outcome of the process Advanced Econometrics I, Autumn 2010, Large-Sample Theory 22 Dependence in the data - theoretical consideration If we were able to ’run the world several times’, we had different realizations of the process at one point in time ⇒ We could compute ensemble means and apply the WLLN As the described repetition is not possible, we take the mean over the one realization of the process PT 1 Key question: Does T t=1 xt → E(x) hold? p Condition: Stationarity of the process Advanced Econometrics I, Autumn 2010, Large-Sample Theory 23 Definition of stationarity Strict stationarity: The joint distribution of zi, zi1 , zi2 , ..., zir depends only on the relative position i1 − i, i2 − i, ..., ir − i but not on i itself In other words: The joint distribution of (zi, zir ) is the same as the joint distribution of (zj , zjr ) if i − ir = j − jr Weak stationarity: - E(zi) does not depend on i - Cov(zi, zi−j ) depends on j (distance), but not on i (absolute position) Advanced Econometrics I, Autumn 2010, Large-Sample Theory 24 Ergodicity A stationary process is also called ergodic if lim n→∞ E [f (zi, zi+1, ..., zi+k ) · g(zi+n, zi+n+1, ..., zi+n+l)] = E [f (zi, zi+1, ..., zi+k )] · E [g(zi+n, zi+n+1, ..., zi+n+l)] Ergodic Theorem: Sequence {zi} is stationary and ergodic with E(zi) = µ, then zn ≡ 1 n Pn → i=1 zi a.s. Advanced Econometrics I, Autumn 2010, Large-Sample Theory µ 25 Martingale difference sequence Stationarity and Ergodicity are not enough for applying the CLT. To derive the CAN property of the OLS-estimator we assume: {gi} = {xiεi} {gi} is a stationary and ergodic martingale difference sequence (m.d.s.): E(gi|gi−1, gi−2, ..., gi−j ) = 0 ⇒ E(gi) = 0 Implications of m.d.s. when 1 ∈ xi: εi and εi−j are uncorrelated, i.e. Cov(εi, εi−j ) = 0 Advanced Econometrics I, Autumn 2010, Large-Sample Theory 26 Large sample assumptions for the OLS estimator (2.1) Linearity: yi = x0iβ + εi ∀ i = 1, 2, ..., n (2.2) Ergodic Stationarity: the (K + 1)-dimensional vector stochastic process {yi, Xi} is jointly stationary and erogodic (2.3) Orthogonality/predetermined regressors: E(xik · εi) = 0 If xik = 1 ⇒ E(εi) = 0 ⇒ Cov(xik , εi) = 0 0 This can be written as E[xi · (yi − xiβ)] = 0 or E(gi) = 0, where gi ≡ xi · ε. (2.4) Rank condition: E(xix0i) ≡ ΣXX is non-singular K xK Advanced Econometrics I, Autumn 2010, Large-Sample Theory 27 Large sample assumptions for the OLS estimator (cont’d) (2.5) Martingale Difference Sequence (M.D.S): gi is a martingale difference sequence with finite second moments. It follows that; i. E(gi) = 0, 0 ii. The K × K matrix of cross moments E(gigi) is nonsingular 0 iii. S ≡ Avar(ḡ) = E(gigi), where (ḡ) √ ≡ of the asymptotic distribution of nḡ) 1 n P i gi . (Avar(ḡ) is the variance See Hayashi pp. 109-113 Advanced Econometrics I, Autumn 2010, Large-Sample Theory 28 Large sample distribution of the OLS estimator We get for b = (X0X)−1X0y: 1 Pn Pn 0 −1 1 0 b x x x y = n i i i i i=1 i=1 n n |{z} n indicates the dependence on the sample size Under WLLN and lemma 1: bn → β p √ a b) n(bn − β) → M V N (0, Avar(b)) or b ∼ M V N β, Avar( n d ⇒ bn is consistent, asymptotically normal (CAN) Advanced Econometrics I, Autumn 2010, Large-Sample Theory 29 How to estimate Avar(b) 0 −1 Avar(b) = Σ−1 xx E(gi gi )Σxx with gi = Xi εi Pn 1 0 0 x x → E(x x i i i i) i=1 n p Estimation of E(gigi0 ): Ŝ = 1 n " ⇒ P e2i xix0i → E(gigi0 ) p n X 1 \ Avar(b) = xix0i n i=1 #−1 " n X 1 Ŝ xix0i n i=1 #−1 → p Avar(b) = E(xix0i)−1E(gigi0 )E(xix0i)−1 Advanced Econometrics I, Autumn 2010, Large-Sample Theory 30 Developing a test statistic under the assumption of conditional homoskedasticity Assumption: E(ε2i |xi) = σ 2 #−1 " n #−1 " n n X X X 1 1 1 0 2 0 \ Avar(b) = xixi σ̂ xi xi xix0i n i=1 n i=1 n i=1 #−1 " n X 2 1 xix0i = σ̂ n i=1 Pn Pn with Ŝ = n1 i=1 e2i n1 i=1 xix0i Pn 2 1 Note: n i=1 ei is a biased estimate for σ 2 Advanced Econometrics I, Autumn 2010, Large-Sample Theory 31 White standard errors Adjusting the test statistics to make them robust against violations of conditional homoskedasticity t-ratio bk − β̄k a tk = s ∼ N (0, 1) P P P −1 −1 [ n1 ni=1 xix0i] n1 ni=1 e2i xix0i[ n1 ni=1 xix0i] n kk Holds under H0 : βk = β k F-ratio " W = (Rb − r) 0 \ Avar(b) R0 R n #−1 a (Rb − r)0 ∼ χ2(#r) Holds under H0 : Rβ − r = 0; allows for nonlinear restrictions on β Advanced Econometrics I, Autumn 2010, Large-Sample Theory 32 We show that bn = (X0X)−1X0y is consistent bn = [ n1 Pn 0 −1 1 x x i i] i=1 n ⇒ b − β} = | n{z 1 P n Pn 0 x y i i i=1 0 −1 1 xixi n P x i εi sampling error We show: bn → β p When sequence {yi, xi} allows application of WLLN ⇒ 1 n Pn E(xix0i) 1 n Pn E(xiεi) → 0 0 i=1 xi xi → p i=1 xi ε → p p Advanced Econometrics I, Autumn 2010, Large-Sample Theory 33 We show that bn = (X0X)−1X0y is consistent (continued) Lemma 1 implies: bn − β = X −1 X 1 1 xix0i xiεi n n → E(xix0i)−1E(xiεi) p → E(xix0i)−1 · 0 = 0 p bn = (X0X)−1X0y is consistent Advanced Econometrics I, Autumn 2010, Large-Sample Theory 34 We show that bn = (X0X)−1X0y is asymptotically normal P 1 Sequence {gi} = {xiεi} allows applying CLT for n xiεi = g √ 0 −1 n(g − E(gi)) → M V N (0, Σ−1 xx E(gi gi )Σxx ) d √ n(bn − β) = 1 P n √ 0 −1 ng xi xi Applying lemma 5: 1 P 0 −1 An = n xixi → A = Σ−1 xx p xn = ⇒ √ √ n g → x → M V N (0, E(gigi0 )) d d 0 −1 n(bn − β) → M V N (0, Σ−1 E(g g )Σ i xx xx ) i d ⇒ bn is CAN Advanced Econometrics I, Autumn 2010, Large-Sample Theory 35 9. Generalized Least Squares Hayashi p. 54-59 Advanced Econometrics I, Autumn 2010, Large-Sample Theory 36 Assumptions of GLS Linearity: yi = x0iβ + εi Full rank: rank(X) = K Strict exogeneity: E(εi|X) = 0 ⇒ E(εi) = 0 and Cov(εi, xik ) = E(εixik ) = 0 NOT assumed: V ar(ε|X) = σ 2In Instead:   V ar(ε|X) = E(εε |X) =   0 V ar(ε1 |X) Cov(ε1 , ε2 |X) Cov(ε1 , ε3 |X) .. . Cov(ε1 , εn |X) Cov(ε1 , ε2 |X) V ar(ε2 |X) Cov(ε2 , ε3 |X) ... V ar(ε3 |X) ... ... Cov(ε1 , εn |X) .. . .. . V ar(εn |X) ⇒ V ar(ε|X) = E(εε0|X) = σ 2V(X) Advanced Econometrics I, Autumn 2010, Large-Sample Theory 37     Deriving the GLS estimator Derived under the assumption that V(X) is known, symmetric and positive definite ⇒ V(X)−1 = C0C Transformation: ỹ = Cy X̃ = CX ⇒ y = Xβ + ε Cy = CXβ + Cε ỹ = X̃β + ε̃ Advanced Econometrics I, Autumn 2010, Large-Sample Theory 38 Least squares estimation of β̃ using transformed data β̂ GLS = (X̃0X̃)−1X̃0ỹ = (X0C0CX)−1X0C0Cy 0 1 −1 −1 0 1 = (X 2 V X) X 2 V−1y σ σ 0 −1 −1 0 −1 = X [V ar(ε|X)] X X0 [V ar(ε|X)] y GLS estimator is the best linear unbiased estimator (BLUE) Problems: Difficult to work out the asymptotic properties of β̂ GLS In real world applications V ar(ε|X) not known If V ar(ε|X) is estimated the BLUE-property of β̂ GLS is lost Advanced Econometrics I, Autumn 2010, Large-Sample Theory 39 Special case of GLS - weighted least squares  E(εε0|X) = V ar(ε|X) = σ 2 As V(X)−1 = C0C  1 √ V1 (X)  ⇒ C=  ⇒ argmin ... √ 1 0 .. . 0 V2 (X) ... Pn y1 i=1 0 si V1 (X) 0 .. . 0 0 V2 (X) 0 ... ...   1 s1   =  0 .. . 0 .. . ... 0 0 √ 1 ... 0 Vn (X) xiK xi2 − β̂1s−1 i − β̂2 si ... − β̂K si 0 .. . 0 VN (X) 0 0 1 s2 0 ...   = σ 2V ... 0 .. . ... 0 0 1 sn    2 Observations are weighted by standard deviation Advanced Econometrics I, Autumn 2010, Large-Sample Theory 40 10. Multicollinearity Advanced Econometrics I, Autumn 2010, Large-Sample Theory 41 Exact multicollinearity Expressing a regressor as linear combination of (an)other regressor(s) rank(X) 6= K: No full rank ⇒ Assumption 1.3 or 2.4 is violated (X0X)−1 does not exist Often economic variables are correlated to some degree BLUE result is not affected Large sample results are not affected relative results V ar(b|X) is affected in absolute terms Advanced Econometrics I, Autumn 2010, Large-Sample Theory 42 Effects of Multicollinearity and solutions to the problem Effects: - Coefficients may have high standard errors and low significance levels - Estimates may have the wrong sign - Small changes in the data produces wide swings in the parameter estimates Solutions: - Increasing precision by implementing more data. (Costly!) - Building a better fitting model that leaves less unexplained. - Excluding some regressors. (Dangerous! Omitted variable bias!) Advanced Econometrics I, Autumn 2010, Large-Sample Theory 43

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 7. Introduction to Large Sample Theory