Download Notes on Asymptotic Theory: Convergence in - UNC

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill November 19, 2011 1 Introduction Let (¨  F  ) be a probability space. Throughout  is a parameter of interest like the mean, variance, correlation, or distribution parameters like Poisson , Binomial , or exponential . Throughout f^ g¸1 is a sequence of estimators of  based on a sample of data f g=1 with sample size  ¸ 1. Assume ^  is F-measurable for any . Unless otherwise noted, assume the 0  have the same mean and variance:  » ( 2 ). If appropriate, we may have a bivariate sample f   g=1 where  » (   2 ) and  » (  2 ). Examples include the sample mean, variance, or correlation:  X ¹ := 1 Sample Mean :    =1 Sample Variance #1 : 2  1 X := ( ¡ )2 ¡1 =1 Sample Variance #2 :  ^ 2 := 1   X =1 ¡ ¢ ¹ 2  ¡  ¢¡ ¢ P ¡ ¹  ¡ ¹ 1 =1  ¡  Sample Correlation :  ^ :=   ^ ^ Similarly, we may estimate a probability by using a sample relative frequency:  1X ^ () =  ( · ) the sample percentage of  ·   =1 1 Notice ^ () estimates  ( · ). We will look at estimator properties: what ^ is on average for any sample size; and what ^ becomes as the sample size grows. PIn every case¢ ¡above the ¢ es ¡ ¹ ¹ timator is a variant¡ of a straight average (e.g. 1 =1  ¡   ¡  is a ¢¡ ¢ ¹  ¡ ¹ ), or a function of a straight average (e.g.  straight average of  ¡  ^ ¢2 12 ¡ ¢2 P ¡ ¹ ¹ := (1 =1  ¡  ) , the square root of the average  ¡  ). We therefore pay particular attention to the sample mean. 2 Unbiasedness Defn. We say ^ is an unbiased estimator of  if [^ ] = . De…ne bias as ³ ´ B ^ := [^ ] ¡  An unbiased estimator has zero bias: B(^ ) = 0. If we had an in…nite number of samples of size , then the average estimate ^ across all samples would be . An asymptotically unbiased estimator satis…es B(^  ) ! 0 as  ! 1. Claim (Weighted Average): P Let  have a common mean  := [ ]. Then the ^  := =1   is an unbiased estimator of  := [] if P weighted average   = 1. =1  Proof:  "  X =1 #   =  X   [ ] =  =1  X =1  =  QED. ¹ := 1 P  is a Corollary (Straight Average): The sample mean  =1 P weighted average with ‡at or uniform weights  = 1 hence trivially =1  = 1 hence ¹ =  [] P The problem then arises as to which weighted average =1   may be preferred in practice since any with unit summed weights is unbiased. We will discuss below the concept of e¢ciency below, but the minimum mean-squared-error unbiased estimator  has uniform weights if  » ( 2 ). That is:  ¹ is the best linear Claim (Sample Mean is Best): Let  » ( 2 ). Then  unbiased estimator of  (i.e. it is BLUE). Proof: We want to solve Ã  !2  X X min    subject to  = 1  =1 =1 The Lagrange is L ( ) :=  Ã  X   =1 2 !2 Ã + 1¡  X =1  ! P P where by independence  ( =1   )2 =  2 =1 2 , hence Ã !   X X L ( ) := 2 2 +  1 ¡  =1 =1 The …rst order conditions are  X   L ( ) = 2 2 ¡  = 0 and L ( ) = 1 ¡  = 0   =1 P 2 2 ThereforeP = (2 P) is a constant that sums to =1  = 1. Write  = (2 ) =:  . Since =1  = =1  =  = 1 it follows  =  = 1. QED. Remark: As in many cases here and below, independence can be substituted for uncorrelatedness since the same proof applies: [  ] = [ ][ ] for all  6= . We can also substitute uncorrelatedness with a condition that restricts the total correlation across all  and  for  6= , but such generality is typically only exploited in time series settings (where  is  at a di¤erent time period).  Claim (Sample Variance): Let  » ( 2 ). The estimator 2 is unbiased 2 and  ^  is negatively biased but asymptotically unbiased. Proof: Notice   =1 =1 X¡ ¢ ¢ ¡1 2 1 X¡ ¹ 2= 1 ¹ 2  =  ^ 2 =  ¡   ¡  +  ¡     =    ¢2 ¡ ¢ 1X 1 X¡ 1X 2 ¹ ¹ ( ¡ ) + ¡ +2 ( ¡ )  ¡     =1 = =1 =1   ¡ ¢2 ¡ ¢1X 1X 2 ¹ ¹ ( ¡ ) +  ¡  ¡ 2  ¡  ( ¡ )   =1 = =1  ¡ ¢ ¡ ¢¡ ¢ 1X ¹ ¡ 2¡2  ¹ ¡  ¹ ¡ ( ¡ )2 +   =1 =  ¡ ¢ 1X ¹ ¡ 2 ( ¡ )2 ¡   =1 ¹ is unbiased By the iid assumption and the fact that  Ã  !  X ¡ ¢2 1 1 X 1 ¹   ¡ =  = 2  ( ) = 2  2 = 2     =1 =1 Further, by de…nition  2 := [( ¡ )2 ] hence "  #   i 1X 1X 1X h 2 2  ( ¡ ) =  ( ¡ ) =  2 = 2     =1 =1 =1 3 Therefore £ 2¤  ¡ 1 £ 2¤ ¡1   =   ^  =  2 ¡ 2  = 2    £ ¤ £ 2¤ This implies each claim:  2 =£  2¤ (2 is unbiased),   ^  = 2 ( ¡ 1)  2 2 2 2 (^   is negatively biased), and   ^  =  ( ¡ 1) ! 2 (^ 2 is asymptotically unbiased). QED.  Example: We simulate 100 samples of  » (75 4) with sample size  = 20. ¹ for each sample. The simulation average of all  ¹ is 74.983941 In Figure 1 we plot  ¹ and the simulation variance of Pall  is 21615195. P In Figure 2 we plot  ^  = =1   for each sample with weights  =  =1 . The simulation average of all  ^  is 74.982795 and the simulation variance of all  ^ ¹ is .30940776. Thus, both display the same property of unbiasedness, but  exhibits less dispersion across samples ¹ Figure 1 :  3 Figure 2 :  ^ Convergence in Mean-Square or L -Convergence Defn. We also write We say ^ 2 R converges to  in mean-square if ³ ´2 MSE(^  ) :=  ^ ¡  ! 0 ^  !  and ^ !  in mean-square. If ^ is unbiased for  then ³ h i´2 h i MSE(^ ) =  ^  ¡  ^ =  ^   Convergence in mean-square certainly does not require unbiasedness. In the, MSE is ³ ´2 ³ h i h i ´2 MSE(^ ) =  ^ ¡  =  ^ ¡  ^ +  ^ ¡  ³ h i´2 ³ h i ´2 ³ h i´ ³ h i ´ =  ^ ¡  ^ +   ^ ¡  + 2 ^  ¡  ^  ^  ¡  ³ h i´2 ³ h i ´2 =  ^ ¡  ^ +   ^ ¡  4 h i ³ h i´ h i since  ^ ¡  is just a constant and  ^ ¡  ^ = [^ ] ¡  ^ = 0. Hence MSE is the variance plus bias squared: ³ h i´2 ³ h i ´2 h i ³ ³ ´´2 MSE(^ ) =  ^  ¡  ^ +   ^ ¡  =  ^ + B ^   If ^ 2 R then we write ³ ´³ ´0 MSE(^ ) :=  ^ ¡  ^ ¡  ! 0 hence component wise convergence. We may similarly write convergence in 2 -norm 0 112 ° ³  X  ´³ ´0 ° X ° ° ° ^  ¡  ^ ¡  ° ! 0 where kk := @ 2 A 2 ° ° 2 =1 =1 or convergence in matrix (spectral) norm: ° ³ ´³ ´0 ° ° ° ° ^ ¡  ^  ¡  ° ! 0 where kk is the largest eigenvalue of . ° ° ³ ´2 Both imply convergence with respect to each element  ^ ¡  ! 0. Defn. We say ^ 2 R has the property of  -convergence, or convergence in  -norm, to  if for   0 ¯ ¯ ¯ ¯  ¯^  ¡ ¯ ! 0 Clearly 2 -convergence and mean-square convergence are equivalent. Claim (Sample Mean): Proof:  ¹ !  in mean square. Let  » ( 2 ). Then  ¹ ¡ )2 =  [] ¹ = 2  ! 0 QED. ( ¹ =  2  still holds. We only require uncorrelatedness since  [] Claim (Sample Mean): mean square. Proof: ¹ !  in Let  » ( 2 ) be uncorrelated. Then  ¹ ¡ )2 =  [] ¹ = 2  ! 0 QED. ( In fact, we only need all cross covariances to not be too large as the sample size grows. P Claim (Sample Mean): Let  » ( 2 ) satisfy 12  (   ) ! 0. ¹ !  in mean square. Then  ¹ ¡ )2 =  [] ¹ = 2  + 2¡2 P (   ) ! 0 QED. Proof: (  Remark: In micro-economic contexts involving cross-sectional data this type of correlatedness is evidently rarely or never entertained. Typically we assume the 0  are uncorrelated. It is, however, profoundly popular in macroeconomic and 5 …nance contexts where data are time series. A very large P class of time series random 2 variables satis…es both (   ) 6= 0 8 6=  and 1  (   ) ! 0, and ¹ therefore exhibits  !  in mean square.  ¹ !  in  -norm for any  2 (1 2] but proving the result If  » ( 2 ) then  for non-integer  2 (1 2) is quite a bit more di¢cult. There are many types of "maximal inequalities", however, that can be used to prove ¯ ¯  ¯X ¯ ¯ ¯ ¯  ¯ ·  for  2 (1 2) where   0 is a …nite constant. ¯ ¯ =1 ¹ !  in  -norm for Let  » (  2 ) be iid. Then  Claim (Sample Mean): any  2 (1 2). Proof: ¯  ¯ ¯  ¯ ¯1 X ¯ ¯ 1 ¯¯X 1 1 ¯ ¯ ¯ ¯  ¡ ¯ =   ¯ f ¡ g¯ ·   =  ¡1 ! 0 ¯ ¯ ¯ ¯    =1 =1 since   1 QED.  Example: We simulate  » (7 400) with sample sizes  = 5 15 25  1000. ¹ and  [] ¹ = 400 over sample size . Notice the high volatility In Figure 3 we plot  for small . ¹ and  [] ¹ Figure 3:  4 Convergence in Probability : WLLN Defn. We say ^ converges in probability to  if ¯ ³¯ ´ ¯ ¯ lim  ¯^ ¡ ¯   = 0 8  0 !1 We variously write  ^ !  and ^ !   6 (1) and we say ^ is a consistent estimator of . Since probability convergence is convergence in the sequence f (j^ ¡ j  by the de…nition of a limit it follows for every    0 there exists   0 such that ¯ ³¯ ´ ¯^ ¯  ¯  ¡ ¯ ·   1 ¡  8 ¸   )g1 =1 , That is, for a large enough sample size ^ is guaranteed to be as close to  as we choose (i.e. the ) with as a great a probability as we choose (i.e. 1 ¡ ). Claim (Law of Large Numbers = LLN): Proof:   ¹ ! If  » ( 2 ) then  . By Chebyshev’s inequality and independence, for any   0 2 ¯ ¡¯ ¢ ¡ ¢ ¹ ¡ ¯   · ¡2   ¹ ¡  2 = ¡2  ! 0 QED  ¯  Remark 1: We call this a Weak Law of Large Numbers [WLLN] since convergence is in probability. A Strong LLN based on a stronger form of convergence is given below. ¡ ¢ ¹ ¡  2 = 2  ! 0. Remark 2: We only need uncorrelatedness to get   The WLLN, however extends to many forms of dependent random variables. Remark 3: In the iid case we only need j j  1, although the proof is substantially more complicated. Even for non-iid data we typically only need j j1+  1 for in…nitessimal   0 (pay close attention to scholarly articles you read, and to your own assumptions: usually far stronger assumptions are imposed than are actually required). P The weighted average =1   is also consistent as long as the weights decay with the sample size. Thus we write the weight as  . P P P   Claim: If  » (  2 ) then =1   !  if =1  = 1 and =1 2 ! 0. P Proof: By Chebyshev’s inequality, independence and =1  = 1, for any   0 ¯ Ã¯  ! Ã  !2 Ã  !2 ¯X ¯ X X ¯ ¯  ¯   ¡ ¯   · ¡2    ¡  = ¡2   f ¡ g ¯ ¯ =1 =1 = ¡2  X =1 =1  h i X 2  ( ¡ )2 = ¡2 2 2 ! 0 =1 which proves the claim. QED. ¹ with  = 1, but also the weights  =  P  used in An example is  =1 Figure 2.  Example: We simulate  » (75 20) sample sizes  = 5 15 25  10000. Pwith  ¹ In Figures 4 and 5 we plot  and  ^  = =1   over sample size . Notice the high volatility for small . 7 ¹ Figure 4 :  Figure 5 :  ^ 79 79 78 78 77 77 76 76 75 75 74 74 73 73 72 72 71 71 70 70 5 1005 2005 3005 4005 5005 6005 7005 8005 9005 5 1005 2005 3005 Sam ple Size n 4005 5005 6005 7005 8005 9005 Sam ple Size n  Claim (Slutsky Theorem): Let ^ 2 R . If ^ !  and  : R ! R is  continuous (except possibly with countably many discontinuity points) then (^ ) ! ().    Corollary: Let ^  !  ,  = 1 2. Then ^ 1 §^ 2 ! 1 § 2 , ^1 £^2 !  1 £2 ,  and if 2 6= 0 and lim inf !1 j^2 j  0 then ^1 ^2 ! 1  2 . Claim: Proof:   If  » ( 2 ) and [4 ]  1 then 2 ! 2 . Note   ¢2 ¡ ¢ ¡1 2 1 X¡ 1X ¹ ¹ ¡ 2  =  ¡  = ( ¡ )2 ¡     =1 =1 ¡ ¢   ¹ ! ¹ ¡ 2 ! By LLN  , therefore by the Slutsky Theorem  0. By [4 ]  1 it follows ( ¡ )2 is iid with a …nite variance, hence it satis…es the LLN: P  1 =1 ( ¡ )2 ! [( ¡ )2 ] = 2 . QED. Claim:   If    » (     2  2 ) and [2 2 j  1 then the sample correlation  ^ !  the population correlation.   Example: We simulate  » (7 400) and  » (0 900) and construct  = ¡43 + 2 +  . The true correlation is £ ¤ ¡43 [ ] + 2 2 ¡ 7 £ (¡43 + 2 £ 7)  [  ] ¡  [ ]  [ ] p  = =  20 £ 4 £ 400 + 900 ¡ ¢ ¡43 £ 7 + 2 400 + 72 ¡ 7 (¡43 + 2 £ 7) p = = 8 20 £ 4 £ 400 + 900 We estimate correlation for samples with size  = 5 15 25  10000. Figure 6 demonstrates consistency and therefore the Slutsky Theorem. 8 Figure 6: Correlation 1.00 0.90 0.80 0.70 0.60 0.50 5 5 1005 2005 3005 4005 5005 6005 7005 8005 9005 Sample Size n Almost Sure Convergence : SLLN Defn. We say ^ converges almost surely to  if ³ ´  lim ^ =  = 1 !1 This is identical to µ ¶ ¯ ¯ ¯^ ¯ lim  max ¯ ¡ ¯   = 0 8  0 !1 We variously write ¸  ^ !  and ^ !   and we say ^ is strongly consistent for . We have the following relationships. Claim: Proof:      ^ !  implies ^ ! ; . ^ !  implies ^ ! .   (j^ ¡ j  ) · ¡2 (^  ¡ )2 by Chebyshev’s inequality. If (^ ¡ )2 ! 0 (i.e.   ^  ! ) then  (j^  ¡ j  ) ! 0 where   0 is arbitrary. Therefore ^ ! .   (j^ ¡ j  ) ·  (sup¸ j^ ¡ j  ) since sup¸ j^ ¡ j ¸ j^  ¡ j.  Therefore if  (sup¸ j^ ¡ j  ) ! 0 8  0 (i.e. ^  ! ) then  (j^  ¡ j   ) ! 0 8  0 (i.e. ^ ! ). QED.  If ^ is bounded wp1 then ^ !  if and only if [^ ] !  which is asymptotic un  biasedness (see Bierens). By the Slutsky Theorem ^ !  implies (^  ¡)2 ! 0 hence [(^  ¡ )2 ] ! 0: convergence in probability implies convergence in mean-square. This proves the following (and gives almost sure convergence as the "strongest" form: the one that implies all the rest). Claim (a.s. =) i.p. =) m.s.): Let ^ be bounded wp1:  (j^ j · ) = 1 9   for …nite   0. Then ^  !  implies ^ !  implies asymptotic unbiasedness and  ^  ! . Claim (Strong Law of Large Numbers = SLLN):  ! . Remark: Example:  ¹ If  » (  2 ) then  The Slutsky Theorem carries over to strong convergence.  Let  » (  2 ) and de…ne ^ := 1  1 + ¹ ¹  Then  (j^ j · ) = 1. Moreover, under the iid assumption  !  by the SLLN, hence by the Slutsky Theorem 1 ^  !  1 +  Therefore 1  ^  ! 1 +  and [^ ] !  = 1(1 +  ) and ·³ ´2 ¸ ^   ¡  ! 0 6 Convergence in Distribution : CLT Defn. We say ^ converges in distribution to a distribution  , or to a random variable  with distribution  , if ³ ´ lim  ^ ·  =  () for every  on the support  . !1 Thus, while ^ may itself not be distributed  , asymptotically it is. We write   ^ !  or ^  !  where  »  .  The notation ^ !  is a bit awkward, because  characterizes in…nitely many random variables. We are therefore saying there is some random draw  from  that ^ is becoming. Which random draw is not speci…ed. 6.1 Central Limit Theorem ¹ Convergence of some By far the most famous result concerns the sample mean . ^ estimator  in a monumentally large number of cases reduces to convergence of a sample mean of something, call it  . This carries over to the sample correlation, regression model estimation methods like Ordinary Least Squares, GMM, and Maximum Likelihood, as well as non-parametric estimation, and on and on. 10 As usual, we limit ourselves to the iid case. The following substantially carries over to non-iid data, and based on a rarely cited obscure fact does not even require a …nite variance (I challenge you to …nd a proof of this, or to ever discover any econometrics textbook that accurately states this). Claim (Central Limit Theorem = CLT):  :=  If  » ( 2 ) then ¢ p ¡  ¹ ¡   !    (0 1)  Remark 1: This is famously cites as the Lindeberg–Lévy CLT. Historically, however, the proof arose in di¤erent camps sometime between 1910-1930 (covering Lindeberg, Lévy, Chebyshev, Markov and Lyapunov). ¡ ¢ p ¹ Remark 2: Notice by construction  :=   ¡   is a standardized ¹ =  by identical distributedness and  [] ¹ = 2  by sample mean because [] independence and identical distributedness. Thus ¡ ¢ ¡ ¢ ¹ ¡ ¹ ¡ ¹ ¡ [] ¹ p    p  :=  = =  ¹     [] Therefore ¢ p ¡¹   ¡   has mean 0 and variance 1: " ¡ ¢# p ¹ ¡ ¢ p   ¡ £ ¹¤   =   ¡ =0   " # ¡ ¢ ¹ ¡ p   £ ¹¤  2   = 2  = 2 = 1     Thus, even as  ! 1 the random variable  » (0 1). Although this is a long way from proving  has a de…nable distribution, even in the limit, it does help to point p ¹ for otherwise we simply have out that the term  ! 1 is necessary to stabilize , ¡ ¢  ¹ ¡   ! 0.  ¢ p ¡¹ Remark 3: Asymptotically  :=   ¡   has a standard normal density (2)¡1 expf¡ 2 2g. Proof: De…ne  := ( ¡ ), hence  ¢ p ¡ 1 X ¹  :=   ¡   = p    =1 2 We will show the characteristic function [ ] ! ¡ 2 . The latter is the characteristic function of a standard normal, while characteristic functions and distributions have a unique correspondence: only standard normals have a characteristic function 2 like ¡ 2 . 11 ¡12   By independence and identical distributednessNow expand  0 by a second order Taylor expansion: " # h i h i  Y ¡12   ¡12      =1   =   =  around  = (2) =1 =  Y =1 ¡12    h i ³ h i´ ¡12  ¡12      =    = 1 + ¡12   1 2 + 2 2 +  1!  2! = 1 + ¡12   1 2 ¡ 2 +   1!  2! where  is a remainder term that is a function of  12 . Now take the expectations as in (2), and note [ ] = [( ¡)] = 0 and [2 ] = [( ¡)2 ]2 = 2 2 = 1: h i ¡12   1 £ ¤ 2    = 1 + ¡12  [ ] ¡  2 +  [ ] 1!  2! 1 2 = 1¡ +    2 where  := [ ] ¡12  ¡12   is a bounded random variable, in particular j j It is easy to prove   1 wp1 (see Bierens) so even if  does not have higher moments we know j j  1. ¡12   ] ! 1. Further  ! 0 because [ Now take the -power in (2): by the Binomial expansion µ ¶ X ¶¡  µ ¶µ ³ h i´ 1 2  1 2 ¡12    = 1¡ +  = 1¡   2   2 =0 µ µ ¶¡  2 ¶ X ¶µ 1 1 2 = 1¡ + 1¡    2   2 =1 The …rst term satis…es µ ¶ 2 1 2 1¡ ! ¡ 2  2 because the sequence f(1 + ) g¸1 converges: (1 ¡ ) !  (simply put  = ¡2 2). For the second term notice for large enough  we have j1 ¡ ¡1 2 2j¡ · 1 hence ¯  µ ¶µ ¶¡ ¯¯ X  µ ¶  µ ¶ ¯X  1 2   X   ¯ ¯ 1¡  ¯ ·  ·  = (1 +  )  ¯ ¯ ¯   2   =1 =1 12 =0 See Bierens for details that verify (1 +  ) ! 0. QED. Example (Bernoulli): The most striking way to demonstrate the CLT is to begin with the least normal of data, a Bernoulli random variable which is discrete ¢ p ¡¹  and takes only two …nite values, and show   ¡   !  (0 1), a continuous random variable with in…nite support.  We simulate  » (2) for  = 5, 50 500, 10000 and compute ¡ ¢ ¡ ¢ ¡ ¢ ¹ ¡ ¹ ¡ 2 ¹ ¡ 2 p  p  p   :=  = p =    4 2 £ 8 In order to show the small sample distribution of  we need a sample of 0 , so we repeat the simulation 1000 times. We plot the relative frequencies of the sample of 0  for each . Let f g1000 the simulated sample of 0 . The =1 beP relative frequencies are the percentage 11000 1000 =1 ( ·   +1 ) for interval endpoints  = [¡5 ¡49 ¡48  49 50]. See Figure 7. For the sake of comparison in Figure 8 we plot the relative frequencies for one sample of 1000 iid standard normal  random variables  » (0 1). Another way to see how  becomes a standard normal random variable is to compute the quantile  such that  ( ·  ) = 975. A standard normal  satis…es  ( · 196) = 975. We call  an empirical quantile since it is based on a simulated set of samples. We simulate 10,000 samples for each size  = 5 105 205 ..., 5005 and compute  . See Figure 9. As  increases  ! 196. Figure 7 Standardized Means for Bernoulli 1000 0 ,  = 5 1000 0 ,  = 50 1000 0 ,  = 500 1000 0 ,  = 5000 13 Figure 8 Standard Normal Standard Normal Figure 9 - Empirical Quantiles q 2.3 2.2 2.1 2.0 1.9 1.8 1.7 5 505 1005 1505 2005 2505 3005 3505 4005 4505 5005 Sample Size n 14

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes on Asymptotic Theory: Convergence in - UNC