Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Econ 322 Section 2: Econometrics An Estimator, An Estimate, A Statistic An Unbiased Estimator, A Consistent Estimator, A Sampling Distribution and A Central Limit Theorem Hiroki Tsurumi January 31 2005 Estimator, Estimate: The definition of an estimator and estimate is given on p.57 of the text. The word statistic is used in the field of statistics and it means an estimator or an estimate. On the other hand, in econometrics the word statistic is used to imply a test statistic such as the t-test statistic, F-test statistic, or chi-square statistic. Unbiased Estimator and Consistent Estimator: The definition is given on p.57 of the test. I will come back to unbiasedness and consistency later. A sampling distribution: Try to find the definition of a sampling distribution in the text. Does the text give a clear definition? Now read “Sampling Distribution” on my website. This write-up is a “cut & paste” from a website which I found by googling the words “ definition sampling distribution.” Question: Do you understand the difference between a sample distribution and a sampling distribution? Unbiased Estimator Definition: Let θb be an estimator of the (population) parameter θ. Then θb is said to be an unbiased estimator of θ if the expected value of θb equals θ: Eθb = θ. Examples of Unbiased Estimators: 1 1. Let x1 , x2 , · · · , xn be a random sample of size n drawn from a distrin 1X xi bution having Exi = µ for all i. Then the sample average x̄ = n i=1 is an unbiased estimator of µ : Ex̄ = µ. Proof: We take the expected value of x̄: ! n 1X 1 X Ex̄ = E xi = E xi n n i=1 = 1 n n X Exi (since the expected value of the sum is i=1 the sum of the expected values) n = 1X µ (since Exi = µ for all i) n i=1 = 1 1 (µ + µ + · · · + µ) = × n µ = µ. n n Remark: An unbiased estimator is not unique. For example, µ̃ ≡ 1 (x3 + x4 ) is also an unbiased estimator of µ. (Prove this.) We need 2 to choose an unbiased estimator out of infinitely many unbiased estimators. To do so, we need to introduce a criterion of choice (i.e. a religion). One popular religion (criterion) is efficiency. Efficiency is defined as follows: An unbiased estimator x̄ is said to be more efficient than another unbiased estimator µ̃ if Var(x̄) < Var(µ̃). Remark: To compare the variances of estimators, we need an assumption that the second moments exist. So, in addition to the assumption EXi = µ for all i = 1, · · · , n we need an assumption Var(Xi ) = σx2 < ∞ for all i = 1, · · · , n. 2 2. Let x1 , x2 , · · · , xn be a random sample of size n drawn from the Bernouille distribution with Pr(Xi = 1) = p. (The Bernoulli distribution is on pp.19-20 of the text.) Then the sample mean: x̄ = n 1 X xi n i=1 is an unbiased estimator of p. Show that E(x̄) = p. Consistency A consistent estimator is a property of convergence of a random variable to a constant. Definition: An estimator θb is said to be a consistent estimator if the random variable θb converges in probability to θ: p θb −→ µ, p where −→ denotes convergence in probability. To understand a consistent estimator, we need to discuss the (weak) law of large numbers.1 The weak law of large numbers is discussed on pp.43–44 and pp.577–580 of the text. The large numbers are in general averages of random variables. Hece, rather than X1 , X2 , · · · , Xn , · · ·, often we use S1 , S2 , · · · , Sn , · · · where, for example, 2 3 n i=1 i=1 i=1 1X 1X 1X X i , S3 = Xi , · · · , Sn = Xi . S1 = X 1 , S2 = 2 3 n An estimator is a large number (i.e. it is an average). Definition: Weak Law of Large Numbers (cf p.578 of the text) A sequence of random variables, S1 , S2 , · · · , Sn obeys the weak law of large numbers if Pr [|Sn − µ| ≥ δ] −→ 0 1 There are in general weak and strong laws of large numbers. The text presents the weak law of large numbers without attaching the adjective “weak. 3 p where δ > 0. We denote Sn −→ µ. We may use the Chebychev inequality to prove the weak law of large numbers. Central Limit Theorem The central limit theorem is discussed on pp.44–49 and pp.580–581 of the text. The key concept of the central theorem is given on p.49 of the test as Key Concept 2.7 using the sample mean Ȳ . The central limit theorem applies not only to the sample mean but also to a large numbers of estimators including for example an estimator for the standard deviation or for the skewness or the kurtosis. The Key Concept 2.7 is the central limit theorem for i.i.d.(independently and identically distributed) random variables. If we rephrase Key Concept 2.7, we may succinctly put: Ȳ − µy d −→ z ∼ N(0, 1) σȳ (1) where µy = EYi , and σȳ2 is the variance of the sample mean Ȳ . The notation d −→ denotes convergence in distribution. (Convergence in distribution is defined on p.580 of the text.) Given that Yi is an i.i.d. random variable, the central limit theorem (1) holds if and only if (i) EYi = µ, and (ii) Var(Yi ) = σ 2 for all i. (“if and only if” means that (i) and (ii) are the necessary and sufficient conditions.) Remarks: 1. What is the difference between the law of large numbers and the central limit theorem? The law of large numbers is the convergence of a sequence of random variables to a constant, while the central limit theorem is the convergence of a sequence of random variables to a random variable having a normal distribution. Putting it in symbols we have 4 Convergence of a sequence of random variables: p Ȳ −→ µ We see that Ȳ is a random variable, whreas µ is a constant. Convergence of a sequence of random variables to a random variable having a normal distribution: Ȳ − µy d −→ z ∼ N(0, 1). σȳ Ȳ − µ converges σȳ to a random variable z ∼ N(0, 1)? To answer this question, find Ȳ − µy Var(Ȳ ) and Var . σȳ 2. Can you explain why Ȳ converges to a constant, while Example of Weak Law of Large Number (consistency) and Central Limit Theorem Let us illustrate the weak law of large numbers and central limit theorem using the exponential distribution. The probability density function is given by x 1 , x > 0, b > 0 f (x) = exp b b where b is the parameter. The moment generating function, M (t), is M (t) = 1 , 1 − bt t < 1 . b The r-th moment about the origin can be computed from the moment generating function: dr M (t) E(X) = = r! br . dtr t=0 The mean and variance is mean: E(X) = b, and Var(X) = b2 . 5 Probability Density 2.00 1.50 1.00 0.50 0.00 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 x Figure 1: Probability Density Function of Exponential Variate, b = .5 Let us set b=.5. The pdf is given in Figure 1. Figure 1 Here. Let us obtain a random sample of size n from the exponential distribution by the random number generator: y ∼ −b ∗ rndus(n,1,seed) and as the estimator of b we use the sample mean: n X bb = 1 yi . n i=1 and we plot n against bb. The sample size and bb is tabulated in Table 1 for some selected sample sizes, and the estimates, bb are plotted against the 6 Table 1: sample size and bb, exponential distribution sample size n 1 20 40 60 80 100 .. . bb 0.00290479 0.52805296 0.51759626 0.43947914 0.51216424 0.46183617 .. . difference NA 0.52514816 −0.010456696 −0.078117123 0.072685096 −0.050328071 .. . 9920 9940 9960 9980 10000 0.50246047 0.49560024 0.51294362 0.49627085 0.51144123 −0.00074344187 −0.0068602281 0.017343383 −0.016672775 0.015170379 sample sizes in Figure 2. You see from Table 1 and Figure 2 that as the sample size increases the random variable bb converges to the population mean b. However large the sample size becomes, bb never becomes b. Table 1 and Figure 2 Here The GAUSS program for Table 1 and Figure 2 is given below: @=== convergence of a random variable to a constant exponential distribution program: wlln.pro ==========================@ new; library pgraph; seed=1357; /*=== pdf of exponential distribution with parameter b ===*/ b=.5; x=seqa(0,.1,50); fx=(1/b)*exp(-x/b); xy(x,fx); 7 /*===convergence of sample mean to population mean====*/ nn=seqa(0,20,501); nn[1]=1; i=1; m={}; do while i <= rows(nn); n=nn[i]; y=-b*ln(rndus(n,1,seed)); m=m|meanc(y); i=i+1; endo; mdif=m[2:rows(nn)]-m[1:rows(nn)-1]; mdif=0|mdif; graphset; xy(nn,m); end; In Figure 2.8 on p.48 of the text, the central limit thorem is demonstrated: the sampling distribution of the standardized sample average Ȳ − µ σȳ converges to the standardized normal distribution (N(0, 1)). In Figure 3 we present the kernel densities of the standardized sample means for the sample sizes of n = 2, 5, 25 and 100. These samples are random samples from the exponential distribution with b = .5. The summary statistics of mean, median, standard deviation, skewness, and kurtosis are presented in Table 2. We see from Table 2, as the sample size increases • the medians approach the means. 8 0.6 0.5 ^b 0.4 0.3 0.2 0.1 difference 0.0 -0.1 0 2000 4000 n 6000 8000 10000 Figure 2: Convergence of bb to b = .5 Table 2: summary statistics for the standardized sample means sample size n=2 n=5 n = 25 n = 100 mean −0.0161 −0.0541 −0.0132 0.0196 median −0.2301 −0.1889 −0.0515 −0.0059 std 0.9713 0.9960 0.9999 1.0025 skewness 1.2531 1.0176 0.3450 0.1700 kurtosis 4.8666 4.7606 3.3828 3.0892 • standard deviation approaches unity. • skewness approaches zero. • kurtosis approaches 3 (mesokurtic). Table 2 and Figure 3 Here. A GAUSS program for the demonstration of the central limit theorem is attached here. 9 0.5 0.5 0.4 density density 0.6 0.4 0.3 0.2 0.2 0.1 0.1 0.0 0.3 0.0 -2 -1 0 1 2 3 4 5 6 -3 -2 -1 0 1 2 3 4 5 6 0.400 0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.000 n=5 density density n=2 -3 -2 -1 0 1 2 3 4 5 n=25 0.400 0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.000 -4 -2 0 2 4 6 n=100 Figure 3: Convergence of Standardized Sample Mean to N(0,1) to b = .5 @===generating Figure 2.8 onp.48 of the text as well as showing convergence of a random variable to a constant exponential distribution program: clt.pro ==========================@ new; library pgraph; seed=12345; nn={2, 5, 25, 100}; /*===central limit therem====*/ nrept=2000; x={}; b=.5; 10 i=1; do while i <= rows(nn); n=nn[i]; xx={}; j=1; do while j <= nrept; y=-b*ln(rndus(n,1,seed)); sd=b/sqrt(n); z=(meanc(y)-b)/sd; xx=xx|z; j=j+1; endo; x=x~xx; i=i+1; endo; format /m1 /rd 8,4; skew={}; kurtos={}; i=1; do while i <=cols(x); x0=x[.,i]; sd=stdc(x0); s3=meanc((x0-meanc(x0))^3)/sd^3; s4=meanc((x0-meanc(x0))^4)/sd^4; skew=skew|s3; kurtos=kurtos|s4; i=i+1; endo; result=meanc(x)~median(x)~stdc(x)~skew~kurtos; print "mean, median, std, skew, kurtosis "; print result; {x1,den1}=kden(x[.,1]); {x2,den2}=kden(x[.,2]); {x3,den3}=kden(x[.,3]); {x4,den4}=kden(x[.,4]); /*===plotting kernel density estimates====*/ xy(x1,den1); xy(x2,den2); 11 xy(x3,den3); xy(x4,den4); xy(x1~x2~x3~x4,den1~den2~den3~den4); {c1,m1,freq1}=hist(x[.,1],40); {c2,m2,freq2}=hist(x[.,2],40); {c3,m3,freq3}=hist(x[.,3],40); {c4,m4,freq4}=hist(x[.,4],40); end; /* */ /* kernel density estimation */ /* */ proc(2)=kden(v); local g,h,j,nn,res; nn=rows(v); h=1.06*stdc(v)/nn^.2; g=0; j=1; do while j <= nn; g=g|meanc(pdfn((v-v[j])/h))/h; j=j+1; endo; res=sortc(v~g[2:nn+1],1); retp(res[.,1],res[.,2]); endp; 12