Download Estimator, et. al

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Series (mathematics) wikipedia , lookup

Tweedie distribution wikipedia , lookup

German tank problem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Econ 322 Section 2: Econometrics
An Estimator, An Estimate, A Statistic
An Unbiased Estimator, A Consistent Estimator,
A Sampling Distribution and A Central Limit Theorem
Hiroki Tsurumi
January 31 2005
Estimator, Estimate: The definition of an estimator and estimate is given
on p.57 of the text. The word statistic is used in the field of statistics and
it means an estimator or an estimate. On the other hand, in econometrics
the word statistic is used to imply a test statistic such as the t-test
statistic, F-test statistic, or chi-square statistic.
Unbiased Estimator and Consistent Estimator: The definition is
given on p.57 of the test. I will come back to unbiasedness and consistency
later.
A sampling distribution: Try to find the definition of a sampling distribution in the text. Does the text give a clear definition?
Now read “Sampling Distribution” on my website. This write-up is
a “cut & paste” from a website which I found by googling the words “
definition sampling distribution.”
Question: Do you understand the difference between a sample distribution and a sampling distribution?
Unbiased Estimator
Definition: Let θb be an estimator of the (population) parameter θ. Then
θb is said to be an unbiased estimator of θ if the expected value of θb equals
θ:
Eθb = θ.
Examples of Unbiased Estimators:
1
1. Let x1 , x2 , · · · , xn be a random sample of size n drawn from a distrin
1X
xi
bution having Exi = µ for all i. Then the sample average x̄ =
n
i=1
is an unbiased estimator of µ :
Ex̄ = µ.
Proof: We take the expected value of x̄:
!
n
1X
1 X Ex̄ = E
xi = E
xi
n
n
i=1
=
1
n
n
X
Exi
(since the expected value of the sum is
i=1
the sum of the expected values)
n
=
1X
µ (since Exi = µ for all i)
n
i=1
=
1
1
(µ + µ + · · · + µ) = × n µ = µ.
n
n
Remark: An unbiased estimator is not unique. For example, µ̃ ≡
1
(x3 + x4 ) is also an unbiased estimator of µ. (Prove this.) We need
2
to choose an unbiased estimator out of infinitely many unbiased estimators. To do so, we need to introduce a criterion of choice (i.e. a
religion). One popular religion (criterion) is efficiency. Efficiency is
defined as follows:
An unbiased estimator x̄ is said to be more efficient than another
unbiased estimator µ̃ if
Var(x̄) < Var(µ̃).
Remark: To compare the variances of estimators, we need an assumption that the second moments exist. So, in addition to the assumption
EXi = µ for all i = 1, · · · , n we need an assumption
Var(Xi ) = σx2 < ∞ for all i = 1, · · · , n.
2
2. Let x1 , x2 , · · · , xn be a random sample of size n drawn from the Bernouille
distribution with Pr(Xi = 1) = p. (The Bernoulli distribution is on
pp.19-20 of the text.) Then the sample mean:
x̄ =
n
1 X
xi
n
i=1
is an unbiased estimator of p.
Show that E(x̄) = p.
Consistency
A consistent estimator is a property of convergence of a random variable
to a constant.
Definition: An estimator θb is said to be a consistent estimator if the random
variable θb converges in probability to θ:
p
θb −→ µ,
p
where −→ denotes convergence in probability.
To understand a consistent estimator, we need to discuss the (weak)
law of large numbers.1 The weak law of large numbers is discussed on
pp.43–44 and pp.577–580 of the text. The large numbers are in general
averages of random variables. Hece, rather than X1 , X2 , · · · , Xn , · · ·, often
we use S1 , S2 , · · · , Sn , · · · where, for example,
2
3
n
i=1
i=1
i=1
1X
1X
1X
X i , S3 =
Xi , · · · , Sn =
Xi .
S1 = X 1 , S2 =
2
3
n
An estimator is a large number (i.e. it is an average).
Definition: Weak Law of Large Numbers (cf p.578 of the text) A
sequence of random variables, S1 , S2 , · · · , Sn obeys the weak law of large
numbers if
Pr [|Sn − µ| ≥ δ] −→ 0
1
There are in general weak and strong laws of large numbers. The text presents the
weak law of large numbers without attaching the adjective “weak.
3
p
where δ > 0. We denote Sn −→ µ. We may use the Chebychev inequality
to prove the weak law of large numbers.
Central Limit Theorem
The central limit theorem is discussed on pp.44–49 and pp.580–581 of
the text. The key concept of the central theorem is given on p.49 of the
test as Key Concept 2.7 using the sample mean Ȳ . The central limit
theorem applies not only to the sample mean but also to a large numbers
of estimators including for example an estimator for the standard deviation
or for the skewness or the kurtosis.
The Key Concept 2.7 is the central limit theorem for i.i.d.(independently
and identically distributed) random variables. If we rephrase Key Concept 2.7, we may succinctly put:
Ȳ − µy d
−→ z ∼ N(0, 1)
σȳ
(1)
where µy = EYi , and σȳ2 is the variance of the sample mean Ȳ . The notation
d
−→
denotes convergence in distribution. (Convergence in distribution is defined
on p.580 of the text.) Given that Yi is an i.i.d. random variable, the central
limit theorem (1) holds if and only if
(i) EYi = µ,
and (ii) Var(Yi ) = σ 2
for all i.
(“if and only if” means that (i) and (ii) are the necessary and sufficient
conditions.)
Remarks:
1. What is the difference between the law of large numbers and the central
limit theorem? The law of large numbers is the convergence of a
sequence of random variables to a constant, while the central limit
theorem is the convergence of a sequence of random variables to a
random variable having a normal distribution. Putting it in symbols
we have
4
Convergence of a sequence of random variables:
p
Ȳ −→ µ
We see that Ȳ is a random variable, whreas µ is a constant.
Convergence of a sequence of random variables to a random variable having a normal distribution:
Ȳ − µy d
−→ z ∼ N(0, 1).
σȳ
Ȳ − µ
converges
σȳ
to a random variable z ∼ N(0, 1)? To answer this question, find
Ȳ − µy
Var(Ȳ ) and Var
.
σȳ
2. Can you explain why Ȳ converges to a constant, while
Example of Weak Law of Large Number (consistency) and
Central Limit Theorem
Let us illustrate the weak law of large numbers and central limit theorem
using the exponential distribution. The probability density function is given
by
x
1
, x > 0, b > 0
f (x) = exp
b
b
where b is the parameter. The moment generating function, M (t), is
M (t) =
1
,
1 − bt
t <
1
.
b
The r-th moment about the origin can be computed from the moment generating function:
dr M (t) E(X) =
= r! br .
dtr t=0
The mean and variance is
mean: E(X) = b,
and Var(X) = b2 .
5
Probability Density
2.00
1.50
1.00
0.50
0.00
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
x
Figure 1: Probability Density Function of Exponential Variate, b = .5
Let us set b=.5. The pdf is given in Figure 1.
Figure 1 Here.
Let us obtain a random sample of size n from the exponential distribution
by the random number generator:
y ∼ −b ∗ rndus(n,1,seed)
and as the estimator of b we use the sample mean:
n
X
bb = 1
yi .
n
i=1
and we plot n against bb. The sample size and bb is tabulated in Table 1
for some selected sample sizes, and the estimates, bb are plotted against the
6
Table 1: sample size and bb, exponential distribution
sample size n
1
20
40
60
80
100
..
.
bb
0.00290479
0.52805296
0.51759626
0.43947914
0.51216424
0.46183617
..
.
difference
NA
0.52514816
−0.010456696
−0.078117123
0.072685096
−0.050328071
..
.
9920
9940
9960
9980
10000
0.50246047
0.49560024
0.51294362
0.49627085
0.51144123
−0.00074344187
−0.0068602281
0.017343383
−0.016672775
0.015170379
sample sizes in Figure 2. You see from Table 1 and Figure 2 that as the
sample size increases the random variable bb converges to the population
mean b. However large the sample size becomes, bb never becomes b.
Table 1 and Figure 2 Here
The GAUSS program for Table 1 and Figure 2 is given below:
@=== convergence of a random variable to a constant
exponential distribution
program: wlln.pro ==========================@
new;
library pgraph;
seed=1357;
/*=== pdf of exponential distribution with parameter b ===*/
b=.5;
x=seqa(0,.1,50);
fx=(1/b)*exp(-x/b);
xy(x,fx);
7
/*===convergence of sample mean to population mean====*/
nn=seqa(0,20,501);
nn[1]=1;
i=1;
m={};
do while i <= rows(nn);
n=nn[i];
y=-b*ln(rndus(n,1,seed));
m=m|meanc(y);
i=i+1;
endo;
mdif=m[2:rows(nn)]-m[1:rows(nn)-1];
mdif=0|mdif;
graphset;
xy(nn,m);
end;
In Figure 2.8 on p.48 of the text, the central limit thorem is demonstrated:
the sampling distribution of the standardized sample average
Ȳ − µ
σȳ
converges to the standardized normal distribution (N(0, 1)). In Figure 3
we present the kernel densities of the standardized sample means for the
sample sizes of n = 2, 5, 25 and 100. These samples are random samples
from the exponential distribution with b = .5. The summary statistics of
mean, median, standard deviation, skewness, and kurtosis are presented in
Table 2. We see from Table 2, as the sample size increases
• the medians approach the means.
8
0.6
0.5
^b
0.4
0.3
0.2
0.1
difference
0.0
-0.1
0
2000
4000
n
6000
8000
10000
Figure 2: Convergence of bb to b = .5
Table 2: summary statistics for the standardized sample means
sample size
n=2
n=5
n = 25
n = 100
mean
−0.0161
−0.0541
−0.0132
0.0196
median
−0.2301
−0.1889
−0.0515
−0.0059
std
0.9713
0.9960
0.9999
1.0025
skewness
1.2531
1.0176
0.3450
0.1700
kurtosis
4.8666
4.7606
3.3828
3.0892
• standard deviation approaches unity.
• skewness approaches zero.
• kurtosis approaches 3 (mesokurtic).
Table 2 and Figure 3 Here.
A GAUSS program for the demonstration of the central limit theorem
is attached here.
9
0.5
0.5
0.4
density
density
0.6
0.4
0.3
0.2
0.2
0.1
0.1
0.0
0.3
0.0
-2 -1 0 1 2 3 4 5 6
-3 -2 -1 0 1 2 3 4 5 6
0.400
0.350
0.300
0.250
0.200
0.150
0.100
0.050
0.000
n=5
density
density
n=2
-3 -2 -1 0 1 2 3 4 5
n=25
0.400
0.350
0.300
0.250
0.200
0.150
0.100
0.050
0.000
-4
-2
0
2
4
6
n=100
Figure 3: Convergence of Standardized Sample Mean to N(0,1) to b = .5
@===generating Figure 2.8 onp.48 of the text as well as showing
convergence of a random variable to a constant
exponential distribution
program: clt.pro ==========================@
new;
library pgraph;
seed=12345;
nn={2, 5, 25, 100};
/*===central limit therem====*/
nrept=2000;
x={};
b=.5;
10
i=1;
do while i <= rows(nn);
n=nn[i];
xx={};
j=1;
do while j <= nrept;
y=-b*ln(rndus(n,1,seed));
sd=b/sqrt(n);
z=(meanc(y)-b)/sd;
xx=xx|z;
j=j+1;
endo;
x=x~xx;
i=i+1;
endo;
format /m1 /rd 8,4;
skew={};
kurtos={};
i=1;
do while i <=cols(x);
x0=x[.,i];
sd=stdc(x0);
s3=meanc((x0-meanc(x0))^3)/sd^3;
s4=meanc((x0-meanc(x0))^4)/sd^4;
skew=skew|s3;
kurtos=kurtos|s4;
i=i+1;
endo;
result=meanc(x)~median(x)~stdc(x)~skew~kurtos;
print "mean, median, std, skew, kurtosis ";
print result;
{x1,den1}=kden(x[.,1]);
{x2,den2}=kden(x[.,2]);
{x3,den3}=kden(x[.,3]);
{x4,den4}=kden(x[.,4]);
/*===plotting kernel density estimates====*/
xy(x1,den1);
xy(x2,den2);
11
xy(x3,den3);
xy(x4,den4);
xy(x1~x2~x3~x4,den1~den2~den3~den4);
{c1,m1,freq1}=hist(x[.,1],40);
{c2,m2,freq2}=hist(x[.,2],40);
{c3,m3,freq3}=hist(x[.,3],40);
{c4,m4,freq4}=hist(x[.,4],40);
end;
/*
*/
/*
kernel density estimation */
/*
*/
proc(2)=kden(v);
local g,h,j,nn,res;
nn=rows(v);
h=1.06*stdc(v)/nn^.2;
g=0;
j=1;
do while j <= nn;
g=g|meanc(pdfn((v-v[j])/h))/h;
j=j+1;
endo;
res=sortc(v~g[2:nn+1],1);
retp(res[.,1],res[.,2]);
endp;
12