Download Properties of Least Squares Regression Coefficients: Lecture Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Forecasting wikipedia , lookup

Regression analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

German tank problem wikipedia , lookup

Choice modelling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Lecture 4: Properties of Ordinary Least Squares
Regression Coefficients
What we know now
How to obtain estimates by OLS
_ ^ _
^
b 0 = Y − b1 X
^ Cov( X ,Y )
b1 =
Var ( X )
^ _
2
y
y
(
−
)
∑
ESS
i
2
=
=
R
_
TSS
2
∑ ( yi − y )
In addition to the overall fit of the model, we now need to ask how
accurate each individual estimated OLS coefficient is
Goodness of fit measure, R2
Bias & Efficiency of OLS
Hypothesis testing - standard errors , t values
To do this need to make some assumptions about the
behaviour of the (true) residual term that underlies our
view of the world
Based on a set of beliefs called the Gauss-Markov
assumptions
→ Gauss-Markov Theorem
which underpins the usefulness of OLS as an estimation
technique
Why make assumptions about the residuals and not the
betas?
- The behaviour/ properties of the betas are derived from
the assumptions we make about the residuals
Gauss-Markov Theorem
(attributed to Carl-Friedrich Gauss, 1777 – 1855
& Andrey Markov 1856-1922 )
Actual versus Estimated Residuals
Y=b0 +b1X + u
Remember the true state of the world
is never observed, only an estimate of it
^ ^
^
y = b 0 + b1 X + u
^
This means that unlike the estimateduresidual,
the true residual
u
is never observed
So all we can ever do is make some assumptions about the behaviour of u
and the 1st assumption is that
E(ui) = 0
The expected (average or mean) value of the true residual is
assumed to be zero
(NOT proved to be equal to zero unlike the OLS residual)
- sometimes positive, sometimes negative, but there is never any
systematic behaviour in this random variable so that on average its
value is zero
The 2nd assumption about the unknown true residuals is that
Var(ui / Xi) = σu2 = constant
ie the spread of residual values is constant for all X values in the data
set
(homoskedasticity)
- think of a value of the X variable and look at the different values of the
residual at this value of X. The distribution of these residual values
around this point should be no different than the distribution of
residual values at any other value of X
- this is a useful assumption since it implies that no particular value of X
carries any more information about the behaviour of Y than any
other
3. Cov(ui uj) = 0 for all i ≠ j
(no autocorrelation)
there should be no systematic association between values of the
residuals so knowledge of the value of one residual imparts no
information about the value of any other residual – residual values
should be independent of one another
4. Cov(X, ui) = 0
- there is zero covariance (association) between the residual and any
value of X – ie X and the residual are independent – so the level of
X says nothing about the level of u and vice versa
- This means that we can distinguish the individual contributions of X
and u in explaining Y
(Note that this assumption is automatically satisfied if X is nonstochastic ie non-random so can be treated like a constant,
measured with certainty)
Given these 4 assumptions we can proceed to establish the properties
of OLS estimates
•
Back to slide 14
The 1st desirable feature of any estimate of any coefficient is that it
should, on average, be as accurate an estimate of the true
coefficient as possible.
Accuracy in this context is given by the “bias”
This means that we would like the expected, or average, value of the
estimator to equal the true (unknown) value for the population of
interest
^
E (β ) = β
ie if continually re-sampled and re-estimated the same model and
plotted the distribution of estimates then would expect the mean
value of these estimates to equal the true value (which would only
be obtained if sampled everyone in the relevant population)
In this case the estimator is said to be “unbiased”
Are OLS estimates biased?
Given a model Y=b0+b1X+u
We now know the OLS estimate of the slope is calculated as
^
Cov ( X ,Y )
b1 =
Var ( X )
^
b1 =
Sub. In for Y=b0+b1X+u
Cov ( X ,b 0 + b1 X + u )
Var ( X )
and using the rules of covariance (from problem set 0) can write the
denominator as
Cov(X, b0+b1X+ u) = Cov(X, b0) + Cov(X, b1X) + Cov(X, u)
Consider the 1st term Cov(X, b0)
so Cov(X, b0)=0
So Cov(X,Y) = 0 + Cov(X, b1X) + Cov(X, u)
Consider the 2nd term Cov(X, b1X)
– since b1 is a constant can take it outside the bracket
so Cov(X, b1X) = b1Cov(X, X)
= b1Var(X)
(see problem set 0)
(since Cov(X,X) is just another way to write Var(X) )
Hence
Cov(X,Y) = 0+b1Var(X) + Cov(X,u)
Sub. this into OLS slope formula
^
Cov( X ,Y )
b1 =
Var ( X )
Using rules for fractions
b Var ( X ) + Cov( X , u )
= 1
Var ( X )
^
b Var ( X ) Cov ( X , u )
b1 = 1
+
Var ( X )
Var ( X )
^
bVar ( X ) Cov ( X , u )
+
b1 =
Var ( X )
Given that now
1
Var ( X )
since Cov(X,u) is assumed =0 (one of Gauss-Markov assumptions) implies that
^
bVar ( X )
b1 =
Var ( X )
1
^
b1 = b1
Now need expected values to establish the extent of any bias.
^
It follows that taking expectations
E (b1) = E (b1)
And since term on right hand side is a constant
^
E (b1 ) = b1
so that, on average, the OLS estimate of the slope will be equal to the true
(unknown) value
ie OLS estimates are unbiased
So don’t need to sample the entire population since OLS on a sub-sample will
give an unbiased estimate of the truth
(Can show unbiased property also holds for ols estimate of constant – see
problem set 2)
Precision of OLS Estimates
• So this result means that if OLS done on a 100 different (random)
samples would not expect to get same result every time – but the
average of those estimates would equal the true value
• Given 2 (unbiased) estimates will prefer the one whose range of
estimates are more concentrated around the true value.
• Measure efficiency of any estimate by its dispersion – based on the
variance (or more usually its square root
– the standard error)
• Can show (see Gujarati Chap. 3 for proof) that variance of the OLS
estimates of the intercept and the slope are

_2 
^
^

σ 2u
X
σ 2u 
Var ( β 1) =
Var ( β 0 ) =
1 +

N *Var ( X )
N  Var ( X ) 


(where σ2u = Var(u) = variance of true (not estimated) residuals)
^
Var ( β 1 ) =
σ 2u
N *Var ( X )
This formula makes intuitive sense since
1) the variance of the OLS estimate of the slope is
proportional to the variance of the residuals, σ2u
–
the more there is random unexplained behaviour
in the population, the less precise the estimates
2) the larger the sample size, N, the lower (the more
efficient) the variance of the OLS estimate
–
more information means estimates likely to be
more precise
3) the larger the variance in the X variable the more precise
(efficient) the OLS estimates
–
the more variation in X the more likely it is to
capture any variation in the Y variable
Estimating σu2
In practice never know variation of true residuals used in the formula
Can show, however, that an unbiased estimate of σ2u is given by
^
N
Var (u )
s =
N −k
2
which since
^
^
N 2
Var (u ) = 1 / N ∑ ui
i =1
Or equivalently
means
s2 =
RSS
N −k
^
N 2
∑ ui
s 2 = i =1
N −k
(*** learn this equation)
Sub. this into (1) and (2) gives the formula for the precision of the OLS estimates.
So substitute
^
N 2
∑ ui
RSS
s2 =
= i =1
N −k
N −k
into the equations below gives the working formula needed to calculate the
precision of the OLS estimates.

_2 

^
2
^
s 2 
X
s

Var ( β 0 ) =
1+
Var ( β 1) =
N  Var ( X ) 
N *Var ( X )




At same time usual to work with the square root to give standard errors of the
estimates
(standard deviation refers to the known variance, standard error refers to the
estimated variance)

_2 


^
s2 
X

1+
s.e.( β 0 ) =
N  Var ( X ) 




^
s.e.( β 1) =
(learn this)
s2
N *Var ( X )
Gauss-Markov Theorem
IF Gauss-Markov assumptions 1-4 hold
1. E(ui) = 0
2. Var(ui / Xi) = σu2 = constant
3. Cov(ui uj) = 0 for all i ≠ j
4. Cov(X, ui) = 0
then can prove that at OLS estimates are unbiased
and will have the smallest variance of all (linear)
unbiased estimators
- there may be other ways of obtaining unbiased estimates,
but OLS estimates will have the smallest standard errors
of any other unbiased estimation technique regardless of
the sample size - so OLS is a good thing
Hypothesis Testing
Now know how to estimate and interpret OLS regression
coefficients
We know also that OLS gives us unbiased and efficient (smallest
variance) estimates
But just because a variable has a large coefficient does not
necessarily mean its contribution to the model is significant. This
means we need to understand the ideas behind standard errors,
the t value and how to use t values in applied work
Hypothesis Testing
If wish to make inferences about how close an estimated value is
to a hypothesised value or even to say whether the influence of
a variable is not simply the result of statistical chance then need
to make one additional assumption about the behaviour of the
(true, unobserved) residuals in the model
We know already that
ui ~ (0, σ2u)
-true residuals assumed to have a mean of zero and variance σ2u
Now
assume
additionally
that
residuals
Normal distribution
ui ~N(0, σ2u)
follow
a
Now assume additionally that residuals follow a Normal
distribution
ui ~N(0, σ2u)
(Since residuals capture influence of many unobserved (random)
variables, can use Central Limit Theorem which says that the
sum of a set of random variables will have a normal
distribution)
If a variable is normally distributed we know that it is
Symmetric
centred on its mean
and that:
66% of values lie within
95% of values lie within
99% of values lie within
mean 1*standard deviation
mean 1.96*standard dev.
mean 2.9*standard dev.
and if u is normal, then it is easy to show that the OLS coefficients
(which are a linear function of u) are also normally distributed
with the means and variances that we derived earlier. So
^
^
β 0 ~ N ( β 0 ,Var ( β 0 ))
^
and
^
β 1 ~ N ( β1 , Var ( β 1 ))
→ Gauss-Markov Theorem
which underpins the usefulness of OLS as an
estimation technique (attributed to Carl-Friedrich
Gauss, 1777 – 1855 & Andrey Markov 1856-1922 )