Download Econ 299 Chapter 06

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Confidence interval wikipedia , lookup

Regression analysis wikipedia , lookup

German tank problem wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
6. Simple Regression and
OLS Estimation
Chapter 6 will expand on concepts introduced in
Chapter 5 to cover the following:
1) Estimating parameters using Ordinary Least
Squares (OLS) Estimation
2) Hypothesis tests of OLS coefficients
3) Confidence intervals of OLS coefficients
4) Excel Regressions
6. Regression & OLS
6.1 The OLS Estimator and its Properties
6.2 OLS Estimators and Goodness of Fit
6.3 Confidence Intervals for Simple
Regression Models
6.4 Hypothesis Testing in a Simple
Regression Context
6.6 Examples of Simple Regression Models
6.7 Conclusion
2
6.1 The OLS Estimator and its Properties
We’ve seen the true economic relationship:
Yi = β1 + β2Xi + єi
 Where єi and therefore Yi are random and the other
terms are non-random
 When this relationship is unknown, we’ve seen how to
estimate the relationship Yˆ  ˆ  ˆ X
i
N
Using:
ˆ2 
(X
i 1
i
 X )(Yi  Y )
N
(X
i 1
ˆ1  Y  ˆ2 X
i
 X)
2
1
2
i
Cov( X , Y )

2
SX
3
6.1 Properties of the OLS Estimator

There exist a variety of methods to estimate the
coefficients of our model (β1 and β2)

Why use Ordinary Least Squares (OLS)?
1)
OLS minimizes the sum of squared errors, creating a
line that fits best with the observations
With certain assumptions, OLS exhibits beneficial
statistical properties. In particular, OLS is BLUE.
2)
4
6.1 The OLS Estimator
These OLS estimates create a straight line going
through the “middle” of the estimates:
Studying and Marks
8
Studying
7
6
5
4
3
2
1
0
0
20
40
60
80
100
120
Marks
5
6.1.1 Fitted or Predicted
Values
From the above we see that often the actual data
points lie above or below the estimated line.
Points on the line give us ESTIMATED y values for
each given x.
The predicted or fitted y values are found using our x
data and our estimated β’s:
ˆ
ˆ
ˆ
Yi  1  2 X i
6
6.1.1 Estimators Example
Price
4
3
3
6
Pbar = 4
Quantity
10
15
20
15
Qbar=15
Ols Estimation (From Chapter 5)
Qˆ i  18.3  0.833Pi
Qˆ1  18.3  0.833P1
Qˆ  18.3  0.833(4)
Qˆ i  18.3  0.833Pi
Qˆ 3  18.3  0.833P3
Qˆ  18.3  0.833(3)
Qˆ1  14.9
Qˆ  18.3  0.833P
Qˆ 3  15.8
Qˆ 4  18.3  0.833P4
Qˆ  18.3  0.833(6)
1
2
2
Qˆ 2  18.3  0.833(3)
Qˆ 2  15.8
3
Qˆ1  14.9
Qˆ  15.8
2
Qˆ 3  15.8
Qˆ 4  13.3
4
Qˆ 4  13.3
7
6.1.1 Estimating Errors
or Residuals
The estimated y values (yhat) are rarely equal to their
actual values (y).
The difference is the estimated error term:
ˆ
ˆ
Ei  Yi  Y
Since we are indifferent whether our estimates are
above or below the actual, we can square these
estimated errors. A higher squared error means an
8
estimate farther from the actual
6.1.1 Estimators Example
Price
4
3
3
6
Pbar = 4
Quantity
10
15
20
15
Qbar=15
Eˆ i  Qi  Qˆ i
Eˆ1  Q1  Qˆ1
Eˆ  10  14.9
Eˆ i  Qi  Qˆ i
Eˆ 3  Q3  Qˆ 3
Eˆ  20  15.8
Eˆ1  4.9
Eˆ  Q  Qˆ
Eˆ 3  4.2
Eˆ 4  Q4  Qˆ 4
Eˆ  15  13.3
1
2
2
2
Eˆ 2  15  15.8
Eˆ 2  0.8
3
4
Eˆ 4  1.7
9
6.1.2 Statistical Properties of OLS
In our model:
Y, the dependent variable, is made up of two
components:
β1 + β 2Xi – a non-random component that indicates
the effect of X on Y. In this course, X is non-random.
b) Єi – a random error term representing other
influences on Y.
a)
10
6.1.2 Statistical Properties of OLS
Error Assumptions:
a) E(єi) = 0; we expect no error; we assume the model
is complete
b) Var(єi) = σ2; the error term has a constant variance
c) Cov(єi, єj) = 0; error terms from two different
observations are uncorrelated. If the last error was
positive, the next error need not be negative.
11
6.1.2 Statistical Properties of OLS
OLS Estimators are Random Variables:
a) Y depends on є and is thus random.
b) β1hat and β2hat depend on Y…
c) Therefore they are random
d) All random variables have probability distributions,
expected values, and variances
e) These characteristics give rise to certain OLS
estimator properties.
12
6.1.2 OLS is BLUE
We use Ordinary Least Squares estimation because,
given certain assumptions, it is BLUE:
B est
L inear
U nbiased
E stimator
13
6.1.2 U nbiased
An estimator is unbiased if it expects the
true value: E(dhat) = d
β2hat = ∑(Xi-Xbar)(Yi-Ybar)
---------------------∑(Xi-Xbar)2
β2hat = ∑(Xi-Xbar)(Yi)
------------------∑(Xi-Xbar)Xi
By a mathematical property.
14
6.1.2 U nbiased
β2hat = ∑(Xi-Xbar)(Yi)
------------------∑(Xi-Xbar)Xi
E(β2hat) = ∑(Xi-Xbar)E(Yi)
------------------∑(Xi-Xbar)Xi
Since only Yi is variable.
15
6.1.2 U nbiased
E(β2hat) = ∑(Xi-Xbar)E(β1 + β2Xi + єi)
---------------------------------∑(Xi-Xbar)Xi
Since Yi = β1 + β2Xi + єi
E(β2hat) = ∑(Xi-Xbar)(β1 + β2Xi + 0)
---------------------------------∑(Xi-Xbar)Xi
Since β1, β2, and Xi are non-random and E(єi)=0.
16
6.1.2 U nbiased
E(β2hat) = β1∑(Xi-Xbar) + β2∑(Xi-Xbar) Xi
----------------------------------------∑(Xi-Xbar)Xi
By simple algebra.
E(β2hat) = β1∑(Xi-Xbar) + β2∑(Xi-Xbar) Xi
----------------- -------------------∑(Xi-Xbar)Xi ∑(Xi-Xbar)Xi
Since there exists a common denominator.
17
6.1.2 U nbiased
E(β2hat) =
β1(0)
+ β2
--------------∑(Xi-Xbar)Xi
Since the sum of the difference between an observation
and its mean is zero, by definition,
E(β2hat) = 0 + β2
= β2
The proof that E(β1hat)= β1 is similar.
18
6.1.2 U nbiased
E(β2hat)
= β2
This means that on average, OLS estimation will
estimate the correct coefficients.
Definition: If the expected value of an estimator is
equal to the parameter that it is being used to
estimate, the estimator is unbiased.
19
6.1.2 L inear
The OLS estimators are linear in the dependent variable
(Y):
-Y’s are never raised to a power other than 1
-no non-linear operations are performed on the Y’s
Note: Since X’s are squared in the β1hat and β2hat
formulae, OLS is not linear in the X’s (which
doesn’t matter for BLUE)
20
6.1.2 B est
Of all linear unbiased estimators, OLS has the smallest
variance.
-there is a greater likelihood of obtaining an estimate
close to the actual parameter
Large variance => High probability of obtaining an
estimate far from the center
Small variance => Low probability of obtaining an
estimate far from the center
21
6.1.2 E stimator
By definition, the OLS estimator is
an estimator; it estimates values
for β1 and β 2.
22
6.1.2 Normality of Y
In order to conduct hypothesis tests and construct
confidence intervals from OLS, we need to
know the exact distributions of β1hat and β2hat
(Otherwise, we can’t use statistical tables..)
We will see that if
1) The error term is normally distributed
Then
2) Y is normally distributed
Then
3) β1hat and β2hat are normally distributed
6.1.2 Normality of Y
So far, we have assumed:
 The error term, єi, is random with
 E(єi)=0; no expected error
 Var(єi)=σ2; constant variance
 Cov(єi, єj)=0; no covariance between
errors
Now we add the assumption that the error term
is normally distributed. Therefore:
iid
 Єi ~ N(0,σ2)
(iid means identically and independently distributed)
6.1.2 Normality of Y
If the error is normally distributed, so will be the Y
term (since the randomness of Y depends on
the randomness of the error term). Therefore:
E(Yi) = E(β1+ β2Xi+єi)= β1+ β2Xi
Var(Yi) = Var(β1+ β2Xi+єi)=Var(єi) = σ2
(Given all our previous assumptions.)
Therefore:
Yi ~ N(β1+ β2Xi, σ2)
(Y is normally distributed with mean β1+ β2Xi and
variance σ2.)
6.1.2 Normality of OLS
Since β1hat and β2hat are linear functions of Y:
ˆ
1 ~ N ( 1 ,

ˆ2 ~ N (  2 ,
X
2
N
(X

(X
i
2
i

X
)
i
2
 X)
)
2
)
2
6.1.2 Normality of OLS
If we know σ, we can construct standard normal
variables (z=(x-μ)/σ):
ˆ1  1

2
X
 i
2
N
~ N (0,1)
2
(
X

X
)
 i
ˆ2   2
2
(X
i
 X)
~ N (0,1)
2
6.1.2 Normality of OLS
Since we don’t know σ2, we can estimate it:
2
 
2
e
N 2
This gives us estimates of the variance of our
2
2
coefficients: 
X


i
ˆ
var( 1 ) 
2
N
 ˆ
var(  2 ) 
(X
i
 X)
2

2
(
X

X
)
 i
6.1.2 Normality of OLS
The square root of the estimated variance is
referred to as the standard error (se) (as
opposed to standard deviation)
Using our assumptions:
 (β1hat- β1)/se(β1hat) has a t distribution with
N-2 degrees of freedom
 (β2hat- β2)/se(β2hat) has a t distribution with
N-2 degrees of freedom
6.2 OLS Estimators and Goodness of Fit
On average, OLS works well:
 The average of the estimated errors is zero

 ei  0
 The average of the estimated Y’s is always the
average of the observed Y’s

 Yi   Yi
N
N
Note: The proof of the above is covered in
Appendix A.
6.2 Measuring Goodness of Fit
These conditions hold regardless of the quality of
the model.
Ie: You could estimate average grades as a
function of locust population in Mexico. OLS
would be a good estimator even though the
model is useless.
“Goodness of Fit” measures how well the
economic model fits the data.
R2 is the most common measure of goodness of
fit.
R2 CANNOT be compared across models.
6.2 Measuring Goodness of Fit
R2 is constructed by dividing the variation of Y
into two parts:
1) Variation in fitted Yhat terms. This is explained
by the model
2) Variation in the estimated errors. This is NOT
explained by the model.

2
 (Yi  Y )
N
R 
2
i 1
N
 (Y  Y )
i 1
i
2
N
 1
2
 ei
i 1
N
 (Y  Y )
i 1
i
2
6.2 Measuring Goodness of Fit
R2 is the proportion of variation explained by the
model. It is expressed as:
a) The ratio of explained variation to total
variation in Y
Or
b) 1 minus the ratio of unexplained variation to
total variation in Y
0<R2<1
R2=0; model has no explanatory power
R2=1; model completely explains variations in Y
(and generally that means you did something
wrong)
6.3 Confidence Intervals for Simple
Economic Models
As covered previously, ordinary least squares
estimation derives POINT ESTIMATES for
our coefficients (β1 and β2).
-These are unlikely to be perfectly accurate.
Alternately, Confidence Intervals provide for us
an estimate of a range for our coefficients.
-We are reasonably certain that our value lies
within that range.
6.3.1 Deriving a Confidence Interval
Step 1: Recall Distribution
We know that:
(β1hat-β1)/se(β1hat) has a t distribution with N-2
degrees of freedom
(β2hat- β2)/se(β2hat) has a t distribution with N2 degrees of freedom
6.3.1 Deriving a Confidence Interval
Step 2: Establish Probability:
Using t-tables with N-2 degrees of freedom, we
find t* such that:
P(-t*<t<t*)=1-α
Note that ±t* cuts off α/2 of each tail.
Ie: if N=25 and α=0.10, t*=1.71
6.3.1 Deriving a Confidence Interval
Step 3: Combine
Steps 1 and 2 combine to give us:
P{t*  ( ˆ1  1 ) / sˆ  t*}  1  
1
P{t*  ( ˆ2   2 ) / sˆ  t*}  1  
2
(1-α)%
-t*
t*
t
6.3.1 Deriving a Confidence Interval
Step 4: Rearrange for CI:
P{ˆ1  t * seˆ  1  ˆ1  t * seˆ }  1  
1
1
P{ˆ2  t * seˆ   2  ˆ2  t * seˆ }  1  
2
2
OR CI   ˆ1  t * se( ˆ1 ), CI   ˆ2  t * se( ˆ2 )
1
2
By repeatedly calculating Confidence Intervals
using OLS, 100(1- α)% of these CI’s will
contain the true value of the parameter (β1 or
β2).
6.3.1 Confident Example
Suppose OLS Gives us the Output:

Yi  5.78 1.5 X i
(1.34)
( 0.37 )
If N=400, construct a 95% CI for B1:
To cut off 2.5% of each tail with df=infinity,
t*=1.96
CI  [ ˆ1  t * se( ˆ1 ), ˆ1  t * se( ˆ1 )]
CI  [5.78  1.96(1.34),5.78  1.96(1.34)]
CI  [3.15,8.41]
6.3.1 Confident Example
Suppose OLS Gives us the Output:

Yi  5.78 1.5 X i
(1.34)
( 0.37 )
If N=25, construct a 90% CI for B2:
To cut off 5% of each tail with df=23,
t*=1.71
CI  [ ˆ2  t * se( ˆ2 ), ˆ2  t * se( ˆ2 )]
CI  [1.5  1.71(0.37),1.5  1.71(0.37)]
CI  [0.867,2.13]
6.3.1 Confident Example
Suppose OLS Gives us the Output:

Yi  5.78 1.5 X i
(1.34)
CI ( ˆ1 )  [3.15,8.41]
( 0.37 )
CI ( ˆ2 )  [0.867,2.13]
In repeated samples, 95% (90%) of such confidence
intervals will contain the true parameter β1 (β2).
β1 = The value of Y when X is zero.
β2 = The change in Y when X increases by 1.
Here, we are confident that X has a positive effect on Y.
We are confident that when X=0, Y is positive.
6.4 Hypothesis Testing in a Simple
Regression Context
As econometricians, we have questions.
-Do intelligent baby toys affect baby
intellect?
-Do scarves have an elastic or inelastic
demand?
Data has answers.
-Through hypothesis testing
6.4.1 Setting Up the Hypothesis Test
1)State null and alternate hypotheses:
Ho: β2=2
Ha: β2≠2
2) Select a level of significance
α=Prob(Type 1 Error)
Let α=0.05
3) Determine critical t values (df=n-2)
If N=25, t* = ±2.069
6.4.1 Setting Up the Hypothesis Test
4) Calculate test statistic
If β2hat=6.465 and se(β2hat)=1.034,
t=(β2hat- β2)/se(β2hat)
=(6.465-2)/1.034)
= 4.318
5) Decide (Reject and do not reject regions)
Since t=4.318>t*=2.069, reject H0
6.4.1 Setting Up the Hypothesis Test
6) Interpret
At a 5% level of significance, the change in Y due
to a 1 unit change in X is not equal to 2.
6.4.1 Example 2
Given a sample size of 26, we estimate the
formula: Grade 

Hours
i
62 0.01
( 0.05)
i
( 0.005)
We want to test whether studying has any effect
on grades.
H0: β2=0
Ha: β2≠0
6.4.1 Example 2

Grade  62  0.01Hours
i
( 0.05)
i
( 0.005)
Given α=0.01 and n-2=24,
t*=±2.80
t=(0.01-0)/0.005
=2
Since t<t*, Do not reject H0
At a 1% level of significance, studying has no
effect on mark.
6.6 OLS Estimation
Often the calculation required for OLS estimation is
greater than pen and paper is capable of.
When this occurs, an econometric program (such as
Excel) is used to make the calculations.
The results should be expressed in this form:
( se( ˆ1 ))
Yˆi 
ˆ1  ˆ2
( se( ˆ1 ))
R2 
For Example:
Xi
( se( ˆ2 ))
N
Utiˆlity i  50.2 10.3 EconClassesi
( 4.51)
R  0.87
2
( 0.58)
N  754