Download EC339: Lecture 6

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
EC339: Lecture 7
Chapter 4-5: Analytical Solutions to OLS
The Linear Regression Model

Postulate: The dependent variable, Y, is a function
of the explanatory variable, X, or


However, the relationship is not deterministic


Yi = ƒ(Xi)
Value of Y is not completely determined by value of X
Thus, we incorporate an error term (residual) into the
model which provides a statistical relationship

Yi = ƒ(Xi) + ui
The Simple Linear Regression Model
(SLR)




Remember we are trying to predict Y for a
given X. We assume a linear relationship (in
the parameters (i.e., the BETAS))
Ceteris Paribus—All else held equal
To account for our ERROR in prediction, we
can add an error term to our prediction. If Y is
a linear function of X then
ERRORS are typically written as u, or
epsilon representing ANYTHING ELSE that
might cause the deviation between actual and
predicted values

We are interested in determining the intercept
(0) and slope (1)
Yˆ    1 X
Y  Yˆ  u
Y     1 X  u
SLR Uses Multivariate Expectations

Univariate Distributions


Multivariate Distributions


Means, Variances, Standard Deviations
Correlation, Covariance
Marginal, Joint, and Conditional Probabilities
Conditional Expectation
m
m
f X ,Y ( y, x)
j 1
j 1
f X ( x)
E[Y | x]   y j fY | X ( y j | x)   yi
Conditional Probability
Density Fn.
Joint
Probability
Density Fn.
Marginal
Probability
Density
Fn.
Joint Distributions

Joint Distribution Probability Density Functions

Now want to consider how Y and X are distributed when
considered together
f X ,Y ( x, y )  P( X  x, Y  y )

INDEPENDENCE

When outcomes of X and Y have no influence on one
another, the joint probability is equal to the product of the
marginal probability density function
f X ,Y ( x, y )  f X ( x) fY ( y )
Think about BINOMIAL DISTRIBUTIONS, each TRIAL is INDEPENDENT
and has no effect on the subsequent trial. Also, think of marginal
distributions much like a histogram of a single variable.
Conditional Distributions

Conditional Probability Density Functions

Now want to consider how Y is distributed when GIVEN a certain
value for X
Conditional Probability of Y occurring given X, is equal to the joint
probability of X and Y, divided by the marginal probability of X
occurring in the first place
A joint probability is like finding the
f X ,Y ( x, y)
probability of a “high school
fY | X ( y | x ) 
graduate” with an hourly wage
f X ( x)
between “$8 and $10” if looking at
INDEPENDENCE
education and wage data.



If X and Y are independent then the conditional distribution shows
these as marginal distributions. Just as if there is no new
information.
f ( x ) fY ( y )
fY | X ( y | x )  X
 fY ( y )
f X ( x)
f X |Y ( x | y ) 
f X ( x ) fY ( y )
 f X ( x)
fY ( y )
Discrete Bivariate Distributions—Joint
Probability Function

For example, assume we flip a coin 3 times, recording the number of heads (H)


X = number of heads on the last (3rd) flip
Y = total number of heads in three flips




There are 8 possible different joint outcomes










S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
X takes on the values {0,1}
Y takes on the values {0,1,2,3}
(X = 0,Y = 0)
(X = 0,Y = 1)
(X = 0,Y = 2)
(X = 0,Y = 3)
(X = 1,Y = 0)
(X = 1,Y = 1)
(X = 1,Y = 2)
(X = 1,Y = 3)
Attaching a probability to each of the different joint outcomes gives us a discrete
bivariate probability distribution or joint probability function
Thus, ƒ(x,y) gives us the probability the random variables X and Y assume the joint
outcome (x,y)
Properties of Covariance


If X and Y are discrete
If X and Y are continuous
Cov( X , Y )  E[ XY ]  E[ X ]E[Y ]
If independent

If X and Y are
independent then
cov(X,Y) = 0
E[ XY ]  E[ X ]E[Y ]
Since this expectation is a "function" of X
k
E[ g ( X )]   g ( x j ) f X ( x j )
j 1
Remember, X 2 is a "function" of X
k
m
E[ g ( X , Y )]   g ( xh , y j ) f X ,Y ( xh , y j )
h 1 j 1
Properties of Conditional Expectations
m
m
f X ,Y ( y, x)
j 1
j 1
f X ( x)
E[Y | x]   y j fY | X ( y j | x)   yi
Conditional Expectations, see Wooldridge
E[WAGE | EDUC ]  1.05  .45 EDUC
This is a regression of Wages (Y) on Education (X)
Summing over all possible values of Y in the conditional
expectation. E[WAGE|EDUC=12] is the expected value of
a wage given that the years of education is 12 years, giving
a value of $6.45, much like the predictions we have seen.
The Linear Regression Model


Ceteris Paribus—All else held equal
Conditional Expectations can be linear or
nonlinear—We will only examine LINEAR
functions here.
m
m
f X ,Y ( y, x)
j 1
j 1
f X ( x)
E[Y | x]   y j fY | X ( y j | x)   yi
The Linear Regression Model


For any given level of X many possible values
of Y can exist
If Y is a linear function of X then

Yi = 0 + 1Xi + ui


u represents the deviation between the actual value of
Y and the predicted value of Y (or 0 + 1X1i)
We are interested in determining the intercept (0) and
slope (1)
The Simple Linear Regression Model
(SLR)

Thus, what we are looking for is the Conditional Expectation of Y Given
values of X. This is what we have called Y-hat thus far. We are trying to
predict values of Y given values of X. To do this we must hold ALL
OTHER FACTORS FIXED (Ceteris Paribus).
E[Y | X ]  Yˆ    1 X
Y  Yˆ  u
Y  Yˆ  u
The Simple Linear Regression Model
(SLR)

LINEAR POPULATION REGRESSION FUNCTION

We can assume that the EXPECTED VALUE of our error term is zero. If the
value were NOT equal to zero, we could alter this expected value to equal
zero by altering the INTERCEPT to account for this fact.
E[u ]  0

This makes no statement about how X and the errors are related.



IF u and X are unrelated linearly, their CORRELATION will equal zero!
Correlation is not sufficient though, since they could be related
NONLINEARLY…
Conditional probability gives sufficient conditions as it looks at ALL values
of u, given a value for X. This is zero conditional mean error.
E[u | x]  E[u ]  0
The Linear Regression Model:
Assumptions

Several assumptions must be made about the
random variable error term

The mean error is zero, or E(ui) = 0


Errors above and below the regression line tend to
balance out
Errors can arise due to



Human behavior (may be unpredictable)
A large number of explanatory variables are not in the model
Imperfect measuring of dependent variable
The Simple Linear Regression Model
(SLR)

Beginning with the simple linear regression, taking conditional
expectations, and using our current assumptions gives us the
POPULATION REGRESSION FUNCTION (Notice, no hats over the
Betas, and that y, is equal to the predicted value, plus an error).
y      x  u
taking expected value, conditiona l on x
E[ y | x]   0  1 x  E[u | x]
and using the assumption that E[u | x]  0
E[ y | x]   0  1 x
The Linear Regression Model

The regression model asserts that the expected
value of Y is a linear function of X

E(Yi) = 0 + 1X1i


Known as the population regression function
From a practical standpoint not all of a
population’s observations are available

Thus we typically estimate the slope and intercept
using sample data
The Simple Linear Regression Model
(SLR)

We can also make the following assumptions knowing that E[u|x]=0
Cov[ x, u ]  E[ xu]  E[ x]E[u ]
Using the assumption that x and u are uncorrelat ed and
E[u]  0 and y  yˆ  u
E[ y   0  1 x]  E[u ]  0
and using the assumption that E[xu]  0
E[ x( y   0  1 x)]  E[ xu]  0
WE NOW HAVE TWO EQUATIONS IN TWO UNKNOWNS!!
(The Beta’s are the unknowns). This is how the Method of Moments
is constructed.
The Linear Regression Model:
Assumptions

Additional assumptions are necessary to develop
confidence intervals and perform hypothesis tests
var(ui )i  s u2 forall i


Says that errors are drawn from a distribution with
a constant variance (heteroskedasticity exists if
this assumption fails)
ui and uj are independent


One observation’s error does not influence
another observation’s error—errors are
uncorrelated (serial correlation of errors exist if
this assumption fails)
Cov(ui,uj) = 0 for all i  j
The Linear Regression Model:
Assumptions

Cov(Xi,ui) = 0 for all i


Error term is uncorrelated with the explanatory
variable, X
ui ~ N(0,s e )
2

Error term follows a normal distribution
The Linear Regression Model:
Assumptions

Cov(Xi,ui) = 0 for all i


Error term is uncorrelated with the explanatory variable,
X
Error term follows a normal distribution
ui ~ N (0, s )
2
u
Ordinary Least Squares-Fit
n
 uˆ
i 1
i
 0, where uˆi  yi  ˆ0  ˆ1 xi
n
 x uˆ
i 1
i i
0
n
SST   ( yi  y ) 2 , Total Sum of Squares
i 1
n
SSE   ( yˆ i  y ) 2 , Explained Sum of Squares
i 1
n
SSR   (uˆi ) 2 , Sum of Squared Residuals
i 1
SST  SSE  SSR
Ordinary Least Squares-Fit
SST  SSE  SSR
n
(y
i 1
n
i
n
(y
i 1
i 1
i 1
n
i
n
(y
 y ) 2   [( yi  yˆ i )  ( yˆ i  y )]2
 y ) 2   [uˆi  ( yˆ i  y )]2
i 1
n
i
n
n
i 1
i 1
 y ) 2   uˆi  2 uˆi ( yˆ i  y )   ( yˆ i  y ) 2
i 1
2
n
showing SST  SSR  2 uˆi ( yˆ i  y )  SSE
i 1
n
where 2 uˆi ( yˆ i  y )  0 since residuals and predicted values are uncorrelat ed
i 1
Ordinary Least Squares-Fit
R - SQUARED GENERALIZE D
In multiple regression , you cannot simply square correlatio n
Interpreta tion is exactly th e same. R 2 is equal to the squared value of
the correlatio n between y i and ŷ i . When prediction only depends on
one independen t variable , this boils down to correlatio n between x and y
SST  SSE  SSR
SSE
SSR
R2 
 1
SST
SST
n
R2 
 ( yˆ
i 1
n
(y
i 1
i
i
n
 (uˆ )
 y)2
 1
 y)2
i
i 1
n
(y
i 1
i
2
 y)2
Estimation (Three Ways-We will not
discuss Maximum Likelihood)

We need a formal method to determine the
line that “fits” the data well

Distance of the line from observations should be
minimized
 Let Yi = 0 + 1X1i
^

^
^
The deviation of the observation from the line
is the estimated error, or residual (ui)
^
 ui = Yi - Yi
Ordinary Least Squares

Designed to minimize the magnitude of estimated
residuals

Selecting an estimated slope and estimated intercept that
minimizes the sum of the squared errors

Most popular method known as Ordinary Least Squares
Ordinary Least Squares—Minimize
Sum of Squared Errors

Identifying the parameters (estimated slope
and estimated y-intercept) that minimize the
sum of the squared errors is a standard
optimization problem in multivariable
calculus


Take first derivatives with respect to the
estimated slope coefficient and estimated yintercept coefficient
Set both equations equal to zero and solve the two
equations
Ordinary Least Squares
Using a sample of data on X and Y we want to minimize the
value of the squared errors, which are themselve s a function of the
parameters yi   0  1 xi  Q(  0 , 1 ). Estimate this function using
calculus of optimizati on and chain rule giving yi  ˆ0  ˆ1 xi  Q(βˆ0 ,βˆ1 ).
n
min
ˆ ˆ
 0 , 1
 (u )
i 1
i
2
 min
ˆ ˆ
 0 , 1
n
(y
i 1
i
 ˆ0  ˆ1 xi ) 2 , where
n
Q(βˆ0 ,βˆ1 )
 2 ( yi  ˆ0  ˆ1 xi )  0
ˆ0
i 1
n
Q(βˆ0 ,βˆ1 )
 2 xi ( yi  ˆ0  ˆ1 xi )  0
ˆ1
i 1
Which allows us to solve the system for our parameters  . These are called
the NORMAL EQUATIONS.
Ordinary Least Squares-Derived
n
n
n
n
i 1
i 1
i 1
2
 2 xi ( yi  ˆ0  ˆ1 xi )  0   xi yi  ˆ0  xi  ˆ1  xi
i 1
n
x y
i 1
i
i
n
n
i 1
i 1
2
 ˆ0  xi  ˆ1  xi
From the first derivative equation
n
n
n
n
1 n
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
 2 ( yi   0  1 xi )  0   yi  n 0  1  xi   0  [ yi  1  xi ]  y  ˆ1 x
n i 1
i 1
i 1
i 1
i 1
Substituti ng this value for  0
n
n
n
i 1
i 1
i 1
2
 xi yi  ( y  ˆ1 x ) xi  ˆ1  xi
n
 xi yi 
i 1
n
n
n
1 n
2
ˆ x x  ˆ
y
x



i i
1  i
1  xi
n i 1 i 1
i 1
i 1
n
n
n
n
n
n
i 1
i 1
i 1
i 1
n
n
n
n
n
i 1
i 1
i 1
i 1
i 1
n
n
n
n
n
i 1
i 1
i 1
i 1
i 1
2
n xi yi   xi  yi  ˆ1  xi  xi  nˆ1  xi
i 1
i 1
2
n xi yi   xi  yi  nˆ1  xi  ˆ1[ xi ]2
2
n xi yi   xi  yi  ˆ1 (n xi  [ xi ]2 )
n
ˆ1 
n
n
n xi yi   xi  yi
i 1
i 1
n
i 1
n
n xi  [ xi ]2
i 1
2
i 1
n
 ( xi  x ) yi
 i n1
 in1
n ( xi  x ) 2
 ( xi  x )2
i 1
i1
n
n ( xi  x ) yi
Ordinary Least Squares

This results in the normal equations

Which suggests an estimator for the intercept. The
means of X and Y are ALWAYS on the regression
line.
Ordinary Least Squares

Which yields an estimator for the slope of the line
 No other estimators will result in a
smaller sum of squared errors
SLR Assumption 1


Linear in Parameters
SLR.1



Defines POPULATION
model
The dependent variable y
is related to the
independent variable x
and the error (or
disturbance) u as SLR.1
 and  are population
parameters
Assumption SLR.1
y   0  1 x  u
SLR Assumption 2




Random Sampling
Use a random sample of
Assumption SLR.2
size n, {xi,yi): i=1,2,…,n}
from the population model {( xi , yi ) : i  1,2,..., n}
Allows redefinition of
SLR.1. Want to use DATA
to estimate our parameters y     x  u , i  1,2,..., n
i
0
1 i
i
 and  are population
parameters to be estimated
SLR Assumption 3


Sample variation in
independent variable
X values must vary. The
variance of X cannot
equal zero
Assumption SLR.3
n
2
(x

E
[
x
])
0
 i
i 1
SLR Assumption 4


Zero Conditional Mean
For a random sample,
implication is that NO
independent variable is
correlated with ANY
unobservable (remember
error includes
unobservable data)
Assumption SLR.4
E[u | x]  0
For a RANDOM Sample
E[ui | xi ]  0
for all i  1,2,..., n
SLR Theorem 1

Unbiasedness of OLS, estimators should equal the
population value in expectation
Theorem 1
E[ ˆ0 ]   0 , and E[ ˆ1 ]  1
n
ˆ1 
 (x  x) y
i
i 1
n
 (x  x)
i
, and ˆ0  y  ˆ1 x
2
i
i 1
n
ˆ1 
 ( x  x )( 
i
i 1
, and examining numerator
n
 (x  x)
2
i
i 1
ˆ1 
 1 xi  ui )
0
n
n
n
i 1
i 1
n
i 1
 ( xi  x ) 0   ( xi  x )1 xi   ( xi  x )ui
This holds because x and u
are assumed to be uncorrelated.
 ( xi  x ) 2
i 1
ˆ1 
n
n
n
i 1
i 1
n
i 1
 0  ( xi  x )  1  ( xi  x ) xi   ( xi  x )ui
 (x  x)
i
i 1
ˆ1 
n
n
i 1
i 1
1  ( xi  x ) 2   ( xi  x )ui
n
 (x  x)
i 1
i
Thus our estimator equals the
actual value of Beta
2
2
n
 1 
 ( x  x )u
i 1
n
i
 (x  x)
i 1
i
i
2
 1
SLR Theorem 1
Unbiasedness of OLS, estimators should equal the
population value in expectation

Theorem 1
E[ ˆ0 ]   0 , and E[ ˆ1 ]  1
ˆ0  y  ˆ1 x   0  1 x  u  ˆ1 x
ˆ    x (   ˆ )  u
0
0
1
1
E[ ˆ0 ]  E[  0  x ( 1  ˆ1 )  u ]
E[ ˆ ]  E[   x (   ˆ )]
0
E[ ˆ0 ]   0
0
1
1
The expected value of the
residuals is zero.
Thus our estimator equals the
actual value of Beta
SLR Assumption 5


Homoskedasticity
The variance of the errors is INDEPENDENT of the
values of X.
Assumption SLR.5
Var (u | x)  s 2
Rewriting SLR.3, SLR.4 , and SLR.5
This value implies also that
E[ y | x]   0  1 x
Var ( y | x)  s 2
Method of Moments



Seeks to equate the moments implied by a statistical
model of the population distribution to the actual
moments found in the sample
Certain restrictions are implied in the population
 E(u) = 0
 Cov(Xi,uj) = 0 i,j
Results in the same estimators as least squares
method
Interpretation of the Regression Slope
Coefficient

The coefficient, 1, tells us the effect X has on
Y

Increasing X by one unit will change the mean
value of Y by 1 units
Units of Measurement and
Regression Coefficients

Magnitude of regression coefficients depends
upon the units in which the dependent and
explanatory variables are measured


For example, using cents versus dollars will result
in smaller coefficients
Changing both the Y and X variables by the
same amount will not affect the slope
although it will impact the y-intercept
Models Including Logarithms

For a log-linear model the slope represents the proportionate
(like percentage change) change in Y arising from a unit
change in X


For a log-log model the slope represents the proportionate
change in Y arising from a proportionate change in X


The coefficients in your regression result in the SEMI-elasticity of Y
with respect to X
The coefficients in your regression results in the elasticity of Y with
respect to X. This is the CONSTANT ELASTICITY MODEL.
For a linear-log model the slope is the unit change in Y arising
from a proportionate change in X
Regression in Excel


Step 1: Reorganize data so that variables are
right next to one another in columns
Step 2: Data AnalysisRegression
Regression in Excel-Ex. 2.11
Regression in Excel
Regression in Excel
Regression in Excel
Your Estimated Equation is as follows
log( salary )  6.5055  .0097ceoten
T-statistics show that the coefficient on ceoten is insignificant at the
5% level. The p-value for ceoten is 0.128368 which is greater than .05,
meaning that you could see this value about 13% of the time. You are
Inherently testing the null hypothesis that all coefficients are equal to
ZERO. YOU FAIL TO REJECT THE NULL HYPOTHESIS HERE ON BETA-1.
Regression in Excel
X Variable 1 Line Fit Plot
10
9
8
Y
7
6
Y
5
Predicted Y
4
Linear (Y)
y = 0.0097x + 6.5055
R2 = 0.0132
3
2
1
0
0
5
10
15
20
X Variable 1
25
30
35
40
Regression in Excel
X Variable 1 Line Fit Plot
10
9
8
Y
7
6
Y
5
Predicted Y
4
Linear (Y)
y = 0.0097x + 6.5055
R2 = 0.0132
3
2
1
0
0
5
10
15
20
X Variable 1
25
30
35
40