Download Statistics

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Transcript
Econometrics I
Professor William Greene
Stern School of Business
Department of Economics
3-1/58
Part 3: Least Squares Algebra
Econometrics I
Part 3 – Least Squares Algebra
3-2/58
Part 3: Least Squares Algebra
Vocabulary



3-3/58
Some terms to be used in the discussion.

Population characteristics and entities vs.
sample quantities and analogs

Residuals and disturbances

Population regression line and sample
regression
Objective: Learn about the conditional mean
function. Estimate  and 2
First step: Mechanics of fitting a line
(hyperplane) to a set of data
Part 3: Least Squares Algebra
Fitting Criteria




The set of points in the sample
Fitting criteria - what are they:

LAD

Least squares

and so on
Why least squares?
A fundamental result:
Sample moments are “good” estimators of
their population counterparts
We will spend the next few weeks using this
principle and applying it to least squares
computation.
3-4/58
Part 3: Least Squares Algebra
An Analogy Principle for Estimating 
In the population
Continuing
Summing,
Exchange Σi and E[]
E[y | X ]
=
E[y - X |X] =
E[xi i]
=
Σi E[xi i]
=
E[Σi xi i]
=
E[X(y - X) ] =
X so
0
0
Σi 0 = 0
E[ X ] = 0
0
Choose b, the estimator of  to mimic this population
result: i.e., mimic the population mean with the sample
mean
Find b such that
1
1
X e = 0  X (y - Xb)
N
N
As we will see, the solution is the least squares coefficient
vector.
3-5/58
Part 3: Least Squares Algebra
Population and Sample Moments
We showed that E[i|xi] = 0 and Cov[xi,i] = 0.
If so, and if E[y|X] = X, then
 = (Var[xi])-1 Cov[xi,yi].
This will provide a population analog to the
statistics we compute with the data.
3-6/58
Part 3: Least Squares Algebra
U.S. Gasoline Market, 1960-1995
3-7/58
Part 3: Least Squares Algebra
Least Squares

Example will be, Gi on
xi = [1, PGi , Yi]

Fitting criterion: Fitted equation will be
yi = b1xi1 + b2xi2 + ... + bKxiK.

Criterion is based on residuals:
ei = yi - b1xi1 + b2xi2 + ... + bKxiK
Make ei as small as possible.
Form a criterion and minimize it.
3-8/58
Part 3: Least Squares Algebra
Fitting Criteria

Sum of residuals:  i 1 ei
N
2
e
 Sum of squares:  i 1 i
N

Sum of absolute values of residuals:

Absolute value of sum of residuals

We focus on  e now and
3-9/58
N
2
i 1 i

N
i 1
ei


N
N
i 1
ei
e
i 1 i
later
Part 3: Least Squares Algebra
Least Squares Algebra
2
e
 i 1 i  ee = (y - Xb)'(y - Xb)
N
A digression on multivariate calculus.
Matrix and vector derivatives.
Derivative of a scalar with respect to a vector
Derivative of a column vector wrt a row vector
Other derivatives
3-10/58
Part 3: Least Squares Algebra
Least Squares Normal Equations
 e
N
2
i 1 i
  i 1 ( yi - xi b)2
N

b
b
(y - Xb)'(y - Xb)
 2 X'(y - Xb) = 0
b
(1x1) / (kx1)
(-2)(NxK)'(Nx1)
= (-2)(KxN)(Nx1) = Kx1
Note: Derivative of 1x1 wrt Kx1 is a Kx1 vector.
Solution:  2 X'(y - Xb) = 0  X'y = X'Xb
3-11/58
Part 3: Least Squares Algebra
Least Squares Solution
Assuming it exists: b = (X'X)-1X'y
Note the analogy:  =  Var(x) 
1
 Cov(x,y) 
1
1
 1

b=  X'X   X'y 
N
 N

Suggests something desirable about least squares
3-12/58
Part 3: Least Squares Algebra
Second Order Conditions
Necessary Condition: First derivatives = 0
(y - Xb)'(y - Xb)
 2 X'(y - Xb)
b
Sufficient Condition: Second derivatives ...
 2 (y - Xb)'(y - Xb)
bb
3-13/58
 (y - Xb)'(y - Xb) 


b


=
b
 K  1 column vector
=
 1 K row vector
= 2X'X
Part 3: Least Squares Algebra
Does b Minimize e’e?
 iN1 xi21
iN1 xi1 xi 2
 N
N
2
2

x
x

x
 e'e
i 1 i 2
 2 X'X = 2  i 1 i 2 i1
 ...
bb'
...
 N
N
i 1 xiK xi1 i 1 xiK xi 2
... iN1 xi1 xiK 

... iN1 xi 2 xiK 
...
... 

N
2
... i 1 xiK 
If there were a single b, we would require this to be
positive, which it would be; 2x'x = 2 i 1 xi2  0.
N
The matrix counterpart of a positive number is a
positive definite matrix.
3-14/58
Part 3: Least Squares Algebra
Sample Moments - Algebra
 iN1 xi21
iN1 xi1 xi 2
 N
N
2

x
x

x
i 1 i 2
X'X =  i 1 i 2 i1
 ...
...
 N
N
i 1 xiK xi1 i 1 xiK xi 2
 xi1 
x 
=iN1  i 2   xi1 xi 2 ...
 ... 
 
 xik 
=iN1xi xi
3-15/58
 xi21
... iN1 xi1 xiK 


... iN1 xi 2 xiK  N  xi 2 xi1
=i 1
 ...
...
... 


N
2
... i 1 xiK 
 xiK xi1
xi1 xi 2
xi22
...
xiK xi 2
... xi1 xiK 

... xi 2 xiK 
...
... 

2
... xiK 
xiK 
Part 3: Least Squares Algebra
Positive Definite Matrix
Matrix C is positive definite if a'Ca is > 0
for any a.
Generally hard to check. Requires a look at
characteristic roots (later in the course).
For some matrices, it is easy to verify. X'X is
one of these.
a'X'Xa = (a'X')( Xa) = ( Xa)'( Xa) = v'v =  k=1 v 2k  0
K
Could v = 0? v = 0 means Xa = 0. This is not possible.
Conclusion: b = ( X'X)-1 X'y does indeed minimize e'e.
3-16/58
Part 3: Least Squares Algebra
Algebraic Results - 1
In the population: E[X'] = 0
In the sample :
3-17/58
1
N
x i ei  0

i 1
N
Part 3: Least Squares Algebra
Residuals vs. Disturbances
Disturbances (population) y i  x i  i
Partitioning y:
y = E[y|X ] + ε
= conditional mean + disturbance
Residuals (sample)
y i  x ib  ei
Partitioning y :
y = Xb + e
= projection + residual
(Note : Projection into the column space of X , i.e., the
set of linear combinations of the columns of X; Xb is one of these. )
3-18/58
Part 3: Least Squares Algebra
Algebraic Results - 2







3-19/58
A “residual maker” M = (I - X(X’X)-1X’)
e = y - Xb= y - X(X’X)-1X’y = My
My = The residuals that result when y is regressed on X
MX = 0
(This result is fundamental!)
How do we interpret this result in terms of residuals?
When a column of X is regressed on all of X, we get a
perfect fit and zero residuals.
(Therefore) My = MXb + Me = Me = e
(You should be able to prove this.
y = Py + My, P = X(X’X)-1X’ = (I - M).
PM = MP = 0.
Py is the projection of y into the column space of X.
Part 3: Least Squares Algebra
The M Matrix
M = I- X(X’X)-1X’ is an nxn matrix
 M is symmetric – M = M’
 M is idempotent – MM = M
(just multiply it out)
 M is singular; M-1 does not exist.
(We will prove this later as a side result
in another derivation.)

3-20/58
Part 3: Least Squares Algebra
Results when X Contains a Constant Term



X = [1,x2,…,xK]
The first column of X is a column of ones
Since X’e = 0, x1’e = 0 – the residuals sum to
zero. y  Xb + e
Define i  [1,1,...,1]'  a column of n ones
i'y =

N
i=1
y i  ny
i'y  i'Xb + i'e = i'Xb
implies (after dividing by N)
y  x b (the regression line passes through the means)
These do not apply if the model has no constant term.
3-21/58
Part 3: Least Squares Algebra
Dummy Variable for One Observation
A dummy variable that isolates a single observation. What
does this do?
Define d to be the dummy variable in question.
Z = all other regressors. X = [Z,d]
Multiple regression of y on X. We know that
X'e = 0 where e = the column vector of
residuals. That means d'e = 0, which says that ej = 0
for that particular residual. The observation will be
predicted perfectly.
Fairly important result. Important to know.
3-22/58
Part 3: Least Squares Algebra
3-23/58
Part 3: Least Squares Algebra
Least Squares Algebra
3-24/58
Part 3: Least Squares Algebra
Least Squares
3-25/58
Part 3: Least Squares Algebra
Residuals
3-26/58
Part 3: Least Squares Algebra
Least Squares Residuals.
Note the peculiar pattern.
3-27/58
Part 3: Least Squares Algebra
Least Squares Algebra-3
X
I
e
X
XX
X
M
M is NxN potentially huge
3-28/58
Part 3: Least Squares Algebra
Least Squares Algebra-4
MX =
Not identically zero in a digital
computer. Rounding error.
3-29/58
Part 3: Least Squares Algebra
Econometrics I
Part 3.1 – Regression
Algebra and Fit
3-30/58
Part 3: Least Squares Algebra
The Fit of the Regression



3-31/58
“Variation:” In the context of the “model” we
speak of covariation of a variable as movement
of the variable, usually associated with (not
necessarily caused by) movement of another
variable.
n
Total variation =  (y i - y)2 = yM0y.
i=1
0
-1
M = I – i(i’i) i’
= the M matrix for
X = a column of ones.
Part 3: Least Squares Algebra
Decomposing the Variation
y i  x ib + ei
y i  y  x ib - x  b + ei
= (x i - x )b+ei

N
i1
(y i  y)2   i1 [(x i - x)b]2   i1 ei2
N
N
(Sum of cross products is zero.)
Total variation = regression variation +
residual variation
Recall the decomposition:
Var[y] = Var [E[y|x]] + E[Var [ y | x ]]
= Variation of the conditional mean around the overall mean
+ Variation around the conditional mean function.
3-32/58
Part 3: Least Squares Algebra
Decomposing the Variation of Vector y
Decomposition: (This all assumes the model contains a
constant term. one of the columns in X is i.)
y = Xb + e so
M0y = M0Xb + M0e = M0Xb + e.
(Deviations from means. Why is M0e = e? )
yM0y = b(X’ M0)(M0X)b + ee
= bXM0Xb + ee.
(M0 is idempotent and e’ M0X = e’X = 0.)
Total sum of squares = Regression Sum of Squares (SSR)+
Residual Sum of Squares
(SSE)
3-33/58
Part 3: Least Squares Algebra
The Sum of Squared Residuals
b minimizes ee = (y - Xb)(y - Xb).
Algebraic equivalences, at the solution
b = (XX)-1Xy
e’e = ye (why? e’ = y’ – b’X’ and X’e=0 )
(This is the F.O.C. for least squares.)
ee = yy - y’Xb = yy - bXy
= ey as eX = 0 (or e’y = y’e)
3-34/58
Part 3: Least Squares Algebra
A Fit Measure
y
= Xb + e
M0y = M0Xb + M0e M0e = e.
y’M0y = bXM0Xb + e’e
1 = bXM0Xb/yM0y + e’e/yM0y
R2 = bXM0Xb/yM0y
Regression Variation
R  1 N

2
Total Variation
i1 (yi  y)
2
e'e
(VIR) R2 is bounded by zero if and one only if:
(a) There is a constant term in X and
(b) The line is computed by linear least squares.
3-35/58
Part 3: Least Squares Algebra
Minimizing ee
Any other coefficient vector has a larger sum of
squares. A quick proof:
d = the vector, not equal to b
u = y – Xd = y – Xb + Xb – Xd
= e - X(d - b).
Then, uu = (y - Xd)(y-Xd)
= [y - Xb - X(d - b)][y - Xb - X(d - b)]
= [e - X(d - b)] [e - X(d - b)]
Expand to find uu = ee + (d-b)XX(d-b)
= e’e + v’v > ee
3-36/58
Part 3: Least Squares Algebra
Dropping a Variable
An important special case. Suppose
bX,z = [b,c]
= the regression coefficients in a regression of y on [X,z]
bX = [d,0]
= is the same, but computed to force the coefficient on z
to equal 0. This removes z from the regression.
We are comparing the results that we get with and without the
variable z in the equation. Results which we can show:
 Dropping a variable(s) cannot improve the fit - that is, it cannot
reduce the sum of squared residuals.
 Adding a variable(s) cannot degrade the fit - that is, it cannot
increase the sum of squared residuals.
3-37/58
Part 3: Least Squares Algebra
Adding a Variable Never Increases
the Sum of Squares
Theorem 3.5 on text page 38.
u = the residual in the regression of y on [X,z]
e = the residual in the regression of y on X alone,
uu = ee – c2(z*z*)  ee
where z* = MXz.
3-38/58
Part 3: Least Squares Algebra
Adding Variables
R2 never falls when a z is added to the
regression.
 A useful general result

R 2 with both X and variable z equals
2
R with only X plus the increase in fit due to z
after X is accounted for:
*2
R 2Xz  R 2X  (1  R 2X )ryz|X
3-39/58
Squared partial
correlation between
y and z controlling
for x.
Part 3: Least Squares Algebra
Adding Variables to a Model
What is the effect of adding PN, PD, PS,
3-40/58
Part 3: Least Squares Algebra
Comparing fits of regressions
Make sure the denominator in R2 is the same - i.e.,
same left hand side variable. Example, linear
vs. loglinear. Loglinear will almost always
appear to fit better because taking logs reduces
variation.
3-41/58
Part 3: Least Squares Algebra
3-42/58
Part 3: Least Squares Algebra
Adjusted R Squared

Adjusted R2 (for degrees of freedom?)
2
R = 1 - [(n-1)/(n-K)](1 - R2)

Degrees of freedom” adjustment suggests something
about “unbiasedness.” The ratio is not unbiased.
2

R includes a penalty for variables that don’t add much fit.
Can fall when a variable is added to the equation.
3-43/58
Part 3: Least Squares Algebra
Adjusted R2
What is being adjusted?
The penalty for using up degrees of freedom.
R 2 = 1 - [ee/(n – K)]/[yM0y/(n-1)] uses the ratio of two
‘unbiased’ estimators. Is the ratio unbiased?
R 2 = 1 – [(n-1)/(n-K)(1 – R2)]
Will R 2 rise when a variable is added to the regression?
R 2 is higher with z than without z if and only if the t ratio
on z is in the regression when it is added is larger than
one in absolute value.
3-44/58
Part 3: Least Squares Algebra
Full Regression (Without PD)
---------------------------------------------------------------------Ordinary
least squares regression ............
LHS=G
Mean
=
226.09444
Standard deviation
=
50.59182
Number of observs.
=
36
Model size
Parameters
=
9
Degrees of freedom
=
27
Residuals
Sum of squares
=
596.68995
Standard error of e =
4.70102
Fit
R-squared
=
.99334 <**********
Adjusted R-squared
=
.99137 <**********
Model test
F[ 8,
27] (prob) =
503.3(.0000)
--------+------------------------------------------------------------Variable| Coefficient
Standard Error t-ratio P[|T|>t]
Mean of X
--------+------------------------------------------------------------Constant|
-8220.38**
3629.309
-2.265
.0317
PG|
-26.8313***
5.76403
-4.655
.0001
2.31661
Y|
.02214***
.00711
3.116
.0043
9232.86
PNC|
36.2027
21.54563
1.680
.1044
1.67078
PUC|
-6.23235
5.01098
-1.244
.2243
2.34364
PPT|
9.35681
8.94549
1.046
.3048
2.74486
PN|
53.5879*
30.61384
1.750
.0914
2.08511
PS|
-65.4897***
23.58819
-2.776
.0099
2.36898
YEAR|
4.18510**
1.87283
2.235
.0339
1977.50
--------+-------------------------------------------------------------
3-45/58
Part 3: Least Squares Algebra
PD added to the model. R2 rises, Adj. R2 falls
---------------------------------------------------------------------Ordinary
least squares regression ............
LHS=G
Mean
=
226.09444
Standard deviation
=
50.59182
Number of observs.
=
36
Model size
Parameters
=
10
Degrees of freedom
=
26
Residuals
Sum of squares
=
594.54206
Standard error of e =
4.78195
Fit
R-squared
=
.99336 Was 0.99334
Adjusted R-squared
=
.99107 Was 0.99137
--------+------------------------------------------------------------Variable| Coefficient
Standard Error t-ratio P[|T|>t]
Mean of X
--------+------------------------------------------------------------Constant|
-7916.51**
3822.602
-2.071
.0484
PG|
-26.8077***
5.86376
-4.572
.0001
2.31661
Y|
.02231***
.00725
3.077
.0049
9232.86
PNC|
30.0618
29.69543
1.012
.3207
1.67078
PUC|
-7.44699
6.45668
-1.153
.2592
2.34364
PPT|
9.05542
9.15246
.989
.3316
2.74486
PD|
11.8023
38.50913
.306
.7617
1.65056 (NOTE LOW t ratio)
PN|
47.3306
37.23680
1.271
.2150
2.08511
PS|
-60.6202**
28.77798
-2.106
.0450
2.36898
YEAR|
4.02861*
1.97231
2.043
.0514
1977.50
--------+-------------------------------------------------------------
3-46/58
Part 3: Least Squares Algebra
y|x1,x2,…,x1000; what is the right model?

2

ˆ

e
e
/
(
n

K
)

Adjusted R2 = 1 

1

y M 0 y / (n  1)
ˆ 2y

Information Criteria




logLikelihood = -n/2(1 + log2pi + log(e’e/n))
Akaike IC = -2logL + 2K
Bayesian IC = -2logL + Klogn
“Cross Validation”
(Training samples and test samples)
3-47/58
Part 3: Least Squares Algebra
Econometrics I
Part 3.2 – Transformed Data
3-48/58
Part 3: Least Squares Algebra
(Linearly) Transformed Data







3-49/58
How does linear transformation affect the results of
least squares? Z = XP for KxK nonsingular P
Based on X, b = (XX)-1X’y.
You can show (just multiply it out), the coefficients
when y is regressed on Z are c = P-1 b
“Fitted value” is Zc = XPP-1b = Xb. The same!!
Residuals from using Z are y - Zc = y - Xb (we just
proved this.). The same!!
Sum of squared residuals must be identical, as
y-Xb = e = y-Zc.
R2 must also be identical, as R2 = 1 - ee/y’M0y (!!).
Part 3: Least Squares Algebra
Understanding Linear Transformation


3-50/58
Xb is the projection of y into the column space of X. Zc is the
projection of y into the column space of Z. But, since the columns
of Z are just linear combinations of those of X, the column space of
Z must be identical to that of X. Therefore, the projection of y into
the former must be the same as the latter, which now produces the
other results.)
What are the practical implications of this result?

Transformation does not affect the fit of a model to a body of
data.

Transformation does affect the “estimates.” If b is an estimate
of something (), then c cannot be an estimate of  - it must be
an estimate of P-1, which might have no meaning at all.
Part 3: Least Squares Algebra
ECONOMETRIC 911
I have a simple question for you. Yesterday, I was
estimating a regional production function with yearly
dummies. The coefficients of the dummies are usually
interpreted as a measure of technical change with
respect to the base year (excluded dummy variable).
However, I felt that it could be more interesting to
redefine the dummy variables in such a way that the
coefficient could measure technical change from one
year to the next. You could get the same result by
subtracting two coefficients in the original regression but
you would have to compute the standard error of the
difference if you want to do inference.
Is this a well known procedure?
3-51/58
Part 3: Least Squares Algebra
Production Function for 247 Dairy Farms: 1993-1998.
3-52/58
Part 3: Least Squares Algebra
3-53/58
Part 3: Least Squares Algebra
A model of the
probability that an
individual will visit the
doctor at least once
in the survey year.
3-54/58
Part 3: Least Squares Algebra
3-55/58
Part 3: Least Squares Algebra
Reducing the Dimensionality of X
Principal Components

Z = XC




Why do we do this?



Fewer columns than X
Includes as much ‘variation’ of X as possible
Columns of Z are orthogonal
Collinearity
Combine variables of ambiguous identity such as test scores as
measures of ‘ability’
How do we do this? Later in the course. Requires some
further results from matrix algebra.
3-56/58
Part 3: Least Squares Algebra
What is a Principal Component?
X = a data matrix (deviations from means)
 z = Xp = linear combination of the columns of X.
 Choose p to maximize the variation of z.
 How? p = eigenvector that corresponds to the
largest eigenvalue of X’X

3-57/58
Part 3: Least Squares Algebra
+----------------------------------------------------+
| Movie Regression. Opening Week Box for 62 Films
|
| Ordinary
least squares regression
|
| LHS=LOGBOX
Mean
=
16.47993
|
|
Standard deviation
=
.9429722
|
|
Number of observs.
=
62
|
| Residuals
Sum of squares
=
20.54972
|
|
Standard error of e =
.6475971
|
| Fit
R-squared
=
.6211405
|
|
Adjusted R-squared
=
.5283586
|
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|Constant|
12.5388***
.98766
12.695
.0000
|
|LOGBUDGT|
.23193
.18346
1.264
.2122
3.71468|
|STARPOWR|
.00175
.01303
.135
.8935
18.0316|
|SEQUEL |
.43480
.29668
1.466
.1492
.14516|
|MPRATING|
-.26265*
.14179
-1.852
.0700
2.96774|
|ACTION |
-.83091***
.29297
-2.836
.0066
.22581|
|COMEDY |
-.03344
.23626
-.142
.8880
.32258|
|ANIMATED|
-.82655**
.38407
-2.152
.0363
.09677|
|HORROR |
.33094
.36318
.911
.3666
.09677|
4 INTERNET BUZZ VARIABLES
|LOGADCT |
.29451**
.13146
2.240
.0296
8.16947|
|LOGCMSON|
.05950
.12633
.471
.6397
3.60648|
|LOGFNDGO|
.02322
.11460
.203
.8403
5.95764|
|CNTWAIT3|
2.59489***
.90981
2.852
.0063
.48242|
+--------+------------------------------------------------------------+
3-58/58
Part 3: Least Squares Algebra
+----------------------------------------------------+
| Ordinary least squares regression
|
| LHS=LOGBOX
Mean
=
16.47993
|
|
Standard deviation
=
.9429722
|
|
Number of observs.
=
62
|
| Residuals
Sum of squares
=
25.36721
|
|
Standard error of e =
.6984489
|
| Fit
R-squared
=
.5323241
|
|
Adjusted R-squared
=
.4513802
|
+----------------------------------------------------+
+--------+--------------+----------------+--------+--------+----------+
|Variable| Coefficient | Standard Error |t-ratio |P[|T|>t]| Mean of X|
+--------+--------------+----------------+--------+--------+----------+
|Constant|
11.9602***
.91818
13.026
.0000
|
|LOGBUDGT|
.38159**
.18711
2.039
.0465
3.71468|
|STARPOWR|
.01303
.01315
.991
.3263
18.0316|
|SEQUEL |
.33147
.28492
1.163
.2500
.14516|
|MPRATING|
-.21185
.13975
-1.516
.1356
2.96774|
|ACTION |
-.81404**
.30760
-2.646
.0107
.22581|
|COMEDY |
.04048
.25367
.160
.8738
.32258|
|ANIMATED|
-.80183*
.40776
-1.966
.0546
.09677|
|HORROR |
.47454
.38629
1.228
.2248
.09677|
|PCBUZZ |
.39704***
.08575
4.630
.0000
9.19362|
+--------+------------------------------------------------------------+
3-59/58
Part 3: Least Squares Algebra