Download Lecture 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Lecture 10 :
Heteroskedasticity
Econ 488
Order of Testing
1. Omitted variables and incorrect
functional form (Adjusted R2)
2. Either A or B, but not both
A. Serial Correlation (Durbin-Watson)
B. Heteroskedasticity (Park’s Test, White’s Test)
3. Multicollinearity (Correlation Matrix, VIF)
4. Irrelevant Variables (t-test)
Homoskedasticiy
Ideal Case: Homoskedasticity
Error variance σ2 is constant across sample
σ2 measures dispersion of dependent
variable around regression line
Homoskedasticity means that the average
relationship between dependent variable
and independent variable is the same
throughout sample
Homoskedasticity
Heteroskedasticity
Heteroskedasticity (or heteroscedasticity)
is when σ2 is not constant across sample
Dispersion of dependent variable around
regression line is not constant.
Heteroskedasticity
Heteroskedasticity
Why do we care?
If we don’t fix heteroskedasticity:
Coefficients are not efficient (not minimum
variance)
Estimated standard errors biased and
inconsistent…meaning
t-stats are not right!
When can it occur?
Whenever dispersion around regression
line differs within sample
means relationship between dependent
variable and independent variable differs
within sample
Example: MLB Payroll and Market Size
2008 MLB Payrolls
Large Markets:(Population>5,000,000)
Mean: $104,000,000
Std Dev: $44,600,000
Min: $21,800,000 (Florida Marlins)
Max: $209,000,000 (NY Yankees)
Small Markets:(Population<5,000,000)
Mean: $78,800,000
Std Dev: $28,300,000
Min: $43,800,000 (Tampa Bay Rays)
Max: $139,000,000 (Detroit Tigers)
Heteroskedasticity
Note: Same principle applies when
observations are groups that differ in size.
e.g.:
States (population)
Countries (population)
Colleges (enrollment)
Companies (sales)
Etc.
Another Example
 Household income and consumption.
A. Low-income households
•
•
Little Flexibility in spending
Most income spend on necessities:
•
•
•
Food, shelter, clothing, transportation, utilities
Little dispersion of consumption around mean
consumption.
Small σ2
Household Income vs. Consumption
B. High income households
•
•
More flexibility in spending
Once necessities are purchased, much
remains to be spent in different ways
•
•
•
Big Spenders
Savers and Investors
Large dispersion of consumption around
mean.
Pure vs. Impure Heteroskedasticity
Impure – Occurs when regression is not
correctly specified
E.g. omitted variables
Can cause heteroskedasticity
Pure – Occurs due to nature of data
Consequences
If we ignore heteroskedasticity, coefficient
estimates are:
Unbiased – OK!
Consistent – OK!
Inefficient – Not OK.
t-tests are inaccurate.
Detection
Tests detect heteroskedasticity
But won’t distinguish between pure and impure
types
If test uncovers heteroskedasticity–STOP!
Try to decide if you have omitted variable.
If you do…
Include it in your model, and then retest for
heteroskedasticity
Detection
OR…If you don’t have an omitted variable:
Employ one of the remedies we’ll discuss
After you “fix” the problem,
Test again
If you still have heteroskedasticity,
It might be the impure type
Detection
 Plots
1) Estimate model, save residuals
2) Plot residuals against each independent
variable separately
Example: data3-6.gdt
Plots
Plots – V on it’s side
Plots – Increasing or Decreasing
Plots – Rainbow or inverted rainbow
Park Test
If there is heteroskedasticity, then…
Var(εi)= σ2 Zi2
εi = error term
σ2 = variance of homoskedastic error term
Zi = proportionality factor
If you know something about Z, you can
use the Park test.
Find a variable that is related to
heteroskedasticity (e.g. population)
Park Test
1. Run regression, obtain residuals
2. Run the following regression:
o
ln(ei2)= α0+ α1ln(Zi)+ ui
o
o
o
o
Where:
ei= residuals from regression
Zi= best choice as to proportionality factor in data
ui= classical error term
3. Test the significance of ln(Zi).
o
If significant, there is evidence of heteroskedasticity.
Park Test
Problem: We don’t always have a good Z
So, we can use White’s Test
White’s Test
 H0: No Heteroskedasticity
 HA: Heteroskedasticity
White’s Test
1) Estimate Equation
 Yi=β0+β1X1i+β2X2i+εi
2) Save residual
o ei  Yi  ˆ1 X 1i  ˆ2 X 2i
and square it.
3) Regress squared residual on a constant, X1,
X2, X12, X22, X1X2 (all combinations of X’s)
 ui2=α0+ α1X1i+ α2X2i + α3X1i2+ α4X2i2+ α5X1iX2i+ vi
White’s Test
4) Compute N*R2
o N= sample size
o R2 = unadjusted R2
5) Reject Null if
o NR2 >χ2 (Chi-Square) with 5 degrees of
freedom
o Because there are 5 independent vars in
auxiliary regression (step 3)
White’s Test
If you have 3 independent vars, auxiliary
regression will have 9 independent vars.
X1, X2, X3, X12, X22, X32, X1X2, X2X3, X1X3
If you have 6 independent vars, auxiliary
regression will have 27 independent vars!
This can get out of hand quickly.
White’s Test Version 2
Same as before, except in auxiliary
regression only use the X and X2 terms
(no cross products)
Use when you have a lot of independent
variables.
Remedies For Heteroskedasticity
1. Heteroskedasticity-Corrected Standard
Errors
o Fixes consistency of standard errors, so when
N is large, standard errors are correct.
o In gretl, just check the “robust standard error”
box when running a regression
Remedies For Heteroskedasticity
2. Weighted Least Squares (WLS)





(1) Yi=β0+β1X1i+β2X2i+εi
(2) Var(εi)= σ2 Zi2
eqn. (1) is equivalent to
(3) Yi=β0+β1X1i+β2X2i+Ziui
So we can divide through by Zi
Remedies For Heteroskedasticity
 Step one:
 Yi   0  1 X 1i   2 X 2i  ui
Zi
Zi
Zi
Zi
 Step two: estimate by OLS
 Caution about step 2: there are two cases.
Remedies For Heteroskedasticity
 Case 1: Z is not in the original equation
Old: Yi=β0+β1X1i+β2X2i+εi
New: Yi
1 1 X 1i  2 X 2i
Zi
 0
Zi

Zi

Zi
 ui
What’s Missing?
The constant!
Solution: Add a constant
Better:
Yi
 X
1 X
  0   0  1 1i  2 2i  ui
Zi
Zi
Zi
Zi
Remedies For Heteroskedasticity
 Case 2: Z is in the original equation
Suppose X1 is Z
Old: Yi=β0+β1X1i+β2X2i+εi
New: Y
 X
1
i
X 1i
 0
Xi
 1 
2
X 1i
2i
 ui
What’s different about this equation?
One of the slope coefficients in the original equation
becomes an intercept!
This happens because X1i/X1i=1
Remedies For Heteroskedasticity
That is:
Intercept value in the new equation is the same
as slope β2 in the original equation.
What should you look at in the new equation to
find the equation of X2?
The constant.
Remedies For Heteroskedasticity
Example: saving.gdt (weight by income)