* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 10
Survey
Document related concepts
Transcript
Lecture 10 : Heteroskedasticity Econ 488 Order of Testing 1. Omitted variables and incorrect functional form (Adjusted R2) 2. Either A or B, but not both A. Serial Correlation (Durbin-Watson) B. Heteroskedasticity (Park’s Test, White’s Test) 3. Multicollinearity (Correlation Matrix, VIF) 4. Irrelevant Variables (t-test) Homoskedasticiy Ideal Case: Homoskedasticity Error variance σ2 is constant across sample σ2 measures dispersion of dependent variable around regression line Homoskedasticity means that the average relationship between dependent variable and independent variable is the same throughout sample Homoskedasticity Heteroskedasticity Heteroskedasticity (or heteroscedasticity) is when σ2 is not constant across sample Dispersion of dependent variable around regression line is not constant. Heteroskedasticity Heteroskedasticity Why do we care? If we don’t fix heteroskedasticity: Coefficients are not efficient (not minimum variance) Estimated standard errors biased and inconsistent…meaning t-stats are not right! When can it occur? Whenever dispersion around regression line differs within sample means relationship between dependent variable and independent variable differs within sample Example: MLB Payroll and Market Size 2008 MLB Payrolls Large Markets:(Population>5,000,000) Mean: $104,000,000 Std Dev: $44,600,000 Min: $21,800,000 (Florida Marlins) Max: $209,000,000 (NY Yankees) Small Markets:(Population<5,000,000) Mean: $78,800,000 Std Dev: $28,300,000 Min: $43,800,000 (Tampa Bay Rays) Max: $139,000,000 (Detroit Tigers) Heteroskedasticity Note: Same principle applies when observations are groups that differ in size. e.g.: States (population) Countries (population) Colleges (enrollment) Companies (sales) Etc. Another Example Household income and consumption. A. Low-income households • • Little Flexibility in spending Most income spend on necessities: • • • Food, shelter, clothing, transportation, utilities Little dispersion of consumption around mean consumption. Small σ2 Household Income vs. Consumption B. High income households • • More flexibility in spending Once necessities are purchased, much remains to be spent in different ways • • • Big Spenders Savers and Investors Large dispersion of consumption around mean. Pure vs. Impure Heteroskedasticity Impure – Occurs when regression is not correctly specified E.g. omitted variables Can cause heteroskedasticity Pure – Occurs due to nature of data Consequences If we ignore heteroskedasticity, coefficient estimates are: Unbiased – OK! Consistent – OK! Inefficient – Not OK. t-tests are inaccurate. Detection Tests detect heteroskedasticity But won’t distinguish between pure and impure types If test uncovers heteroskedasticity–STOP! Try to decide if you have omitted variable. If you do… Include it in your model, and then retest for heteroskedasticity Detection OR…If you don’t have an omitted variable: Employ one of the remedies we’ll discuss After you “fix” the problem, Test again If you still have heteroskedasticity, It might be the impure type Detection Plots 1) Estimate model, save residuals 2) Plot residuals against each independent variable separately Example: data3-6.gdt Plots Plots – V on it’s side Plots – Increasing or Decreasing Plots – Rainbow or inverted rainbow Park Test If there is heteroskedasticity, then… Var(εi)= σ2 Zi2 εi = error term σ2 = variance of homoskedastic error term Zi = proportionality factor If you know something about Z, you can use the Park test. Find a variable that is related to heteroskedasticity (e.g. population) Park Test 1. Run regression, obtain residuals 2. Run the following regression: o ln(ei2)= α0+ α1ln(Zi)+ ui o o o o Where: ei= residuals from regression Zi= best choice as to proportionality factor in data ui= classical error term 3. Test the significance of ln(Zi). o If significant, there is evidence of heteroskedasticity. Park Test Problem: We don’t always have a good Z So, we can use White’s Test White’s Test H0: No Heteroskedasticity HA: Heteroskedasticity White’s Test 1) Estimate Equation Yi=β0+β1X1i+β2X2i+εi 2) Save residual o ei Yi ˆ1 X 1i ˆ2 X 2i and square it. 3) Regress squared residual on a constant, X1, X2, X12, X22, X1X2 (all combinations of X’s) ui2=α0+ α1X1i+ α2X2i + α3X1i2+ α4X2i2+ α5X1iX2i+ vi White’s Test 4) Compute N*R2 o N= sample size o R2 = unadjusted R2 5) Reject Null if o NR2 >χ2 (Chi-Square) with 5 degrees of freedom o Because there are 5 independent vars in auxiliary regression (step 3) White’s Test If you have 3 independent vars, auxiliary regression will have 9 independent vars. X1, X2, X3, X12, X22, X32, X1X2, X2X3, X1X3 If you have 6 independent vars, auxiliary regression will have 27 independent vars! This can get out of hand quickly. White’s Test Version 2 Same as before, except in auxiliary regression only use the X and X2 terms (no cross products) Use when you have a lot of independent variables. Remedies For Heteroskedasticity 1. Heteroskedasticity-Corrected Standard Errors o Fixes consistency of standard errors, so when N is large, standard errors are correct. o In gretl, just check the “robust standard error” box when running a regression Remedies For Heteroskedasticity 2. Weighted Least Squares (WLS) (1) Yi=β0+β1X1i+β2X2i+εi (2) Var(εi)= σ2 Zi2 eqn. (1) is equivalent to (3) Yi=β0+β1X1i+β2X2i+Ziui So we can divide through by Zi Remedies For Heteroskedasticity Step one: Yi 0 1 X 1i 2 X 2i ui Zi Zi Zi Zi Step two: estimate by OLS Caution about step 2: there are two cases. Remedies For Heteroskedasticity Case 1: Z is not in the original equation Old: Yi=β0+β1X1i+β2X2i+εi New: Yi 1 1 X 1i 2 X 2i Zi 0 Zi Zi Zi ui What’s Missing? The constant! Solution: Add a constant Better: Yi X 1 X 0 0 1 1i 2 2i ui Zi Zi Zi Zi Remedies For Heteroskedasticity Case 2: Z is in the original equation Suppose X1 is Z Old: Yi=β0+β1X1i+β2X2i+εi New: Y X 1 i X 1i 0 Xi 1 2 X 1i 2i ui What’s different about this equation? One of the slope coefficients in the original equation becomes an intercept! This happens because X1i/X1i=1 Remedies For Heteroskedasticity That is: Intercept value in the new equation is the same as slope β2 in the original equation. What should you look at in the new equation to find the equation of X2? The constant. Remedies For Heteroskedasticity Example: saving.gdt (weight by income)