Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Applied Econometrics Martin Huber Chair of Applied Econometrics - Evaluation of Public Policies University of Fribourg 1 / 25 Overview Review of the OLS assumptions MLR.1 linearity: y = β0 + β1 x1 + ...βk xk + u MLR.2 i.i.d. (random) sampling MLR.3 conditional mean expectation of errors: E (u |x ) = 0 MLR.4 no perfect collinearity MLR.5 homoskedasticity: Var (u |x ) = σ 2 2 / 25 Overview 3 / 25 Overview Topic of this course: What if some assumption(s) is/are violated? Violation of MLR.1 nonlinear models Violation of MLR.2 non-random sampling Violation of MLR.3 omitted variables and endogeneity due to measurement error: E (u |x ) 6= 0 Violation of MLR.5 heteroskedasticity: Var (u |x ) = σ 2 (x ) 4 / 25 Contents of this lecture 1 Modelling nonlinearities using OLS General modelling approaches Ordinal variables Variables representing qualitative features 2 Modelling heterogeneity using OLS Dummy variables Interaction terms Tests for model heterogeneity Wooldridge Chapters 7.1-7.4 5 / 25 Data example 6 / 25 General modelling approaches Polynomials (of order J): J y = β0 + ∑ βj x j + u (1) j =1 Dummy variables for each category: K −1 y = β0 + ∑ βj · 1(x = vk ) + u (2) k =1 Discrete variables mit K different values: x ∈ {v1 , v2 , ..., vK −1 , vK }. Individuals with x = vK are the reference group. 7 / 25 Regression with polynomials As the dependent variable is log(y ) rather than y , the coefficients have to be interpreted as percentage changes in y due to unit changes in the explanatory variable x. E.g., the coefficient 0.080 on educ implies that the wage of an otherwise comparable worker increases by 8% if education is increased by one year. 8 / 25 Ordinal variables Original variables degree in the data: professional degree 1 = no degree (nodeg) 2 = vocational training (voc) 3 = college degree (col) 4 = university degree (uni) Ordinal variables: ordinal sorting (one value is larger or better than the other), but no cardinal interpretation (by how much it is larger or better is not specified) wage = β0 + β1 voc + β2 col + β3 uni + u (3) wage = β0 + δ0 female + β1 voc + β2 col + β3 uni + u (4) Why did we omit nodeg in (3)? What is the reference group in (3)? What is the reference group in (4)? 9 / 25 Variables representing qualitative features Original variable type: Type of university employees 1 = Professor 2 = Assistant professor 3 = Doctoral assistant 4 = Head of administration 5 = Administration 6 = Others Qualitative features: neither ordinal, nor cardinal sorting wage = β0 + β1 1(type = 1) + β2 1(type = 2) (5) +β3 1(type = 3) + β4 1(type = 4) + β5 1(type = 5) + u Why shouldn’t we directly use the original variable type? What is the reference group? What is the interpretation of β0 ? What is the interpretation of β1 ? 10 / 25 Modelling heterogeneity using OLS Dummy variables and interaction terms allow for different relations between dependent and independent variables in different subgroups, e.g. males and females. 11 / 25 Dummy variables Suspicion, that females and males receive on average different wages, even with the same level of education: (female = 1 if the individual is female, and female = 0) if the individual is male. wage = β0 + δ0 female + β1 educ + u (6) Intercept males: β0 Intercept females: β0 + δ0 Males are the reference group: group for which dummy= 0 so that the intercept is solely determined by the constant β0 . 12 / 25 Dummy variables 13 / 25 Interaction terms Suspicion, that females and males not only receive different mean wages with the same level of education, but also face different returns to education: wage = β0 + δ0 female + β1 educ + δ1 (female · educ ) + u (7) Intercept males: β0 Intercept females: β0 + δ0 Slope males: β1 Slope females: β1 + δ1 Interaction term: female · educ 14 / 25 Interaction terms 15 / 25 Interaction terms Interaction of dummies: overlapping groups wage = β0 + γ0 old + δ0 female + β1 educ + u (8) Intercept young male: β0 Intercept old male: β0 + γ0 Intercept young female: β0 + δ0 Intercept old female: β0 + γ0 + δ0 Age effect: γ0 Gender effect: δ0 16 / 25 Interaction terms Interaction of dummies: non-overlapping groups wage = β0 + λ1 d1 + λ2 d2 + λ3 d3 + β1 educ + u (9) Intercept young male: β0 Intercept old male (d1 = 1): β0 + λ1 Intercept young female (d2 = 1): β0 + λ2 Intercept old female (d3 = 1): β0 + λ3 λj is the wage difference of the group with dj = 1 when compared to the group of young males (reference group) given an equal level of education. What is the relationship between λ1 , λ2 , λ3 , γ0 , and δ0 ? 17 / 25 Regression with dummies As the dependent variable is log(y ) rather than y , the coefficients have to be interpreted as percentage changes in y due to unit changes in the explanatory variable (in case of x) or due to percentage change in the explanatory variable (in case of log (x )). E.g., the coefficient 0.054 on colonial implies that the price of an otherwise comparable house is 5.4% higher if built in colonial style (but the coefficient is not significant). The coefficient 0.168 on log(lotsize) implies that the house price increases by 0.168% if the lot size is increased by 1%. 18 / 25 Regression with dummies Controlling for sales and employment, firms that received a grant trained each worker, on average, 26.25 hours more. The coefficient -6.07 on log(employ) implies that, if a firm is 1% larger, it trains its workers 0.0607 hours less. 19 / 25 Regression with dummies and polynomials 20 / 25 Testing for differences in models across groups (Wooldr. 7.4) Fully interacted model: SSRur : Sum of squared residuals when estimating the OLS model with group dummy and interaction terms with all regressors (unrestricted model) SSRr : Sum of squared residuals when estimating the OLS model without group dummy/interaction terms (restricted model) (recall: SSR = ∑ni=1 ûi2 ) F= (SSRr − SSRur )/(k + 1) SSRur /[n − 2(k + 1)] (10) 21 / 25 Testing for differences in models across groups (Wooldr. 7.4) Separate models: SSR1 : Sum of squared residuals when estimating the OLS model in subgroup 1 SSR2 : Sum of squared residuals when estimating the OLS model in subgroup 2 SSR: Sum of squared residuals when estimating the OLS model in the total sample F= [SSR − (SSR1 + SSR2 )]/(k + 1) (SSR1 + SSR2 )/[n − 2(k + 1)] Chow-Test (11) The standard test is based on the assumption of homoskedasticity, stating that the variance of the error term is equal in both groups. The validity of homoskedasticity can be tested. If it is rejected, heteroskedasticity robust standard errors are to be used in the statistic. 22 / 25 Example from Wooldridge 7.4 23 / 25 Example from Wooldridge 7.4 24 / 25 Example from Wooldridge 7.4 Important limitation of the Chow test: null hypothesis allows for no differences at all between the groups. It may be more interesting to allow for an intercept difference between the groups and then to test for slope differences. This can be tested by including the group dummy and all interaction terms, as in equation (7.22), but then test joint significance of the interaction terms only (in that equation, rather than testing based on subsamples). 25 / 25