Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ORDINARY LEAST SQUARES REGRESSION NICHOLAS CHARRON ASSOCIATE PROF. DEPT. OF POLITICAL SCIENCE Section Outline Day 1: overview of OLS regression (Wooldridge, chap. 1-4) Day 2 (25 jan): assumptions of OLS regression (Wooldridge, chap. 5-7) Day 3 (27 jan): alternative estimation, interaction models (Wooldridge, chap. 8-9 + Brambor et al article) Next topic: Limited dependent variables Section Goals • To understand the basic ideas and formulas behind linear regression • Calculate (by hand) simple bivariate regression coefficient • Working with ’real’ data, apply knowledge, perform regression & interpret results, compare effects of variables in multiple regression • Understant the basic asumptions of OLS estimation • How to check for violations, and what to do (more in later lectures also) • What to do when X and Y relationship are not directly linear – interaction effect, variable transformation (logged variables) • Apply knowledge in STATA! ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) Introductions! -Name, -Department -year as PhD student, -where you are from (country), -how much statisitica have you had? -What is your research topic? Linear Regression: brief history A bit Sir Francis Galton – interested in heredity of plants, ’regression toward mediocraty’, meaning in his time the median (now known more as regression toward the mean..) Emphais on ’on average’ what can we expect? was not a mathmatician however.. Karl Pearson (Galton’s biographer) took Galton’s work and developed several statistical measures Together with previous ’least squares’ method (Gauss 1812), regression analysis was born ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) Simple statistical methods: cross tabulations & correlations Used widely, especially in survey research, probably many of you are familiar with this… At least one Categorical variable – nominal or ordinal If we want to know how two variables related in terms of: strength, direction & effect, we can use various tests Nominal level – only strength (Cramer’s V) Ordinal level – strength and direction (tau-b & tau-c, gamma) Interval (ratio) level – strength, direction (and effect) (Pearson’s r, regression) WHY DO WE USE REGRESSION? • To test hypotheses about causal relationships for a continuous/ordinal outcome • To make predictions • The preferred method when your statistical model has more than two explanitory variables and you want to elaborate a causal relationship • However, always remember that correlation is not causation! • We test hypotheses about causal relationships but the regression doesn’t express causal direction (why theory is important!) Key advantages of linear regression • Simplicity: a linear relationship is the most simple, non-trivial relationship. Plus, most people can even do the math by hand (as opposed to other estimation techniques..) • Flexability: even if the relationship between X and Y is not really linear, the variables can be transformed (more later) • Interprebility: we get strength, direction & effect in a simple easy to understand package Some essential terminology Regression: the mean of the outcome variable (Y) as a function of one or more independent variables (X) 𝜇 𝑌|𝑋 Regression Model: explaining Y’s in the ’real world’ is very complicated. A model is our APPROXIMATION (simplification) of our relationship Simple (bivariate) regression model 𝑌 = 𝛽0 +𝛽1 X Y: the dependent Variable X: the independent variable 𝛽0 : the intercept or ’constant’ (in other words??), also notated as 𝛼 (alpha) 𝛽1 : the slope 𝛽’s are called ’coefficients’ More terminology • Dependent variable (Y): aka explained variable, response variable • Ind. Variable (X): aka. Explanitory variable, control variable Two types of models broadly speaking: 1. Deterministic model: – Y=a+bx (the equation of a straight line) 2. Probabilistic model (e.g. what we are most interested in..): – Y=a+bx+e – A deterministic model: visual 120 Y: Total expenses 100 80 60 40 20 0 0 1 2 3 4 5 6 7 X: # beers 12 A deterministic model: simple ex. Calculation of the slope: Person # beers (X) Total expenses (Y) Stefan 0 20 Martin 2 50 Thomas 4 80 Rasmus 5 95 Christian 6 110 β 110 20 6 0 15 Calculation of intercept: α = 50 – (2 15) = 20, or α = 80 – (4 15) = 20 The equation for the relationship: Y = α + βX = 20 + 15X 13 The probabilistic model: with ‘error’ 14 ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) Even more terminology • and, most often, we’re dealing with ’probabilistic’ models: Fitted values (’Y hat’): for any observation ’i’ our model gives an expected mean: 𝒀 = 𝒇𝒊𝒕𝒊 = 𝜷𝒐 + 𝜷𝟏 𝑿 Residual: is the error (how much our model is ’off’) for obs ’i’ 𝒓𝒆𝒔𝒊 = 𝒀𝒊 − 𝒇𝒊𝒕𝒊 = 𝒀𝒊 − 𝒀 Where 𝑟𝑒𝑠𝑖 is normally written as 𝑒𝑖 Least Squares: our method to find the estimates that MINIMIZE the SUM of SQUARE RESIDUALS (SSE) 𝑛 𝑛 (𝑦𝑖 − 𝛽𝑜 + 𝛽1 𝑥1 )2 = 𝑖=1 𝑛 (𝑦𝑖 − 𝑦)2 = 𝑖=1 𝑒𝑖 ² 𝑖=1 OLS- Regression X(age) Y(income) Pelle 20 21 Lisa 19 22,4 Kalle 54 47,3 Ester 42 17 Ernst 39 35 Stian 67 23,8 Lise 40 39,3 Relationship between income and age Each dot represents a respondent’s income(Y) at age(X) OLS- Regression Relationship between income and age – shows strength, direction & effect, but causality? OLS- Regression Relationship between income and age How to estimate the coefficients? • We use the ’least squares method’ of course! To calculate the slope coefficient (𝜷) of X: 𝜷𝟏 = 𝒏 𝒊=𝟏(𝒙𝒊 − 𝑿)(𝒚𝒊 − 𝒏 𝟐 𝒊=𝟏(𝒙𝒊 − 𝑿) 𝒀) the slope coefficient is the covariance between X and Y over the variance of X, or the rate of change of Y relative to change in X And to calculate the constant: 𝜷𝟎 = 𝒀 − 𝜷𝟏 𝑿 Simply speaking, the constant is just the mean of Y – the mean of X times 𝛽 In class excercise – Calculation of beta and alpha value in OLS by hand! _ b X Y Pelle 2 4 Lisa 1 5 Kalle 5 2 Ester 4 1 Ernst 3 3 _ ( X X )(Y Y ) (X X ) _ a Y bX 2 20 Calculation of b-value in OLS X Y X Y X-X Y-Y (X-X)2 (X-X)*(Y-Y) Pelle 2 4 3 3 -1 1 1 -1 Lisa 1 5 3 3 -2 2 4 -4 Kalle 5 2 3 3 2 -1 4 -2 Ester 4 1 3 3 1 -2 1 -2 Ernst 3 3 3 3 0 0 0 0 Summa: 15 15 15 15 0 0 10 -9 _ b _ ( X X )(Y Y ) (X X ) _ 2 21 Calculation of b-value in OLS X Y X Y X-X Y-Y (X-X)2 (X-X)*(Y-Y) Pelle 2 4 3 3 -1 1 1 -1 Lisa 1 5 3 3 -2 2 4 -4 Kalle 5 2 3 3 2 -1 4 -2 Ester 4 1 3 3 1 -2 1 -2 Ernst 3 3 3 3 0 0 0 0 Summa: 15 15 15 15 0 0 10 -9 _ _ ( X X )(Y Y ) b (X X ) _ b 2 a Y bX 9 10 b 9 0.90 10 a 3 0.90 * 3 5.7 22 .8 .6 .4 .2 • Now for every value of X, we have an expected value of Y mean of X = 4.073 mean of Y = 0.176 0 corruption • A cool property of least squares estimation is that the regression line will always cross the mean of X and mean of Y 2 from Charron et al 2017 3 4 pub. sec. meritocracy 5 6 𝜷 compared to Pearson’s r • The effect in a linear regression = 𝜷 _ b _ ( Xi X )(Yi Y ) ( Xi X ) _ 2 • Correlation– Pearson’s r • Same numerator, but takes the variance of Y also in the denomonater into account Xi X Yi Y r 2 2 Xi X Yi Y Q: When will these two be equal? Interpretation of Pearson’s r Source: wikipedia 25 Correlation and regression, a comparison • Pearson’s r is standardized and varies between -1 (perfect neg. relationship) and 1 (perfect pos. relationship) 0 = no relationship, n is not taken into account 𝑺𝒀 𝒓𝒙𝒚 𝑺𝑿 • The regression coefficient (𝜷) has no given minimum or maximum values and the interpretation of the coefficient depends on the range of the scale • Unlike the correlation r, the regression is used to predict values of one variable, given values of another variable Objectives and goals of linear regression • We want to know the probability distribution of Y as a function of X (or several X’s) • Y is a strait line (e.g. linear) function of X, plus some random noise (error term) • Goal is to find the ’best line’ that fits explains variation of Y with X • Important! The marginal effect of X on Y is assumed to be CONSTANT across all values of X. What does this mean?? Applied bivariate example • Data: QoG Basic, two variables from World Value Survey • Dependent variable (Y): Life happiness (1-4, lower=better) • Independent variable (X): State of health (1-5, lower=better) • Units of analysis: countries (aggregated from survey data ), 20 randomly selected Our Model: Y(Happiness) = α + β1(health) + e • H1: the healthier a country feels, the happier it is on average • Let’s Estimete of α and β1 based on our data! • Open file in STATA from GUL: health_happy ex.dta ***To do what I’ve done in the slides, see the do.file 28 Some basic statistics ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) 1. Summary stats sum health happiness Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------health | 20 2.1925 .241004 1.84 2.65 happiness | 20 1.948 .2783901 1.59 2.58 2. Pairwise correlations (Pearson’s r) pwcorr health happiness | health happin~s -------------+-----------------health | 1.0000 happiness | 0.6752 1.0000 2.2 2 1.8 1.6 happy 2.4 2.6 3. Scatterplot w/line – in STATA: twoway (scatter happiness health) (lfit happiness health) 1.8 2 2.2 health 2.4 2.6 30 Now for the regression • Observations • F-test of model . reg happiness health Source SS df MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 Coefficient (b) Std. Err. .2009521 .4429099 Constant (a) t 3.87 0.55 standard error Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.590 = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 • R sq • Mean sq. error [95% Conf. Interval] .3558981 -.6873656 t-test (of sig.) 1.200267 1.173673 95% interval of confidence Now for the regression . reg happiness health Source SS df MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 Std. Err. .2009521 .4429099 Xi X Yi Y b Xi X 2 t 3.87 0.55 Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.590 = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 [95% Conf. Interval] .3558981 -.6873656 1.200267 1.173673 a Y bX Yˆi a bX i 32 2.2 2 1.8 1.6 happiness 2.4 2.6 Some interpretation 1.8 2 2.2 health happiness 2.4 2.6 What is the predicted mean of happines for a country with a mean in health of 2.3? Fitted values 33 2.2 2 1.8 1.6 happiness 2.4 2.6 Interpretation Y=.243 + .778(2.3) = 2.02 1.8 2 2.2 health happiness 2.4 2.6 What is the predicted mean of happines for a country with a mean in health of 2.3? Fitted values Answer: 2.02.. 34 Ok, now what?? • Ok great, but the calculation of beta and alpha are just simple math… • Now, we want to see how much we can INFER from this relationship – as we do not have ALL the observations (e.g. a ’universe’) with perfect data, we are only making an inference • A key to doing this is to evaluate how ”off” our model predictions are relative to actual observations in Y • We can do this both for our model on whole and for individual coefficients (both betas and alpha). We’ll start with calculating the SUM of SQUARES • Two questions: a. how ’sure’ are we of our estimates, e.g. significance, or probability that the relationship we see is not just ’white noise’ b. Is this (OLS) actually the most valid estimation method? Assumptions of OLS (more on this next week!) OLS is fantastic if our data meets several assumptions, and before we make any inferences, we should always check: In order to make inference: 1. The linear model is suitable 2. The conditional standard deviation is the same for all levels of X (homoscedasticity) 3. Error terms are normally distributed for all levels of X 4. The sample is selected randomly 5. There are no severe outliers 6. There is no autocorrelation 7. No multicollinearity 8. Our sample is representative of the population (all estimation) 36 Regression Inference • In order to test several of our assumptions, we need to observe the residuals in our estimation • These allow us to both check OLS assumptions AND provide significance testing • Plotting the residuals against the explanatory (X) variable is helpful in checking these conditions because a residual plot magnifies patterns. This you should ALWAYS look at Least squares: Sum of the squared error term – getting our error terms • A measure of how far the line is from the observations, is the sum of all errors: The smaller it is, the closer is the line to the observations (and thus, the better our model.) • In order to avoid positive and negative errors to cancel out in the calculation, we square them: term •The Theerror sum of all for observation i: e i 2 Yi - Ŷi 2 The Residual sum of squares (RSS) n n RSS ei Yi - Ŷi i 1 2 i 1 2 38 90 Residual Sum of squares – a simple visual 80 'actual Yi' residuals 50 60 70 'best fit' line 0 ei Yi - Ŷi 2 5 10 Unemply1012 15 2 Back to our exercise: . reg happiness health Source SS df MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 RSS Y Yˆ Std. Err. .2009521 .4429099 t 3.87 0.55 Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 [95% Conf. Interval] 0.001 0.590 .3558981 -.6873656 1.200267 1.173673 Y Yˆ 2 2 ˆ TSS Y Y 2 RSS n2 n2 The typical deviation around the bar (ie the conditional standard deviation) mean 40 square Getting our standard errors for beta • The standard error (s.e.) tells us something about the average deviation between our observed values and our predicted values. • The s.e. for a slope coefficient is then calculated as the square root of the RSS divided by the number of observations - the # of parameters: Se RSS nK = Se (Y Yˆ ) 2 nK Where RSS= Residual Sum of Square, aka Sum of Squared Errors n=number of cases K=number of parameters (in bivariate regression - intercept and b-coefficient = 2 Getting our standard errors for b • The precision of b depends (among others) of the variation around the slope – i.e. how large the spread is around the line? • This spread, we have assumed constant for all levels of X, but how is it calculated? • See. Earlier: The sum of squared deviations of the line is given by: n 2 RSS Yi - Ŷi i 1 • As we just saw, the typical deviation around the bar (ie the conditional standard deviation) is then given by: ˆ RSS n2 Y Yˆ n2 2 42 Standard errors for b • The standard error is then defined as the conditional standard error divided by the variation in X ˆ b • ˆ X X 2 ˆ X X n 1 2 ˆ sX n 1 n 1 Factors affecting the standard error of Beta: 1. 2. 3. The spread on the line σ - the smaller σ, the smaller the standard error Sample size: n - the larger n, the smaller the standard error The variation of X: - the greater the variation, the smaller the standard error 2 X X 43 Standard errors for b: the ‘ideal’ 44 Back to our example (see excel sheet for ‘by hand calculations..) . reg happiness health Source SS df MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 Standard error of a Std. Err. .2009521 .4429099 t 3.87 0.55 Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.590 = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 [95% Conf. Interval] .3558981 -.6873656 1.200267 1.173673 Standard error of b 45 Standard errors for b • The standard errors can then be used for hypothesis testing. By dividing our slopecoefficients by their s.e.’s we receive their t-values. • The t-value can then be used as a measure for statistical significance and allows us to calculate a p-value (what is this??) • Old school: one can consult a t-table where the degrees of freedom (the number of observations - 1) and your level of security (p<.10, .05 or .01 for ex.) decides whether your coefficient is significantly different from zero or not (t-table – can be found as appendices in books in statistics, like Woodbridge) • New school: rely on statistical programs (like STATA, SPSS) • H0: β1= 0 • H1: β1≠ 0 47 Hypothesis testing & confidence interval for β Hypothesis test of independence between the variables (Ho: β = 0): b0 b t ˆ b ˆ b Calculation of t-value: b t SE 95 pct. Confidence intervals: β b t(σ̂ b ) 48 confidence intervals for β H0 - b = 0 ± 1,96 H1 – b ≠ 0 ± 1,96 on a 95% confidence level 90% confidence interval. t=1.645 95% confidence interval. t=1.96 99% confidence interval. t=2,576 Forming a 95% confidence interval for a single slope-coefficient: bx ± t(SEbx) Back to our example.. • Observations • F-test of model . reg happiness health Source SS df MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 Std. Err. .2009521 .4429099 standard error t 3.87 0.55 Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.590 = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 • R sq • Mean sq. error [95% Conf. Interval] .3558981 -.6873656 t-test (of sig.) p-value from t-test 1.200267 1.173673 95% upper/lower limit TAKING A CLOSER LOOK ’UNDER THE HOOD OF THE CAR’… OPEN EXCEL FILE: REGRESSION HAPPINES_VS_HEALTH Basic OLS model diagnostics: 1. R² 2. F-Test 3. MSE 1. R2 : EXPLAINED VARIANCE • A.k.a. “coefficient of determination” • R2 ranges from 0 and1 (0 ≤ r2 ≤ 1) • R2 defined how close to the estimated regression line the observed values (the dots) • R2 is a direct measure of linearity but is interpreted as explained variance. When R2 =1, we’ve explained all variation in Y, when R2 =0, we’ve explained nothing… good way to compare models! • In many (social science) research models building on survey data (individual level), R2 is often a low value (rarely exceeds .40) • Is calculated using three sum of squares formulas Calculating R2 - FIRST COMPONENT TSS Yi Y 2 sum of squares (TSS) - as we’ve use in other equations, this is the sum value on the dependent variable for each observation minus the mean value of the dependent variable. . Total e.g., This is the total variation in the dependent variable, 𝜎 2 Calculating R2 - SECOND COMPONENT ESS Yˆi Y 2 Explained sum of squares (ESS) – the sum of the predicted values of the dependent variable for each observation minus the mean value of the dependent variable, squared. If our regression does not explain any variation in the dependent variable, ESS = 0. Our best prediction is the mean value of Y and if our model has any explanatory power, ESS > 0 and the model adds something beyond the mean to our understanding of the outcome (Y). This is also called ‘regression sum of squares (RSS)’ (confusing, right??) Calculating R2 - THIRD COMPONENT RSS e i Yi - Ŷi 2 2 Residual sum of squares (RSS) – which we covered a few slides ago, e.g. each observation’s value on the dependent variable minus the predicted value. This is the variation our model cannot explain and is therefore labeled as the error term (or residual). This is also called the error sum of squares (ESS) (huh, wtf??) EXPLAINED VARIANCE • As noted, R2 is defined as: R2 1 TSS ESS RSS 1 TSS TSS • The total variation in Y – TSS – can be divided into two parts : The closer ESS is to TSS, or the lower RSS is relative to TSS – the higher the R2 value • Therefore, R2 is commonly interpreted as the part of the variation in Y explained by X • Note! R2 will be lower if the relationship between our variables have a nonlinear relationship! . reg happiness health Source SS MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 RSS e i Yi - Ŷi 2 df Std. Err. .2009521 .4429099 t 3.87 0.55 Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.590 = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 [95% Conf. Interval] .3558981 -.6873656 1.200267 1.173673 2 TSS Yi Y 2 ESS=explained (model) sum of squares RSS=residual sum of squared TSS=total sum of squares ESS Yˆi Y 2 . reg Amount of explained variance in happiness explained by health happiness health Source SS df MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 Std. Err. .2009521 .4429099 t 3.87 0.55 Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.590 [95% .3558981 -.6873656 R2 = 1-RSS/TSS = 1- (0.805/1.476) = 0.45 TSS ESS RSS R 1 1 TSS TSS 2 Conf. = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 Interval] 1.200267 1.173673 A visual of how R² works: R² = 1 - (rss/tss) our DV Total sum of squares (TSS) IV Residual sum of squares (RSS) Our IV Explained variation (ESS) B values vs. R2 values – important distinction! R2 = 0.10 b = 4.33 R2 doesn’t say ANYTHING 2 about the effectRsize b ≈ 4.33 R2 ≈ 0.90 2. Testing model significance: F-test • If our null is of the form, Ho : β1 = β2 = . . . = βk = 0, then we can write the test statistic in the following way: (𝑅𝑆𝑆1 − 𝑅𝑆𝑆2 )/(𝑃2 − 𝑃1 ) 𝑓0 = 𝑅𝑆𝑆2 /(n − (𝑃2 )) • This compares whether any/all betas we put in a model explained variation significantly better than an empty model with just a constant • It is basically the explained variance over the residual variance • Degrees of freedom - n is the number of observations, 𝑃2 is the number of independent variables total in our ‘restricted’ model, while 𝑃1 is just the more ’unrestricted’ model, e.g. in this case just a constant. Where 𝑃2 >𝑃1 • This can also be used to test ’nested models’ (more later…) 2. Testing model significance: F-test • H0: β1= β2 =…. βk = 0 • Ha: At least one β is different from 0 • If p<0,05, we reject the null hypothesis in favor of Ha • Note! A significant F value does not necessarily mean we have a good model. However, if we cannot reject H0, our model is indeed bad! . reg happiness health Source SS df MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 Std. Err. .2009521 .4429099 t 3.87 0.55 Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.590 = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 [95% Conf. Interval] .3558981 -.6873656 1.200267 1.173673 Mean Squared Error (MSE) • MSE tells us how ‘off’ the models is on average, in the unit of the DV. Some units less ‘intuative’ than others, when less so, compare MSE with the st. Dev. Of the DV • Also useful to compare different models with same DV • The MSE here tells us that our predictions on average are ‘off’ by 0.21 . reg happiness health Source SS df MS Model Residual .670585718 .805119395 1 18 .670585718 .044728855 Total 1.47570511 19 .07766869 happiness Coef. health _cons .7780828 .2431537 Std. Err. .2009521 .4429099 t 3.87 0.55 Number of obs F( 1, 18) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.001 0.590 = = = = = = 20 14.99 0.0011 0.4544 0.4241 .21149 [95% Conf. Interval] .3558981 -.6873656 1.200267 1.173673 ADDING ADDITIONAL VARIABLES: MULTIPLE REGRESSION Last week ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) • Regression introduction • Basics of OLS – • calcualtions of beta, alpha, error term, etc. • bivariate analysis • Basic model diagnositics: R2, F-tests, MSE • Today: • multivariate regression • Assumptions of OLS, dection of violations and what to do about it.. Back to Sir Galton…. Multiple regression • So far, we’ve just kept it simple with bivariate regression models: 𝑌 = 𝛽0 +𝛽1 X + 𝑒 With multiple regression, we’re of course adding more variables (’parameters’), to our model. In stats terms, we’re estimating a more ’constrained’ or ’restricted’ model: 𝑌 = 𝛽0 + 𝛽1 X + 𝜷𝟐 𝐙 + 𝑒 We’re thus able to account for a greater number of explanations as to why Y varies. Additional variables can be included for a number of reasons: controls, additional theoretical, interactions (later) How now to interpret our coefficients? 𝛽𝑛 = the change in Y for a one unit change in xn, holding all other variables constant, (or ‘all things being equal’, or ‘ceteris paribus’). In other words, the average marginal effect across all values of additional X’s in the model 𝛼 (intercept) – is the estimated value of Y when all other X’s are held at ‘0’. This may or may not be relaistic G: The variation in the dependent variable NOT explained by the independent variables – this is the variation that could be explained by additional independent variables (RSS) y C: Shared covariance between all three variables D: Covariance between the two independent variables not including the dependent variable Circle x1: The total variation in the first independent variable G A E C B D F Circle Y: The total variation in the dependent variable (TSS) A: The unique covariance between the independent variable x1 and y B: The unique covariance between the indepdent variable x2 and y x2 x1 Variation in x1 (E) and x2 (F) respectively that is not associated with Circle x2: The total variation in the second independent variable B coefficients in multiple regression, cont. Regression for y (dependent) och x2(independent) 𝑦 = 𝑐1 + 𝑐2 𝑥2 + 𝑤 y Area C and B are predicted by the equation: G 𝑦 = 𝑐1 + 𝑐2 𝑥2 A Area A and G are shown in w (error), which equals: E 𝑤 = (𝑦 − 𝑦) C D x1 Areas A and G are secured in y through w. Now, we can calculate the unique effect of x1 on y under control for x2. B F x2 Calculation of the b coefficients in multiple regression y= B1 + Bx2 +Bx3 +e "B1" = intercept Starting simple: dummy variables in regression • If an independent variable is a nominal, we can still use them by creating dummy variables (if >2 categories) • A dummy variable is a dichotomous variable coded 0 and 1 (based on an original nominal or ordinal variable) • The number of dummy variables needed depends on the number of categories on the original variable • Number of categories on the original variable minus 1 = number of dummy variables. • Ex. Party affiliation: Alliansen, R-G, SD – we would include a dummy for 2 groups and these betas are compared with the third (omitted) group • We can also do this for ordinal IV’s, like low, middle and high f/e. • In any regression, the intercept will equal the mean on the dependent variable when X’s =0, thus for a dummy variable this =Y for the reference category (RC). • The coefficients shows each category’s difference from the mean relative to the RC • If we add other independent variables in our model, the interpretations of the intercept is when ALL independent variables are 0. • Still, the interpretation of the coefficients for the dummy variables should be in relation to the reference category but under control for the additional variable we entered into the model. Example: support for EU integration, EES data (on GUL) • Let’s say we’re interested in explaining why support for further EU integration varies at the individual level in Sweden. • DV: Some say European unification should be pushed further. Others say it already has gone too far. What is your opinion? Please indicate your views using a scale from 0 to 10, where '0‘ means unification "has already gone too far" and '10' means it "should be pushed further". • 3 IV’s: gender (0=men, 1=female), education (1=some post-secondary+, 0 if otherwise) and European identity (attachment, 0-3, greater, 0=very unattached, 3=very attachment) 𝐸𝑈 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 = 𝛽0 + 𝛽1 𝒇𝒆𝒎𝒂𝒍𝒆 + 𝛽2 𝒆𝒅𝒖𝒄𝒂𝒕𝒊𝒐𝒏 + 𝛽3 (𝑬𝒖𝒓𝒐 𝒊𝒅𝒆𝒏𝒕𝒊𝒕𝒚) + e 300 0 100 200 Summary stats 0 2 4 6 Supp_EU_int 8 10 sum Supp_EU_int female some_college EU_attach • DV ranges from 0-10 • 2 binary IV’s • 1 ordinal IV Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------Supp_EU_int | 1,112 4.644784 2.561694 0 10 female | 1,144 .4318182 .4955461 0 1 some_college | 1,144 .6975524 .4595189 0 1 EU_attach | 1,131 2.228117 .7741086 0 3 reg Supp_EU_int female some_college EU_attach -----------------------------------------------------------------------------Supp_EU_int | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------female | -.4034422 .1486337 -2.71 0.007 -.6950803 -.1118041 some_college | .3745538 .1601731 2.34 0.020 .0602739 .6888338 EU_attach | 1.05227 .0947614 11.10 0.000 .8663361 1.238204 _cons | 2.209169 .2469881 8.94 0.000 1.724547 2.693791 ------------------------------------------------------------------------------ 1. intercept: ? the predicted level of the DV when all variables = 0 (men, w/out college, who are strongly detached from Europe) 2. female: the effect of gender is significant. Holding constant education and European identity, females support further EU integration by -0.4 on average. 3. Education: the effect is also signfificant. Having some post-secondary education increases support for EU integration by 0.37 holding gender and and European identity constant 4. European attachment: is signficant: Holding constant education and gender, a one unit increase in attachment results in an increase in suppport for the DV by 1.05 on average. A visual with gender and identity 4 Effect of Euro attach. 3 Effect of gender 2 Linear Prediction 5 6 Predicted levels of Support 0 1 2 European Attachment males females 3 Some predictions from our model • 𝐸𝑈 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 = 2.21 − 0.40 𝒇𝒆𝒎𝒂𝒍𝒆 + 0.37 𝒆𝒅𝒖𝒄𝒂𝒕𝒊𝒐𝒏 + 1.05 𝑬𝒖𝒓𝒐 𝒊𝒅𝒆𝒏𝒕𝒊𝒕𝒚 + e • What is the predicted level of support for further EU integration for a: 1. male with some university and a strong European identity (3) ? = 2.21 -0.40(0) + 0.37(1) + 1.05(3) = 5.73 2. Female with no university and a very weak European attachment (0) ? = 2.21 -0.40(1) + 0.37(0) + 1.05(0) = 1.81 Comparing marginal effects • Significance values - not always interesting ...most everything tends to become significant with many observations, like in large survey data… • Another great feature of OLS is that we can compare both marginal and total effects of all B’s • when you are about to publish your results you often want to say which variables have the greatest impact in this model? • Here we can show both the marginal effects (showed in the regression output). These effects/b-values only show the change in Y caused by on unit change in X, AND, the total effects (min to max effect, or the effects within a certain range) one has to consider the scale. • Question: what is the marginal and total effect of our 3 variables? Answer.. • For binary variables, marginal and total are the same • For ordinal/continuous variables, we can do a few things to check this: 1. ’normalize’ (re-scale) the variable to 0/1 (see do file for this) 2. Compare standardized coefficients (just add command ’beta’) 3. Alternative – use ’margins’ command (more later..) -----------------------------------------------------------------------------------Supp_EU_int | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------+---------------------------------------------------------------female | -.4034422 .1486337 -2.71 0.007 -.6950803 -.1118041 some_college | .3745538 .1601731 2.34 0.020 .0602739 .6888337 normal_EU_attach0_1 | 3.15681 .2842841 11.10 0.000 2.599008 3.714611 _cons | 2.209169 .2469881 8.94 0.000 1.724547 2.693791 ------------------------------------------------------------------------------------- ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) For our model…. variable Marginal Effect Total (max-min) effect female -0.4 -0.4 education 0.37 0.37 Euro attach 1.05 3.15 Direct comparison: Standardized coefficients • Standardized coefficients can be used to make direct comparison of effects of IV’s • When standardized coefficients are used (beta values), the scale unit of all variables are deviations from the mean – number of standard deviations • Thus, we gain comparison but loose the intuitive feeling of our interpretation of the results, but we can always report both ‘regular’ betas and standardized. reg Supp_EU_int female some_college EU_attach, beta ----------------------------------------------------------------------------Supp_EU_int | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------female | -.4034422 .1486337 -2.71 0.007 -.0779299 some_college | .3745538 .1601731 2.34 0.020 .0672729 EU_attach | 1.05227 .0947614 11.10 0.000 .3167175 _cons | 2.209169 .2469881 8.94 0.000 . ------------------------------------------------------------------------------ STANDARDIZED COEFFICIENTS (BETAS) • The standardization of b: • standardized scores are also known as z-scores, so often they are labeled with a ‘z’ In STATA: • gen zy=(y - r(mean))/r(sd) • gen zx=(x - r(mean))/r(sd) • Beta=b*zy/zx min/max change interquartile change (25th to 75th %ile) Nat. attach. Income Population Econ. Left-Right Female Education Age Gal-Tan EU integration Vote EU Skep. Corruption Trust EU Immigration Econ. sat. Europe attach. 2.2 2.4 2.6 2.8 Another way of reporting comparative effects… (Bauhr and Charron 2017) ORDINARY LEAST SQUARES REGRESSION DAY 2 NICHOLAS CHARRON ASSOCIATE PROF. DEPT. OF POLITICAL SCIENCE OLS is ’BLUE’ • What is this? • It is the Best Linear Unbiased Estimator • Aka ’Gauss–Markov theorem’ • “states that in a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator (BLUE) of the coefficients is given by the ordinary least squares (OLS) estimator. Here "best" means giving the lowest variance of the estimate, as compared to other unbiased, linear estimators. Assumptions of OLS OLS is fantastic if our data meets several assumptions, and before we make any inferences, we should always check: In order to make inference: 1. Correct model specification - the linear model is suitable 2. No severe multicollinearity 3. The conditional standard deviation is the same for all levels of X (homoscedasticity) 4. Error terms are normally distributed for all levels of X 5. The sample is selected randomly 6. There are no severe outliers 7. There is no autocorrelation 89 1) Model specification a) - causality in the relationships - not so much a problem for the statistical model but rather a theoretical problem. Better data and modelling - use panel data, experiments or theory! b) is the relationships betweenDV and IV LINEAR? if not - OLS regression will give biased results c) all theoretically relevant variables should be included. • - if they are not this will lead to "omitted variable bias", - if an important variable is being left out in a model - this will influence the coefficients of the other variables in the model. remedy? Theory, previous literature. Motivate all variables. Some statistical tests/checks Linear model is suitable • When 1 or more IV’s has a non-linear effect on the DV, thus a relationship exists, but cannot be properly detected in standard OLS • This one is probably one of the easiest to detect 1. Bivariate Scatterplot: If the scatter plot doesn’t show an approximately linear pattern, the fitted line may be almost useless. 2. Ramsey RESET test (F-test) 3. theory • If X and Y do not fit a linear pattern, there are several measures you can take 1. Run regression in STATA 2.4 • Scatter looks ok, but let’s check more formally with the Ramsey RESET test: 3 steps: 2.6 Checking for this: health and happieness (in GUL) 3. Run command ovtest 2.2 2. Run command linktest A significant squared residual or F-stat implies that the model is incorrectly specified If sig., make adjustment and re-run regression & test 1.8 1.6 Ovtest, Ho: model is specified correctly 2 The linktest estimates your DV with the residual and squared residual of your model as IVs. 1.8 2 2.2 health 2.4 2.6 • The 3 steps • What do you see? Example with health and happiness data reg happiness health ----------------------------------------------------------------------------happiness | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------health | .7799197 .2008376 3.88 0.001 .3579756 1.201864 _cons | .2380261 .4428564 0.54 0.598 -.6923806 1.168433 ------------------------------------------------------------------------------ linktest -----------------------------------------------------------------------------happiness | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------_hat | 3.503184 5.25501 0.67 0.514 -7.583918 14.59029 _hatsq | -.6307252 1.322438 -0.48 0.639 -3.420826 2.159376 _cons | -2.461618 5.186894 -0.47 0.641 -13.40501 8.481771 ovtest Ramsey RESET test using powers of the fitted values of happiness Ho: model has no omitted variables F(3, 15) = 0.10 Prob > F = 0.9606 0 2 4 6 Freedom House/Polity Control of Corruption - Estimate 80 60 40 8 10 20 -2 -1 0 1 Corruption Perceptions Index 2 100 Non-linearity can be detected 0 Fitted values 500000 1000000 Total population (in thousands) 1500000 Issues with non-linearity • Problems with curve-linear relationships - we will under- or overestimate the effect in the dependent variable for different values of the independent variable. • However, this is a ’sexy’ problem to have at times.. • OLS can be used for relationships that are not strictly linear in y and x by using non-linear functions of y and x 3 standard approaches depending on the data: 1. natural log of x, y or both (e.g. loggorithm) 2. quadratic forms of x or y 3. interactions of x variables • Or adding more data/observations… – the natural logarithm will downplay extreme values and make it more normally distributed. Variable transformation: natural loggorithm • Log models are invariant to the scale of the variables since they are now measuring percent changes. • Sometimes done to constrain extreme outliers, and downplay their effect in the model, make distribution more ‘compact’. • Standard variables in social science that researchers tend to log: 1. Positive variables representing wealth (personal income, country GDP, etc.). 2. Other variables that take large values – population, geographic area size, etc • Important to note- the rank order does not change from the original scale! Transforming your variables • Using the natural logarithm (e.g. the inverse of the exponential function). Only for x>0. • Ex. corruptio nexplained by country size (population) • Population and corruption 20 20 40 40 60 60 80 80 100 100 logged population and corruption 0 500000 1000000 Total population (in thousands) Corruption Perceptions Index In Stata: reg DV IV 1500000 4 6 8 10 Corruption Perceptions Index Fitted values gen logIV= log(IV) 12 logpop reg DV logIV Fitted values 14 Interpretation of transformations with logs 1. Logged DV and non-logged IV: ln(y) = β0 + β1x + u – „β1 is approximately the percentage change in y given an absolute change in x. a 1 step increase in the IV gives the coefficient*100 percent increase in the DV. (%Δy=100⋅β1) 2. Logged IV and non-logged DV: y = β0 + β1ln(x) + u β1 is approximately the absolute change in y for a percentage change in x. 1 percent increase in the IV gives the coefficient/100 increase in the DV in absolute terms. (Δy=(β1/100)%Δx) 3. Logged DV and IV: ln(y) = β0 + β1ln(x) + u – „„β1 is the elasticity of y with respect to x (%Δy=β1%Δx) – β1 is thus the percentage change in y for a percentage change in x NOTE: The interpretation is only applicable for log base e (natural log) transformations. Rules for interpreation of Beta with logged transformed variables 0 • Explained later by an interaction with economic development 2 4 6 • Ex. Democracy versus corruption 8 10 Quadratic forms (e.g. squared) 0 2 4 6 Freedom House/Polity Corruption Perceptions Index 8 10 Fitted values Charron, N., & Lapuente, V. (2010). Does democracy produce quality of government?. European Journal of Political Research, 49(4), 443-470. 0 2 4 6 8 10 Quadratic forms –capture diminishing or increasing returns 0 2 4 6 Freedom House/Polity Corruption Perceptions Index Fitted values 8 Fitted values How to model this? Quite simple, add a squared term of the non-linear IV 10 Quadratic forms: interpretation • Analyses including quadratic terms can be viewed as a special case of interactions (more on Friday on this topic) • Include both original variable and the squared term in your model: y = β0 + β1x + β2x2 + u • For ‘u’ shaped curves, B1 should be negative, while B2 should be positive • Including the squared term means that β1 can’t be interpreted alone as measuring the change in y for a unit change in x, we need to take into account β2 as well since: 𝑆𝑙𝑜𝑝𝑒 = ∆𝑦 ≈ 𝛽1 + 2𝛽2 𝑥 ∆𝑥 In stata ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) • 2 approaches: 1. Generate a new squared variable: gen democracy2 = democracy*democracy 2. Tell STATA in the regression with the ’#’ sign: For continuous or ordinal variables we need to add the ’c.’ prior to the variable: Ex. reg corruption c.democracy c.democracy#c.democracy Comparing the results, we see.. reg wbgi_cce fh_polity2 Source | SS df MS -------------+---------------------------------Model | 53.3873279 1 53.3873279 Residual | 110.238236 161 .684709542 -------------+---------------------------------Total | 163.625564 162 1.01003435 Number of obs F(1, 161) Prob > F R-squared Adj R-squared Root MSE = = = = = = 163 77.97 0.0000 0.3263 0.3221 .82747 -----------------------------------------------------------------------------wbgi_cce | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------fh_polity2 | .1852964 .0209846 8.83 0.000 .1438558 .226737 _cons | -1.319789 .1476037 -8.94 0.000 -1.611278 -1.0283 -----------------------------------------------------------------------------reg wbgi_cce c.fh_polity2 c.fh_polity2#c.fh_polity2 Source | SS df MS -------------+---------------------------------Model | 84.1888456 2 42.0944228 Residual | 79.4367185 160 .496479491 -------------+---------------------------------Total | 163.625564 162 1.01003435 Number of obs F(2, 160) Prob > F R-squared Adj R-squared Root MSE = = = = = = 163 84.79 0.0000 0.5145 0.5085 .70461 ------------------------------------------------------------------------------------------wbgi_cce | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------------+---------------------------------------------------------------fh_polity2 | -.4563782 .0834032 -5.47 0.000 -.6210913 -.291665 | c.fh_polity2#c.fh_polity2 | .0565682 .0071819 7.88 0.000 .0423847 .0707516 | _cons | -.0634611 .2030729 -0.31 0.755 -.46451 .3375879 ------------------------------------------------------------------------------------------- Quadratic forms – getting concrete model predictions using the margins command reg wbgi_cce c.fh_polity2 c.fh_polity2#c.fh_polity2 ------------------------------------------------------------------------------------------wbgi_cce | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------------+---------------------------------------------------------------fh_polity2 | -.4563782 .0834032 -5.47 0.000 -.6210913 -.291665 .0565682 .0071819 7.88 0.000 .0423847 .0707516 -.0634611 .2030729 -0.31 0.755 -.46451 .3375879 | c.fh_polity2#c.fh_polity2 | | _cons | ------------------------------------------------------------------------------------------- margins, at (fh_polity2 =(0 (1)10)) marginsplot ∆𝑦 1.5 Adjusted Predictions with 95% CIs -2 .5 0 -1 -.5 -1 0 Linear Prediction 1 1 2 • 𝑆𝑙𝑜𝑝𝑒 = ∆𝑥 ≈ 𝛽1 + 2𝛽2 𝑥 0 2 4 6 Freedom House/Polity 8 10 0 1 2 3 4 5 6 Freedom House/Polity 7 8 9 10 Otehr things to watch for for assumption 1 1. The sample is a simple random representative sample (SRS) from the population. 2. Model has correct values 3. Data is valid and accurately measures concepts 4. No omitted variables (exogeneity) No omitted IV’s - exogeneity • Error term has zero population mean (E(εi)=0). • Error term is not correlated with X’s, ‘exogeneity’, E(εi|X1i,X2i,…, XNi,)=0, • This assumption is also called ’exogeneity’. It basically means that X’s are not correlated with the error term in any systematic way. • This is a result of ommitted variable bias • Can be checked via checking the correlations and scatterplots with the residual and the IV’s – if a correlation/pattern exists, this can lead to bias (more later on this) 2. No severe multicollinearity • What is multicollinearity? • ’perfect’ multicollinearity is when two variables X1 and X2 are correlated at 1 (or -1), but is also a problem when X1 and X2 are highly correlated, say above/below 0.6 or -0.6. • Example: if we esitmate one’s shoe size with height, and include measures of height in cm and inches Cont. ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) • Since an inch = 2.54 cm, we know that if someone is 63 inches then they are 160cm for example • What happens? 𝑠ℎ𝑜𝑒 𝑠𝑖𝑧𝑒𝑖 = 𝛽0 + 𝛽1 ℎ𝑒𝑖𝑔ℎ𝑡_𝑖𝑛𝑐ℎ𝑒𝑠 + 𝛽2 ℎ𝑒𝑖𝑔ℎ𝑡_𝑐𝑚 + 𝑒𝑖 • What is the effect of 𝛽1 on Y? • The effect of inches on shoe size when holding constant cm (𝛽2 ) – but inches don’t vary when holding cm constant! So 𝛽’s will be=0, undefined.. multicollinearity ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) • Other examples: • Nominal/categorical variables: employment (1. private sector, 2. public sector, 3. not working), must exclude one category as a ’reference’ • But these examples are mainly error by us.. • What heppens if X1 and X2 are just highly correlated? • OLS BLUE is not violated, estiamtes still unbiased, but they become less EFFICIENT (higher standard errors) ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) RSS Effect of X1 & X2 DV Effect of X1 only Effect of X2 only X2 X1 Detecting multicollinearity 1. You run a model where none of the X’s are sig., but the overall F-test is significant 2. Look at a Pearson’s correlation table – if any variables are correlated at above (rule of thumb) 0.6 or -0.6, then this could be an issue 3. A post regression, VIF (variance inflation factor) test. This tests if any/all X’s in the model are in linear combination with any other X 1 𝑉𝐼𝐹𝑗 = 1 − 𝑅²𝑗 If there is no correlation between Xj and any other X’s, then Rj=0, and thus VIFj=1, which is the lowest value. You will get a VIF for each X and the model on whole. Any value above 10 (ruole of thumb) is considered a problem. In STATA, post regression: estat vif What to do about multicollinearity?? 1. If X1 and X2 are highly correlated, drop one (the least important) – -this points to a possible trade-off between BIAS and EFFICIENCY 2. Increase N, multicollienarity has a larger impact on smaller sample sizes 3. Combine the variables into an index. This can be done via principle component or factor analysis for example, 4. Do nothing and just be clear about the problem Short exercise • Open dataset on GUL: practicedata.dta • Explain share of women in parliament (DV) as a function of corruption, population, and spending on primary education • Check scatterplots, correlations, and do a multivariate regression • Interpret all coefficients, check model statistics • Test/examine whether the linear relationship is appropriate for all IV’s • Make the proper transformation if necessary • Run regression with transformed variable. Compare results in terms of Beta, p-values and R2 with the non-transformed regression output – what do you see? • Check for multicollinearity • Based on correlation tables & Run a VIF test – what do you see? Assumptions of OLS OLS is fantastic if our data meets several assumptions, and before we make any inferences, we should always check: In order to make inference: 1. Correct model specification - the linear model is suitable 2. No severe multicollinearity 3. Error terms are normally distributed for all levels of X 4. The conditional standard deviation is the same for all levels of X (homoscedasticity) 5. There are no severe outliers 6. There is no autocorrelation 7. The sample is selected randomly/ is representative 115 3. No extreme outliers • Outliers? • Outliers if undetected, can have a severe impact on your bet estimates. You must check for these, especially where Y’s or X’s are continuous. Three ways we should think about outlying observations: 1. Leverage outlier – an observation far from the mean of Y or X (for ex., 2 or 3+ st. deviations from mean) 2. Reisdual outlier – an observation that ’goes against our prediction’ (e.g. has a lot of error) 3. Influence: if we take this observation out, do the results change signfincantly? A leverage outlier is not necessarily a problem (if it is in line with our predictions). However, a leverage outlier makes things very misleading if it is also a big residual outlier, meaning it will be an influence observation. use http://www.ats.ucla.edu/stat/stata/dae/crime, clear • Run a regression explaining crime in a state (# of violent crimes/100,000 people): 3 IV’s • %metro area, • poverty rate % • %of single parent households • Interpretation? regress crime pctmetro poverty single Source | SS df MS -------------+---------------------------------- Number of obs = 51 F(3, 47) = 82.16 Model | 8170480.21 3 2723493.4 Prob > F = 0.0000 Residual | 1557994.53 47 33148.8199 R-squared = 0.8399 Adj R-squared = 0.8296 Root MSE = 182.07 -------------+---------------------------------Total | 9728474.75 50 194569.495 -----------------------------------------------------------------------------crime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------pctmetro | 7.828935 1.254699 6.24 0.000 5.304806 10.35306 poverty | 17.68024 6.94093 2.55 0.014 3.716893 31.6436 single | 132.4081 15.50322 8.54 0.000 101.2196 163.5965 _cons | -1666.436 147.852 -11.27 0.000 -1963.876 -1368.996 • Any near top right corner can especially -bias results! .2 • x- axis_ residual ak ms wv la vt mt nj ky wy mectsd ar ca nd md id nv ok ma tn hi nm co pa ny ne de nhtx ga va ut sc mi az kswi al il wa mn mo oh or in nc 0 • Y-axis = leverage Leverage • We do this after a regression in STATA dc .4 • A simple leverage residual plot can give us a clear visual .6 Detection of influence of obs: lvr2plot 0 fl ia ri .05 .1 Normalized residual squared .15 .2 outliers via ‘studentized’ residuals • We can check with normal residuals, but they are dependent on their scale, which make it hard to compare different models. • As our model is an estimate of the ‘true’ relationship, so are the errors • The issue is that although the variance of the error term is equal (assumption, homoskedastic), the estimates are often not equal for all levels of X. The variance might decrease as X increases f/e. • Studentized residuals are adjusted. They are re-calculated residuals whereby the regression line is re-calculated by taking out each observation one at a time. • We then compare the first estimates (all obs) with the estimates removing each obs, for each obs. For obs where the line moves a lot, the obs has a larger studentized residual.. Normal (raw), vs. studentized residuals • Normal 10 Frequency 4 0 5 2 Frequency 6 15 8 Studentized -400 -200 0 200 400 Residuals 0 -600 -4 Z and Stud can be related To the Z-score where 95% of The resid. fall within ± 2 std.dev -2 0 Studentized residuals 2 4 Looking at obs on extremes of distribution hilo r state, show(5) • Command ’hilo’ • Specify with ’show(#) how many you want to see (default =10) 5 lowest and highest observations on r +-------------------+ | r state | |-------------------| | -3.570789 ms | | -1.838577 la | | -1.685598 ri | | -1.303919 wa | | oh | -1.14833 +-------------------+ • Any obs -2 or +2 (esp. -3 or +3) should be further looked at +------------------+ | r state | |------------------| | 1.151702 il | | 1.293477 id | | 1.589644 ia | | 2.619523 fl | | 3.765847 dc | +------------------+ Influence of each observation: Cook’s D • In STATA, after a regression: predict d, cooksd If Cook’s d = 0 for an obs, than the obs has no influence, the higher the d value, the greater the influence. It is calcualted via an F-test, testing whether Xi=Xi(minus obs i) The ’rule of thumb’ for observations with possibly troublesome influence is d > 4/n To avoid adding observations with missing data, specify: if d>4/51 Compare outlier’s stats on variables with sample list state d crime pctmetro poverty single if d>4/51 +--------------------------------------------------------+ | state d crime pctmetro poverty single | |--------------------------------------------------------| 9. | fl .173629 1206 93 17.8 10.6 | 18. | la .1592638 1062 75 26.4 14.9 | 25. | ms .602106 434 30.7 24.7 14.7 | 51. | dc 3.203429 2922 100 26.4 22.1 | +--------------------------------------------------------+ . sum crime pctmetro poverty single Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------crime | 51 612.8431 441.1003 82 2922 pctmetro | 51 67.3902 21.95713 24 100 poverty | 51 14.25882 4.584242 8 26.4 single | 51 11.32549 2.121494 8.4 22.1 Measuring influence for each IV: DFBETA • dfbeta is a stastistic of influence of obs for each IV in the model • It tells us how many standard errors the coefficient WOULD CHANGE if we removed the obs • A new variable is generate for each IV • Ex. DC increases Beta of %single parent by 3.13*se (or 3.13*15.5) compared to reg without DC • Dependent on scale of Y and X! • Caution for any dfbeta number above: list _dfbeta_1 state if _dfbeta_1>.28 +------------------+ | _dfbet~1 state | |------------------| 9. | .64175 fl | 25. | 1.006877 ms | +------------------+ . list _dfbeta_2 state if _dfbeta_2>.28 +------------------+ | _dfbet~2 state | |------------------| 9. | .5959252 fl | +------------------+ . list _dfbeta_3 state if _dfbeta_3>.28 +------------------+ | _dfbet~3 state | |------------------| 2 𝑛 =2 51 = 0.28 51. | 3.139084 dc | +------------------+ What to do about outliers? Again, depends on what type of ’outlier’ an observation is! no ”right” answer here, just be aware of if they exsist and how much effect they have on the estimates, BUT: 1. Check for data error! 2. Create an obs. dummy for the outliers ’gen outlier = 1 if ccode== x ’replace outlier=0 if outlier==. 3. *Take out the obs & re-run model & see if any differences, run ’lfit’ and compare R² stats.. Report any differences… 3. New functional form (log, normalize variables) 4. Do nothing, leave them in and footnote 5. Use weighted observations 2 0 5 4 10 6 15 8 20 10 25 ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) 0 5 10 15 x y 20 0 5 10 15 x Fitted values y Fitted values 20 Robust regression (rreg) • Robust regression can be used in any situation in which you would use OLS • Can also be helpful in dealing with outliers • After we decide that have no compelling reason to exclude them from the analysis. • In normal OLS, all observations are weighted equally. The idea of robust regression is to weigh the observations differently based on how “well behaved” they are. Basically, it is a form of weighted and reweighted OLS (WLS) Robust regression (rreg) • Stata's rreg command implements a version of robust regression. • It runs the OLS regression, gets the Cook's D for each observation. Obs. with small residuals gets higher weight (1>), any obs. with Cook's distance greater than 1 (sever influence) are dropped. • Using the Stata defaults, robust regression is about 95% as efficient as OLS (Hamilton, 1991). In short, the most influential points are dropped, and then cases with large absolute residuals are down-weighted. • Looking at our example data on women in parliament… ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) -----------------------------------------------------------------------------reg ipu_l_sw une_eep ti_cpi logpop ipu_l_sw | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------une_eep | 3.869044 1.299639 2.98 0.004 1.29563 6.442458 ti_cpi | .2091336 .0511606 4.09 0.000 .1078304 .3104368 logpop | .7506063 .5774238 1.30 0.196 -.3927504 1.893963 _cons | -2.813357 6.907481 -0.41 0.685 -16.49086 10.86415 -----------------------------------------------------------------------------rreg ipu_l_sw une_eep ti_cpi logpop Robust regression Number of obs F( 3, Prob > F = 123 119) = 8.31 = 0.0000 -----------------------------------------------------------------------------ipu_l_sw | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------une_eep | 4.050256 1.329502 3.05 0.003 1.417709 6.682803 ti_cpi | .2208518 .0523362 4.22 0.000 .1172209 .3244828 logpop | 1.054711 .590692 1.79 0.077 -.1149176 2.224341 _cons | -7.11359 7.066203 -1.01 0.316 -21.10538 6.878198 ------------------------------------------------------------------------------ ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) Short exercise • Open ’practicedata’ dataset again, and we’ll do the same regression as in example 1 • Again, examine a scatterplots between the DV and each IV. Run regression • Search for outliers: 1. Visual residual leverage plot: lvr2plot, mlabel( cname ) 2. Cook’s d 3. Dfbeta (you can look at all 3 , or dfbeta for each IV one at a time if easier): e.g. list cname _dfbeta_1 if _dfbeta_1 > 2 (**dont forget to calculate the 𝟐 𝑛 𝒏 What do you see? Do any observations break our 4/n Cook’s, or 2 𝑛 dbeta rule? Which countries are they? What would you do about this? Do your adjustments change your regression results? ) ASSUMPTIONS THAT ARE ERROR-TERM VIOLATIONS: -NORMALITY -HOMOSKADASTICITY -AUTOCORRELATION -INDEPENDENCE OF OBSERVATIONS 4. mean of error=0, are normally distributed for all levels of X Key issues: 1. There is a probability distribution of Y for each level of X. A ‘hard’ assumption is that this distribution is normal (bell shaped) 2. Given that µy is the mean value of Y, the standard form of the model is y f (x) where is a random variable with a normal distribution with mean 0 and standard deviation . Normality distribution of error terms • While violations against any of these three former assumptions (1 Model specification – linearity 2, No extreme observations 3, (No strong multicollinearity)) could potentially result in bias in the estimated coefficients. • However, violations against the assumptions concerning the residuals (4) absence of autocorrelation 5) normally distributed residuals and 6) homoskadasticity) may not necessarily not affect the estimated coefficients but it may affect and reduce your ability to perform inference and hypothesis testing. But they can, so it’s always good to check! • This since the residuals distribution is the foundation to perform significance tests for the coefficients - it's the distribution that underlies the calculation of t- and P-values. This is especially true for smaller data samples. • • a prerequisite in small samples is that the residuals are normally distributed. Analysis of Residual • Always important to do – for several assumptions • To examine whether the regression model is appropriate for the data being analyzed, we can check the residual plots. • Later we can do more ‘advanced’ tests to see if we’ve violated some assumptions • Residual plots: 1. histogram of the residuals 2. Scatterplot residuals against the fitted values (y-hat). 3. Scatterplot residuals against the independent variables (x). 4. Scatterplot residuals over time if the data are chronological (more later in time series analysis). Plotting the residuals ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) • Use the academic performance data, and regress academic performance on the %of ESL learniners, %of students with free meals, and average education of parents • use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi2 • regress api00 meals ell emer • Then predict the residuals: • predict r, resid • Plot the desnity of the residuals against a normal bell curve – how close are they matched? • kdensity r, normal • A qnorm plot (plots the quantiles of a variable against the quantiles of a normal distribution) • qnorm r ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) qnorm plot 200 Density plot -100 0 Residuals Kernel density estimate Normal density kernel = epanechnikov, bandwidth = 15.5162 100 200 0 -100 -200 -200 0 .002 Residuals .004 100 .006 .008 Kernel density estimate -200 -100 0 Inverse Normal 100 200 More ’formal’ tests 1. Shapiro-Wilk W test for normality. Tests proximity of our residual distribution compared wit the normal bell curve. Ho: residuals are normally distributed swilk r swilk r Shapiro-Wilk W test for normal data Variable | Obs W V z Prob>z -------------+-----------------------------------------------------r | 381 0.99698 0.795 -0.544 0.70691 5. Homoskadasticity • Homoskedasticity: The error has a constant variance around our regression line • The opposite of this is: • Heteroskedasticity: The variance of the error depends on the values of Xs. What does hetereoskadasticity look like? • Plotting the residuals against X, we should not variance around a fitted line consequences • If you find heteroscedasticity, like multicollinearity, this will effect the EFFICIENCY of the model. • The calculation of standard errors, and thus P values will be uncertain, since differences in residuals dispersions is depending on the level of the variables. • The effect of X on Y might be very significant at some levels of X, and less so at others, which makes a total significance calculation impossible. • Heteroscedasticity does not necessarily result in biased parameter estimates but OLS is no longer BLUE. • Risk for Type I or Type II error will increase (what are these??) • E.g. ‘false positive’ & ‘false negative’ How to check for Heteroskadasticity 1. A visual plot of the residuals over the fitted values of Y: rvfplot, yline(0) Here we do not want to see any pattern – just a random insignificant scattering of dots.. Use the ’academic performance data’, and regress academic performance (api100) on the %of ESL learniners (ell), %of students with free meals (meals), and average education of parents (ave_ed) reg api00 meals ell avg_ed -----------------------------------------------------------------------------api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------meals | -3.006897 .1891693 -15.90 0.000 -3.378856 -2.634937 ell | -.8303102 .1946917 -4.26 0.000 -1.213128 -.4474925 avg_ed | 27.65032 6.867322 4.03 0.000 14.14726 41.15337 _cons | 781.9566 27.12323 28.83 0.000 728.6248 835.2883 ------------------------------------------------------------------------------ rvfplot, yline(0) 200 ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) 100 • What do we observe? -200 -100 0 Residuals • Looks kind of random, but error term seems to narro as fitted values get higher.. 400 500 600 700 Fitted values 800 900 More ’formal’ tests 2. Breusch-Pagan / Cook-Weisberg test -Regresses Sq. Errors on X’s *good at detecting linear hetereoskadascticity, but not for non-linear forms. estat hettest Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of api00 chi2(1) = 12.60 Prob > chi2 = 0.0004 Ho: no heteroskadascticity . estat imtest 3. Cameron & Trivedi's IM test -similar, but includes also sqaured X’s in regression Ho: no heteroskadascticity **both are sensitive and will often be signficant even with only slight hetero… Cameron & Trivedi's decomposition of IM-test --------------------------------------------------Source | chi2 df p ---------------------+----------------------------Heteroskedasticity | 23.55 9 0.0051 Skewness | 6.16 3 0.1040 Kurtosis | 0.39 1 0.5305 ---------------------+----------------------------Total | 30.10 13 0.0046 ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) 200 If we find something, we might check individual IV’s and residual plots, and look at correlations of IV’s and error pwcorr r meals ell emer 100 | r meals ell emer -200 -100 0 -------------+------------------------------------ 0 20 40 60 english language learners 80 100 r | 1.0000 meals | 0.0000 1.0000 ell | -0.0000 0.7724 1.0000 emer | -0.0000 0.5330 0.4722 1.0000 What to do about this? • You don’t always have to do anything, but if severe: 1. Try transforming X’s (non-linear, logged,, un log) to make relationship more linear 2. Remove variables that are suspect or insignifincat and re-run regression. 3. Add more variables 4. ”weighted least squares regression” WLS , where certain observations, (maybe those that deviate most?), are weighted less then others, thus affecting the standard errors. 5. Use strictor alpha in significance cut-off of P-vales – 0.01 instead of 0.05 to reduce risk of Type I error 6) Autocorrelation Same unobservable forces might be influencing the dependent variable in successive time points – *It is defined as the correlation is between two values of the same variable at times Xt and Xt-1. • For example, factors are likely to predict defense spending/voting/economic development, etc. at 1971 are likely to also predict 1972 and therefore whatever error remains from our estimation of Y in 1971 will persist in 1972. • Can lead to BIAS and/or INEFICIENT estimates with OLS 4) Autocorrelation • The problem also occurs when the order of observation by geographical location. serial vs. spatial autocorrelation. • The consequences are quite serious. Positive autocorrelation will tend to increase the variation of the sample distributions for the estimated coefficients - which then can indicate great variation across different models. • • In a simple model the result would, on the contrary, result in an underestimation of standard errors of the actual estimates, so we can risk that a given coefficient is significant while it is not. It same goes for R2, which may also be overestimated. 4) Autocorrelation • Detection: • The Durbin Watson goes between 0 to 4, where 0 indicates high positive autocorrelation and 4 high negative autocorr. while 2 indicates absence of autocorrelation. – In Stata: estat dwatson • Solutions • In order to correct for auto-correlation one has to use time-series regression but in OLS one could consider to lag the dep. Variable (Y1 – Yt-1) and thereby remove all non-independent information in a variable (only works in time series autocorrelation). MORE ON AUTOCORRELATION IN STEFAN’S TIME SERIES MOMENT! 7) Independence of Errors • This assumption states that an error from one observation is independent of the error from another observation. • Actually, it is not the dependency by itself that matters, it if whether they are correlated that matters.. • • dependency in errors often happens in financial and economic time series data and in cross-country multilevel data (e.g. survey data from multiple countries) • • Multi level - affect coefficients & sig. • TSCS - affects mainly sigs. • A Hausman test can be used to estimate this. What needs to be considered depends on your data! 1. Model specification – linearity -Always important 2. No extreme observations -more important in small samples 3. No strong multicollinearity -more important in small samples 4. No autocorrelation -more important in time-series or cross-section data 5. Errors have zero mean with a normal distribution -more important in small samples 6. Errors have constant variance -more important in large samples - in small samples - outliers are more severe 7. Observations shall be independent of each other -more important in time-series or multi-level cross-section data Interaction terms • Back to our discussion about the assumption of ’proper model specification’ • Sometimes our X variable has a non-linear effect due to an interaction with another IV. When testing this, it is called a ‘conditional hypothesis’ • A conditional hypothesis is simply one in which a relationship between two or more variables depends on the value of one or more other variables. – Ex. An increase in X is associated with an increase in Y when condition Z is met, but not when condition Z is absent. – Ex. The effect of education on income is stronger in men than in women • In technical terms, we compare the following two models: Additative multiple regression model Y= α + β1𝑥 + β2𝑧 + µ Multiplicative multiple regression model Y= α + β1𝑥 + β2𝑧 + β3𝑥𝑧 + µ Types of interaction terms ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) • Our X variables can take different shapes depending on their measurment These are the combinations in order of complexity to interpret: 1. Two dummy variables: ex. gender*unemployed 2. One dummy, one continuous/ordinal variable: ex. gender*age 3. Two continuous/ordinal variales: ex. age*income the first inteaction can also be modelled as 3 dummy variabels in relation to one reference category, in this case unemployed males: Y= α + β1(𝑓𝑒𝑚𝑎𝑙𝑒_𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑑) + β2(𝑚𝑎𝑙𝑒_𝑢𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑑) + β3(𝑓𝑒𝑚𝑎𝑙𝑒_𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑑 + µ X=1, high Y X=1, high X=0, low Z X=1, high Y X=0, low Y X=0, low Z Z One dummy, one continuous/ordinal: visual example In this example Z=0 and 1 Taken from Brambor et. al. (2005) Interaction Interpretations • Multiplicative multiple regression model • Y= α + β1𝑥 + β2𝑧 + 𝜷𝟑𝒙𝒛 + µ • when condition Z (a dummy variable) is absent (e.g. =0) the equation above simplifies to: • Y= α + β1𝑥 + µ • Where B1 is the effect of X for observations that take 0 for Z • And when condition Z is present (e.g. =1, or greater) the effect of x on y becomes: • Y= (α + β2) + β1 + β3 𝑥 + µ • Now we see that B1X cannot be interpreted indpendently of B3 4 important points 1. Interaction models should be used whenever the hypothesis they want to test is conditional in nature 2. Include All ”Constitutive Terms”. These are just the two variales that make up the interaction (e.g. x and z). 3. Do Not Interpret Constitutive Terms as Unconditional Marginal Effects 4. Calculate Substantively Meaningful Marginal Effects and Standard Errors Brambor, Thomas, William Roberts Clark, & Matt Golder. 2006. "Understanding Interaction Models: Improving Empirical Analyses." Political Analysis 14: 63-82. Include All Constitutive Terms • No matter what form the interaction term takes, all constitutive terms should be included. Thus, X should be included when the interaction term is X2 and X, Z, J, XZ, XJ, and ZJ should be included when the interaction term is XZJ. • b1 does not represent the average effect of X on Y; it only indicates the effect of X when Z is zero • b2 does not represent the average effect of Z on Y; it only indicates the effect of Z when X is zero • Excluding X or Z is equivalent to assuming that b1 and b2 is zero. Taken from Brambor et. al. (2005) Include All Constitutive Terms The constitutive terms (b2x and b3Z) captures the difference in the intercepts between the regression lines for the case in which condition Z is present and the case in which condition Z is absent - omitting Z amounts to constraining the two regression lines to meet on the Y axis. Taken from Brambor et. al. (2005) Multicollinearity • Just as we discussed with the quadradic term, the coefficients in interaction models no longer indicate the average effect of a variable as they do in an additive model. As a result, they are almost certain to change with the inclusion of an interaction term, and this should not be interpreted as a sign of multicollinearity. • Even if there really is high multicollinearity and this leads to large standard errors on the model parameters, it is important to remember that these standard errors are never in any sense ‘‘too’’ large—they are always the ‘‘correct’’ standard errors. • High multicollinearity simply means that there is not enough information in the data to estimate the model parameters accurately and the standard errors rightfully reflect this. Multicollinearity • ‘solutions’ have been posited: re-scaling the variables, ‘centering’ • Centering the IV:s around their mean does not solve the problem (Aiken and West (1991)) • Regardless of the complexity of the regression equation, centering has no effect at all on the coefficients of the highest-order terms, but may drastically change those of the lowerorder terms in the equation. • Centering unstandardized IVs usually does not affect anything of interest. Simple slopes will be the same in centered as in un-centered equations, their standard errors and ttests will be the same, and interaction plots will look exactly the same, but with different values on the x-axis. 3. Do Not Interpret Constitutive Terms as Unconditional Marginal Effects • When we have an interaction, the effect of the independent variable X on the dependent variable Y depend on some third variable Z (and vice versa). • The coefficient on X only captures the effect of X on Y when Z is zero. Similarly, the coefficient of Z only captures the effect of Z on Y when X is zero. • It is, therefore, incorrect to say that a positive and significant coefficient on X(or Z) indicates that an increase in X (or Z) is expected to lead to an increase in Y. • Also, whether X modified Z or vice versa cannot be determined by the model, only by the researcher and the theory behind it! 4. Calculate Substantively Meaningful Marginal Effects and Standard Errors typical results tables will report only the marginal effect of X when the conditioning variable is zero, i.e., b1. Similarly, Stata tables report only the standard error for this particular effect. As a result, the only inference we can draw is whether X has a sig. effect on Y when Z = 0 Basically, we want to know WHERE and HOW MUCH Z conditions X’s effect on Y, and the significance level. Results tables are often quite uninformative in this respect. Even a ‘significant’ interaction coefficient might not be that interesting, while even an insignificant one can actually be significant at certain levels of Z (or X) This is where the Margins command in Stata is very helpful (help margins) ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) Example: back to explaining % women in parliament • This time let’s try a few differnet vairables: • IV’s – level of democracy (0-10) and the % of protestants in a country (0-100) reg ipu_l_sw c.fh_polity2 c.lp_protmg80 Source | SS df MS -------------+---------------------------------- Number of obs = 152 F(2, 149) = 12.59 Model | 2248.21552 2 1124.10776 Prob > F = 0.0000 Residual | 13306.2439 149 89.3036502 R-squared = 0.1445 Adj R-squared = 0.1331 Root MSE = 9.4501 -------------+---------------------------------Total | 15554.4594 151 103.009665 -----------------------------------------------------------------------------ipu_l_sw | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------fh_polity2 | .6576315 .263696 2.49 0.014 .1365647 1.178698 lp_protmg80 | .1399659 .0419661 3.34 0.001 .0570404 .2228914 _cons | 10.65832 1.767326 6.03 0.000 7.166058 14.15058 ------------------------------------------------------------------------------ ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) Example: back to explaining % women in parliament • This time let’s try a few differnet vairables: • IV’s – level of democracy (0-10) and the % of protestants in a country (0-100)– now w/ interaction • What does this tell us generally speaking? reg ipu_l_sw c.fh_polity2 c.lp_protmg80 c.lp_protmg80#c.fh_polity2 Source | SS df MS -------------+---------------------------------- Number of obs = 152 F(3, 148) = 12.30 Model | 3104.63365 3 1034.87788 Prob > F = 0.0000 Residual | 12449.8258 148 84.1204443 R-squared = 0.1996 Adj R-squared = 0.1834 Root MSE = 9.1717 -------------+---------------------------------Total | 15554.4594 151 103.009665 -------------------------------------------------------------------------------------------ipu_l_sw | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------------------+---------------------------------------------------------------fh_polity2 | .3311227 .2756287 1.20 0.232 -.2135533 .8757987 lp_protmg80 | c.lp_protmg80#c.fh_polity2 | -.397334 .173249 -2.29 0.023 -.7396953 -.0549728 .0607897 .0190519 3.19 0.002 .0231409 .0984386 13.24061 1.896612 6.98 0.000 9.492677 16.98855 | _cons | Using margins for interpreation ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) • We can show this interaction a number of ways: • 1. the margina effect (1 unit increase) of democracy over a range of % protestant margins, dydx(fh_polity2 ) at (lp_protmg80=(0 13 52 97)) • Where ’dydx’ means we want to see a marginal effect (ΔY from a 1 unit increase in X) • The numbers (0 13 52 97) after the % protestant variable are just the min, mean +2 s.d. and max values. I got these from just doing the ‘sum’ command • To see a visual plot, just type marginsplot 0 2 4 6 8 10 Average Marginal Effects of fh_polity2 with 95% CIs 0 13 52 Religion: Protestant 97 Using margins for interpreation ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) • 2. compare predicted levels of % women in parliament for 2 ’meaningful’ vlaues of democracy over a range of % protestant margins, at (lp_protmg80=(0 13 52 97)fh_polity2 =(0 10) ) • Note the ’modifying variable’ (e.g. on the x axis) goes 1st after ’at’. • The numbers (0 13 52 97) after the % protestant variable are just the min, mean +2 s.d. and max values. 0 and 10 for democracy are just the min and max values. I got these from just doing the ‘sum’ command • This is also what you’d do if you had a binary variable in the interaction (e.g. instead of 0 and 10, just type 0 1)… • To see a visual plot, just type marginsplot -60 -40 -20 0 20 40 Adjusted Predictions with 95% CIs 0 13 52 Religion: Protestant fh_polity2=0 97 fh_polity2=10 ORGANISATIONSNAMN (ÄNDRA SIDHUVUD VIA FLIKEN INFOGA-SIDHUVUD/SIDFOT) We will do more in the next section with this command! Next time: models for limited dependent variables: logit and probit