* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Dummies and Interactions
Survey
Document related concepts
Transcript
Dummies (no, this lecture is not about you) POL 242 Renan Levine February 13/15, 2007 Regression Interpretation Review Interpretation of a Regression “Countries" dataset. The dependent variable is military expenditures as a % of GDP. There are two independent variables: % of GDP from industry [INDUSTRY] and Civil Liberties Index [CIVLIB]. -----------------Variable B INDUSTRY .040 CIVLIB -.696 (Constant) 4.86 Variables in the Equation --------SE B Beta T Sig T .022 .135 1.82 .0705 .147 -.351 -4.73 .0000 .967 5.03 .0000 What to look at first The primary question you are asking when you run a regression is "what is the effect of x [independent variable] on y [dependent variable]?" So first look at the value of the unstandardized coefficients (B) The unstandardized coefficient tells you how much the value of the dependent variable is expected to change if the value of the independent variable increases by one unit controlling for all other independent variables. Ex: For INDUSTRY, B= 0.04. You can write this in a sentence like this: All else being equal, for every one percent increase in the % of GDP from industry, the amount of military expenditures as a % of GDP goes up 0.04 points [or 4 hundredths of a percent of GDP]. Are you sure? Can we be sure that this is not the result of sampling error? For this we look at the significance (Sig T). Ex: P = 0.0705. You can write this result in a sentence like this: The relationship between the % of GDP from industry and the % of military expenditures is significant [P < 0.5]. Are you worried about multicollinearity? In this case, both variables are significant, so we do not need to check the tolerance. If two or more variables are insignificant, what should you examine? Which variable has the biggest effect? Now that we have observed the effect of each independent variable on the dependent variable, we can turn to the question of which variable has the biggest effect on the dependent variable. For this, we look at the standardized coefficient (Beta). In this example, the largest beta is for Civil Liberties. You can report this result in three sentences like this: According to the standardized coefficients, the level civil liberties has a bigger effect on a country's percentage of GDP spent on military expenditures than the percentage of GDP from industry. A change in one standard deviation in the level of civil liberties causes a 0.35 decrease in the percentage of GDP spent on military expenditures when controlling for percentage of GDP from industry. In comparison, a change in one standard in the percentage of GDP from industry causes a 0.13 increase in the percentage of GDP spent on military expenditures. Last Finally, we can turn to the final question of how much total variation the model explains. The R-squared of the model provides this information. Remember that R-squared values will be lower if your DV has fewer categories. R-squared tends to be very high when researching country-level data. Multivariate Regression You finished a worksheet, demonstrating you know mechanics of a regression. Now you need to start developing a model of the world. You are trying to explain your DV. You must choose all those IVs that you (or others) think explains the DV Today: how to use nominal IVs. Two tools to make your model a more accurate description of the social world Coefficient Remember that the unstandardized coefficient tell us how much a one-unit increase in the independent variable affects the dependent variable. The dependent variable must be (at least) ordinal. The change from “strongly agree” to “agree” or “disagree” to “strongly disagree” causes… For each additional percentage point increase in literacy… Nominal Variables Makes no sense to use nominal variables this way. The change from New Brunswick to Ontario or P.E.I to Quebec causes !?!?!?! Being a Liberal rather than a Conservative, or being Green rather than a NDP supporter?!?!? But you can say.. The difference between New Brunswick and all other provinces OR Ontario and all other provinces is, OR Atlantic provinces and all others… NDP supporters vs Liberals AND/OR Conservatives vs. Liberals, Bloc vs. Liberals AND/OR Others vs. Liberals. Make Nominal Vars Dichotomous! Quebec vs. R.O.C. Alberta vs. all other provinces Men vs. Women Africa vs. all other countries City residents vs. all others NDP supporters only Catholics only Or even… People over 65 only University educated vs. non-University educated Wealthiest countries vs. all others Rules for Dummies “Dummy” Variables are dichotomous. Can use dummy variables for each category of a nominal variable minus one. Minimum: use just one dummy Example. Original variable has five categories (Liberal, Conservative, NDP, Bloc, & Other) 1= NDP, 0= All others Maximum: use # of categories – 1 = 4 1= NDP, 0= Liberals; * 1= Tory, 0= Liberals; * 1= Bloc, 0= Liberals; * 1= Other, 0= Liberals;* * 0= Liberals when # of categories-1 dummies are created if Liberals are excluded. All options in between are also okay! One remaining category is the reference What is the reference? Reference category is the baseline. When you use maximum number of categories, you are comparing each dummy to the [omitted] reference. Often is the most popular category. Ex. For Canadian partisanship, Liberals are most popular, so if we have four dummies we often omit the Liberals and compare NDP-ers, Tories, Separatists and Others to the Liberals. Your call as to what you want baseline to be. What does a dummy do? Y Dummy’s B Since X is zero or one, it is just like adding another constant! { Constant (Y intercept) X Next Work on the lab! You have a write-up to do. You can use previous results, but you should really think about DV and IVs You may use dummies in your write-up. You may need to use dummies (and/or interactions) in your final paper (when we most care about the accuracy of your model).