Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 22c_SPSS.pdf Michael Hallstone, Ph.D. [email protected] Lecture 22c: Using SPSS for Multiple Regression The purpose of this lecture is to illustrate the how to create SPSS output for multiple regression. You will notice that in the “main” text lecture 22 on multiple regression I do all calculations using SPSS. Thus that main lecture can also serve as an example of interpreting SPSS. There are a series of short YouTube videos I made linked in our course schedule that show you how to do this. In the video some menu commands are hard to see on the screen so I’ve provided screen shots of the menu commands here. I also show you how to interpret the SPSS output in writing below. You will need to read this closely to understand the videos where I am very brief. Making gender a dummy variable: https://www.youtube.com/watch?v=EEZjgAC3u-4 Lecture 22c part 1: https://youtu.be/Zl8L-X_LHtI Lecture 22c part 2 https://youtu.be/Na6Pt08QMT4 Lecture 22c part3 https://youtu.be/HLLbqdqUyD0 Multiple Regression -- variables needed For the test problem you will need one dependent variable and two independent variables. You must use the following unless you receive emailed permission from me to use other variables. It is okay to use others, but we need to talk about it first. Dependent variable (y) – monthly income in exact numbers (must be ratio level not coded into categories) Independent variables (x) x1 = age this should be age in exact years (must be ratio level not coded into categories) • x2 = gender (female=1 male=0) * • * Research shows females earn less income for equivalent qualifications, so we’d expect this effect to be negative. For a tutorial on recoding gender see the video above. 1 OF 9 How to have SPSS create multiple regression output Analyze ….Regression….Linear Again is VERY important that you do not “mix” up your variables in the following screen! Move your dependent variable (y) into the Dependent box and your independent variables (x) into the Independent box and push OK. Recall our dependent variable is income and our independent variables are age and gender (Gender must be coded as a dummy variable. In ours female =1 and male=0). Push OK 2 OF 9 Below is the output 3 OF 9 Use p-value to see if overall model is statistically significant. Find the p-value in the far right box “Sig.” in the ANOVA box. Our p-value is .028 or 2.8%. ANOVAa Sum of Model Squares df Mean Square F 1 Regression 38994960.51 19497480.25 2 4.019 8 9 Residual 155231896.6 32 4850996.770 25 Total 194226857.1 34 43 a. Dependent Variable: Approximate monthly income (US dollars) b. Predictors: (Constant), female=1 male=0, What is your age in exact years_______? Sig. .028b Decision rule for statistical significance using p-value • • If p-value < α or “alpha” it is significant! [Typically we use alpha =.05] If p-value >= α or “alpha” it is NOT significant! [Typically we use alpha =.05] So.... • If p-value < .05 ”it is significant! If p-value >= .05 it is NOT significant! Our p-value is .028 or 2.8%. That’s less than .05 or 5% so we will say we have a statistically significant overall model. This means our independent variables account for a significant amount of variation in our dependent variable. So age and gender account for a significant amount of variation in monthly income. 4 OF 9 Regression equation and unstandardized coeffients Each independent variable has a number that represents a slope. In SPSS they are called unstandardized coefficients and found under B in the Coefficients box. The slopes or unstandardized coefficients are meaningful only if the overall model is statistically significant and their individual p-values are statistically significant. Coefficientsa Unstandardized Coefficients B Std. Error 539.623 1980.405 Standardize d Coefficients Beta Model t 1 (Constant) .272 What is your age in exact 105.948 48.318 .351 2.193 years_______? female=1 1266.354 899.057 .226 1.409 male=0 a. Dependent Variable: Approximate monthly income (US dollars) Sig. .787 .036 .169 Equation for our regression model: Generic equation: y-hat= a + b1x1 + b2x2 where y-hat = dependent variable a = y intercept, b= slope for two independent variables: x1 = age (in years), x2 = gender (female=1 male=0) Plugging the actual slopes from SPSS gives us the real equation: y-hat= 539.623 + 105.948x1 + 1266.354x2 or rounded to y-hat= 539.6 + 105.9x1 + 1266.4x2 5 OF 9 Unstandardized coefficients or Unstandardized slope Now we need to see which of the independent variables have a statistically significant effect. We use the p-value under “Sig.” in the far right column of the Coefficients box. Coefficientsa Standardize Unstandardized d Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 539.623 1980.405 .272 .787 What is your age in exact 105.948 48.318 .351 2.193 .036 years_______? female=1 1266.354 899.057 .226 1.409 .169 male=0 a. Dependent Variable: Approximate monthly income (US dollars) • x1 = age p-value =.036 or 3.6% so it’s less than .05 or 5% and statistically significant • x2 = gender p-value =.169 or 16.9% so it’s greater than .05 or 5% and not statistically significant – this variable has no effect Interpreting the slopes of the independent variables – unstandardized coefficients y-hat= 539.6 + 105.9x1 + 1266.4x2 Interpreting the effect of ratio level independent variables on the dependent variable • x1 = age b1 =105.9 When all other independent variables are held constant, every one-year increase in age increases the monthly income by $105.90. (An increase in age increases the monthly income.) Interpreting the effect “dummy” independent variables on the dependent variable Recall that the variables below were nominal variables coded 1 and 0, where 1= the characteristic of interest. We pretend they are ratio where 1=100% of the characteristic of interest and 0=0% of the characteristic of interest. • x2 = gender (female=1 male=0) 6 OF 9 Dummy variables are interpreted differently. When the dummy variable =1 the slope is the effect it has on the dependent variable. In our model the p-value for gender was larger than .05. This means gender is NOT statistically significant so this variable has no logical interpretation! But for the purposes of the test we would say, “If this variable were significant the unstandardized coefficient or slope would mean ________but actually this coefficient is meaningless.” x2 = gender b2 = 1266.4 “If this variable were significant the unstandardized coefficient or slope would mean, when all other independent variables are held constant, when the person is female (as compared to male), monthly income increases by $1266.40 but actually this coefficient is meaningless.” Again gender does not have a statistically significant effect so the unstandardized coefficient is meaningless. Were it statistically significant that is what the coefficient would mean. 7 OF 9 Standardized coefficient or standardized slope – SPSS calls them “Beta” Recall, the overall model must be statistically significant for our interpretation below to make sense. We need a way to figure out which independent variable has the strongest effect on the dependent variable. The largest one (in absolute value) has the largest relative effect on the dependent variable. In our example we would wonder what has a greater effect on monthly income – age, or gender? (This is theoretical and for learning purposes as from above we know that gender has no effect whatsoever. But to learn we will play make believe.) Standardized Slopes or Coeffients or Beta in SPSS Coefficientsa Unstandardized Coefficients B Std. Error 539.623 1980.405 Standardize d Coefficients Beta Model t 1 (Constant) .272 What is your age in exact 105.948 48.318 .351 2.193 years_______? female=1 1266.354 899.057 .226 1.409 male=0 a. Dependent Variable: Approximate monthly income (US dollars) Sig. .787 .036 .169 Strength of standardized slopes in descending order • Beta x1 = age = 0.351 • Beta x2 = gender = 0.226 * recall this is meaningless as it’s not statistically significant but we are playing make believe for learning purposes. So pretending both were statistically significant, 0.351 is the largest number and age has the greatest effect of two independent variables on monthly income. (Really, age is the only independent variable with an effect!) Interpreting standardized slopes or coefficients or Beta The plain English on this one is a little whacky as it involves the language of standard deviations. Recall a standard deviation produces a number that answers the question “On average how much do the data points differ from the mean.” So the standard deviation is the average deviation from the mean. • Beta x1 = age = 0.351 this is positive and the sign matters! With the other independent variables held constant, for every increase of one standard deviation in age, there is a 0.351 standard deviation increase in monthly income. 8 OF 9 In our model the p-value for gender was larger than .05. This means gender is NOT statistically significant so this variable has no logical interpretation! But for the purposes of the test we would say, “If this variable were significant the standardized coefficient would mean ________but actually this coefficient is meaningless.” • Beta x2 =gender = 0.226 “If this variable were significant the standardized coefficient would mean with the other independent variables held constant, when the person is female (as compared to male), there is a 0.266 standard deviation increase in monthly income but actually this coefficient is meaningless.” 9 OF 9