Download lecture 22c SPSS for Multiple Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Statistics
22c_SPSS.pdf
Michael Hallstone, Ph.D.
[email protected]
Lecture 22c: Using SPSS for Multiple Regression
The purpose of this lecture is to illustrate the how to create SPSS output for multiple regression.
You will notice that in the “main” text lecture 22 on multiple regression I do all calculations using
SPSS. Thus that main lecture can also serve as an example of interpreting SPSS.
There are a series of short YouTube videos I made linked in our course schedule that show
you how to do this. In the video some menu commands are hard to see on the screen so I’ve
provided screen shots of the menu commands here. I also show you how to interpret the
SPSS output in writing below. You will need to read this closely to understand the videos
where I am very brief.
Making gender a dummy variable: https://www.youtube.com/watch?v=EEZjgAC3u-4
Lecture 22c part 1: https://youtu.be/Zl8L-X_LHtI
Lecture 22c part 2 https://youtu.be/Na6Pt08QMT4
Lecture 22c part3 https://youtu.be/HLLbqdqUyD0
Multiple Regression -- variables needed
For the test problem you will need one dependent variable and two independent variables. You
must use the following unless you receive emailed permission from me to use other
variables. It is okay to use others, but we need to talk about it first.
Dependent variable (y) – monthly income in exact numbers (must be ratio level not coded into
categories)
Independent variables (x)
x1 = age this should be age in exact years (must be ratio level not coded into categories)
• x2 = gender (female=1 male=0) *
•
* Research shows females earn less income for equivalent qualifications, so we’d expect this effect
to be negative.
For a tutorial on recoding gender see the video above.
1 OF 9
How to have SPSS create multiple regression output
Analyze ….Regression….Linear
Again is VERY important that you do not “mix” up your variables in the following screen! Move your dependent
variable (y) into the Dependent box and your independent variables (x) into the Independent box and push OK.
Recall our dependent variable is income and our independent variables are age and gender (Gender must be
coded as a dummy variable. In ours female =1 and male=0).
Push OK
2 OF 9
Below is the output
3 OF 9
Use p-value to see if overall model is statistically significant.
Find the p-value in the far right box “Sig.” in the ANOVA box. Our p-value is .028 or 2.8%.
ANOVAa
Sum of
Model
Squares
df
Mean Square
F
1
Regression 38994960.51
19497480.25
2
4.019
8
9
Residual
155231896.6
32 4850996.770
25
Total
194226857.1
34
43
a. Dependent Variable: Approximate monthly income (US dollars)
b. Predictors: (Constant), female=1 male=0, What is your age in exact
years_______?
Sig.
.028b
Decision rule for statistical significance using p-value
•
•
If p-value < α or “alpha” it is significant! [Typically we use alpha =.05]
If p-value >= α or “alpha” it is NOT significant! [Typically we use alpha =.05]
So....
•
If p-value < .05 ”it is significant!
If p-value >= .05 it is NOT significant!
Our p-value is .028 or 2.8%. That’s less than .05 or 5% so we will say we have a statistically
significant overall model.
This means our independent variables account for a significant amount of variation in our dependent
variable. So age and gender account for a significant amount of variation in monthly income.
4 OF 9
Regression equation and unstandardized coeffients
Each independent variable has a number that represents a slope. In SPSS they are called
unstandardized coefficients and found under B in the Coefficients box. The slopes or
unstandardized coefficients are meaningful only if the overall model is statistically significant
and their individual p-values are statistically significant.
Coefficientsa
Unstandardized
Coefficients
B
Std. Error
539.623
1980.405
Standardize
d
Coefficients
Beta
Model
t
1
(Constant)
.272
What is your
age in exact
105.948
48.318
.351
2.193
years_______?
female=1
1266.354
899.057
.226
1.409
male=0
a. Dependent Variable: Approximate monthly income (US dollars)
Sig.
.787
.036
.169
Equation for our regression model:
Generic equation: y-hat= a + b1x1 + b2x2
where
y-hat = dependent variable
a = y intercept,
b= slope for two independent variables: x1 = age (in years), x2 = gender (female=1 male=0)
Plugging the actual slopes from SPSS gives us the real equation:
y-hat= 539.623 + 105.948x1 + 1266.354x2 or rounded to
y-hat= 539.6 + 105.9x1 + 1266.4x2
5 OF 9
Unstandardized coefficients or Unstandardized slope
Now we need to see which of the independent variables have a statistically significant effect. We use
the p-value under “Sig.” in the far right column of the Coefficients box.
Coefficientsa
Standardize
Unstandardized
d
Coefficients
Coefficients
Model
B
Std. Error
Beta
t
Sig.
1
(Constant)
539.623
1980.405
.272
.787
What is your
age in exact
105.948
48.318
.351
2.193
.036
years_______?
female=1
1266.354
899.057
.226
1.409
.169
male=0
a. Dependent Variable: Approximate monthly income (US dollars)
•
x1 = age p-value =.036 or 3.6% so it’s less than .05 or 5% and statistically significant
•
x2 = gender p-value =.169 or 16.9% so it’s greater than .05 or 5% and not statistically
significant – this variable has no effect
Interpreting the slopes of the independent variables – unstandardized coefficients
y-hat= 539.6 + 105.9x1 + 1266.4x2
Interpreting the effect of ratio level independent variables on the dependent
variable
•
x1 = age b1 =105.9
When all other independent variables are held constant, every one-year increase in age increases
the monthly income by $105.90. (An increase in age increases the monthly income.)
Interpreting the effect “dummy” independent variables on the dependent
variable
Recall that the variables below were nominal variables coded 1 and 0, where 1= the characteristic of
interest. We pretend they are ratio where 1=100% of the characteristic of interest and 0=0% of the
characteristic of interest.
•
x2 = gender (female=1 male=0)
6 OF 9
Dummy variables are interpreted differently. When the dummy variable =1 the slope is the effect it
has on the dependent variable.
In our model the p-value for gender was larger than .05. This means gender is NOT
statistically significant so this variable has no logical interpretation! But for the purposes of the
test we would say,
“If this variable were significant the unstandardized coefficient or slope would mean
________but actually this coefficient is meaningless.”
x2 = gender b2 = 1266.4
“If this variable were significant the unstandardized coefficient or slope would mean, when all
other independent variables are held constant, when the person is female (as compared to male),
monthly income increases by $1266.40 but actually this coefficient is meaningless.”
Again gender does not have a statistically significant effect so the unstandardized coefficient is
meaningless. Were it statistically significant that is what the coefficient would mean.
7 OF 9
Standardized coefficient or standardized slope – SPSS calls them “Beta”
Recall, the overall model must be statistically significant for our interpretation below to make sense.
We need a way to figure out which independent variable has the strongest effect on the dependent
variable. The largest one (in absolute value) has the largest relative effect on the dependent
variable. In our example we would wonder what has a greater effect on monthly income – age, or
gender? (This is theoretical and for learning purposes as from above we know that gender has no
effect whatsoever. But to learn we will play make believe.)
Standardized Slopes or Coeffients or Beta in SPSS
Coefficientsa
Unstandardized
Coefficients
B
Std. Error
539.623
1980.405
Standardize
d
Coefficients
Beta
Model
t
1
(Constant)
.272
What is your
age in exact
105.948
48.318
.351
2.193
years_______?
female=1
1266.354
899.057
.226
1.409
male=0
a. Dependent Variable: Approximate monthly income (US dollars)
Sig.
.787
.036
.169
Strength of standardized slopes in descending order
•
Beta x1 = age = 0.351
•
Beta x2 = gender = 0.226 * recall this is meaningless as it’s not statistically significant
but we are playing make believe for learning purposes.
So pretending both were statistically significant, 0.351 is the largest number and age has the
greatest effect of two independent variables on monthly income. (Really, age is the only independent
variable with an effect!)
Interpreting standardized slopes or coefficients or Beta
The plain English on this one is a little whacky as it involves the language of standard deviations.
Recall a standard deviation produces a number that answers the question “On average how much do
the data points differ from the mean.” So the standard deviation is the average deviation from the
mean.
•
Beta x1 = age = 0.351 this is positive and the sign matters!
With the other independent variables held constant, for every increase of one standard deviation in
age, there is a 0.351 standard deviation increase in monthly income.
8 OF 9
In our model the p-value for gender was larger than .05. This means gender is NOT
statistically significant so this variable has no logical interpretation! But for the purposes of the
test we would say,
“If this variable were significant the standardized coefficient would mean ________but actually
this coefficient is meaningless.”
• Beta x2 =gender = 0.226
“If this variable were significant the standardized coefficient would mean with the other
independent variables held constant, when the person is female (as compared to male), there is a
0.266 standard deviation increase in monthly income but actually this coefficient is meaningless.”
9 OF 9