Download Multiple Regression - Basic Relationships

Document related concepts

Omnibus test wikipedia , lookup

Mediation (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
SW388R7
Data Analysis &
Computers II
Multiple Regression – Basic Relationships
Slide 1
Purpose of multiple regression
Different types of multiple regression
Standard multiple regression
Hierarchical multiple regression
Stepwise multiple regression
Steps in solving regression problems
SW388R7
Data Analysis &
Computers II
Purpose of multiple regression
Slide 2


The purpose of multiple regression is to analyze the
relationship between metric or dichotomous
independent variables and a metric dependent
variable.
If there is a relationship, using the information in the
independent variables will improve our accuracy in
predicting values for the dependent variable.
SW388R7
Data Analysis &
Computers II
Types of multiple regression
Slide 3

There are three types of multiple regression, each of
which is designed to answer a different question:
 Standard multiple regression is used to evaluate
the relationships between a set of independent
variables and a dependent variable.
 Hierarchical, or sequential, regression is used to
examine the relationships between a set of
independent variables and a dependent variable,
after controlling for the effects of some other
independent variables on the dependent variable.
 Stepwise, or statistical, regression is used to
identify the subset of independent variables that
has the strongest relationship to a dependent
variable.
SW388R7
Data Analysis &
Computers II
Standard multiple regression
Slide 4



In standard multiple regression, all of the
independent variables are entered into the
regression equation at the same time
Multiple R and R² measure the strength of the
relationship between the set of independent
variables and the dependent variable. An F
test is used to determine if the relationship
can be generalized to the population
represented by the sample.
A t-test is used to evaluate the individual
relationship between each independent
variable and the dependent variable.
SW388R7
Data Analysis &
Computers II
Hierarchical multiple regression
Slide 5



In hierarchical multiple regression, the
independent variables are entered in two
stages.
In the first stage, the independent variables
that we want to control for are entered into
the regression. In the second stage, the
independent variables whose relationship we
want to examine after the controls are
entered.
A statistical test of the change in R² from the
first stage is used to evaluate the importance
of the variables entered in the second stage.
SW388R7
Data Analysis &
Computers II
Stepwise multiple regression
Slide 6



Stepwise regression is designed to find the
most parsimonious set of predictors that are
most effective in predicting the dependent
variable.
Variables are added to the regression
equation one at a time, using the statistical
criterion of maximizing the R² of the included
variables.
When none of the possible addition can make
a statistically significant improvement in R²,
the analysis stops.
SW388R7
Data Analysis &
Computers II
Problem 1 - standard multiple regression
Slide 7
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that the split sample validation will confirm the
generalizability of the results. Use a level of significance of 0.05.
The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a
strong relationship to the variable "frequency of attendance at religious services" [attend].
Survey respondents who were less strongly affiliated with their religion attended religious
services less often. Survey respondents who prayed less often attended religious services
less often.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 1 - 1
Slide 8
When a problem states that there is a
relationship between some independent
variables and a dependent variable, we
do standard multiple regression.
The variables listed first in the
1. Instatement
the dataset
is the following statement true, false, or an incorrect
problem
are GSS2000.sav,
the
independent
variables
(ivs): Assume that there is no problem with missing data, violation of
application
of a statistic?
"strength
of affiliation"
[reliten]and that the split sample validation will confirm the
assumptions,
or outliers,
and "frequency of prayer" [pray]
generalizability of the results. Use a level of significance of 0.05.
The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a
strong relationship to the variable "frequency of attendance at religious services" [attend].
Survey respondents who were less strongly affiliated with their religion attended religious
services less often. Survey respondents who prayed less often
attended
religious
services
The variable
that
is
less often.
related to is the
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
dependent variable
(dv): "frequency of
attendance at religious
services" [attend].
SW388R7
Data Analysis &
Computers II
Dissecting problem 1 - 2
Slide 9
In1.order
fordataset
a problem
to be true, is
wethe following statement true, false, or an incorrect
In the
GSS2000.sav,
will
have find:of a statistic? Assume that there is no problem with missing data, violation of
application
•a statistically significant relationship
assumptions, or outliers, and that the split sample validation will confirm the
between the ivs and the dv
generalizability
of the
results.
Use a level of significance of 0.05.
•a
relationship of the
correct
strength
The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a
strong relationship to the variable "frequency of attendance at religious services" [attend].
Survey respondents who were less strongly affiliated with their religion attended religious
services less often. Survey respondents who prayed less often attended religious services
less often.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
The relationship of each of
the independent variables
to the dependent variable
must be statistically
significant and interpreted
correctly.
SW388R7
Data Analysis &
Computers II
Request a standard multiple regression
Slide 10
To compute a multiple
regression in SPSS, select
the Regression | Linear
command from the Analyze
menu.
SW388R7
Data Analysis &
Computers II
Specify the variables and selection method
Slide 11
First, move the
dependent variable
attend to the
Dependent text box.
Second, move the
independent variables
reliten and pray to
the Independent(s)
list box.
Fourth, click on the
Statistics… button to
specify the statistics
options that we want.
Third, select the method
for entering the variables
into the analysis from the
drop down Method menu.
In this example, we accept
the default of Enter for
direct entry of all variables,
which produces a standard
multiple regression.
SW388R7
Data Analysis &
Computers II
Specify the statistics output options
Slide 12
First, mark the
checkboxes for
Estimates on
the Regression
Coefficients
panel.
Second, mark
the checkboxes
for Model Fit and
Descriptives.
Third, click on
the Continue
button to close
the dialog box.
SW388R7
Data Analysis &
Computers II
Request the regression output
Slide 13
Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Computers II
LEVEL OF MEASUREMENT
Slide 14
Multiple regression requires that the dependent variable be
metric and the independent variables be metric or
dichotomous. "Frequency of attendance at religious services"
[attend] is an ordinal level variable, which satisfies the level
of measurement requirement if we follow the convention of
treating ordinal level variables as metric variables. Since
some data analysts do not agree with this convention, a note
of caution should be included in our interpretation.
"Strength of affiliation" [reliten] and "frequency of prayer"
[pray] are ordinal level variables. If we follow the convention
of treating ordinal level variables as metric variables, the
level of measurement requirement for multiple regression
analysis is satisfied. Since some data analysts do not agree
with this convention, a note of caution should be included in
our interpretation.
SW388R7
Data Analysis &
Computers II
SAMPLE SIZE
Slide 15
Descriptive Statistics
Mean
HOW OFTEN R ATTENDS
RELIGIOUS SERVICES
STRENGTH OF
AFFILIATION
HOW OFTEN DOES R
PRAY
Std. Deviation
N
3.15
2.653
113
2.12
1.084
113
2.90
1.575
113
The minimum ratio of valid cases to
independent variables for multiple
regression is 5 to 1. With 113 valid
cases and 2 independent variables,
the ratio for this analysis is 56.5 to 1,
which satisfies the minimum
requirement.
In addition, the ratio of 56.5 to 1
satisfies the preferred ratio of 15 to 1.
SW388R7
Data Analysis &
Computers II
Slide 16
OVERALL RELATIONSHIP BETWEEN INDEPENDENT
AND DEPENDENT VARIABLES - 1
The probability of the F statistic (49.824) for the
overall regression relationship is <0.001, less than or
equal to the level of significance of 0.05. We reject
the null hypothesis that there is no relationship
between the set of independent variables and the
dependent variable (R² = 0). We support the
research hypothesis that there is a statistically
significant relationship between the set of
independent variables and the dependent variable.
ANOVAb
Model
1
Regress ion
Res idual
Total
Sum of
Squares
374.757
413.685
788.442
df
2
110
112
Mean Square
187.379
3.761
F
49.824
Sig.
.000 a
a. Predictors : (Constant), HOW OFTEN DOES R PRAY, STRENGTH OF AFFILIATION
b. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
SW388R7
Data Analysis &
Computers II
Slide 17
OVERALL RELATIONSHIP BETWEEN
INDEPENDENT AND DEPENDENT VARIABLES - 2
The Multiple R for the relationship between the set of
independent variables and the dependent variable is 0.689,
which would be characterized as strong using the rule of
thumb than a correlation less than or equal to 0.20 is
characterized as very weak; greater than 0.20 and less than
or equal to 0.40 is weak; greater than 0.40 and less than or
equal to 0.60 is moderate; greater than 0.60 and less than or
equal to 0.80 is strong; and greater than 0.80 is very strong.
Model Summary
Model
1
R
R Square
a
.689
.475
Adjus ted
R Square
.466
Std. Error of
the Es timate
1.939
a. Predictors : (Constant), HOW OFTEN DOES R PRAY,
STRENGTH OF AFFILIATION
SW388R7
Data Analysis &
Computers II
Slide 18
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
For the independent variable strength of affiliation, the
probability of the t statistic (-5.857) for the b
coefficient is <0.001 which is less than or equal to the
level of significance of 0.05. We reject the null
hypothesis that the slope associated with strength of
affiliation is equal to zero (b = 0) and conclude that
there is a statistically significant relationship between
strength of affiliation and frequency of attendance at
religious services.
Coefficientsa
Model
1
(Cons tant)
STRENGTH OF
AFFILIATION
HOW OFTEN
DOES R PRAY
Uns tandardized
Coefficients
B
Std. Error
7.167
.442
Standardized
Coefficients
Beta
t
16.206
Sig.
.000
-1.138
.194
-.465
-5.857
.000
-.554
.134
-.329
-4.145
.000
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
SW388R7
Data Analysis &
Computers II
Slide 19
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 2
Coefficientsa
Model
1
(Cons tant)
STRENGTH OF
AFFILIATION
HOW OFTEN
DOES R PRAY
Uns tandardized
Coefficients
B
Std. Error
7.167
.442
Standardized
Coefficients
Beta
t
16.206
Sig.
.000
-1.138
.194
-.465
-5.857
.000
-.554
.134
-.329
-4.145
.000
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
The b coefficient associated with strength of affiliation
(-1.138) is negative, indicating an inverse relationship in
which higher numeric values for strength of affiliation are
associated with lower numeric values for frequency of
attendance at religious services.
Since both variables are ordinal level, we will have to look
at the coding for each before we can make a correct
interpretation. For ordinal level variables the numeric
codes can be associated with labels in ascending or
descending order.
SW388R7
Data Analysis &
Computers II
Slide 20
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 3
The independent variable
strength of affiliation is an
ordinal variable that is
coded so that higher
numeric values are
associated with survey
respondents who were less
strongly affiliated with their
religion.
SW388R7
Data Analysis &
Computers II
Slide 21
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 4
The dependent variable
frequency of attendance at
religious services is also an
ordinal variable. It is coded
so that lower numeric
values are associated with
survey respondents who
attended religious services
less often.
Therefore, the negative value of b implies
that survey respondents who were less
strongly affiliated with their religion
attended religious services less often.
SW388R7
Data Analysis &
Computers II
Slide 22
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 5
For the independent variable frequency of prayer, the
probability of the t statistic (-4.145) for the b
coefficient is <0.001 which is less than or equal to the
level of significance of 0.05. We reject the null
hypothesis that the slope associated with frequency of
prayer is equal to zero (b = 0) and conclude that there
is a statistically significant relationship between
frequency of prayer and frequency of attendance at
religious services.
Coefficientsa
Model
1
(Cons tant)
STRENGTH OF
AFFILIATION
HOW OFTEN
DOES R PRAY
Uns tandardized
Coefficients
B
Std. Error
7.167
.442
Standardized
Coefficients
Beta
t
16.206
Sig.
.000
-1.138
.194
-.465
-5.857
.000
-.554
.134
-.329
-4.145
.000
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
SW388R7
Data Analysis &
Computers II
Slide 23
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 6
Coefficientsa
Model
1
(Cons tant)
STRENGTH OF
AFFILIATION
HOW OFTEN
DOES R PRAY
Uns tandardized
Coefficients
B
Std. Error
7.167
.442
Standardized
Coefficients
Beta
t
16.206
Sig.
.000
-1.138
.194
-.465
-5.857
.000
-.554
.134
-.329
-4.145
.000
a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES
The b coefficient associated with how often does r pray
(-0.554) is negative, indicating an inverse relationship in
which higher numeric values for how often does r pray are
associated with lower numeric values for frequency of
attendance at religious services.
Since both variables are ordinal level, we will have to look
at the coding for each before we can make a correct
interpretation. For ordinal level variables the numeric
codes can be associated with labels in ascending or
descending order.
SW388R7
Data Analysis &
Computers II
Slide 24
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 7
The independent variable
frequency of prayer is an
ordinal variable that is
coded so that higher
numeric values are
associated with survey
respondents who prayed
less often.
SW388R7
Data Analysis &
Computers II
Slide 25
RELATIONSHIP OF INDIVIDUAL INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 8
The dependent variable
frequency of attendance at
religious services is also an
ordinal variable. It is coded
so that lower numeric
values are associated with
survey respondents who
attended religious services
less often.
Therefore, the negative value of b
implies that survey respondents who
prayed less often attended religious
services less often.
SW388R7
Data Analysis &
Computers II
Answer to problem 1
Slide 26





The independent and dependent variables were
metric (ordinal).
The ratio of cases to independent variables was 56.5
to 1.
The overall relationship was statistically significant
and its strength was characterized correctly.
The b coefficient for all variables was statistically
significant and the direction of the relationships
were characterized correctly.
The answer to the question is true with caution. The
caution is added because of the ordinal variables.
SW388R7
Data Analysis &
Computers II
Problem 2 – hierarchical regression
Slide 27
In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that the split sample validation will confirm the
generalizability of the results. Use a level of significance of 0.05.
After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of
the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude
toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%.
After controlling for age and sex, the variables happiness of marriage, condition of health,
and attitude toward life each make an individual contribution to reducing the error in
predicting general happiness. Survey respondents who were less happy with their marriages
were less happy overall. Survey respondents who said they were not as healthy were less
happy overall. Survey respondents who felt life was less exciting were less happy overall.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 2 - 1
Slide 28
The variables listed first in the
problem statement are the
independent variables (ivs)
14.effect
In the
is the following statement true, false, or an incorrect
whose
wedataset
want toGSS2000.sav,
control
application
ofthe
a statistic? Assume that there is no problem with missing data, violation of
before
we test for
assumptions,
or outliers,
relationship:
"age"[age]
and and that the split sample validation will confirm the
"sex"generalizability
[sex],
of the results. Use a level of significance of 0.05.
After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of
the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude
toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%.
After controlling for age and sex, the variables happiness of marriage, condition of health,
The
we add
after
the an individual contribution to reducing the error in
andvariables
attitudethat
toward
life in
each
make
control variables are the independent
predicting general happiness. Survey respondents who were less
with
their
marriages
The happy
variable
that
to be
variables that we think will have a
were lessrelationship
happy overall.
Survey respondents who said they were
not asor
healthy
predicted
relatedwere
to is less
statistical
to the
the
dependent
variable
happy overall.
Survey respondents who felt life was less exciting were less happy overall.
dependent
variable:
(dv): "general happiness"
"happiness of marriage" [hapmar],
[happy]
"condition of health" [health], and
1. True
"attitude
toward life" [life]
2. True with caution
3. False
4. Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 2 - 2
Slide 29
In order for a problem to be true, the
relationship
the true,
addedfalse,
variables
14. In the dataset GSS2000.sav, is the
followingbetween
statement
or an incorrect
and the dependent variable must be
application of a statistic? Assume that
there is no problem with missing data, violation of
statistically significant, and the strength of
assumptions, or outliers, and that the
sample after
validation
willthe
confirm
the
the split
relationship
including
control
generalizability of the results. Use avariables
level ofmust
significance
of 0.05.
be correctly
stated.
After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of
the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude
toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%.
After controlling for age and sex, the variables happiness of marriage, condition of health,
and attitude toward life each make an individual contribution to reducing the error in
predicting general happiness. Survey respondents who were less happy with their marriages
were less happy overall. Survey respondents who said they were not as healthy were less
happy overall. Survey respondents who felt life was less exciting were less happy overall.
1.WeTrue
are generally not interested
whether
not the control
2.in True
withorcaution
variables have a statistically
3.significant
False relationship to the
variables.
4.dependent
Inappropriate
application of a statistic
The relationship between
each of the independent
variables entered after the
control variables and the
dependent variable must
be statistically significant
and interpreted correctly.
SW388R7
Data Analysis &
Computers II
Request a hierarchical multiple regression
Slide 30
To compute a multiple
regression in SPSS, select
the Regression | Linear
command from the Analyze
menu.
SW388R7
Data Analysis &
Computers II
Specify independent variables to control for
Slide 31
First, move the
dependent variable
happy to the
Dependent text box.
Second, move the
independent variables
to control for age and
sex to the
Independent(s) list box.
Fourth, click on the Next
button to tell SPSS to add
another block of variables
to the regression analysis.
Third, select the method for
entering the variables into the
analysis from the drop down
Method menu. In this example,
we accept the default of Enter for
direct entry of all variables in the
first block which will force the
controls into the regression.
SW388R7
Data Analysis &
Computers II
Add the other independent variables
Slide 32
SPSS identifies that we
will now be adding
variables to a second
block.
First, move the other
independent variables
hapmar, health and
life to the
Independent(s) list
box for block 2.
Second, click on the
Statistics… button to
specify the statistics
options that we want.
SW388R7
Data Analysis &
Computers II
Specify the statistics output options
Slide 33
First, mark the
checkboxes for
Estimates on
the Regression
Coefficients
panel.
Second, mark the checkboxes for Model
Fit, Descriptives, and R squared change.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
the dependent variable.
Third, click on
the Continue
button to close
the dialog box.
SW388R7
Data Analysis &
Computers II
Request the regression output
Slide 34
Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Computers II
LEVEL OF MEASUREMENT
Slide 35
Multiple regression requires that the dependent variable be metric
and the independent variables be metric or dichotomous. "General
happiness" [happy] is an ordinal level variable, which satisfies the
level of measurement requirement if we follow the convention of
treating ordinal level variables as metric variables. Since some data
analysts do not agree with this convention, a note of caution should
be included in our interpretation.
"Age" [age] is an interval level variable, which satisfies the level of
measurement requirements for multiple regression analysis.
"Happiness of marriage" [hapmar], "condition of health" [health], and
"attitude toward life" [life] are ordinal level variables. If we follow
the convention of treating ordinal level variables as metric variables,
the level of measurement requirement for multiple regression
analysis is satisfied. Since some data analysts do not agree with this
convention, a note of caution should be included in our
interpretation.
"Sex" [sex] is a dichotomous or dummy-coded nominal variable which
may be included in multiple regression analysis.
SW388R7
Data Analysis &
Computers II
SAMPLE SIZE
Slide 36
Descriptive Statistics
GENERAL HAPPINESS
AGE OF RESPONDENT
RESPONDENTS SEX
HAPPINESS OF
MARRIAGE
CONDITION OF HEALTH
IS LIFE EXCITING OR
DULL
Mean
1.63
45.50
1.61
Std. Deviation
.626
15.221
.490
1.42
.540
90
1.80
.810
90
1.49
.525
90
The minimum ratio of valid cases to
independent variables for multiple
regression is 5 to 1. With 90 valid
cases and 5 independent variables,
the ratio for this analysis is 18.0 to 1,
which satisfies the minimum
requirement.
In addition, the ratio of 18.0 to 1
satisfies the preferred ratio of 15 to 1.
N
90
90
90
SW388R7
Data Analysis &
Computers II
Slide 37
OVERALL RELATIONSHIP BETWEEN INDEPENDENT
AND DEPENDENT VARIABLES
ANOVAc
Model
1
2
Regress ion
Res idual
Total
Regress ion
Res idual
Total
Sum of
Squares
.006
34.894
34.900
12.601
22.299
34.900
df
2
87
89
5
84
89
Mean Square
.003
.401
F
.007
Sig.
.993 a
2.520
.265
9.493
.000 b
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE
EXCITING OR DULL, HAPPINESS OF MARRIAGE, CONDITION OF HEALTH
c. Dependent Variable: GENERAL HAPPINESS
The probability of the F statistic (9.493) for the overall
regression relationship for all indpendent variables is
<0.001, less than or equal to the level of significance of
0.05. We reject the null hypothesis that there is no
relationship between the set of all independent variables
and the dependent variable (R² = 0). We support the
research hypothesis that there is a statistically significant
relationship between the set of all independent variables
and the dependent variable.
SW388R7
Data Analysis &
Computers II
Slide 38
REDUCTION IN ERROR IN PREDICTING
DEPENDENT VARIABLE - 1
Model Summary
Change Statis tics
Model
1
2
R
R Square
a
.013
.000
b
.601
.361
Adjus ted
R Square
-.023
.323
Std. Error of
the Es timate
.633
.515
R Square
Change
.000
.361
F Change
.007
15.814
df1
df2
2
3
87
84
Sig. F Change
.993
.000
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF
MARRIAGE, CONDITION OF HEALTH
The R Square Change statistic for the increase in R²
associated with the added variables (happiness of
marriage, condition of health, and attitude toward
life) is 0.361. Using a proportional reduction in
error interpretation for R², information provided by
the added variables reduces our error in predicting
general happiness by 36.1%.
SW388R7
Data Analysis &
Computers II
Slide 39
REDUCTION IN ERROR IN PREDICTING
DEPENDENT VARIABLE - 2
Model Summary
Change Statis tics
Model
1
2
R
R Square
.013 a
.000
b
.601
.361
Adjus ted
R Square
-.023
.323
Std. Error of
the Es timate
.633
.515
R Square
Change
.000
.361
F Change
.007
15.814
df1
df2
2
3
87
84
Sig. F Change
.993
.000
a. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT
b. Predictors : (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF
MARRIAGE, CONDITION OF HEALTH
The probability of the F statistic (15.814) for the change in R²
associated with the addition of the predictor variables to the
regression analysis containing the control variables is <0.001, less
than or equal to the level of significance of 0.05. We reject the
null hypothesis that there is no improvement in the relationship
between the set of independent variables and the dependent
variable when the predictors are added (R² Change = 0).
We support the research hypothesis that there is a statistically
significant improvement in the relationship between the set of
independent variables and the dependent variable.
SW388R7
Data Analysis &
Computers II
Slide 40
RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 1
Coefficientsa
Model
1
2
Uns tandardized
Coefficients
B
Std. Error
1.594
.341
.000
.005
.011
.140
.432
.341
-.001
.004
-.013
.115
Standardized
Coefficients
Beta
t
4.677
.107
.078
1.268
-.385
-.113
(Cons tant)
AGE OF RESPONDENT
.012
RESPONDENTS SEX
.008
(Cons tant)
AGE OF RESPONDENT
-.035
RESPONDENTS SEX
-.010
HAPPINESS OF
.599
.104 added individual
.517
5.741
If there is a relationship between
each
MARRIAGE
independent variable and the dependent variable, the probability
CONDITION
OF HEALTH
.101b coefficient
.072 (slope of.131
1.408
of the statistical
test of the
the regression
IS line)
LIFE EXCITING
OR than or equal to the level of significance. The
will be less
.170states that
.108 b is equal .142
null hypothesis for this test
to zero, 1.570
DULL
indicating a flat regression line and no relationship.
a. Dependent Variable: GENERAL HAPPINESS
If we reject the null hypothesis and find that there is a
relationship between the variables, the sign of the b coefficient
indicates the direction of the relationship for the data values. If
b is greater than or equal to zero, the relationship is positive or
direct. If b is less than zero, the relationship is negative or
inverse. If the variable is dichotomous or ordinal, the direction of
the coding must be taken into account to make a correct
interpretation.
Sig.
.000
.915
.938
.208
.701
.911
.000
.163
.120
SW388R7
Data Analysis &
Computers II
Slide 41
RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 2
Coefficientsa
Model
1
2
(Cons tant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Cons tant)
AGE OF RESPONDENT
RESPONDENTS SEX
HAPPINESS OF
MARRIAGE
CONDITION OF HEALTH
IS LIFE EXCITING OR
DULL
Uns tandardized
Coefficients
B
Std. Error
1.594
.341
.000
.005
.011
.140
.432
.341
-.001
.004
-.013
.115
Standardized
Coefficients
Beta
-.035
-.010
t
4.677
.107
.078
1.268
-.385
-.113
Sig.
.000
.915
.938
.208
.701
.911
.012
.008
.599
.104
.517
5.741
.000
.101
.072
.131
1.408
.163
.170
.108
.142
1.570
.120
a. Dependent Variable: GENERAL HAPPINESS
For the independent variable happiness of marriage, the
probability of the t statistic (5.741) for the b coefficient is
<0.001 which is less than or equal to the level of
significance of 0.05.
We reject the null hypothesis that the slope associated
with happiness of marriage is equal to zero (b = 0) and
conclude that there is a statistically significant relationship
between happiness of marriage and general happiness.
SW388R7
Data Analysis &
Computers II
Slide 42
RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 3
Coefficientsa
Model
1
2
(Cons tant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Cons tant)
AGE OF RESPONDENT
RESPONDENTS SEX
HAPPINESS OF
MARRIAGE
CONDITION OF HEALTH
IS LIFE EXCITING OR
DULL
Uns tandardized
Coefficients
B
Std. Error
1.594
.341
.000
.005
.011
.140
.432
.341
-.001
.004
-.013
.115
Standardized
Coefficients
Beta
-.035
-.010
t
4.677
.107
.078
1.268
-.385
-.113
Sig.
.000
.915
.938
.208
.701
.911
.012
.008
.599
.104
.517
5.741
.000
.101
.072
.131
1.408
.163
.170
.108
.142
1.570
.120
The b coefficient associated with happiness
a. Dependent Variable: GENERAL HAPPINESS
of marriage (0.599) is positive, indicating a
direct relationship in which higher numeric
values for happiness of marriage are
associated with higher numeric values for
general happiness.
SW388R7
Data Analysis &
Computers II
Slide 43
RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 4
The independent variable happiness
of marriage is an ordinal variable
that is coded so that higher
numeric values are associated with
survey respondents who were less
happy with their marriages.
SW388R7
Data Analysis &
Computers II
Slide 44
RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 5
The dependent variable
general happiness is also an
ordinal variable. It is coded so
that higher numeric values
are associated with survey
respondents who were less
happy overall.
Therefore, the positive value of b
implies that survey respondents who
were less happy with their marriages
were less happy overall.
SW388R7
Data Analysis &
Computers II
Slide 45
RELATIONSHIP OF ADDED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE - 6
Coefficientsa
Model
1
2
(Cons tant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Cons tant)
AGE OF RESPONDENT
RESPONDENTS SEX
HAPPINESS OF
MARRIAGE
CONDITION OF HEALTH
IS LIFE EXCITING OR
DULL
Uns tandardized
Coefficients
B
Std. Error
1.594
.341
.000
.005
.011
.140
.432
.341
-.001
.004
-.013
.115
Standardized
Coefficients
Beta
-.035
-.010
t
4.677
.107
.078
1.268
-.385
-.113
Sig.
.000
.915
.938
.208
.701
.911
.012
.008
.599
.104
.517
5.741
.000
.101
.072
.131
1.408
.163
.170
.108
.142
1.570
.120
a. Dependent Variable: GENERAL HAPPINESS
For the independent variable condition of health, the probability of
the t statistic (1.408) for the b coefficient is 0.163 which is greater
than the level of significance of 0.05. We fail to reject the null
hypothesis that the slope associated with condition of health is
equal to zero (b = 0) and conclude that there is not a statistically
significant relationship between condition of health and general
happiness. The statement in the problem that "survey respondents
who said they were not as healthy were less happy overall" is
incorrect.
SW388R7
Data Analysis &
Computers II
Answer to problem 2
Slide 46






The independent and dependent variables were metric or
dichotomous. Some are ordinal.
The ratio of cases to independent variables was 18.0 to 1.
The overall relationship was statistically significant and its
strength was characterized correctly.
The change in R2 associated with adding the second block of
variables was statistically significant and correctly interpreted.
The b coefficient for happiness of marriage was statistically
significant and correctly interpreted. The b coefficient for
condition of health was not statistically significant. We cannot
conclude that there was a relationship between condition of
health and general happiness.
The answer to the question is false.
SW388R7
Data Analysis &
Computers II
Problem 3 – Stepwise Regression
Slide 47
26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that the split sample validation will confirm the
generalizability of the results. Use a level of significance of 0.05.
From the list of variables "number of hours worked in the past week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
degree" [degree], the best predictors of "total family income" [income98] are "highest
academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic
degree and occupational prestige score have a moderate relationship to total family
income.
The most important predictor of total family income is occupational prestige score. The
second most important predictor of total family income is highest academic degree.
Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents who had more prestigious occupations had higher total family incomes.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 3 - 1
Slide 48
The variables listed first in the
The variable that to be
problem
are the
26. statement
In the dataset
GSS2000.sav, is the following statementpredicted
true, false,
or an incorrect
or related
to is
independent
variables
from
which
application of a statistic? Assume that there is no problem the
withdependent
missing data,
violation
of
variable
the computer
will
select
the
best
assumptions, or outliers, and that the split sample validation
will
confirm
the
(dv):
"total
family
income"
subset using statistical criteria.
[income98]
generalizability of the results. Use a level of significance of 0.05.
From the list of variables "number of hours worked in the past week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
degree" [degree], the best predictors of "total family income" [income98] are "highest
academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic
degree and occupational prestige score have a moderate relationship to total family
income.
The most important predictor of total family income is occupational prestige score. The
The best predictors are the variables
secondthat
mostwill
important
predictor of total family income is highest academic degree.
be meet the statistical
criteria for inclusion in the model.
Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents who had more prestigious occupations had higher total family incomes.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 3 - 2
Slide 49
In order
for a problem
to be
we
26. In the dataset GSS2000.sav, is the following
statement
true, false,
ortrue,
an incorrect
find:with missing data, violation of
application of a statistic? Assume that there iswill
no have
problem
•a statistically significant relationship
assumptions, or outliers, and that the split sample
validation will confirm the
between the included ivs and the dv
generalizability of the results. Use a level of significance
of 0.05.
•a relationship
of the correct strength
From the list of variables "number of hours worked in the past week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
degree" [degree], the best predictors of "total family income" [income98] are "highest
academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic
degree and occupational prestige score have a moderate relationship to total family
income.
The most important predictor of total family income is occupational prestige score. The
second most important predictor of total family income is highest academic degree.
Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents
who had of
more
The importance
the prestigious
variables is occupations had higher total family incomes.
1.
2.
3.
4.
provided by the stepwise order of entry
of the variable into the regression
analysis.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Dissecting problem 3 - 3
Slide 50
26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect
application of a statistic? Assume that there is no problem with missing data, violation of
assumptions, or outliers, and that
therelationship
split sample
validation
confirm
the
The
between
eachwill
of the
independent
variables
after theofcontrol
generalizability of the results. Use
a levelentered
of significance
0.05. variables and
the dependent variable must be statistically
significant
interpreted
correctly.
From the list of variables "number
of hours and
worked
in the past
week" [hrs1], "occupational
prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic
Since statistical significance of a variable's
degree" [degree], the best predictors
of "total family income" [income98] are "highest
contribution toward explaining the variance in the
academic degree" [degree] and "occupational
prestige
score" always
[prestg80].
dependent variable
is almost
used Highest
as the academic
degree and occupational prestigecriteria
score have
a moderate
relationship
to total family
for inclusion,
the statistical
significance
of
the relationships is usually assured.
income.
The most important predictor of total family income is occupational prestige score. The
second most important predictor of total family income is highest academic degree.
Survey respondents who had higher academic degrees had higher total family incomes.
Survey respondents who had more prestigious occupations had higher total family incomes.
1.
2.
3.
4.
True
True with caution
False
Inappropriate application of a statistic
SW388R7
Data Analysis &
Computers II
Request a stepwise multiple regression
Slide 51
To compute a multiple
regression in SPSS, select
the Regression | Linear
command from the Analyze
menu.
SW388R7
Data Analysis &
Computers II
Slide 52
Specify variables and method for selecting
variables
First, move the
dependent variable
income98 to the
Dependent text box.
Second, move the
independent variables to
control for hrs1,
prestg80, educ, and
degree to the
Independent(s) list box.
Third, select the Stepwise
method for entering the
variables into the analysis
from the drop down Method
menu.
SW388R7
Data Analysis &
Computers II
Open statistics options dialog box
Slide 53
First, click on the
Statistics… button to
specify the statistics
options that we want.
SW388R7
Data Analysis &
Computers II
Specify the statistics output options
Slide 54
First, mark the
checkboxes for
Estimates on
the Regression
Coefficients
panel.
Second, mark the
checkboxes for Model
Fit and Descriptives.
Third, click on
the Continue
button to close
the dialog box.
SW388R7
Data Analysis &
Computers II
Request the regression output
Slide 55
Click on the OK
button to
request the
regression
output.
SW388R7
Data Analysis &
Computers II
LEVEL OF MEASUREMENT
Slide 56
Multiple regression requires that the dependent variable be metric
and the independent variables be metric or dichotomous. "Total
family income" [income98] is an ordinal level variable, which
satisfies the level of measurement requirement if we follow the
convention of treating ordinal level variables as metric variables.
Since some data analysts do not agree with this convention, a note
of caution should be included in our interpretation.
"Number of hours worked in the past week" [hrs1], "occupational
prestige score" [prestg80], and "highest year of school completed"
[educ] are interval level variables, which satisfies the level of
measurement requirements for multiple regression analysis.
"Highest academic degree" [degree] is an ordinal level variable. If we
follow the convention of treating ordinal level variables as metric
variables, the level of measurement requirement for multiple
regression analysis is satisfied. Since some data analysts do not agree
with this convention, a note of caution should be included in our
interpretation.
SW388R7
Data Analysis &
Computers II
SAMPLE SIZE
Slide 57
Descriptive Statistics
TOTAL FAMILY INCOME
NUMBER OF HOURS
WORKED LAST WEEK
RS OCCUPATIONAL
PRESTIGE SCORE
(1980)
HIGHEST YEAR OF
SCHOOL COMPLETED
RS HIGHEST DEGREE
Mean
17.06
Std. Deviation
4.130
N
41.45
12.076
151
45.64
14.183
151
14.00
2.587
151
1.74
1.159
151
151
The minimum ratio of valid cases to independent
variables for stepwise multiple regression is 5 to 1.
With 151 valid cases and 4 independent variables, the
ratio for this analysis is 37.75 to 1, which satisfies the
minimum requirement.
However, the ratio of 37.75 to 1 does not satisfy the
preferred ratio of 50 to 1. A caution should be added
to the interpretation of the analysis and a split sample
validation should be conducted.
SW388R7
Data Analysis &
Computers II
Slide 58
RELATIONSHIP BETWEEN BEST PREDICTORS AND
THE DEPENDENT VARIABLE - 1
Variables Entered/Removeda
Model
1
The best subset of
predictors for total family
income included the
independent variables:
highest academic degree
and occupational prestige
score.
Variables
Entered
Variables
Removed
RS
HIGHEST
DEGREE
.
RS
OCCUPATI
ONAL
PRESTIGE
SCORE
(1980)
.
2
Method
Stepwis e
(Criteria:
Probabilit
y-of-F-to-e
nter <=
.050,
Probabilit
y-of-F-to-r
emove >=
.100).
Stepwis e
(Criteria:
Probabilit
y-of-F-to-e
nter <=
.050,
Probabilit
y-of-F-to-r
emove >=
.100).
a. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis &
Computers II
Slide 59
RELATIONSHIP BETWEEN BEST PREDICTORS
AND THE DEPENDENT VARIABLE - 2
The probability of the F statistic (29.146) for the
regression relationship which includes these variables is
<0.001, less than or equal to the level of significance of
0.05. We reject the null hypothesis that there is no
relationship between the best subset of independent
variables and the dependent variable (R² = 0). We support
the research hypothesis that there is a statistically
significant relationship between the best subset of
independent variables and the dependent variable.
ANOVAc
Model
1
2
Regress ion
Res idual
Total
Regress ion
Res idual
Total
Sum of
Squares
620.049
1938.415
2558.464
722.947
1835.517
2558.464
df
1
149
150
2
148
150
Mean Square
620.049
13.009
F
47.661
Sig.
.000 a
361.473
12.402
29.146
.000 b
a. Predictors : (Constant), RS HIGHEST DEGREE
b. Predictors : (Constant), RS HIGHEST DEGREE, RS OCCUPATIONAL PRESTIGE
SCORE (1980)
c. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis &
Computers II
Slide 60
RELATIONSHIP BETWEEN BEST PREDICTORS
AND THE DEPENDENT VARIABLE - 3
Model Summary
Model
1
2
R
R Square
.492 a
.242
b
.532
.283
Adjus ted
R Square
.237
.273
Std. Error of
the Es timate
3.607
3.522
a. Predictors : (Constant), RS HIGHEST DEGREE
b. Predictors : (Constant), RS HIGHEST DEGREE, RS
OCCUPATIONAL PRESTIGE SCORE (1980)
The Multiple R for the relationship between the subset of
independent variables that best predict the dependent variable
is 0.532, which would be characterized as moderate using the
rule of thumb than a correlation less than or equal to 0.20 is
characterized as very weak; greater than 0.20 and less than
or equal to 0.40 is weak; greater than 0.40 and less than or
equal to 0.60 is moderate; greater than 0.60 and less than or
equal to 0.80 is strong; and greater than 0.80 is very strong.
SW388R7
Data Analysis &
Computers II
Slide 61
RELATIONSHIP BETWEEN BEST PREDICTORS AND
THE DEPENDENT VARIABLE - 4
Variables Entered/Removeda
Based on the table of
"Variables Entered/
Removed," the most
important predictor of total
family income is highest
academic degree.
The second most important
predictor of total family
income is occupational
prestige score.
The importance of the
predictors stated in the
problem is not correct.
Model
1
Variables
Entered
Variables
Removed
RS
HIGHEST
DEGREE
.
RS
OCCUPATI
ONAL
PRESTIGE
SCORE
(1980)
.
2
Method
Stepwis e
(Criteria:
Probabilit
y-of-F-to-e
nter <=
.050,
Probabilit
y-of-F-to-r
emove >=
.100).
Stepwis e
(Criteria:
Probabilit
y-of-F-to-e
nter <=
.050,
Probabilit
y-of-F-to-r
emove >=
.100).
a. Dependent Variable: TOTAL FAMILY INCOME
SW388R7
Data Analysis &
Computers II
Answer to problem 3
Slide 62





The independent and dependent variables were
metric, interval or ordinal.
The ratio of cases to independent variables was
37.75 to 1.
The relationship of the included variables was
statistically significant and the strength of the
relationship was characterized correctly.
However, the order of entry, or importance, was not
stated correctly in the problem.
The answer to the question is false.
SW388R7
Data Analysis &
Computers II
Standard multiple regression - 1
Slide 63
The following is a guide to the decision process for answering
problems about standard multiple regression analysis:
Dependent variable
metric?
Independent variables
metric or dichotomous?
No
Inappropriate
application of
a statistic
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
least 5 to 1?
Yes
Probability of ANOVA test of
regression less than/equal to
level of significance?
Yes
No
False
SW388R7
Data Analysis &
Computers II
Standard multiple regression - 2
Slide 64
Strength of relationship for
included variables
interpreted correctly?
No
False
Yes
Probability of relationship
between each IV and DV
<= level of significance?
No
False
Yes
Direction of relationship
between each IV and DV
interpreted correctly?
Yes
No
False
SW388R7
Data Analysis &
Computers II
Standard multiple regression - 3
Slide 65
Any independent variable or
dependent variable ordinal
level of measurement?
Yes
True with caution
No
Ratio of cases to independent
variables at preferred sample
size of at least 15 to 1?
Yes
True
No
True with caution
SW388R7
Data Analysis &
Computers II
Hierarchical regression - 1
Slide 66
The following is a guide to the decision process for answering
problems about hierarchical regression analysis:
Dependent variable
metric?
Independent variables
metric or dichotomous?
No
Inappropriate
application of
a statistic
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
least 5 to 1?
Yes
Probability of ANOVA test
of regression less
than/equal to level of
significance?
Yes
No
False
SW388R7
Data Analysis &
Computers II
Hierarchical regression - 2
Slide 67
Probability of F test of for
change in R² less than or
equal to level of significance?
No
False
Yes
Change in R² correctly
reported and interpreted?
No
False
Yes
Probability of relationship
between each IV added after
controls and DV less than or
equal to level of significance?
Yes
No
False
SW388R7
Data Analysis &
Computers II
Hierarchical regression - 3
Slide 68
Direction of relationship
between each IV added
after controls and DV
interpreted correctly?
No
False
Yes
Any independent variable or
dependent variable ordinal
level of measurement?
Yes
True with caution
No
Ratio of cases to independent
variables at preferred sample
size of at least 15 to 1?
Yes
True
No
True with caution
SW388R7
Data Analysis &
Computers II
Stepwise regression - 1
Slide 69
The following is a guide to the decision process for answering
problems about stepwise regression analysis:
Dependent variable
metric?
Independent variables
metric or dichotomous?
No
Inappropriate
application of
a statistic
No
Inappropriate
application of
a statistic
Yes
Ratio of cases to
independent variables at
least 5 to 1?
Yes
Is the list of independent
variables selected for
inclusion correct?
Yes
No
False
SW388R7
Data Analysis &
Computers II
Stepwise regression - 2
Slide 70
Probability of ANOVA test of
regression less than/equal to
level of significance?
No
False
Yes
Strength of relationship for
included variables interpreted
correctly?
No
False
Yes
Is the stated order of
importance independent
variables correct?
Yes
No
False
SW388R7
Data Analysis &
Computers II
Stepwise regression - 3
Slide 71
Yes
Probability of relationship
between each included IV
and DV less than or equal to
level of significance?
No
False
Yes
Direction of relationship
between each included IV
and DV interpreted
correctly?
Yes
No
False
SW388R7
Data Analysis &
Computers II
Stepwise regression - 4
Slide 72
Yes
Any independent variable or
dependent variable ordinal
level of measurement?
Yes
True with caution
No
Ratio of cases to independent
variables at preferred sample
size of at least 50 to 1?
Yes
True
No
True with caution