Download Hacking PROCESS for Bootstrap Inference in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Transcript
Hacking PROCESS for Bootstrap Inference in
Moderation Analysis
Andrew F. Hayes
The Ohio State University
Unpublished White Paper, DRAFT DATE: January 18, 2015
Abstract
Bootstrap inference for indirect effects is implemented in the PROCESS
macro for SPSS and SAS for models that include a mediation component
of some kind (models 4 through 76). Bootstrap inference is not available
in moderation-only models (i.e., models that contain a moderation component but not an indirect effect). This document describes a PROCESS
hack to generate bootstrap confidence intervals for regression coefficients in
moderation-only models, with an emphasis on bootstrap inference for the
regression coefficient for a product term in a simple moderation model.
Models 1, 2, and 3 are the only models built into PROCESS dedicated exclusively to
moderation analysis. They would be used when an investigator is interested in examining
the extent to which X’s effect on Y is linearly related to a moderator M (model 1) or two
moderators M and W additively (model 2) or multiplicatively (model 3).
The only inferential procedures implemented in these models are those based on ordinary least squares theory and estimation. No options are available for bootstrap inference,
like is available in models that include a mediation component—models 4 through 76. What
if you want to employ bootstrapping methods for inference in a model that includes only
a moderation component? Unfortunately, you can’t just specify bootstrap confidence intervals using the boot option in PROCESS because PROCESS will ignore it if your model
doesn’t involve the estimation of an indirect effect. But a simple hack described in this
document works around this problem. It relies on the save function in PROCESS that
saves bootstrap estimates of regression coefficients to a file, and a recognition that most of
the models that are built into PROCESS that contain a mediation component also contain
a moderation component.
I illustrate this hack by generating a bootstrap confidence interval for the regression
coefficient for the interaction in the example used in Chapter 7 of Introduction to Mediation,
Moderation, and Conditional Process Analysis (Hayes, 2013). This chapter introduces moderation principles and describes a moderation analysis examining if the effect of a female
Andrew F. Hayes, Department of Psychology, The Ohio State University, Columbus, OH 43210 USA,
[email protected], www.afhayes.com. Learn more about the use of PROCESS for moderation analysis by
taking a class from Andrew Hayes offered through Statistical Horizons (www.statisticalhorizons.com).
c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE.
⃝
2
lawyer’s decision whether or not to protest a personnel decision (X) differentially affected
how she was perceived (Y ) as a function of the perceiver’s beliefs about the pervasiveness
of sex discrimination in society (M ). In the data, X is a dichotomous variable (PROTEST
in the protest data file) coding whether participants were told that the lawyer protested
the discriminatory action against her (X = 1) or accepted it without protesting (X = 0)
and continued her job at the law firm. The dependent variable Y is how much the participant reported liking the attorney (LIKING in the data file), and the moderator M is the
participant’s score on the Modern Sexism Scale (SEXISM in the data file).
In the analysis, PROCESS model 1 is used to estimate
Y = i1 + b1 X + b2 M + b3 XM + eY
(1)
and doing so yields i1 = 7.706, b1 = −3.773, b2 = −0.473, and b3 = 0.834 (see the PROCESS
output in Figure 1). Most pertinent to the analysis, the regression coefficient for XM (b3 )
is statistically different from zero, with a 95% confidence interval of 0.352 to 1.316. So the
effect of the lawyer’s decision to protest or not on how she was perceived depends on (i.e.,
is moderated by) the perceiver’s beliefs about the pervasiveness of sex discrimination in
society.
This inferential test, whether framed in terms of a confidence interval or a hypothesis
test, makes all the standard assumptions of OLS regression including normality and homoscedasticity of the errors in estimation. If you’d rather not make such assumptions when
you conduct an inferential test, a bootstrap confidence interval can be a good alternative.
But PROCESS does not offer bootstrapping as an inferential option in model 1 (or 2 or 3).
If you want a bootstrap confidence interval for this test, you need to do something different.
The key to the hack presented here is appreciating that the simple regression model
is a component of several models in PROCESS that have a mediation component, thereby
allowing you to use the boot and save options to generate the bootstrap distribution of b3
in equation 1. PROCESS model 74 is one possibility, although other models could be used
(and are discussed toward the end of this document). Model 74 is a conditional process
model in which X is modeled to exert an effect on Y indirectly through M as well as
directly, with moderation of the effect of M on Y by X. As discussed in the PROCESS
documentation and in Chapter 12 of Hayes (2013) but using slightly different symbolic
notation here, this model in equation form is
M = i1 + aX + eM
Y = i2 + b1 X + b2 M + b3 XM + eY
(2)
Observe that in model 74, the model of Y (equation 2) is just a simple moderation model,
which is what we really want to estimate. We can do so using the PROCESS command
below, ignoring all the output it produces because we really don’t need it.
process vars=protest sexism liking/y=liking/x=protest/m=sexism/model=74/boot=10000/save=1.
The equivalent code in PROCESS for SAS is
c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE.
⃝
***************** PROCESS Procedure for SPSS Release 2.11 ****************
Written by Andrew F. Hayes, Ph.D.
www.afhayes.com
Documentation available in Hayes (2013). www.guilford.com/p/hayes3
**************************************************************************
Model
Y
X
M
=
=
=
=
1
liking
protest
sexism
Sample size
129
**************************************************************************
Outcome: liking
Model Summary
R
.3654
R-sq
.1335
F
6.4190
df1
3.0000
df2
125.0000
p
.0004
Model
constant
sexism
protest
int_1
coeff
7.7062
-.4725
-3.7727
.8336
se
1.0449
.2038
1.2541
.2436
t
7.3750
-2.3184
-3.0084
3.4224
p
.0000
.0220
.0032
.0008
LLCI
5.6382
-.8758
-6.2546
.3515
ULCI
9.7743
-.0692
-1.2907
1.3156
Interactions:
int_1
protest
X
sexism
R-square increase due to interaction(s):
R2-chng
F
df1
df2
int_1
.0812
11.7126
1.0000
125.0000
p
.0008
*************************************************************************
Conditional effect of X on Y at values of the moderator(s)
SEXISM
4.3332
5.1170
5.9007
Effect
-0.1607
0.4926
1.1459
se
0.2629
0.1872
0.2718
t
-0.6113
2.6312
4.2156
p
0.5421
0.0096
0.0000
LLCI
-0.6809
0.1221
0.6079
ULCI
0.3595
0.8632
1.6839
Values for quantitative moderators are the mean and plus/minus one SD from mean
******************** ANALYSIS NOTES AND WARNINGS *************************
Level of confidence for all confidence intervals in output:
95.00
Figure 1 . Output from PROCESS model 1 for a simple moderation model.
%process (data=protest,vars=protest sexism liking,y=liking,x=protest,m=sexism,model=74,
boot=10000,save=modboot);
All we want out of PROCESS at this point is the file of bootstrap estimates of the regression
coefficients that this command produces. Nevertheless, I provide the output in Figure 2 so
you can verify that the results it produces for the model of Y are identical to the results
for the model generated by PROCESS model 1 in Figure 1. It is also helpful in explaining
where to find the bootstrap estimates in the file this command produces. Observe that the
regression coefficient for the product of X and M is indeed 0.834, just as produced by model
1. Indeed, all of the regression coefficients, standard errors, t and p-values, and confidence
intervals in this section of model 74 output correspond to the PROCESS model 1 output.
3
c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE.
⃝
4
***************** PROCESS Procedure for SPSS Release 2.11 ****************
Written by Andrew F. Hayes, Ph.D.
www.afhayes.com
Documentation available in Hayes (2013). www.guilford.com/p/hayes3
**************************************************************************
Model = 74
Y = liking
X = protest
M = sexism
Sample size
129
**************************************************************************
Outcome: sexism
Model Summary
R
.0402
R-sq
.0016
F
.2058
df1
1.0000
df2
127.0000
p
.6509
Model
coeff
5.0710
.0674
constant
protest
se
.1228
.1487
t
41.2998
.4536
p
.0000
.6509
LLCI
4.8280
-.2267
ULCI
5.3139
.3616
**************************************************************************
Outcome: liking
Model Summary
R
.3654
R-sq
.1335
F
6.4190
df1
3.0000
df2
125.0000
p
.0004
Model
constant
sexism
protest
int_1
coeff
7.7062
-.4725
-3.7727
.8336
se
1.0449
.2038
1.2541
.2436
t
7.3750
-2.3184
-3.0084
3.4224
p
.0000
.0220
.0032
.0008
LLCI
5.6382
-.8758
-6.2546
.3515
ULCI
9.7743
-.0692
-1.2907
1.3156
Interactions:
int_1
sexism
X
protest
******************** DIRECT AND INDIRECT EFFECTS *************************
Indirect effect(s) of X on Y:
This is a simple moderation model
equivalent to PROCESS model 1
Mediator
sexism
Effect
-.0319
Boot SE
.0878
BootLLCI
-.3262
BootULCI
.0773
**************************************************************************
Figure 2 . Excerpt of output from a PROCESS hack to produce a simple moderated regression using model 74. Compare to Figure 1.
c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE.
⃝
Figure 3 . The SPSS data file containing bootstrap regression coefficients produced by the
SAVE option.
Statistics
COL3
N
Valid
Missing
Percentiles
2.5
97.5
COL4
COL5
COL6
10000
10000
10000
0
0
0
10000
0
4.9025
-1.1496
-7.1123
.2134
10.8640
.1079
-.7484
1.5399
Figure 4 . The distribution of 10,000 bootstrap estimates of the four regression coefficients
in Equation 1.
The save option used in the PROCESS command above produces a data file of
10,000 bootstrap estimates of each of the regression coefficients in the model, as in Figure
3. As can be seen in Figure 3, there are six columns in the bootstrap file. The estimates
for b3 are in the sixth column because b3 is the sixth regression parameter estimate in the
PROCESS model 74 output when you scan it from top to bottom. For a percentile bootstrap
confidence interval, we need to find the two values in the 10,000 bootstrap estimates that
define the 2.5th and 97.5th percentile of the distribution. Although our problem requires
this information only for b3 , the SPSS code below generates these percentiles not just for
the estimates of b3 but for every regression parameter in the moderation component of the
model (i.e., i1 , b1 , b2 , and b3 in equation 1).
frequencies variables = col3 to col6/format notable/statistics mean stddev
/percentiles 2.5 97.5.
In SAS, use
proc means data=modboot;var col3 col4 col5 col6;run;
proc univariate data=modboot noprint;
var col3 col4 col5 col6;
output out=percent pctlpts=2.5 97.5 pctlpre=col3P col4P col5P col6P;
5
c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE.
⃝
6
run;
proc print data=percent;run;
The SPSS version of this code generates the output in Figure 4. As can be seen in the
section of output for COL6, 95% of the bootstrap estimates for b3 were between 0.213 and
1.540. This is a bonafide 95% bootstrap confidence interval for the regression coefficient for
XM in the simple moderation model represented by equation 1.
Bootstrap Confidence Intervals for Conditional Effects
PROCESS models 1, 2, and 3 automatically probe an interaction (whether statistically significant or not) by producing estimates and inferential tests of the effect of X on
Y conditioned on various values of the moderator or moderator(s)—the so-called “simple
slopes”. In this example, PROCESS model 1 generates the estimate of the effect of the
lawyer’s decision to protest or not on how she was perceived for participants “relatively
low” (a standard deviation below the sample mean), “moderate” (the sample mean), and
“relatively high” (a standard deviation above the mean) in their perceived pervasiveness of
sex discrimination (see the bottom of Figure 1). The inferential tests PROCESS generates
are “normal theory”-based tests and carry with them all the standard assumptions of regression. You might want to generate bootstrap confidence intervals for these conditional
effects if these assumptions bother you or you’d just rather not make them.
In equation 1, the conditional effect of X on Y conditioned on a given value of M is
θ(X→Y )|M = b1 + b3 M
By substituting values of M into this equation, one generates the effect of X on Y at that
value of M . PROCESS does this automatically and also produces the standard error, a
t and p-value, and normal theory confidence intervals. To generate bootstrap confidence
intervals for the conditional effect of X, we can use the file of bootstrap estimates already
generated by this hack. In the bootstrap coefficient file produced earlier, b1 is in column
5 and b3 is in column 6. Thus, the SPSS code below creates three new columns that are
bootstrap estimates of the conditional effect of X for those a standard deviation below
the sample mean perceived pervasiveness of sex discrimination (M = 4.333), at the mean
(M = 5.117) and a standard deviation above the mean (M = 5.901). You could substitute
different values of M in the code below if you wanted.
compute low=col5+col6*4.333.
compute mod=col5+col6*5.117.
compute high=col5+col6*5.901.
frequencies variables = low mod high/format notable/statistics mean stddev
/percentiles 2.5 97.5.
The resulting output can be found in Figure 5. The 2.5th and 97.5th percentiles
are the lower and upper bounds, respectively, of 95% bootstrap confidence intervals for
the conditional effects at low, moderate, and high values of the moderator. For those
c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE.
⃝
Statistics
N
Valid
Missing
Mean
Std. Deviation
Percentiles
low
mod
10000
10000
high
0
0
0
-.1629
.4960
1.1549
.42581
10000
.24478
.22550
2.5
-.6472
.0639
.3726
97.5
.3125
.9558
2.0284
Figure 5 . The distribution of 10,000 bootstrap estimates of the conditional effect of the
decision to protest on liking for those low, moderate, and high in perceived pervasiveness
of sex discrimination
low in perceived pervasiveness of sex discrimination, the conditional effect of the lawyer’s
behavior on how she was perceived is between −0.647 and 0.313 with 95% confidence.
The corresponding bootstrap confidence intervals for the conditional effect among those
moderate and high in perceived pervasiveness of sex discrimination are 0.064 to 0.956, and
0.373 and 2.028, respectively. SAS users can implement this trick using the code below.
data modboot;set modboot;
low=col5+col6*4.333;mod=col5+col6*5.117;high=col5+col6*5.901;run;
proc means data=modboot;var low mod high;run;
proc univariate data=modboot noprint;
var low mod high;
output out=percent pctlpts=2.5 97.5 pctlpre=lowP modP highP;run;
proc print data=percent;run;
Variants of This Hack
There are some variants of this hack for generating bootstrap confidence intervals for
regression coefficients in a moderation-only model that are equally effective. For example,
you could produce the product of X and M in the data yourself and use it as a covariate
in a simple mediation model, which is model 4 in PROCESS. The SPSS code below would
work:
compute proxsex=protest*sexism.
process vars=protest sexism liking proxsex/y=liking/x=protest/m=sexism/model=4
/boot=10000/save=1.
In SAS, try
data protest;set protest;proxsex=protest*sexism;run;
%process (data=protest,vars=protest sexism liking proxsex,y=liking,x=protest,m=sexism,
model=4,boot=10000,save=1);
7
c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE.
⃝
8
In standard simple mediation model notation such as in Chapter 4 of Hayes (2013), this
PROCESS command estimates
M = i1 + aX + eM
Y = i2 + c′ X + bM + c′2 XM + eY
(3)
but notice that equation 3 is a simple moderation model with M moderating the effect of X
on Y , just like equation 1. So the bootstrap distribution of c′2 produced by this code is the
bootstrap distribution of b3 in equation 1. Once the file of bootstrap regression coefficients
is generated, a 95% bootstrap confidence interval can be constructed as above.
A disadvantage of this approach relative to using model 74 is that model 4 does not
contain a moderation component as far as PROCESS knows. Thus, the center option to
automate the mean centering of X and M is not available, as this option is available only
in models that contain a moderation component. Although you could specify center=1 in
the command line, PROCESS would ignore it because PROCESS does not recognize model
4 as a moderation model. As discussed in section 9.4 of Hayes (2013), mean centering is
not required, but if it is something you want to do, you’d have to manually mean center
X and M , then generate the product, and use mean centered X and M in the PROCESS
command above. By contrast the approach using model 74 would allow you to automate
the mean centering of X and M with the center command.
Another variant on this hack is to use PROCESS model 5. This is a simple mediation
model that allows the direct effect of X on Y to be moderated. In equation form, model 5
is
M = i1 + aX + eM
Y = i2 + c′1 X + c′2 W + c′3 XW + bM + eY
(4)
Notice that equation 4 is a simple moderation model like equation 1 estimated by PROCESS
model 1 but with M as a covariate. So this hack works only if you have at least one covariate
in the moderation model because you have to specify a variable as M in model 5. But this
approach doesn’t require the construction of a product prior to executing PROCESS as
does the variant above using model 4, the center option can be used if desired like when
using model 74, and it also produces estimates of the conditional effect of X in the output,
just as model 1 does.
To illustrate, suppose there were two additional variables in the PROTEST data file
containing the age of the participant as well as his or her education, and both of these were
to be used as covariates. The PROCESS code below estimates the model and generate
10,000 bootstrap estimates of every regression parameter:
process vars=protest sexism liking age educ/y=liking/x=protest/m=age/w=sexism/model=5
/boot=10000/save=1.
The equivalent code in SAS is
c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE.
⃝
%process (data=protest,vars=protest sexism liking age educ,y=liking,x=protest,m=age,
w=sexism,model=5,boot=10000,save=1);
In this code, I specified one of the covariates as M , which PROCESS will treat as
a mediator, and I left the other covariate unassigned. It doesn’t matter which covariate
is assigned to M . The output will include a model of M and direct and indirect effects,
but all this can be ignored as irrelevant. The goal is to generate a file that can be used to
construct bootstrap confidence intervals for the regression coefficients in the model of Y .
This code accomplishes that while also producing the table of conditional effects of X that
PROCESS model 1 generates.
References
Hayes, A. F. (2013). Introduction to mediation, moderation, and conditional process analysis: A
regression-based approach. New York, NY: The Guilford Press.
9