Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hacking PROCESS for Bootstrap Inference in Moderation Analysis Andrew F. Hayes The Ohio State University Unpublished White Paper, DRAFT DATE: January 18, 2015 Abstract Bootstrap inference for indirect effects is implemented in the PROCESS macro for SPSS and SAS for models that include a mediation component of some kind (models 4 through 76). Bootstrap inference is not available in moderation-only models (i.e., models that contain a moderation component but not an indirect effect). This document describes a PROCESS hack to generate bootstrap confidence intervals for regression coefficients in moderation-only models, with an emphasis on bootstrap inference for the regression coefficient for a product term in a simple moderation model. Models 1, 2, and 3 are the only models built into PROCESS dedicated exclusively to moderation analysis. They would be used when an investigator is interested in examining the extent to which X’s effect on Y is linearly related to a moderator M (model 1) or two moderators M and W additively (model 2) or multiplicatively (model 3). The only inferential procedures implemented in these models are those based on ordinary least squares theory and estimation. No options are available for bootstrap inference, like is available in models that include a mediation component—models 4 through 76. What if you want to employ bootstrapping methods for inference in a model that includes only a moderation component? Unfortunately, you can’t just specify bootstrap confidence intervals using the boot option in PROCESS because PROCESS will ignore it if your model doesn’t involve the estimation of an indirect effect. But a simple hack described in this document works around this problem. It relies on the save function in PROCESS that saves bootstrap estimates of regression coefficients to a file, and a recognition that most of the models that are built into PROCESS that contain a mediation component also contain a moderation component. I illustrate this hack by generating a bootstrap confidence interval for the regression coefficient for the interaction in the example used in Chapter 7 of Introduction to Mediation, Moderation, and Conditional Process Analysis (Hayes, 2013). This chapter introduces moderation principles and describes a moderation analysis examining if the effect of a female Andrew F. Hayes, Department of Psychology, The Ohio State University, Columbus, OH 43210 USA, [email protected], www.afhayes.com. Learn more about the use of PROCESS for moderation analysis by taking a class from Andrew Hayes offered through Statistical Horizons (www.statisticalhorizons.com). c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE. ⃝ 2 lawyer’s decision whether or not to protest a personnel decision (X) differentially affected how she was perceived (Y ) as a function of the perceiver’s beliefs about the pervasiveness of sex discrimination in society (M ). In the data, X is a dichotomous variable (PROTEST in the protest data file) coding whether participants were told that the lawyer protested the discriminatory action against her (X = 1) or accepted it without protesting (X = 0) and continued her job at the law firm. The dependent variable Y is how much the participant reported liking the attorney (LIKING in the data file), and the moderator M is the participant’s score on the Modern Sexism Scale (SEXISM in the data file). In the analysis, PROCESS model 1 is used to estimate Y = i1 + b1 X + b2 M + b3 XM + eY (1) and doing so yields i1 = 7.706, b1 = −3.773, b2 = −0.473, and b3 = 0.834 (see the PROCESS output in Figure 1). Most pertinent to the analysis, the regression coefficient for XM (b3 ) is statistically different from zero, with a 95% confidence interval of 0.352 to 1.316. So the effect of the lawyer’s decision to protest or not on how she was perceived depends on (i.e., is moderated by) the perceiver’s beliefs about the pervasiveness of sex discrimination in society. This inferential test, whether framed in terms of a confidence interval or a hypothesis test, makes all the standard assumptions of OLS regression including normality and homoscedasticity of the errors in estimation. If you’d rather not make such assumptions when you conduct an inferential test, a bootstrap confidence interval can be a good alternative. But PROCESS does not offer bootstrapping as an inferential option in model 1 (or 2 or 3). If you want a bootstrap confidence interval for this test, you need to do something different. The key to the hack presented here is appreciating that the simple regression model is a component of several models in PROCESS that have a mediation component, thereby allowing you to use the boot and save options to generate the bootstrap distribution of b3 in equation 1. PROCESS model 74 is one possibility, although other models could be used (and are discussed toward the end of this document). Model 74 is a conditional process model in which X is modeled to exert an effect on Y indirectly through M as well as directly, with moderation of the effect of M on Y by X. As discussed in the PROCESS documentation and in Chapter 12 of Hayes (2013) but using slightly different symbolic notation here, this model in equation form is M = i1 + aX + eM Y = i2 + b1 X + b2 M + b3 XM + eY (2) Observe that in model 74, the model of Y (equation 2) is just a simple moderation model, which is what we really want to estimate. We can do so using the PROCESS command below, ignoring all the output it produces because we really don’t need it. process vars=protest sexism liking/y=liking/x=protest/m=sexism/model=74/boot=10000/save=1. The equivalent code in PROCESS for SAS is c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE. ⃝ ***************** PROCESS Procedure for SPSS Release 2.11 **************** Written by Andrew F. Hayes, Ph.D. www.afhayes.com Documentation available in Hayes (2013). www.guilford.com/p/hayes3 ************************************************************************** Model Y X M = = = = 1 liking protest sexism Sample size 129 ************************************************************************** Outcome: liking Model Summary R .3654 R-sq .1335 F 6.4190 df1 3.0000 df2 125.0000 p .0004 Model constant sexism protest int_1 coeff 7.7062 -.4725 -3.7727 .8336 se 1.0449 .2038 1.2541 .2436 t 7.3750 -2.3184 -3.0084 3.4224 p .0000 .0220 .0032 .0008 LLCI 5.6382 -.8758 -6.2546 .3515 ULCI 9.7743 -.0692 -1.2907 1.3156 Interactions: int_1 protest X sexism R-square increase due to interaction(s): R2-chng F df1 df2 int_1 .0812 11.7126 1.0000 125.0000 p .0008 ************************************************************************* Conditional effect of X on Y at values of the moderator(s) SEXISM 4.3332 5.1170 5.9007 Effect -0.1607 0.4926 1.1459 se 0.2629 0.1872 0.2718 t -0.6113 2.6312 4.2156 p 0.5421 0.0096 0.0000 LLCI -0.6809 0.1221 0.6079 ULCI 0.3595 0.8632 1.6839 Values for quantitative moderators are the mean and plus/minus one SD from mean ******************** ANALYSIS NOTES AND WARNINGS ************************* Level of confidence for all confidence intervals in output: 95.00 Figure 1 . Output from PROCESS model 1 for a simple moderation model. %process (data=protest,vars=protest sexism liking,y=liking,x=protest,m=sexism,model=74, boot=10000,save=modboot); All we want out of PROCESS at this point is the file of bootstrap estimates of the regression coefficients that this command produces. Nevertheless, I provide the output in Figure 2 so you can verify that the results it produces for the model of Y are identical to the results for the model generated by PROCESS model 1 in Figure 1. It is also helpful in explaining where to find the bootstrap estimates in the file this command produces. Observe that the regression coefficient for the product of X and M is indeed 0.834, just as produced by model 1. Indeed, all of the regression coefficients, standard errors, t and p-values, and confidence intervals in this section of model 74 output correspond to the PROCESS model 1 output. 3 c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE. ⃝ 4 ***************** PROCESS Procedure for SPSS Release 2.11 **************** Written by Andrew F. Hayes, Ph.D. www.afhayes.com Documentation available in Hayes (2013). www.guilford.com/p/hayes3 ************************************************************************** Model = 74 Y = liking X = protest M = sexism Sample size 129 ************************************************************************** Outcome: sexism Model Summary R .0402 R-sq .0016 F .2058 df1 1.0000 df2 127.0000 p .6509 Model coeff 5.0710 .0674 constant protest se .1228 .1487 t 41.2998 .4536 p .0000 .6509 LLCI 4.8280 -.2267 ULCI 5.3139 .3616 ************************************************************************** Outcome: liking Model Summary R .3654 R-sq .1335 F 6.4190 df1 3.0000 df2 125.0000 p .0004 Model constant sexism protest int_1 coeff 7.7062 -.4725 -3.7727 .8336 se 1.0449 .2038 1.2541 .2436 t 7.3750 -2.3184 -3.0084 3.4224 p .0000 .0220 .0032 .0008 LLCI 5.6382 -.8758 -6.2546 .3515 ULCI 9.7743 -.0692 -1.2907 1.3156 Interactions: int_1 sexism X protest ******************** DIRECT AND INDIRECT EFFECTS ************************* Indirect effect(s) of X on Y: This is a simple moderation model equivalent to PROCESS model 1 Mediator sexism Effect -.0319 Boot SE .0878 BootLLCI -.3262 BootULCI .0773 ************************************************************************** Figure 2 . Excerpt of output from a PROCESS hack to produce a simple moderated regression using model 74. Compare to Figure 1. c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE. ⃝ Figure 3 . The SPSS data file containing bootstrap regression coefficients produced by the SAVE option. Statistics COL3 N Valid Missing Percentiles 2.5 97.5 COL4 COL5 COL6 10000 10000 10000 0 0 0 10000 0 4.9025 -1.1496 -7.1123 .2134 10.8640 .1079 -.7484 1.5399 Figure 4 . The distribution of 10,000 bootstrap estimates of the four regression coefficients in Equation 1. The save option used in the PROCESS command above produces a data file of 10,000 bootstrap estimates of each of the regression coefficients in the model, as in Figure 3. As can be seen in Figure 3, there are six columns in the bootstrap file. The estimates for b3 are in the sixth column because b3 is the sixth regression parameter estimate in the PROCESS model 74 output when you scan it from top to bottom. For a percentile bootstrap confidence interval, we need to find the two values in the 10,000 bootstrap estimates that define the 2.5th and 97.5th percentile of the distribution. Although our problem requires this information only for b3 , the SPSS code below generates these percentiles not just for the estimates of b3 but for every regression parameter in the moderation component of the model (i.e., i1 , b1 , b2 , and b3 in equation 1). frequencies variables = col3 to col6/format notable/statistics mean stddev /percentiles 2.5 97.5. In SAS, use proc means data=modboot;var col3 col4 col5 col6;run; proc univariate data=modboot noprint; var col3 col4 col5 col6; output out=percent pctlpts=2.5 97.5 pctlpre=col3P col4P col5P col6P; 5 c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE. ⃝ 6 run; proc print data=percent;run; The SPSS version of this code generates the output in Figure 4. As can be seen in the section of output for COL6, 95% of the bootstrap estimates for b3 were between 0.213 and 1.540. This is a bonafide 95% bootstrap confidence interval for the regression coefficient for XM in the simple moderation model represented by equation 1. Bootstrap Confidence Intervals for Conditional Effects PROCESS models 1, 2, and 3 automatically probe an interaction (whether statistically significant or not) by producing estimates and inferential tests of the effect of X on Y conditioned on various values of the moderator or moderator(s)—the so-called “simple slopes”. In this example, PROCESS model 1 generates the estimate of the effect of the lawyer’s decision to protest or not on how she was perceived for participants “relatively low” (a standard deviation below the sample mean), “moderate” (the sample mean), and “relatively high” (a standard deviation above the mean) in their perceived pervasiveness of sex discrimination (see the bottom of Figure 1). The inferential tests PROCESS generates are “normal theory”-based tests and carry with them all the standard assumptions of regression. You might want to generate bootstrap confidence intervals for these conditional effects if these assumptions bother you or you’d just rather not make them. In equation 1, the conditional effect of X on Y conditioned on a given value of M is θ(X→Y )|M = b1 + b3 M By substituting values of M into this equation, one generates the effect of X on Y at that value of M . PROCESS does this automatically and also produces the standard error, a t and p-value, and normal theory confidence intervals. To generate bootstrap confidence intervals for the conditional effect of X, we can use the file of bootstrap estimates already generated by this hack. In the bootstrap coefficient file produced earlier, b1 is in column 5 and b3 is in column 6. Thus, the SPSS code below creates three new columns that are bootstrap estimates of the conditional effect of X for those a standard deviation below the sample mean perceived pervasiveness of sex discrimination (M = 4.333), at the mean (M = 5.117) and a standard deviation above the mean (M = 5.901). You could substitute different values of M in the code below if you wanted. compute low=col5+col6*4.333. compute mod=col5+col6*5.117. compute high=col5+col6*5.901. frequencies variables = low mod high/format notable/statistics mean stddev /percentiles 2.5 97.5. The resulting output can be found in Figure 5. The 2.5th and 97.5th percentiles are the lower and upper bounds, respectively, of 95% bootstrap confidence intervals for the conditional effects at low, moderate, and high values of the moderator. For those c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE. ⃝ Statistics N Valid Missing Mean Std. Deviation Percentiles low mod 10000 10000 high 0 0 0 -.1629 .4960 1.1549 .42581 10000 .24478 .22550 2.5 -.6472 .0639 .3726 97.5 .3125 .9558 2.0284 Figure 5 . The distribution of 10,000 bootstrap estimates of the conditional effect of the decision to protest on liking for those low, moderate, and high in perceived pervasiveness of sex discrimination low in perceived pervasiveness of sex discrimination, the conditional effect of the lawyer’s behavior on how she was perceived is between −0.647 and 0.313 with 95% confidence. The corresponding bootstrap confidence intervals for the conditional effect among those moderate and high in perceived pervasiveness of sex discrimination are 0.064 to 0.956, and 0.373 and 2.028, respectively. SAS users can implement this trick using the code below. data modboot;set modboot; low=col5+col6*4.333;mod=col5+col6*5.117;high=col5+col6*5.901;run; proc means data=modboot;var low mod high;run; proc univariate data=modboot noprint; var low mod high; output out=percent pctlpts=2.5 97.5 pctlpre=lowP modP highP;run; proc print data=percent;run; Variants of This Hack There are some variants of this hack for generating bootstrap confidence intervals for regression coefficients in a moderation-only model that are equally effective. For example, you could produce the product of X and M in the data yourself and use it as a covariate in a simple mediation model, which is model 4 in PROCESS. The SPSS code below would work: compute proxsex=protest*sexism. process vars=protest sexism liking proxsex/y=liking/x=protest/m=sexism/model=4 /boot=10000/save=1. In SAS, try data protest;set protest;proxsex=protest*sexism;run; %process (data=protest,vars=protest sexism liking proxsex,y=liking,x=protest,m=sexism, model=4,boot=10000,save=1); 7 c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE. ⃝ 8 In standard simple mediation model notation such as in Chapter 4 of Hayes (2013), this PROCESS command estimates M = i1 + aX + eM Y = i2 + c′ X + bM + c′2 XM + eY (3) but notice that equation 3 is a simple moderation model with M moderating the effect of X on Y , just like equation 1. So the bootstrap distribution of c′2 produced by this code is the bootstrap distribution of b3 in equation 1. Once the file of bootstrap regression coefficients is generated, a 95% bootstrap confidence interval can be constructed as above. A disadvantage of this approach relative to using model 74 is that model 4 does not contain a moderation component as far as PROCESS knows. Thus, the center option to automate the mean centering of X and M is not available, as this option is available only in models that contain a moderation component. Although you could specify center=1 in the command line, PROCESS would ignore it because PROCESS does not recognize model 4 as a moderation model. As discussed in section 9.4 of Hayes (2013), mean centering is not required, but if it is something you want to do, you’d have to manually mean center X and M , then generate the product, and use mean centered X and M in the PROCESS command above. By contrast the approach using model 74 would allow you to automate the mean centering of X and M with the center command. Another variant on this hack is to use PROCESS model 5. This is a simple mediation model that allows the direct effect of X on Y to be moderated. In equation form, model 5 is M = i1 + aX + eM Y = i2 + c′1 X + c′2 W + c′3 XW + bM + eY (4) Notice that equation 4 is a simple moderation model like equation 1 estimated by PROCESS model 1 but with M as a covariate. So this hack works only if you have at least one covariate in the moderation model because you have to specify a variable as M in model 5. But this approach doesn’t require the construction of a product prior to executing PROCESS as does the variant above using model 4, the center option can be used if desired like when using model 74, and it also produces estimates of the conditional effect of X in the output, just as model 1 does. To illustrate, suppose there were two additional variables in the PROTEST data file containing the age of the participant as well as his or her education, and both of these were to be used as covariates. The PROCESS code below estimates the model and generate 10,000 bootstrap estimates of every regression parameter: process vars=protest sexism liking age educ/y=liking/x=protest/m=age/w=sexism/model=5 /boot=10000/save=1. The equivalent code in SAS is c COPYRIGHT 2015 BY ANDREW F. HAYES. DO NOT POST ONLINE. ⃝ %process (data=protest,vars=protest sexism liking age educ,y=liking,x=protest,m=age, w=sexism,model=5,boot=10000,save=1); In this code, I specified one of the covariates as M , which PROCESS will treat as a mediator, and I left the other covariate unassigned. It doesn’t matter which covariate is assigned to M . The output will include a model of M and direct and indirect effects, but all this can be ignored as irrelevant. The goal is to generate a file that can be used to construct bootstrap confidence intervals for the regression coefficients in the model of Y . This code accomplishes that while also producing the table of conditional effects of X that PROCESS model 1 generates. References Hayes, A. F. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. New York, NY: The Guilford Press. 9