library(DiagrammeR)
grViz(diagram = "digraph flowchart {
rankdir=LR;
node [fontname = arial, shape = oval]
tab1 [label = '@@1']
tab2 [label = '@@2']
tab1 -> tab2 [dir='both'];
}
[1]: 'X'
[2]: 'Y'
")
4. Mediation
Emma Mills
This page is under construction for 24/25 and may be subject to change before the teaching week.
Lecture
Watch the lecture on mediation theory here
Watch the mediation demonstration here
Differences between means
T-tests helps answer the question: + ‘Is there a difference between two groups in performance on X?’
ANOVA helps answer the question: + ‘Is there a difference between two or more groups / factors in performance on X?’
With a 3rd variable we can see if this affects performace at different levels – we can introduce an interaction term…
Association: Correlation -> Regression
Measures of association help answer the question ‘What is the relationship between two variables?’
Correlation looks at pairs of variables
Regressions chooses one as the outcome variable & one as the predictor variable + Simple regression = 1 outcome and 1 predictor
grViz(diagram = "digraph flowchart {
rankdir=LR;
node [fontname = arial, shape = oval]
tab1 [label = '@@1']
tab2 [label = '@@2']
tab1 -> tab2;
}
[1]: 'X'
[2]: 'Y'
")
- Multiple regression = 1 outcome and 1+ predictor
- Interactions also possible
grViz(diagram = "digraph flowchart {
rankdir=LR;
node [fontname = arial, shape = oval]
tab1 [label = '@@1']
tab2 [label = '@@2']
tab3 [label = '@@3']
tab1 -> tab3;
tab2 -> tab3;
{rank=same; tab1, tab2}
}
[1]: 'X1'
[2]: 'X2'
[3]: 'Y'
")
Mediation: a causal model
Mediation helps answer the question: + ‘how does a predictor variable (X) influence / effect the outcome variable (Y)?’
We assume a third variable is involved + The third variable is called the mediator (M) + It is situated between the predictor (X) and outcome variable (Y)
grViz(diagram = "digraph {
rankdir=LR;
node [fontname = arial, shape = circle]
ranksep = .5;
tab1 [label = 'X']
tab2 [label = 'M']
tab3 [label = 'Y']
tab1 -> tab3;
tab1 -> tab2 -> tab3;
}
")
Mediation: parts of the model
Unmediated relationship + path of total effect = c
grViz(diagram = "digraph flowchart {
rankdir=LR;
node [fontname = arial, shape = oval]
tab1 [label = '@@1']
tab2 [label = '@@2']
tab1 -> tab2 [label = 'c'];
}
[1]: 'X'
[2]: 'Y'
")
Mediated relationship + mediator variable (M) + path of indirect effect = ab + a = X predicts M + b = M predicts Y + path of direct effect = c’ + ab + c’ = c = total effect of X on Y + either partial or full mediation
grViz(diagram = "digraph {
rankdir=LR;
node [fontname = arial, shape = circle]
ranksep = .5;
tab1 [label = 'X']
tab2 [label = 'M']
tab3 [label = 'Y']
tab1 -> tab3 [label = 'c`'];
tab1 -> tab2 [label = 'a'];
tab2-> tab3 [label = 'b'];
}
")
Mediation: conditions
- X need not be a significant predictor of Y
- M must not be a primary predictor variable
- M must not be any of the study conditions
- M must be dependent upon X
- M must reduce or eradicate the impact of X on Y
Mediation: Different types: partial and full
When path c’ is reduced but non-zero Mediation is said to be partial
grViz(diagram = "digraph {
rankdir=LR;
node [fontname = arial, shape = circle]
ranksep = .5;
tab1 [label = 'X']
tab2 [label = 'M']
tab3 [label = 'Y']
tab1 -> tab3 [label = 'c` >0'];
tab1 -> tab2 [label = '>0 a'];
tab2-> tab3 [label = 'b >0'];
}
")
When path c’ is at 0 Mediation is said to be complete or full
grViz(diagram = "digraph {
rankdir=LR;
node [fontname = arial, shape = circle]
ranksep = .5;
tab1 [label = 'X']
tab2 [label = 'M']
tab3 [label = 'Y']
tab1 -> tab3 [label = 'c` =0'];
tab1 -> tab2 [label = '>0 a'];
tab2-> tab3 [label = 'b >0'];
}
")
But be mindful of power – bootstrap method offers strongest solution here.
Additional assumptions to the linear model assumptions
A mediated model follows all the assumptions of linear regression
As an explanatory process, a predictor (X) can be said to be ‘causally’ related to the outcome (Y) when: + X is associated with Y + X precedes changes in Y + No other unmeasured variables are related to X and also affect Y
X should / could precede M in time
M should significantly predict Y but Y could also significantly predict M + M and Y could be correlated if they are both causally related to X. + Swapping the order of variables can check this High power +Study design can help this: from weakest to strongest for assumptions +Cross-sectional design (v. popular in student projects – beware…) +Panel designs that allow for staggered measurement in waves +Experimental designs with random assignment and manipulated variables
Method: 4 step approach (Baron & Kenny, 1986)
Step 1:
Test path of total effect = Test the significance of slope c = Linear regression of X on Y 𝑌=𝑏_(0 ) + 𝑏_1 𝑋 = a simple, straightforward simple regression
Step 2 & 3:
Test path of indirect effect a and b = Test the significance of slope a and slope b in two independent models = Linear regression of X on M M =𝑏_(0 )+ 𝑏_1 𝑋 = Linear regression of M on Y while controlling for X = Y =𝑏_(0 )+𝑏_1 𝑋+𝑏_2 𝑀
Step 4:
Test if c’ < c = \(Y=b_0= b_1X + b_2M\) (step 3 and 4 are in the same equation / model)
If c’ is significant = partial mediation If c’ is non- significant = full mediation
Method: Bootstrap test(Preacher & Hayes, 2004, 2008)
- Automated process
- Resampling method for the indirect pathway using the model data with replacement
- Indirect pathway (ab) estimated for each set of sampled data
- Average = indirect effect estimate
- Generates confidence intervals also
mediation
package (Tingley et al., 2013) in R
Reporting Results
Report the indirect effect and its confidence intervals + This is generally the effect with the most power + A nonsignificant test for c’ may occur due to low power + i.e. give a Type II Error + Be careful of claiming full / complete mediation given this information Report each pathway with either its significance value or confidence interval + Pathways a, b, c’ and c Discuss how the additional assumptions of mediation analysis are met Muller et al. (2008) give further details.
Interpreting Results
- Benefits of a direct effect in the context of a significant indirect effect (partial mediation) – it informs theory development
- Size of the indirect effect indicates the strength of mediation
- Zhao et al. (2010) gave descriptive labels to mediation as a function of the directions of effect for the direct and indirect pathways:
- Complementary – effects for both pathways are in the same direction
- Competitive – effects for both pathways are in opposite directions
Extensions
Hayes (2018) talks about: + Moderated mediation + The mediator depends on a fourth variable (!eek) and it could be + partial moderated mediation + conditional moderated mediation + moderated moderated mediation
Out of scope for this module, but have a think when you are designing your studies!
Demonstration
Mediation analysis
In this demonstration, we will model a mediation analysis. The total effect of an unmediated relationship is below in pathway c.
<- c(0, "a", 0,
data 0, 0, 0,
"b", "c", 0)
<- matrix (nrow=3, ncol=3, byrow = TRUE, data=data)
M<- plotmat (M, pos=c(1,2),
plotname= c( "M",
"Supervision",
"Dissertation \nPerformance"),
box.type = "rect",
box.size = 0.12,
box.prop=0.5,
curve=0)
We are going to test for the mediator variable of self esteem
. This needs the prime mark added to the direct pathway “c” text!
<- c(0, "a", 0,
data 0, 0, 0,
"b", "c", 0)
<- matrix (nrow=3, ncol=3, byrow = TRUE, data=data)
M<- plotmat (M, pos=c(1,2),
plotname= c( "Self-esteem",
"Supervision",
"Dissertation \nPerformance"),
box.type = "rect",
box.size = 0.12,
box.prop=0.5,
curve=0)
Read in the data
First download the data file from here and upload to the R server.
<- read_sav("mediation exercise 2 data.sav")
d_full
head(d_full)
# A tibble: 6 × 3
supervision self_esteem dissertation_performance
<dbl> <dbl> <dbl>
1 2.67 3.14 3
2 2.67 3.57 3.56
3 3.33 3 3.41
4 3.33 2.57 1
5 2.67 4.71 2.26
6 2.67 3.29 2.41
Early in the construction of the script, I noticed that there were different levels of missingness across the models - this means that coefficients are estimated on different datasets, so we are introducing a potential source of systematic error if we do not correct for this.
Checking for missingness:
summary(d_full)
supervision self_esteem dissertation_performance
Min. :1.667 Min. :1.571 Min. :1.000
1st Qu.:3.333 1st Qu.:3.286 1st Qu.:2.380
Median :4.000 Median :3.857 Median :3.037
Mean :3.786 Mean :3.864 Mean :3.028
3rd Qu.:4.333 3rd Qu.:4.500 3rd Qu.:3.593
Max. :5.000 Max. :5.714 Max. :5.815
NA's :2 NA's :3
For the purposes of this analysis, I will remove the observations (rows) with NA values. This is not the best way of working with missingness, but for the purposes of the demonstration it is ok.
<- na.omit(d_full) # 4 observations removed
d summary(d) # no NA values listed
supervision self_esteem dissertation_performance
Min. :1.667 Min. :1.571 Min. :1.000
1st Qu.:3.333 1st Qu.:3.286 1st Qu.:2.370
Median :4.000 Median :3.857 Median :3.074
Mean :3.791 Mean :3.853 Mean :3.030
3rd Qu.:4.333 3rd Qu.:4.429 3rd Qu.:3.593
Max. :5.000 Max. :5.714 Max. :5.815
I am going to copy and rename the variables to save on typing
X = supervision Y = dissertation_performance M = self_esteem
<- d %>%
d mutate(X = supervision,
Y = dissertation_performance,
M = self_esteem)
Longhand - Steps of Baron & Kenny (1986)
Four independent linear regression models
- The effect of X on Y (total effect = pathway c)
- The effect of X on M (indirect effect pathway a)
- The effect of M on Y (indirect effect pathway b) while controlling for X
- The effect of X and M on Y (for direct effect estimation = pathway c’)
When running your models, you need to assign them to objects in the environment to then be able to use them in a call to the mediation
package.
Step 1: Test the total effect - pathway c
\[ Y = b_0 + b_1 * X + e \]
via a simple regression model:
<- summary(lm(Y ~ X, d))) (fit_total
Call:
lm(formula = Y ~ X, data = d)
Residuals:
Min 1Q Median 3Q Max
-1.83011 -0.56623 -0.04308 0.51478 2.25553
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3718 0.4679 2.931 0.004470 **
X 0.4375 0.1209 3.620 0.000532 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8361 on 75 degrees of freedom
Multiple R-squared: 0.1488, Adjusted R-squared: 0.1374
F-statistic: 13.11 on 1 and 75 DF, p-value: 0.0005321
The total effect of our predictor on our outcome is significant. In other words, supervision on dissertation performance is significant (p < .001). A change in one unit of supervision is associated with an increase in dissertation performance of 0.44.
There is an effect that can be tested for mediation.
Step 2: Test the a pathway of the indirect effect
\[ M = b_0 + b_1 * X + e \] a second simple regression model, using X as a predictor but this time, M (self esteem here) is our outcome variable:
<- summary(lm(M ~ X, d))) (fit_indirecta
Call:
lm(formula = M ~ X, data = d)
Residuals:
Min 1Q Median 3Q Max
-2.34553 -0.48838 0.02337 0.61288 1.49353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.7018 0.4406 6.132 3.74e-08 ***
X 0.3038 0.1138 2.670 0.0093 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7872 on 75 degrees of freedom
Multiple R-squared: 0.08679, Adjusted R-squared: 0.07461
F-statistic: 7.128 on 1 and 75 DF, p-value: 0.0093
The indirect effect pathway a (X on M) is also significant (p = .009). A change in one unit of supervision is associated with an increase in self-esteem of 0.30.
So now we know that X and M share some variance - they are correlated. We have met one of the assumptions that we need to be able to perform a mediation analysis.
Step 3 & 4: Test the b pathway of the indirect effect & the direct effect (pathway c’)
\[ Y = b_0 + b_1 * X + b_2 * M + e \] A multiple regression model, with Y (dissertation performance) as our outcome variable, and X (supervision) and M (self-esteem) as predictors. Remember that this model is controlling for the effect of X on Y, because interpreting one predictor in a multiple regression model always assumes that the effect of the other predictors are already taken care of, or controlled for.
<- summary(lm(Y ~ X + M, d))) (fit_indirectb
Call:
lm(formula = Y ~ X + M, data = d)
Residuals:
Min 1Q Median 3Q Max
-1.39749 -0.48798 0.02245 0.44603 1.29565
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.46168 0.44402 -1.040 0.3018
X 0.23135 0.09794 2.362 0.0208 *
M 0.67861 0.09497 7.145 5.26e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6475 on 74 degrees of freedom
Multiple R-squared: 0.4963, Adjusted R-squared: 0.4827
F-statistic: 36.46 on 2 and 74 DF, p-value: 9.564e-12
The indirect effect pathway b (M on Y), while controlling for X is also significant (p < .001). A change in one unit of self-esteem is associated with an increase in dissertation-performance of 0.68 of a unit. The predictor (X) relationship with the outcome (Y and pathway c’ - the direct effect) remains significant also (p < .02 , but reduced relative to the step 1 model coefficient (step 1 \(b\) = 0.44, step 4 \(b\) = 0.23).
Since the direct pathway c’ is significant, we can say that we have a partially mediated effect of self-esteem on the relationship between supervision and dissertation performance. Supervision predicts self esteem and dissertation performance, while self esteem also predicts dissertation performance.
If instead the X coefficient in the model above had been > .05 i.e. no significant, we could have claimed a full or complete mediation of self esteem on the relationship between supervision and dissertation performance
Using the power of R and the mediation
package
The mediation
package is called by the library() function loaded at the top of the document.
- It takes the models for pathways a and b (
fit_indirecta
andfitindirectb
here), - It needs us to tell it the name of the predictor or treatment variable and the name of the mediator variable as labelled in the models
- and we set the
boot
argument to T for TRUE, to be able to generate confidence intervals on our co-efficients.
<- mediate(fit_indirecta,
results
fit_indirectb, treat = 'X',
mediator = 'M',
boot = T,
dropobs = TRUE)
Running nonparametric bootstrap
summary(results)
Causal Mediation Analysis
Nonparametric Bootstrap Confidence Intervals with the Percentile Method
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.2062 0.0620 0.39 0.002 **
ADE 0.2313 0.0439 0.41 0.020 *
Total Effect 0.4375 0.2087 0.66 <2e-16 ***
Prop. Mediated 0.4712 0.1837 0.85 0.002 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 77
Simulations: 1000
- ACME stands for average causal mediation effects and is the product of pathway a and pathway b from
fit indirecta
(X = 0.3037992) andfit_indirectb
(M = 0.6786069). - ADE stands for average direct effects or pathway c’. This is the X coefficient in our
fit_indirectb
- Total Effect does what it says on the tin. It is the sum of the direct and indirect effect, ACME + ADE, and also calculated as X in model
fit_total
. - Prop. Mediated is the proportion of the effect of X on Y that goes through M. We divide ACME (or ab) by the total effect (c).
plot(results)
Reporting the mediation analysis
and to use our diagram from the top of the document:
We are going to test for the mediator variable of self esteem
. (This needs the prime mark added to the direct pathway “c” text!)
<- c(0, 0.30, 0,
data 0, 0, 0,
0.68, 0.23, 0)
<- matrix (nrow=3, ncol=3, byrow = TRUE, data=data)
M<- plotmat (M, pos=c(1,2),
plotname= c( "Self-esteem",
"Supervision",
"Dissertation \nPerformance"),
box.type = "rect",
box.size = 0.12,
box.prop=0.5,
curve=0)
(Remember that these data are not standardised so we cannot compare between them for strength of relationships!)
The effect of supervision on dissertation performance was partially mediated via self-esteem. The effect of supervision on dissertation performance and the effect of self-esteem on dissertation performance were independently significant predictors. The indirect effect equals (.3)*(0.68) = .0.21. We tested the significance of this indirect effect using bootstrapping procedures. We computed the average indirect effect over 1,000 bootstrapped samples with 95% confidence intervals (bootstrapped indirect effect = 0.21 95% CI [0.06, 0.38]). Since the confidence intervals do not cross zero, we infer statistical significance.
Lab Tasks
You can download the .zip folder that’s needed for this lab from here, which contains a starter .Rmd file and the data needed.
Use the lecture script as a model to help you.
Descriptive Statistics: describe the variables:
- Prompts: continuous / categorical? Levels? Which one is the reference level if categorical?
- Report the outcome variable (Y)
- Report the predictor variables (Xs)
- Summary stats for continuous variables = calculate and report the mean and sd values for each variable.
Mediation analysis – 4 step approach
Step 1: Total effect: Run the code and write a short paragraph of results
Step 2: Indirect effect path a: Run the code and write a short paragraph of results.
Step 3 & 4: Indirect effect path b and direct effect path: Run the code and write a short paragraph of results.
Questions
- Is there a total effect?
- Is there an indirect path effect for X predicting the mediator M?
- Is there an effect for indirect path of mediator M on outcome Y?
- Is there an effect of direct path of X on Y?
- Is there a mediated effect of coffee on problem solving?
- If so, what type of mediation is it? 7.Why have you decided that type of mediation?
Bootstrap method
Run the bootstrap method Report the following values to 2 dp.
- What is the average causal mediation effect value?
- What is the average direct effect value?
- What is the total effect value?
- What proportion of variance goes through the indirect pathway?
Draw the mediation analysis diagram with the correct values on the pathways.
Report the bootstrap analysis and the results.
Submit Scripts
Remember to submit your group scripts if you want to receive and see feedback on your own and other groups’ scripts.