Department of Psychology, Lancaster University
Tip
Ask me anything:
Figure 1: Plot showing how the effect of reading strategy on accuracy varies at different education levels, given an interaction between the effects of reading strategy and education level
We are working together to develop concepts:
We are working together to develop skills:
Concepts – To engage with the real challenge in psychological science:
Skills – To be able to code linear models that include interaction effects:
Skills – To be able to visualize, interpret, and report tests of interaction effects:
flickr, Cat Walker ‘crowd’
Psychological and social processes show much more variability than the usual phenomena in the physical sciences (Gelman, 2015)
Human variation means that there are three levels of uncertainty when we look at sample data (McElreath, 2020):
We often test who we can – convenience sampling – and who we can test has an impact on the quality of our evidence (Bornstein et al., 2013)
Critical evaluation
Note
(1.) experience HLVA, SHIPLEY and (2.) reasoning ability (strategy, reading strategy) (Freed et al., 2017)
It will become important that we distinguish between numeric variables (like age) and categorical variables (factors like education level)
Figure 3: Scatterplots showing the potential association between accuracy of comprehension and variation on each of a series of potential predictor variables.
Tip
Is the shape or size of an effect different in different groups of people, in different contexts, or for different values of a third variable (a moderator)?
HLVA) will interact with the impact of having broad vocabulary knowledge (measured using the SHIPLEY)SHIPLEY)HLVA) will interact with the impact of having broad vocabulary knowledge (measured using the SHIPLEY)SHIPLEY)mean.acc and predictor HLVA could be different for different values of a third variable (SHIPLEY)The way that
mean.acc and predictor HLVASHIPLEY\(\rightarrow\) This is what makes a possible interaction
lm(mean.acc ~ SHIPLEY)mean.acc)SHIPLEY)
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + strategy + AGE, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.38654 -0.07169 0.01129 0.07683 0.31901
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.891e-01 4.856e-02 8.013 8.80e-15 ***
SHIPLEY 4.975e-03 1.325e-03 3.754 0.000196 ***
HLVA 1.831e-02 2.582e-03 7.090 4.92e-12 ***
strategy 1.736e-03 7.974e-04 2.177 0.029984 *
AGE -7.984e-05 3.954e-04 -0.202 0.840082
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1124 on 473 degrees of freedom
Multiple R-squared: 0.2082, Adjusted R-squared: 0.2015
F-statistic: 31.09 on 4 and 473 DF, p-value: < 2.2e-16
Estimate: for the slope of the effect of variation in each predictorStd. Error (standard error) for the estimate of each effectt value and Pr(>|t|) p-value for null hypothesis tests of each coefficient\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \epsilon\)
Outcome \(y\) is calculated as a sum by adding together:
AGE \(\beta_1\) multiplied by \(x_1\) a person’s age ++ any number of other variables +\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \epsilon\)
Note
Models like this are said to be additive because
mean.acc) for any one predictor (e.g., the relation between mean.acc and HLVA)mean.acc and HLVA would be the same whatever the value of SHIPLEY.For discussion, we consider an example model in which we focus on the interaction between the effects of health literacy (HLVA) and vocabulary knowledge (SHIPLEY)
We can specify interactions in two different ways:
*:
Call:
lm(formula = mean.acc ~ HLVA * SHIPLEY, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.37986 -0.06840 0.01342 0.07652 0.29872
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4806201 0.1445375 3.325 0.000952 ***
HLVA 0.0153042 0.0168363 0.909 0.363810
SHIPLEY 0.0045132 0.0043491 1.038 0.299919
HLVA:SHIPLEY 0.0001132 0.0004922 0.230 0.818173
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1128 on 474 degrees of freedom
Multiple R-squared: 0.2004, Adjusted R-squared: 0.1953
F-statistic: 39.59 on 3 and 474 DF, p-value: < 2.2e-16
You see each Estimate for the Coefficients: – the effects of:
HLVASHIPLEYHLVA:SHIPLEY coefficient for the interaction tells us how the slope of the HLVA effect varies – depending on SHIPLEY scores
Call:
lm(formula = mean.acc ~ HLVA * SHIPLEY, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.37986 -0.06840 0.01342 0.07652 0.29872
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4806201 0.1445375 3.325 0.000952 ***
HLVA 0.0153042 0.0168363 0.909 0.363810
SHIPLEY 0.0045132 0.0043491 1.038 0.299919
HLVA:SHIPLEY 0.0001132 0.0004922 0.230 0.818173
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1128 on 474 degrees of freedom
Multiple R-squared: 0.2004, Adjusted R-squared: 0.1953
F-statistic: 39.59 on 3 and 474 DF, p-value: < 2.2e-16
The standard way to represent an interaction in algebra is:
\[y \sim \beta_0 + \beta_1X + \beta_2Z + \color{red}{\beta_3XZ} + \epsilon\]
For this model, the average change in the outcome, \(y\), is calculated by adding together:
\[y \sim \beta_0 + \beta_1X + \beta_2Z + \color{red}{\beta_3XZ} + \epsilon\]
To make this concrete, we can translate this model using the variables from our example data:
\[ \begin{aligned} mean.acc &= {\beta_0} + {\beta}_{1}(HLVA) + {\beta}_{2}(SHIPLEY)\ + \\ &\quad {\beta}_{3}(HLVA \times SHIPLEY) + \epsilon \end{aligned} \]
HLVASHIPLEYHLVA:SHIPLEY the interaction\[y \sim \beta_0 + \beta_1X + \beta_2Z + \color{red}{\beta_3XZ} + \epsilon\]
\[ mean.acc = \alpha + \beta_{1}(HLVA) + \beta_{2}(SHIPLEY) + \beta_{3}(HLVA \times SHIPLEY) + \epsilon \]
HLVA on outcome mean.acc when SHIPLEY = 0.SHIPLEY on outcome mean.acc when HLVA = 0.HLVA on outcome mean.acc, given different values of SHIPLEYThe key idea here is that the effect of one predictor variable (HLVA) on the outcome is a function of another predictor variable (SHIPLEY)
We can state an equation for calculating the slope of the predicted effect of \(HLVA\) on outcome \(mean.acc\) at any value of \(SHIPLEY\) as:
\[The \space HLVA \space slope \space at \space some \space SHIPLEY \space score = \beta_1 + \beta_3 \times SHIPLEY\] In the presence of an interaction, the effect of the predictor HLVA is a composite reflecting
SHIPLEYHLVA*SHIPLEY interactionSHIPLEY*HLVA interaction
Call:
lm(formula = mean.acc ~ SHIPLEY * HLVA, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.37986 -0.06840 0.01342 0.07652 0.29872
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4806201 0.1445375 3.325 0.000952 ***
SHIPLEY 0.0045132 0.0043491 1.038 0.299919
HLVA 0.0153042 0.0168363 0.909 0.363810
SHIPLEY:HLVA 0.0001132 0.0004922 0.230 0.818173
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1128 on 474 degrees of freedom
Multiple R-squared: 0.2004, Adjusted R-squared: 0.1953
F-statistic: 39.59 on 3 and 474 DF, p-value: < 2.2e-16
HLVA effect for different ranges of SHIPLEYSHIPLEY effect for different ranges of HLVAp > .05 for the interaction
Call:
lm(formula = mean.acc ~ HLVA * SHIPLEY, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.37986 -0.06840 0.01342 0.07652 0.29872
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.4806201 0.1445375 3.325 0.000952 ***
HLVA 0.0153042 0.0168363 0.909 0.363810
SHIPLEY 0.0045132 0.0043491 1.038 0.299919
HLVA:SHIPLEY 0.0001132 0.0004922 0.230 0.818173
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1128 on 474 degrees of freedom
Multiple R-squared: 0.2004, Adjusted R-squared: 0.1953
F-statistic: 39.59 on 3 and 474 DF, p-value: < 2.2e-16
Call:
lm(formula = mean.acc ~ EDUCATION * strategy, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.43422 -0.06111 0.01700 0.09208 0.23754
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.692571 0.054831 12.631 <2e-16 ***
EDUCATIONHigher -0.199280 0.083325 -2.392 0.0172 *
strategy 0.002210 0.001100 2.008 0.0452 *
EDUCATIONHigher:strategy 0.004104 0.001656 2.479 0.0135 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1223 on 474 degrees of freedom
Multiple R-squared: 0.06106, Adjusted R-squared: 0.05512
F-statistic: 10.28 on 3 and 474 DF, p-value: 1.45e-06
Tip
HLVA and SHIPLEY) that we have been working with so farTo explore interactions between numeric variables and factors, we take as an example, the interaction between the effect on outcomes (mean.acc) of:
strategy reading strategy questionnaire), a numeric variableEDUCATION as Further or Higher), a factorTwo kinds of variables are treated differently in R:
HLVA) or reading strategy score (strategy) whereTwo kinds of variables are treated differently in R:
EDUCATIONEDUCATION level can be Further or HigherEDUCATION encodes differences between levels or between groups
Call:
lm(formula = mean.acc ~ EDUCATION + strategy, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.42828 -0.06774 0.02177 0.09198 0.31055
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6032460 0.0415482 14.519 < 2e-16 ***
EDUCATIONHigher 0.0053666 0.0113014 0.475 0.635
strategy 0.0040226 0.0008266 4.866 1.55e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1229 on 475 degrees of freedom
Multiple R-squared: 0.04889, Adjusted R-squared: 0.04489
F-statistic: 12.21 on 2 and 475 DF, p-value: 6.75e-06
EDUCATION in a linear model
Call:
lm(formula = mean.acc ~ EDUCATION + strategy, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.42828 -0.06774 0.02177 0.09198 0.31055
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6032460 0.0415482 14.519 < 2e-16 ***
EDUCATIONHigher 0.0053666 0.0113014 0.475 0.635
strategy 0.0040226 0.0008266 4.866 1.55e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1229 on 475 degrees of freedom
Multiple R-squared: 0.04889, Adjusted R-squared: 0.04489
F-statistic: 12.21 on 2 and 475 DF, p-value: 6.75e-06
Coefficients: Estimate for the effect of EDUCATION is 0.0053666EDUCATIONHigher0.0053666 tells us thatFurther education (the reference level)Higher education are predicted to score 0.0053666 more on outcome mean.acc
Call:
lm(formula = mean.acc ~ EDUCATION + strategy, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.42828 -0.06774 0.02177 0.09198 0.31055
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6032460 0.0415482 14.519 < 2e-16 ***
EDUCATIONHigher 0.0053666 0.0113014 0.475 0.635
strategy 0.0040226 0.0008266 4.866 1.55e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1229 on 475 degrees of freedom
Multiple R-squared: 0.04889, Adjusted R-squared: 0.04489
F-statistic: 12.21 on 2 and 475 DF, p-value: 6.75e-06
We can code the effect of the interaction between the effects of:
EDUCATION)strategy)\[y \sim \beta_0 + \beta_1X + \beta_2Z + \color{red}{\beta_3XZ} + \epsilon\]
\[ \begin{aligned} mean.acc &= {\beta_0} + {\beta}_{1}(EDUCATION_{Higher}) + {\beta}_{2}(strategy)\ + \\ &\quad {\beta}_{3}(EDUCATION_{Higher} \times strategy) + \epsilon \end{aligned} \]
strategy on outcome mean.acc when EDUCATION = Further (the baseline)EDUCATION (Further versus Higher) on outcome mean.acc when strategy = 0\[y \sim \beta_0 + \beta_1X + \beta_2Z + \color{red}{\beta_3XZ} + \epsilon\]
\[ \begin{aligned} mean.acc &= {\beta_0} + {\beta}_{1}(EDUCATION_{Higher}) + {\beta}_{2}(strategy)\ + \\ &\quad {\beta}_{3}(EDUCATION_{Higher} \times strategy) + \epsilon \end{aligned} \]
strategy on outcome mean.accEDUCATION:EDUCATION is Higher instead of FurtherYou see each Estimate for the Coefficients: – the effects of:
EDUCATIONstrategyEDUCATION:strategy coefficient for the interaction tells us how the slope of the strategy effect varies – depending on EDUCATION level
Call:
lm(formula = mean.acc ~ EDUCATION * strategy, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.43422 -0.06111 0.01700 0.09208 0.23754
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.692571 0.054831 12.631 <2e-16 ***
EDUCATIONHigher -0.199280 0.083325 -2.392 0.0172 *
strategy 0.002210 0.001100 2.008 0.0452 *
EDUCATIONHigher:strategy 0.004104 0.001656 2.479 0.0135 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1223 on 474 degrees of freedom
Multiple R-squared: 0.06106, Adjusted R-squared: 0.05512
F-statistic: 10.28 on 3 and 474 DF, p-value: 1.45e-06
Interactions are easier to interpret if we plot them:
lm()plot_model() to show predictions# -- fit a model
data.lm.acc <- lm(mean.acc ~ EDUCATION*strategy,
data = all.data)
# -- set colour values for manual colour
cols.studies <- c("#BAE4B3", "#6BAED6")
# -- make plot
plot_model(data.lm.acc, type = "pred",
terms = c("strategy", "EDUCATION"),
colors = cols.studies) +
theme_bw() +
ylim(0, 1)The resulting plot shows how the slope of the strategy effect is steeper (the effect is larger) for people with Higher education than with Further education
Call:
lm(formula = mean.acc ~ EDUCATION * strategy, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.43422 -0.06111 0.01700 0.09208 0.23754
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.692571 0.054831 12.631 <2e-16 ***
EDUCATIONHigher -0.199280 0.083325 -2.392 0.0172 *
strategy 0.002210 0.001100 2.008 0.0452 *
EDUCATIONHigher:strategy 0.004104 0.001656 2.479 0.0135 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1223 on 474 degrees of freedom
Multiple R-squared: 0.06106, Adjusted R-squared: 0.05512
F-statistic: 10.28 on 3 and 474 DF, p-value: 1.45e-06
Call:
lm(formula = mean.acc ~ EDUCATION * strategy, data = all.data)
Residuals:
Min 1Q Median 3Q Max
-0.43422 -0.06111 0.01700 0.09208 0.23754
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.692571 0.054831 12.631 <2e-16 ***
EDUCATIONHigher -0.199280 0.083325 -2.392 0.0172 *
strategy 0.002210 0.001100 2.008 0.0452 *
EDUCATIONHigher:strategy 0.004104 0.001656 2.479 0.0135 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1223 on 474 degrees of freedom
Multiple R-squared: 0.06106, Adjusted R-squared: 0.05512
F-statistic: 10.28 on 3 and 474 DF, p-value: 1.45e-06
We fitted a linear model with mean comprehension accuracy as the outcome and education and reading strategy as predictors. The model is significant overall, \(F(3, 474) = 10.28, p < .001\), and explains 6% of variance (\(\text{adjusted } R^2 = .06\)). Accuracy of comprehension was lower for people with higher levels of participant education (\(\beta = -.199, t = -2.39, p = .017\)). But comprehension was more accurate for people with better reading strategy (\(\beta = .002, t = 2.01, p = .045\)). A significant interaction (\(\beta = .004, t = 2.48, p = .014\)) suggests the strategy effect was larger for participant with higher education.
We fitted a linear model with mean comprehension accuracy as the outcome and education and reading strategy as predictors. The model is significant overall, \(F(3, 474) = 10.28, p < .001\), and explains 6% of variance (\(\text{adjusted } R^2 = .06\)). Accuracy of comprehension was lower for people with higher levels of participant education (\(\beta = -.199, t = -2.39, p = .017\)). But comprehension was more accurate for people with better reading strategy (\(\beta = .002, t = 2.01, p = .045\)). A significant interaction (\(\beta = .004, t = 2.48, p = .014\)) suggests the strategy effect was larger for participant with higher education.
Note
We typically code models like those we have seen using the structure: