Department of Psychology, Lancaster University
Ask me anything:
Figure 1: Scatterplot showing the potential association between accuracy of comprehension and health literacy
We are working together to develop concepts:
We are working together to develop skills:
function and the model mean.acc ~ ...
data = clearly.both.subjects
function and the model:mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE
Take a good look:
You will see this sentence structure in coding for many different analysis types
method(outcome ~ predictors)
Figure 4: Scatterplots showing the potential association between accuracy of comprehension and variation on each of a series of potential predictor variables. Data are from two studies
Figure 5: Grid of plots showing the distribution of potential predictor variables
We can try to model anything using linear models: that is the real challenge we face
This is why we need to be careful
(1.) experience HLVA, SHIPLEY
and (2.) reasoning ability (FACTOR3
, reading strategy) (Freed et al., 2017)
This is why we teach:
\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \epsilon\)
Outcome \(y\) is calculated as the sum of:
\(\beta_1\) multiplied by \(x_1\) a person’s age +
any number of other variables +
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
of the linear model shows:R-squared
and F-statistic
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
: 0.0179228
for the slope of the effect of variation in HLVA scoresStd. Error
(standard error) 0.0026682
for the estimatet value
of 6.717
and associated Pr(>|t|)
p-value 7.93e-11
for the null hypothesis test of the coefficient
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
) a positive or a negative number? is it relatively large or small?
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
Adjusted R-squared
because it tends to be more accurate
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
Figure 8: A grid of plots showing model predictions, for outcome accuracy, given variation in (a.) age, (b.) vocabulary, (c.) health literacy, and (d) reading strategy
We fitted a linear model with mean comprehension accuracy as the outcome and vocabulary knowledge (Shipley), health literacy (HLVA), reading strategy (FACTOR3), and age (years) as predictors. The model is significant overall, with \(F(4, 336) = 34.94, p < .001\), and explains 29% of variance (\(\text{adjusted } R^2 = 0.29\)). The model estimates showed that the accuracy of comprehension increased with higher levels of participant vocabulary knowledge (\(\beta = .007, t = 4.42, p <.001\)), health literacy (\(\beta = .018, t = 6.72, p <.001\)), and reading strategy (\(\beta = .003, t = 3.82, p < .001\)). Younger participants (\(\beta = -0.0003, t = -.72, p = .475\)) tended to show lower levels of accuracy but the age effect was not significant.
We fitted a linear model with mean comprehension accuracy as the outcome and vocabulary knowledge (Shipley), health literacy (HLVA), reading strategy (FACTOR3), and age (years) as predictors. The model is significant overall, with \(F(4, 336) = 34.94, p < .001\), and explains 29% of variance (\(\text{adjusted } R^2 = 0.29\)). The model estimates showed that the accuracy of comprehension increased with higher levels of participant vocabulary knowledge (\(\beta = .007, t = 4.42, p <.001\)), health literacy (\(\beta = .018, t = 6.72, p <.001\)), and reading strategy (\(\beta = .003, t = 3.82, p < .001\)). Younger participants (\(\beta = -0.0003, t = -.72, p = .475\)) tended to show lower levels of accuracy but the age effect was not significant.
There are three levels of uncertainty when we look at sample data (McElreath, 2020) – uncertainty over:
Figure 9: Scatterplot showing the potential association between accuracy of comprehension and vocabulary scores: Data from eight studies. Effects will vary between different samples so: expect the variation (a. Gelman, 2015; Vasishth & Gelman, 2021) >>> important to evaluating claims in the literature, and to evaluation of your own results
Figure 10: Grid of plots showing the distribution of potential predictor variables
Practice critical evaluation:
Most common statistical tests are special cases of linear models, or are close approximations
\(y_i = \beta_0 + \beta_1X\)
coding for group membershiplm(y ~ group)
\(y_i = \beta_0 + \beta_1X + \beta_2Z + \beta_3XZ\)
factor.1, factor.2
, and a dataset with variables X, Z
coding for group membershiplm(y ~ factor.1*factor.2)
Anova(aov(y ~ factor.1*factor.2, data), type='II')
\(outcome ~ predictors + error\)
can generalize to analyse data that are not metric, do not come from normal distributionspredictors
can be curvilinear, categorical, involve interactionserror
can be independent; can be non-independentglm(ratings ~ predictors, family = "binomial")
An old saying goes:
All models are wrong but some are useful
(attributed to George Box).