Department of Psychology, Lancaster University
Tip
Ask me anything:
lm
function and the model mean.acc ~ ...
data = all.studies.subjects
summary(model)
lm
function and the model:mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE
Take a good look:
You will see this sentence structure in coding for many different analysis types
method(outcome ~ predictors)
predictors
could be SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE ...
Tip
We can try to model anything using linear models: that is the real challenge we face
This is why we need to be careful
Closing the loop: The health comprehension project questions
(1.) experience (HLVA, SHIPLEY) and (2.) reasoning ability (reading strategy)
This is why we care about open science
\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \epsilon\)
Outcome \(y\) is calculated as the sum of:
AGE
\(\beta_1\) multiplied by \(x_1\) a person’s age +
+
any number of other variables +
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE,
data = all.studies.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.55939 -0.08115 0.02056 0.10633 0.41598
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1873086 0.0472991 3.960 8.47e-05 ***
SHIPLEY 0.0073947 0.0011144 6.635 7.70e-11 ***
HLVA 0.0242787 0.0031769 7.642 9.44e-14 ***
FACTOR3 0.0053455 0.0008947 5.975 4.12e-09 ***
AGE -0.0026434 0.0004905 -5.390 1.05e-07 ***
NATIVE.LANGUAGEOther -0.0900035 0.0141356 -6.367 4.04e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1612 on 555 degrees of freedom
(54 observations deleted due to missingness)
Multiple R-squared: 0.4221, Adjusted R-squared: 0.4169
F-statistic: 81.09 on 5 and 555 DF, p-value: < 2.2e-16
summary()
of the linear model shows:R-squared
and F-statistic
estimates
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE,
data = all.studies.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.55939 -0.08115 0.02056 0.10633 0.41598
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1873086 0.0472991 3.960 8.47e-05 ***
SHIPLEY 0.0073947 0.0011144 6.635 7.70e-11 ***
HLVA 0.0242787 0.0031769 7.642 9.44e-14 ***
FACTOR3 0.0053455 0.0008947 5.975 4.12e-09 ***
AGE -0.0026434 0.0004905 -5.390 1.05e-07 ***
NATIVE.LANGUAGEOther -0.0900035 0.0141356 -6.367 4.04e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1612 on 555 degrees of freedom
(54 observations deleted due to missingness)
Multiple R-squared: 0.4221, Adjusted R-squared: 0.4169
F-statistic: 81.09 on 5 and 555 DF, p-value: < 2.2e-16
Estimate
: 0.0242787
for the slope of the effect of variation in HLVA scoresStd. Error
(standard error) 0.0031769
for the estimatet value
of 7.642
and associated Pr(>|t|)
p-value 9.44e-14
for the null hypothesis test of the coefficient
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE,
data = all.studies.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.55939 -0.08115 0.02056 0.10633 0.41598
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1873086 0.0472991 3.960 8.47e-05 ***
SHIPLEY 0.0073947 0.0011144 6.635 7.70e-11 ***
HLVA 0.0242787 0.0031769 7.642 9.44e-14 ***
FACTOR3 0.0053455 0.0008947 5.975 4.12e-09 ***
AGE -0.0026434 0.0004905 -5.390 1.05e-07 ***
NATIVE.LANGUAGEOther -0.0900035 0.0141356 -6.367 4.04e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1612 on 555 degrees of freedom
(54 observations deleted due to missingness)
Multiple R-squared: 0.4221, Adjusted R-squared: 0.4169
F-statistic: 81.09 on 5 and 555 DF, p-value: < 2.2e-16
0.0242787
) a positive or a negative number? is it relatively large or small?
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE,
data = all.studies.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.55939 -0.08115 0.02056 0.10633 0.41598
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1873086 0.0472991 3.960 8.47e-05 ***
SHIPLEY 0.0073947 0.0011144 6.635 7.70e-11 ***
HLVA 0.0242787 0.0031769 7.642 9.44e-14 ***
FACTOR3 0.0053455 0.0008947 5.975 4.12e-09 ***
AGE -0.0026434 0.0004905 -5.390 1.05e-07 ***
NATIVE.LANGUAGEOther -0.0900035 0.0141356 -6.367 4.04e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1612 on 555 degrees of freedom
(54 observations deleted due to missingness)
Multiple R-squared: 0.4221, Adjusted R-squared: 0.4169
F-statistic: 81.09 on 5 and 555 DF, p-value: < 2.2e-16
Adjusted R-squared
because it tends to be more accurate
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE,
data = all.studies.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.55939 -0.08115 0.02056 0.10633 0.41598
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1873086 0.0472991 3.960 8.47e-05 ***
SHIPLEY 0.0073947 0.0011144 6.635 7.70e-11 ***
HLVA 0.0242787 0.0031769 7.642 9.44e-14 ***
FACTOR3 0.0053455 0.0008947 5.975 4.12e-09 ***
AGE -0.0026434 0.0004905 -5.390 1.05e-07 ***
NATIVE.LANGUAGEOther -0.0900035 0.0141356 -6.367 4.04e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1612 on 555 degrees of freedom
(54 observations deleted due to missingness)
Multiple R-squared: 0.4221, Adjusted R-squared: 0.4169
F-statistic: 81.09 on 5 and 555 DF, p-value: < 2.2e-16
We fitted a linear model with mean comprehension accuracy as the outcome and, as predictors: vocabulary knowledge (Shipley), health literacy (HLVA), reading strategy (FACTOR3), age (years) and native language status. Our analysis indicated significant effects of all predictor variables. The model is significant overall, with \(F(5, 555) = 81.09, p < .001\), and explains 42% of variance (\(\text{adjusted } R^2 = 0.42\)). The model estimates showed that the accuracy of comprehension increased with higher levels of participant vocabulary knowledge (\(\beta = .007, t = 6.64, p <.001\)), health literacy (\(\beta = .024, t = 7.64, p <.001\)), and reading strategy (\(\beta = .005, t = 5.98, p = < .001\)). Younger participants (\(\beta = -0.003, t = -5.39, p <.001\)) and native speakers of English as another language (\(\beta = -.090, t = -6.37, p <.001\)) tended to show lower levels of accuracy.
We fitted a linear model with mean comprehension accuracy as the outcome and, as predictors: vocabulary knowledge (Shipley), health literacy (HLVA), reading strategy (FACTOR3), age (years) and native language status. Our analysis indicated significant effects of all predictor variables. The model is significant overall, with \(F(5, 555) = 81.09, p < .001\), and explains 42% of variance (\(\text{adjusted } R^2 = 0.42\)). The model estimates showed that the accuracy of comprehension increased with higher levels of participant vocabulary knowledge (\(\beta = .007, t = 6.64, p <.001\)), health literacy (\(\beta = .024, t = 7.64, p <.001\)), and reading strategy (\(\beta = .005, t = 5.98, p = < .001\)). Younger participants (\(\beta = -0.003, t = -5.39, p <.001\)) and native speakers of English as another language (\(\beta = -.090, t = -6.37, p <.001\)) tended to show lower levels of accuracy.
There are three levels of uncertainty when we look at sample data (McElreath, 2020) – uncertainty over:
Tip
Most common statistical tests are special cases of linear models, or are close approximations
\(y_i = \beta_0 + \beta_1X\)
X
coding for group membershiplm(y ~ group)
\(y_i = \beta_0 + \beta_1X + \beta_2Z + \beta_3XZ\)
factor.1, factor.2
, and a dataset with variables X, Z
coding for group membershiplm(y ~ factor.1*factor.2)
Anova(aov(y ~ factor.1*factor.2, data), type='II')
Tip
We have to make choices in teaching and, here, we are choosing to focus on a powerful, flexible, and generally applicable method we can explain in depth: linear models
\(outcome ~ predictors + error\)
outcome
can generalize to analyse data that are not metric, do not come from normal distributionspredictors
can be curvilinear, categorical, involve interactionserror
can be independent; can be non-independentglm(ratings ~ predictors, family = "binomial")
clm(ratings ~ predictors)
An old saying goes:
All models are wrong but some are useful
(attributed to George Box).
Tip