Department of Psychology, Lancaster University
2024-03-04
Tip
Ask me anything:
We are working together to develop concepts:
We are working together to develop skills:
lm
function and the model mean.acc ~ ...
data = clearly.both.subjects
summary(model)
lm
function and the model:mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE
Take a good look:
You will see this sentence structure in coding for many different analysis types
method(outcome ~ predictors)
predictors
could be SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE ...
Warning
We can try to model anything using linear models: that is the real challenge we face
This is why we need to be careful
Note
(1.) experience HLVA, SHIPLEY
and (2.) reasoning ability (FACTOR3
, reading strategy) (Freed et al., 2017)
This is why we teach:
\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots + \epsilon\)
Outcome \(y\) is calculated as the sum of:
AGE
\(\beta_1\) multiplied by \(x_1\) a person’s age +
+
any number of other variables +
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
summary()
of the linear model shows:R-squared
and F-statistic
estimates
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
Estimate
: 0.0179228
for the slope of the effect of variation in HLVA scoresStd. Error
(standard error) 0.0026682
for the estimatet value
of 6.717
and associated Pr(>|t|)
p-value 7.93e-11
for the null hypothesis test of the coefficient
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
0.0179228
) a positive or a negative number? is it relatively large or small?
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
Adjusted R-squared
because it tends to be more accurate
Call:
lm(formula = mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE, data = clearly.both.subjects)
Residuals:
Min 1Q Median 3Q Max
-0.38092 -0.05889 0.01296 0.06780 0.21677
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2372834 0.0591436 4.012 7.43e-05 ***
SHIPLEY 0.0067294 0.0015225 4.420 1.33e-05 ***
HLVA 0.0179228 0.0026682 6.717 7.93e-11 ***
FACTOR3 0.0032872 0.0008595 3.824 0.000156 ***
AGE -0.0003125 0.0004374 -0.715 0.475391
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.101 on 336 degrees of freedom
Multiple R-squared: 0.2938, Adjusted R-squared: 0.2854
F-statistic: 34.94 on 4 and 336 DF, p-value: < 2.2e-16
We fitted a linear model with mean comprehension accuracy as the outcome and vocabulary knowledge (Shipley), health literacy (HLVA), reading strategy (FACTOR3), and age (years) as predictors. The model is significant overall, with \(F(4, 336) = 34.94, p < .001\), and explains 29% of variance (\(\text{adjusted } R^2 = 0.29\)). The model estimates showed that the accuracy of comprehension increased with higher levels of participant vocabulary knowledge (\(\beta = .007, t = 4.42, p <.001\)), health literacy (\(\beta = .018, t = 6.72, p <.001\)), and reading strategy (\(\beta = .003, t = 3.82, p < .001\)). Younger participants (\(\beta = -0.0003, t = -.72, p = .475\)) tended to show lower levels of accuracy but the age effect was not significant.
We fitted a linear model with mean comprehension accuracy as the outcome and vocabulary knowledge (Shipley), health literacy (HLVA), reading strategy (FACTOR3), and age (years) as predictors. The model is significant overall, with \(F(4, 336) = 34.94, p < .001\), and explains 29% of variance (\(\text{adjusted } R^2 = 0.29\)). The model estimates showed that the accuracy of comprehension increased with higher levels of participant vocabulary knowledge (\(\beta = .007, t = 4.42, p <.001\)), health literacy (\(\beta = .018, t = 6.72, p <.001\)), and reading strategy (\(\beta = .003, t = 3.82, p < .001\)). Younger participants (\(\beta = -0.0003, t = -.72, p = .475\)) tended to show lower levels of accuracy but the age effect was not significant.
There are three levels of uncertainty when we look at sample data (McElreath, 2020) – uncertainty over:
Tip
Practice critical evaluation:
Important
Most common statistical tests are special cases of linear models, or are close approximations
\(y_i = \beta_0 + \beta_1X\)
X
coding for group membershiplm(y ~ group)
\(y_i = \beta_0 + \beta_1X + \beta_2Z + \beta_3XZ\)
factor.1, factor.2
, and a dataset with variables X, Z
coding for group membershiplm(y ~ factor.1*factor.2)
Anova(aov(y ~ factor.1*factor.2, data), type='II')
\(outcome ~ predictors + error\)
outcome
can generalize to analyse data that are not metric, do not come from normal distributionspredictors
can be curvilinear, categorical, involve interactionserror
can be independent; can be non-independentglm(ratings ~ predictors, family = "binomial")
An old saying goes:
All models are wrong but some are useful
(attributed to George Box).
Tip