rm(list=ls())
PSYC122-w19-how-to
Introduction
In Week 19, we aim to further develop skills in working with the linear model.
We do this to learn how to answer research questions like:
- What person attributes predict success in understanding?
- Can people accurately evaluate whether they correctly understand written health information?
In this class, what is new is our focus on critically evaluating – comparing, reflecting on – the evidence from more than one relevant study.
- This work simulates the kind of critical evaluation of evidence that psychologists must do in professional research.
Naming things
I will format dataset names like this:
study-one-general-participants.csv
I will also format variable (data column) names like this: variable
I will also format value or other data object (e.g. cell value) names like this: studyone
I will format functions and library names like this: e.g. function ggplot()
or e.g. library {tidyverse}
.
The data we will be using
In this how-to guide, we use data from two 2020 studies of the response of adults from a UK national sample to written health information:
study-one-general-participants.csv
study-two-general-participants.csv
The reason we are going to work with two datasets is that we will be comparing the results of analyses of the data to assess whether the results are robust.
Here, our assessment of robustness focuses on whether similar results are found in two different studies.
Check out the PSYC122 Week 19 lecture for a discussion of how an assessment of robustness is important to psychological science.
Answers
Step 1: Set-up
To begin, we set up our environment in R.
Task 1 – Run code to empty the R environment
Task 2 – Run code to load relevant libraries
library("ggeffects")
library("patchwork")
library("tidyverse")
Warning: package 'tidyr' was built under R version 4.1.1
Warning: package 'purrr' was built under R version 4.1.1
Warning: package 'stringr' was built under R version 4.1.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Step 2: Load the data
Task 3 – Read in the data files we will be using
The data files are called:
study-one-general-participants.csv
study-two-general-participants.csv
Use the read_csv()
function to read the data files into R:
<- read_csv("study-one-general-participants.csv") study.one.gen
Rows: 169 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): participant_ID, study, GENDER, EDUCATION, ETHNICITY
dbl (7): mean.acc, mean.self, AGE, SHIPLEY, HLVA, FACTOR3, QRITOTAL
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv("study-two-general-participants.csv") study.two.gen
Rows: 172 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): participant_ID, study, GENDER, EDUCATION, ETHNICITY
dbl (7): mean.acc, mean.self, AGE, SHIPLEY, HLVA, FACTOR3, QRITOTAL
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
When you read the data files in, give the data objects you create distinct name e.g. study.one.gen
versus study.two.gen
.
Task 4 – Inspect the data file
Use the summary()
or head()
functions to take a look at both datasets.
summary(study.one.gen)
participant_ID mean.acc mean.self study
Length:169 Min. :0.3600 Min. :3.440 Length:169
Class :character 1st Qu.:0.7600 1st Qu.:6.080 Class :character
Mode :character Median :0.8400 Median :7.080 Mode :character
Mean :0.8163 Mean :6.906
3rd Qu.:0.9000 3rd Qu.:7.920
Max. :0.9900 Max. :9.000
AGE SHIPLEY HLVA FACTOR3
Min. :18.00 Min. :23.00 Min. : 3.000 Min. :34.00
1st Qu.:24.00 1st Qu.:33.00 1st Qu.: 7.000 1st Qu.:46.00
Median :32.00 Median :35.00 Median : 9.000 Median :51.00
Mean :34.87 Mean :34.96 Mean : 8.905 Mean :50.33
3rd Qu.:42.00 3rd Qu.:38.00 3rd Qu.:10.000 3rd Qu.:55.00
Max. :76.00 Max. :40.00 Max. :14.000 Max. :63.00
QRITOTAL GENDER EDUCATION ETHNICITY
Min. : 6.00 Length:169 Length:169 Length:169
1st Qu.:12.00 Class :character Class :character Class :character
Median :13.00 Mode :character Mode :character Mode :character
Mean :13.36
3rd Qu.:15.00
Max. :19.00
summary(study.two.gen)
participant_ID mean.acc mean.self study
Length:172 Min. :0.4107 Min. :3.786 Length:172
Class :character 1st Qu.:0.6786 1st Qu.:6.411 Class :character
Mode :character Median :0.7679 Median :7.321 Mode :character
Mean :0.7596 Mean :7.101
3rd Qu.:0.8393 3rd Qu.:7.946
Max. :0.9821 Max. :9.000
AGE SHIPLEY HLVA FACTOR3
Min. :18.00 Min. :23.00 Min. : 3.000 Min. :29.00
1st Qu.:25.00 1st Qu.:32.75 1st Qu.: 7.750 1st Qu.:47.00
Median :32.50 Median :36.00 Median : 9.000 Median :51.00
Mean :35.37 Mean :35.13 Mean : 9.064 Mean :51.24
3rd Qu.:44.00 3rd Qu.:39.00 3rd Qu.:11.000 3rd Qu.:56.25
Max. :76.00 Max. :40.00 Max. :14.000 Max. :63.00
QRITOTAL GENDER EDUCATION ETHNICITY
Min. : 6.00 Length:172 Length:172 Length:172
1st Qu.:12.00 Class :character Class :character Class :character
Median :14.00 Mode :character Mode :character Mode :character
Mean :13.88
3rd Qu.:16.00
Max. :20.00
Notice that study.two.gen
was designed to be a replication of study.one.gen
.
- We use the same online survey methods to collect data in both studies.
- We present different health information texts in the different studies and recorded responses from different groups of adults in the UK.
Step 3: Compare the data from the different studies
Revise: practice to strengthen skills
Task 5 – Compare the data distributions from the two studies
Q.1. What is the mean of the
mean.acc
andSHIPLEY
variables in the two studies?A.1. The means are:
study one – mean.acc – mean = 0.8163
study one – mean.self – mean = 6.906
study two – mean.acc – mean = 0.7596
study two – mean.self – mean = 7.101
Q.2. Draw histograms of both mean.acc and mean.self for both studies.
A.2. You can write the code as you have been shown to do e.g. in week 17:
ggplot(data = study.one.gen, aes(x = mean.acc)) +
geom_histogram(binwidth = .1) +
theme_bw() +
labs(x = "Mean accuracy (mean.acc)", y = "frequency count") +
xlim(0, 1)
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
ggplot(data = study.one.gen, aes(x = mean.self)) +
geom_histogram(binwidth = 1) +
theme_bw() +
labs(x = "Mean self-rated accuracy (mean.self)", y = "frequency count") +
xlim(0,10)
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
ggplot(data = study.two.gen, aes(x = mean.acc)) +
geom_histogram(binwidth = .1) +
theme_bw() +
labs(x = "Mean accuracy (mean.acc)", y = "frequency count") +
xlim(0, 1)
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
ggplot(data = study.two.gen, aes(x = mean.self)) +
geom_histogram(binwidth = 1) +
theme_bw() +
labs(x = "Mean self-rated accuracy (mean.self)", y = "frequency count") +
xlim(0,10)
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
Introduce: make some new moves
Task 6 – Create grids of plots to make the comparison easier to do
hint: Task 6 – What we are going to do is to create two histograms and then present them side by side to allow easy comparison of variable distributions
We need to make two changes to the coding approach you have been using until now.
Before we explain anything, let’s look at an example: run these line of code and check the result.
- Make sure you identify what is different about the plotting code, shown following, compared to what you have done before: there is a surprise in what is going to happen.
First, create plot objects, give them names, but do not show them:
<- ggplot(data = study.one.gen, aes(x = mean.acc)) +
plot.one geom_histogram(binwidth = .1) +
theme_bw() +
labs(x = "Mean accuracy (mean.acc)", y = "frequency count", title = "Study One") +
xlim(0, 1)
<- ggplot(data = study.two.gen, aes(x = mean.acc)) +
plot.two geom_histogram(binwidth = .1) +
theme_bw() +
labs(x = "Mean accuracy (mean.acc)", y = "frequency count", title = "Study Two") +
xlim(0, 1)
Second, show the plots, side-by-side:
+ plot.two plot.one
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
This is what you are doing: check out the process, step-by-step. (And notice that you repeat the process for each of two (or more) plots.)
ggplot(...)
tell R you want to make a plot using theggplot()
function;plot.one <-
tell R you want to give the plot a name; the name appears in the environment;ggplot(data = study.one.gen ...)
tell R you want to make a plot with thestudy.two
data;ggplot(..., aes(x = mean.acc))
tell R that you want to make a plot with the variablemean.acc
;
- here, specify the aesthetic mapping,
x = mean.acc
geom_histogram()
tell R you want to plot values ofmean.acc
as a histogram;binwidth = .1
adjust the binwidth to show enough detail but not too much in the distribution;theme_bw()
tell R what theme you want, adjusting the plot appearance;labs(x = "Mean accuracy (mean.acc)", y = "frequency count", title = "Study One")
fix the x-axis and y-axis labels;
- here, add a title for the plot, so you can tell the two plots apart;
xlim(0, 1)
adjust the x-axis limits to show the full range of possible score values on this variable.
Do this process twice, once for each dataset, creating two plots so that you can compare the distribution of mean.acc
scores between the studies.
Finally, having created the two plots, produce them for viewing:
plot.one + plot.two
having constructed – and named – both plots, you enter their names, separated by a +, to show them in a grid of two plots.
Notice: until you get to step 10, nothing will appear.
This will be surprising but it is perfectly normal when we increase the level of complexity of the plots we build.
- You first build the plots.
- You are creating plot objects and you give these objects names.
- The objects will appear in the
Environment
with the names you give them. - You then produce the plots for viewing, by using their names.
Until you complete the last step, you will not see any changes until you use the object names to produce them for viewing.
This is how you construct complex arrays of plots.
Task 7 – Try this out for yourself, focusing now on the distribution of mean.self
scores in the two studies
First, create plot objects but do not show them.
- Give each plot a name. You will use the names next.
<- ggplot(data = study.one.gen, aes(x = mean.self)) +
plot.one geom_histogram(binwidth = 2) +
theme_bw() +
labs(x = "Self-rated accuracy (mean.self)", y = "frequency count", title = "Study One") +
xlim(0, 10)
<- ggplot(data = study.two.gen, aes(x = mean.self)) +
plot.two geom_histogram(binwidth = 2) +
theme_bw() +
labs(x = "Self-rated accuracy (mean.self)", y = "frequency count", title = "Study Two") +
xlim(0, 10)
Second produce the plots for viewing, side-by-side, by naming them.
+ plot.two plot.one
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
Q.3. Now use the plots to do some data analysis work: how do the
mean.self
distributions compare, when you compare themean.self
ofstudy.one.gen
versus `mean.self
ofstudy.two.gen
?A.3. When you compare the plots side-by-side you can see that the
mean.self
distributions are similar in the two studies: most people have highmean.self
scores. This means that they rated the accuracy of their understanding at a high level, on average.Q.4. Is the visual impression you get from comparing the distributions consistent with the statistics you see in the summary?
A.4. Yes: If you go back to the summary of
mean.self
, comparing the two studies datasets, then you can see that the median and mean are similar (around 7) in bothstudy.one.gen
andstudy.two.gen
datasets.
Step 4: Now use scatterplots and correlation to examine associations between variables
Revise: practice to strengthen skills
Task 8 – Draw scatterplots to compare the potential association between mean.acc
and mean.self
in both study.one.gen
and study.two.gen
datasets
hint: Task 8 – The plotting steps are explained in some detail in PSYC122-w17-how-to.Rmd
ggplot(data = study.one.gen, aes(x = mean.self, y = mean.acc)) +
geom_point(alpha = 0.75, size = 3, colour = "darkgrey") +
geom_smooth(method = "lm", size = 1.5, colour = "green") +
theme_bw() +
labs(x = "mean self-rated accuracy", y = "mean accuracy") +
xlim(0, 10) + ylim(0, 1)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
`geom_smooth()` using formula = 'y ~ x'
ggplot(data = study.two.gen, aes(x = mean.self, y = mean.acc)) +
geom_point(alpha = 0.75, size = 3, colour = "darkgrey") +
geom_smooth(method = "lm", size = 1.5, colour = "green") +
theme_bw() +
labs(x = "mean self-rated accuracy", y = "mean accuracy") +
xlim(0, 10) + ylim(0, 1)
`geom_smooth()` using formula = 'y ~ x'
Task 9 – Create a grid of plots to make the comparison easier to do
hint: Task 9 – We follow the same steps as we used in tasks 6 and 7 to create the plots
We again:
- First construct the plot objects and give them names;
- Then create and show a grid of named plots.
Though this time we are producing a grid of scatterplots.
First, create plot objects – give them names but do not show them:
<- ggplot(data = study.one.gen, aes(x = mean.self, y = mean.acc)) +
plot.one geom_point(alpha = 0.75, size = 3, colour = "darkgrey") +
geom_smooth(method = "lm", size = 1.5, colour = "green") +
theme_bw() +
labs(x = "mean self-rated accuracy", y = "mean accuracy", title = "Study One") +
xlim(0, 10) + ylim(0, 1)
<- ggplot(data = study.two.gen, aes(x = mean.self, y = mean.acc)) +
plot.two geom_point(alpha = 0.75, size = 3, colour = "darkgrey") +
geom_smooth(method = "lm", size = 1.5, colour = "green") +
theme_bw() +
labs(x = "mean self-rated accuracy", y = "mean accuracy", title = "Study Two") +
xlim(0, 10) + ylim(0, 1)
- Notice that in the plotting code we ask R to give each plot a title using
labs()
.
Second name the plots, to show them side-by-side in the plot window:
+ plot.two plot.one
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
Now use the plots to make comparison judgments.
- Q.5. How does the association, shown in the plots, between
mean.self
andmean.acc
compare when you look at thestudy.one.gen
versus thestudy.two.gen
plot? - hint: Q.5. When comparing evidence about associations in different studies, we are mostly going to focus on the slope – the angle – of the prediction lines, and the ways in which points do or do not cluster about the prediction lines.
- A.5. If you examine the
study.one.gen
versus thestudy.two.gen
plots then you can see that in both plots highermean.self
scores appear to be associated with highermean.acc
scores. But the trend maybe is a bit stronger – the line is steeper – instudy.two.gen
compared tostudy.two.gen
.
We are now in a position to answer one of our research questions:
- Can people accurately evaluate whether they correctly understand written health information?
If people can accurately evaluate whether they correctly understand written health information then mean.self
(a score representing their evaluation) should be associated with mean.acc
(a score representing their accuracy of understanding) for each person.
Revise: practice to strengthen skills
Task 10 – Can you estimate the association between mean.acc
and mean.self
in both datasets?
hint: Task 10 – Use cor.test()
as you have been shown how to do e.g. in 2023-24-PSYC122-w16-how-to.Rmd
Do the correlation for both datasets.
First, look at the correlation between mean.acc
and mean.self
in study.one.gen
:
cor.test(study.one.gen$mean.acc, study.one.gen$mean.self, method = "pearson", alternative = "two.sided")
Pearson's product-moment correlation
data: study.one.gen$mean.acc and study.one.gen$mean.self
t = 7.1936, df = 167, p-value = 2.026e-11
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.3619961 0.5937425
sample estimates:
cor
0.4863771
Q.6. What is r, the correlation coefficient?
A.6. r = 0.4863771
Q.7. Is the correlation significant?
A.7. r is significant
Q.8. What are the values for t and p for the significance test for the correlation?
A.8. t = 7.1936, p-value = 2.026e-11
Second, look at the correlation between mean.acc
and mean.self
in study.two.gen
:
cor.test(study.two.gen$mean.acc, study.two.gen$mean.self, method = "pearson", alternative = "two.sided")
Pearson's product-moment correlation
data: study.two.gen$mean.acc and study.two.gen$mean.self
t = 8.4991, df = 170, p-value = 9.356e-15
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4317217 0.6431596
sample estimates:
cor
0.5460792
Q.9. What is r, the correlation coefficient?
A.9. r = 0.5460792
Q.10. Is the correlation significant?
A.10. r is significant
Q.11. What are the values for t and p for the significance test for the correlation?
A.11. t = 8.4991, p = 9.356e-15
Now we can answer the research question:
- Can people accurately evaluate whether they correctly understand written health information?
- Q.12. What do the correlation estimates tell you is the answer to the research question?
- A.12.
The correlations are positive and significant, indicating that higher
mean.self
(evaluations) are associated with highermean.acc
(understanding), suggesting that people can judge their accuracy of understanding.
Q.13. Can you compare the estimates, given the two datasets, to evaluate if the result in
study.one.gen
is replicated instudy.two.gen
?hint: Q.13. We can judge if the result in a study is replicated in another study by examining if – here – the correlation coefficient is significant in both studies and if the coefficient has the same size and sign in both studies.
A.13. If you compare the correlation estimates from both
study.one.gen
andstudy.two.gen
you can see:first, the correlation is significant in both studies;
second, the correlation is positive in both studies,
third, the correlation is similar in magnitude, about \(r = .5\) in both studies.
This suggests that the association we see in study.one.gen
is replicated in study.two.gen
.
Step 5: Use a linear model to to answer the research questions – multiple predictors
Revise: practice to strengthen skills
Task 11 – Examine the relation between outcome mean accuracy (mean.acc
) and multiple predictors
We specify linear models including as predictors the variables:
- health literacy (
HLVA
); - vocabulary (
SHIPLEY
); - reading strategy (
FACTOR3
).
hint: Task 11 – Use lm()
, as you have done before, see e.g. 2023-24-PSYC122-w18-how-to.R
Task 11 – Examine the predictors of mean accuracy (mean.acc
), first, for the study.one.gen
data
<- lm(mean.acc ~ HLVA + SHIPLEY + FACTOR3, data = study.one.gen)
model summary(model)
Call:
lm(formula = mean.acc ~ HLVA + SHIPLEY + FACTOR3, data = study.one.gen)
Residuals:
Min 1Q Median 3Q Max
-0.40322 -0.05349 0.01152 0.07128 0.18434
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.302030 0.091257 3.310 0.00115 **
HLVA 0.017732 0.003923 4.521 1.17e-05 ***
SHIPLEY 0.005363 0.002336 2.296 0.02291 *
FACTOR3 0.003355 0.001264 2.654 0.00872 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1033 on 165 degrees of freedom
Multiple R-squared: 0.2474, Adjusted R-squared: 0.2337
F-statistic: 18.08 on 3 and 165 DF, p-value: 3.423e-10
Using the model estimates, we can answer the research question:
- What person attributes predict success in understanding?
Inspect the model summary, then answer the following questions:
Q.14. What is the estimate for the coefficient of the effect of the predictor
SHIPLEY
in this model?A.14. 0.005363
Q.15. Is the effect significant?
A.15. It is significant, p < .05
Q.16. What are the values for t and p for the significance test for the coefficient?
A.16. t = 2.296, p = 0.02291
Q.17. Now consider the estimates for all the variables, what do you conclude is the answer to the research question – given the
study.one.gen
data:
- What person attributes predict success in understanding?
hint: Q.17. Can you report the model and the model fit statistics using the language you have been shown in the week 18 lecture?
A.17.
We fitted a linear model with mean comprehension accuracy as the outcome and health literacy (
HLVA
), reading strategy (FACTOR3
), and vocabulary (SHIPLEY
) as predictors. The model is significant overall, with F(3, 165) = 18.08, p < .001, and explains 23% of variance (adjusted R2 = 0.23). Mean accuracy was predicted to be higher given higher scores in health literacy (HLVA
estimate = .018, t = 4.52, p < .001), vocabulary knowledge (SHIPLEY
estimate = .005, t = 2.96, p < .001), and reading strategy (FACTOR3
estimate = .003, t = 2.65, p = .009).
Task 12 – Examine the predictors of mean accuracy (mean.acc
), now, for the study.two.gen
data
<- lm(mean.acc ~ HLVA + SHIPLEY + FACTOR3, data = study.two.gen)
model summary(model)
Call:
lm(formula = mean.acc ~ HLVA + SHIPLEY + FACTOR3, data = study.two.gen)
Residuals:
Min 1Q Median 3Q Max
-0.242746 -0.074188 0.003173 0.075361 0.211357
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.146896 0.076325 1.925 0.05597 .
HLVA 0.017598 0.003589 4.904 2.2e-06 ***
SHIPLEY 0.008397 0.001853 4.533 1.1e-05 ***
FACTOR3 0.003087 0.001154 2.675 0.00822 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.097 on 168 degrees of freedom
Multiple R-squared: 0.3636, Adjusted R-squared: 0.3522
F-statistic: 31.99 on 3 and 168 DF, p-value: < 2.2e-16
Using the model estimates, we can answer the research question:
- What person attributes predict success in understanding?
Inspect the model summary, then answer the following questions:
Q.18. What is the estimate for the coefficient of the effect of the predictor,
SHIPLEY
, in this model?A.18. 0.008397
Q.19. Is the effect significant?
A.19. It is significant, p < .05
Q.20. What are the values for t and p for the significance test for the coefficient?
A.20. t = 4.533, p = 1.1e-05
Q.21. Now consider the estimates for all the variables, what do you conclude is the answer to the research question – given the
study.two.gen
data:
- What person attributes predict success in understanding?
hint: Q.21. Can you report the model and the model fit statistics using the language you have been shown in the week 18 lecture?
A.21.
We fitted a linear model with mean comprehension accuracy as the outcome and health literacy (
HLVA
), reading strategy (FACTOR3
), and vocabulary (SHIPLEY
) as predictors. The model is significant overall, with F(3, 168) = 31.99, p < .001, and explains 35% of variance (adjusted R2 = 0.35). Mean accuracy was predicted to be higher given higher scores in health literacy (HLVA estimate = .018, t = 4.90, p < .001), vocabulary knowledge (SHIPLEY estimate = .008, t = 4.53, p < .001), and reading strategy (FACTOR3 estimate = .003, t = 2.68, p = .008).
Q.22. Are the findings from
study.one.gen
replicated instudy.two.gen
?hint: Q.22. We can judge if the results in an earlier study are replicated in another study by examining if – here – the linear model estimates are significant in both studies and if the coefficient estimates have the same size and sign in both studies.
A.22. If you compare the linear model coefficient estimates from both
study.one.gen
andstudy.two.gen
you can see:first, the
HLVA
,SHIPLEY
andFACTOR3
estimates are significant in bothstudy.one.gen
andstudy.two.gen
;second, the estimates have the same sign – positive – in both studies.
This suggests that the results we see in study.one.gen
are replicated in study.two.gen
.
- Q.23. Are there any important differences between the results of the two studies?
- hint: Q.23. You can look at the estimates but you can also use the model prediction plotting code you used before, see example code in
PSYC122-w18-how-to.R
. - hint: Q.23. – Let’s focus on comparing the
study.one.gen
andstudy.two.gen
estimates for the effect of vocabulary knowledge in both models: we can plot model predictions for comparison:
First: fit the models – using different names for the different models:
<- lm(mean.acc ~ HLVA + SHIPLEY + FACTOR3, data = study.one.gen)
model.one summary(model.one)
Call:
lm(formula = mean.acc ~ HLVA + SHIPLEY + FACTOR3, data = study.one.gen)
Residuals:
Min 1Q Median 3Q Max
-0.40322 -0.05349 0.01152 0.07128 0.18434
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.302030 0.091257 3.310 0.00115 **
HLVA 0.017732 0.003923 4.521 1.17e-05 ***
SHIPLEY 0.005363 0.002336 2.296 0.02291 *
FACTOR3 0.003355 0.001264 2.654 0.00872 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1033 on 165 degrees of freedom
Multiple R-squared: 0.2474, Adjusted R-squared: 0.2337
F-statistic: 18.08 on 3 and 165 DF, p-value: 3.423e-10
<- lm(mean.acc ~ HLVA + SHIPLEY + FACTOR3, data = study.two.gen)
model.two summary(model.two)
Call:
lm(formula = mean.acc ~ HLVA + SHIPLEY + FACTOR3, data = study.two.gen)
Residuals:
Min 1Q Median 3Q Max
-0.242746 -0.074188 0.003173 0.075361 0.211357
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.146896 0.076325 1.925 0.05597 .
HLVA 0.017598 0.003589 4.904 2.2e-06 ***
SHIPLEY 0.008397 0.001853 4.533 1.1e-05 ***
FACTOR3 0.003087 0.001154 2.675 0.00822 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.097 on 168 degrees of freedom
Multiple R-squared: 0.3636, Adjusted R-squared: 0.3522
F-statistic: 31.99 on 3 and 168 DF, p-value: < 2.2e-16
Second, create prediction plots for the SHIPLEY effect for each model:
<- ggpredict(model.one, "SHIPLEY")
dat.one <- plot(dat.one) + labs(title = "Study One")
plot.one <- ggpredict(model.two, "SHIPLEY")
dat.two <- plot(dat.two) + labs(title = "Study Two") plot.two
– Third, show the plots side-by-side
+ plot.two plot.one
- A.23. If we compare the estimates for the coefficient of the
SHIPLEY
effect in thestudy.one.gen
andstudy.two.gen
models we can see that:
- the
SHIPLEY
effect is significant in bothstudy.one.gen
andstudy.two.gen
;
- the effect is positive in both studies;
- the coefficient estimate is a bit bigger in
study.two.gen
than instudy.two.gen
; - the prediction plots suggest the prediction line slope is steeper in
study.two.gen
.
This suggests that the effect of vocabulary is stronger in Study Two.
- Why that is should be the target of further investigation.