download.file("https://github.com/lu-psy-r/statistics_for_psychologists/blob/main/PSYC122/data/week17/study-one-general-participants.csv?raw=true", destfile = "study-one-general-participants.csv")
6. Week 17 – Better understanding the linear model
Written by Rob Davies
This page is now live for you to use: Welcome!
- Here is a link to the sign-in page for R-Studio Server
Week 17: Introduction
Welcome to your overview of our work together in PSYC122 Week 17.
Putting it all together
- We will complete four classes in weeks 16-19.
- These classes are designed to help you to revise and to put into practice some of the key ideas and skills you have been developing in the first year research methods modules PSYC121, PSYC123 and PSYC124.
- We will do this in the context of a live research project with potential real world impacts: the Clearly Understood project.
Our learning goals
In Week 17, we aim to further develop skills in analyzing and in visualizing psychological data.
We will do this in the context of the Clearly Understood project: our focus will be on what makes it easy or difficult for people to understand written health information.
In the Week 17 class, we will aim to answer the research question:
- What person attributes predict success in understanding?
In psychological science, research questions like our question can be examined using linear models.
When we do these analyses, we will need to think about how we report the results:
- we usually need to report information about the kind of model we specify;
- and we will need to report the nature of the association estimated in our model.
This means we will usually need to decide:
- is the association significant?
- does the association reflect a positive or negative relationship between outcome and predictor?
- is the association relatively strong or weak?
Our thinking, and our decision-making, will be helped by developing our data visualization skills. At every stage, as we work, we will visualize the data to:
- Understand the shape of the relationships we may observe or predict.
- We will aim to build skills in producing professional-looking plots for our audiences.
Lectures
Before you go on to the activities in Section 5, watch the lectures:
The lecture for this week is presented in four short parts. You can view video recordings of the lectures using Panopto, by clicking on the links shown following.
- Overview (19 minutes): What we are doing in Week 17 – thinking critically about predicting people.
- Using linear models to do prediction (11 minutes): How we answer live research questions by using linear models to predict behaviour.
- How linear models work (9 minutes): How we can visualize and think about predictions, and about the difference between what we predict and what we observe when we study people.
- Interpreting, reporting and visualizing linear model results (15 minutes): Identifying, interpreting and communicating the key statistics.
The slides presented in the videos can be downloaded either as a web page or as a Word document.
- The slides exactly as presented (21 MB).
- The slides converted to a Word .docx (11 MB).
You can download the web page .html
file and click on it to open it in any browser (e.g., Chrome, Edge or Safari). The slide images are high quality so the file is quite big and may take a few seconds to download.
You can download the .docx
file and click on it to open it as a Word document that you can then edit. Converting the slides to a .docx distorts some images but the benefit of the conversion is that it makes it easier for you to add your notes.
The lectures have two main areas of focus
1. Understanding the scientific process
I outline the steps through which a psychological scientist may progress, in logic and practice, from research questions to hypotheses to analyses.
We are learning data analysis methods. But the key point is that we use these methods in the context of a research project with concerns, aims, methodological assumptions, and choices. This is generally true so the aim is to present a concrete example of how research works.
As part of the discussion, I raise questions you might want to consider. These questions are also part of the context for our data analysis, because they help to inform how you interpret or evaluate the results. These questions are examples of the critical evaluation that you will need to develop through your studies.
2. The linear model
We look at how the linear model can be used to address research questions in the context of the Clearly understood health comprehension project. But I aim to outline some general ideas about why we use the linear model technique, and how it works.
I build on work you have done with Margriet Groen in earlier PSYC122 classes, so that we can strengthen understanding, and extend skills.
The lectures end with a discussion of the critical information you must identify and extract when you view the summary of a linear model results.
I then show you how to report the results. I give you examples of the conventional language you can use to report your results.
To work with the recordings:
- Watch the video parts right through.
- Use the printable versions of the slides (provided on Moodle) to make notes.
- Try out the coding exercises in the how-to guide and the acitivity tasks or questions (Section 5) to learn how to construct visualizations and do analyses.
Reading: Links to other classes
We do not provide further reading for this class but you will find it helpful to revise some of the key ideas you have been learning about PSYC122 and in other modules.
- The lectures in PSYC123 on: the scientific method; reliability and validity; experimental design, especially between-subjects studies; hypothesis testing; and precise hypotheses.
- The lecture in PSYC122 on correlations.
Pre-lab activities
Pre-lab activity 1
In weeks 16-19, we will be working together on a research project to investigate how people vary in their response to health advice.
Completing the project involves collecting responses from PSYC122 students: you.
To enter your responses, we invite you to complete a short survey.
Complete the survey by clicking on the link here
In our week 19 class activity, we will analyze the data we collect here.
The survey should take about 20 minutes to complete.
Taking part in the survey is completely voluntary. You can stop at any time without completing the survey if you do not want to finish it. If you do not want to do the survey, you can do an alternative activity (see below).
All responses will be recorded completely anonymously.
Pre-lab activity alternative option
If you do not want to complete the survey, we invite you to read the pre-registered research plan for the PSYC122 health advice research project.
Lab activities
Introduction
We will do our practical lab work to develop your skills in the context of the Clearly Understood project.
- Our focus will be on what makes it easy or difficult for people to understand written health information.
In these classes, we will complete a research project to answer the research questions:
- What person attributes predict success in understanding health information?
- Can people accurately evaluate whether they correctly understand written health information?
Get ready
Download the data
Click on the link: 122_week17_for_students.zip to download the data files folder. Then upload the contents to the new folder you created in RStudio Server.
The downloadable .zip folder includes the data files:
study-one-general-participants.csv
study-two-general-participants.csv
and the R Markdown .Rmd
:
2023-24-PSYC122-w17-how-to.Rmd
If you can’t upload these files to the server – this affects some students – you can use some code to get R to do it for you: uncover the code box below to reveal the code to do this.
- You can use the code below to directly download the file you need in this lab activity to the server.
- Remember that you can copy the code to your clipboard by clicking on the ‘clipboard’ in the top right corner.
- Get the
study-one-general-participants.csv
data
- Get the
study-two-general-participants.csv
data
download.file("https://github.com/lu-psy-r/statistics_for_psychologists/blob/main/PSYC122/data/week17/study-two-general-participants.csv?raw=true", destfile = "study-two-general-participants.csv")
- Get the
2023-24-PSYC122-w17-how-to.Rmd
how-to guide
download.file("https://github.com/lu-psy-r/statistics_for_psychologists/blob/main/PSYC122/data/week17/2023-24-PSYC122-w17-how-to.Rmd?raw=true", destfile = "2023-24-PSYC122-w17-how-to.Rmd")
Check: What is in the data files?
Each of the data files we will work with has a similar structure, as you can see in this extract.
participant_ID | mean.acc | mean.self | study | AGE | SHIPLEY | HLVA | FACTOR3 | QRITOTAL | GENDER | EDUCATION | ETHNICITY |
---|---|---|---|---|---|---|---|---|---|---|---|
studytwo.1 | 0.4107143 | 6.071429 | studytwo | 26 | 27 | 6 | 50 | 9 | Female | Higher | Asian |
studytwo.10 | 0.6071429 | 8.500000 | studytwo | 38 | 24 | 9 | 58 | 15 | Female | Secondary | White |
studytwo.100 | 0.8750000 | 8.928571 | studytwo | 66 | 40 | 13 | 60 | 20 | Female | Higher | White |
studytwo.101 | 0.9642857 | 8.500000 | studytwo | 21 | 31 | 11 | 59 | 14 | Female | Higher | White |
You can use the scroll bar at the bottom of the data window to view different columns.
You can see the columns:
participant_ID
participant code;mean.acc
average accuracy of response to questions testing understanding of health guidance (varies between 0-1);mean.self
average self-rated accuracy of understanding of health guidance (varies between 1-9);study
variable coding for what study the data were collected inAGE
age in years;HLVA
health literacy test score (varies between 1-16);SHIPLEY
vocabulary knowledge test score (varies between 0-40);FACTOR3
reading strategy survey score (varies between 0-80);GENDER
gender code;EDUCATION
education level code;ETHNICITY
ethnicity (Office National Statistics categories) code.
It is always a good idea to view the dataset – click on the name of the dataset in the R-Studio Environment
window, and check out the columns, scroll through the rows – to get a sense of what you are working with.
Lab activity 1: Work with the How-to
guide
The how-to
guide comprises an .Rmd file:
2023-24-PSYC122-w17-how-to.Rmd
It is full of advice and example code.
The code in the how-to
guide was written to work with the data file:
study-one-general-participants.csv
.
We show you how to do everything you need to do in the lab activity (Section 5.4, next) in the how-to
guide.
- Start by looking at the
how-to
guide to understand what steps you need to follow in the lab activity.
We will take things step-by-step.
We split .Rmd scripts by steps, tasks and questions:
- different steps for different phases of the analysis workflow;
- different tasks for different things you need to do;
- different questions to examine different ideas or coding challenges
- Make sure you start at the top of the
.Rmd
file and work your way, in order, through each task. - Complete each task before you move on to the next task.
In the activity Section 5.4, we are going to work through the following tasks.
- Notice that we are gradually building up our skills: consolidating what we know; revising important learning; and extending ourselves to acquire new skills.
- Over time, we will refer less and less to what we have learned before.
Step 1: Set-up
- Empty the R environment – using
rm(list=ls())
- Load relevant libraries – using
library()
Step 2: Load the data
- Read in the data file – using
read_csv()
- Inspect the data – using
head()
andsummary()
Step 3: Use histograms to examine the distributions of variables
- Draw histograms to examine the distributions of variables – using
ggplot()
andgeom_histogram()
- Practice editing the appearance of a histogram plot step-by-step
Step 4: Now draw scatterplots to examine associations between variables
- Create scatterplots to examine the association between some variables
- Edit the appearance of each plot step-by-step
Step 5: Use correlation to to answer the research questions
- Examine the correlations between the outcome variable and predictor variables
Step 6: Use a linear model to to answer the research questions
- Examine the relation between outcome mean accuracy (
mean.acc
) and each of the potential predictors:SHIPLEY
,HLVA
andAGE
Step 7: Use a linear model to generate predictions
- Use a model we have fitted to plot model predictions
If you are unsure about what you need to do, look at the advice in 2023-24-PSYC122-w17-how-to.Rmd
on how to do the tasks, with examples on how to write the code.
You will see that you can match a task in the activity Section 5.4 to the same task in the how-to
guide. The how-to
shows you what function you need and how you should write the function code.
This process of adapting demonstration code is a process critical to data literacy and to effective problem solving in modern psychological science.
Don’t forget: You will need to change the names of the dataset or the variables to complete the tasks in the activity.
Lab activity 2
OK: now let’s do it!
In the following, we will guide you through the tasks and questions step by step. You will learn more if you follow this advice:
- We will hide the code to do some tasks behind a drop-down button. Try to write and run the code for yourself first.
- We won’t always give you the code required to do something: this gives you the chance to check what you have learned by trying out your code without the answer in front of you.
- We will not at first give you the answers to questions about the data or about the results of analyses.
- An answers version of the workbook will be provided after the last lab session (check the answers then in Section 6) so that you can check whether your independent work has been correct.
Questions
Step 1: Set-up
To begin, we set up our environment in R.
Task 1 – Run code to empty the R environment
rm(list=ls())
Task 2 – Run code to load relevant libraries
library("tidyverse")
Step 2: Load the data
Task 3 – Read in the data file we will be using
The data file for the workbook is called:
study-two-general-participants.csv
Use the read_csv()
function to read the data file into R.
<- read_csv("study-two-general-participants.csv") study.two.gen
When you code this, you can choose your own file name, but be sure to give the data object you create a distinct name e.g. study.two.gen
.
Task 4 – Inspect the data file
Use the summary()
or head()
functions to take a look
summary(study.two.gen)
Even though you have done this before, you will want to do it again, here.
- Pay attention to what you see, for the numeric variables, in the information about minimum (Min.) and maximum (Max.) values.
Step 3: Use histograms to examine the distributions of variables
Revise: make sure you are confident about doing these things
Task 5 – Draw histograms to examine the distributions of variables
Use ggplot()
with geom_histogram()
.
When we create a plot, we take things step-by-step.
Here’s an example you have seen before: run the lines of code and see the result in the Plots
window in R-Studio
.
ggplot(data = study.two.gen, aes(x = mean.acc)) +
geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
These are the steps, set out one at a time:
ggplot(...)
you tell R you want to make a plot using theggplot()
functionggplot(data = study.two.gen ...)
you tell R you want to make a plot with thestudy.two.gen
dataggplot(..., aes(x = mean.acc))
you tell R that you want to make a plot with the variablemean.acc
– here, you specify the aesthetic mapping,x = mean.acc
ggplot(...) + geom_histogram()
you tell R you want to plot values ofmean.acc
as a histogram
Notice that the code works the same whether we have the different bits of code on the same line or in a series of lines.
Task 6 – Practice editing the appearance of a histogram plot step-by-step
Start by constructing a basic histogram.
- Draw a histogram plot to visualize the distribution of whichever numeric variable from the
study.two.gen
dataset you please.
- Use the line-by-line format to break the plot code into steps.
- It will make it easier to read, and it will make it easier to add edit.
- Pick numeric variable in the dataset.
- Run the code to produce a histogram of the variable values.
Can you work out how to do it without looking at the code example?
Click on the button to see the code example: compare it to the code you wrote.
ggplot(data = study.two.gen, aes(x = SHIPLEY)) +
geom_histogram()
We are going to revise editing:
- The appearance of the bars using
binwidth
; - The colour of the background using
theme_bw()
; - The appearance of the labels using `labs().
Then we are going to try some new moves:
- Setting the x-axis limits to reflect the full range of possible scores on the x-axis variable;
- Adding annotation – here, a vertical line – indicating the sample average for a variable.
Q.1. Edit the appearance of the bars by specifying a
binwidth
value.
ggplot(data = study.two.gen, aes(x = SHIPLEY)) +
geom_histogram(binwidth = 2)
Q.2. Then add an edit to the appearance of the background using
theme_bw()
.
ggplot(data = study.two.gen, aes(x = SHIPLEY)) +
geom_histogram(binwidth = 2) +
theme_bw()
Q.3. Then add an edit to the appearance of the labels using
labs()
.
ggplot(data = study.two.gen, aes(x = SHIPLEY)) +
geom_histogram(binwidth = 2) +
theme_bw() +
labs(x = "SHIPLEY", y = "frequency count")
Introduce: make some new moves
Q.4. Now add an edit by setting the x-axis limits using
x.lim()
.
ggplot(data = study.two.gen, aes(x = SHIPLEY)) +
geom_histogram(binwidth = 2) +
theme_bw() +
labs(x = "Vocabulary (SHIPLEY)", y = "frequency count") +
xlim(0,40)
Q.5. Then add an edit to draw a vertical line to show the mean value of the variable you are plotting..
ggplot(data = study.two.gen, aes(x = SHIPLEY)) +
geom_histogram(binwidth = 2) +
theme_bw() +
labs(x = "Vocabulary (SHIPLEY)", y = "frequency count") +
xlim(0,40) +
geom_vline(xintercept = mean(study.two.gen$SHIPLEY), colour = "red", size = 1.5)
Q.6. Can you find information on how to define the limits on the x-axis and on the y-axis?
You can see the information in this week’s how-to
guide but try a search online for “ggplot reference xlim”.
Q.7. Can you find information on how to a reference line?
You can see the information in this week’s how-to but try a search online for “ggplot reference vline”.
Step 4: Now draw scatterplots to examine associations between variables
Consolidation: should be no surprises here
But if you want to remind yourself how to do things, click on the box to reveal hint information.
- We are working with
geom_point()
and you need x and y aesthetic mappings. - The outcome variable
mean.acc
has to be mapped to the y-axis using...y = ...
This is how the target scatterplot code works.
ggplot(data = data.set, aes(x = predictor.variable, y = outcome.variable)) +
geom_point()
The plot code moves through the following steps:
ggplot(...)
make a plot;ggplot(data = data.set, ...)
working with thedata.set
, using the name you gave the dataset you are working with;ggplot(...aes(x = predictor.variable, y = outcome.variable))
using two aesthetic mappings
x = predictor.variable
maps values of thepredictor.variable
(whatever it is called) to x-axis (horizontal, left to right) positions;y = outcome.variable
maps values of theoutcome.variable
(whatever it is called) to y-axis (vertical, bottom to top) positions;
geom_point()
show the mappings as points.
Task 7 – Create a scatterplot to examine the association between some variables
Create three scatterplots to visualize the relationship between (1.) the outcome mean.acc
and (2.) each of three numeric potential predictor variables SHIPLEY
, HLVA
and AGE
.
Check first if you can write the code you need to produce each scatterplot. Click on the button to see the code example: compare it to the code you wrote.
- Notice that R will tolerate some variation in how code is written.
- This is like any language where we can ask for the same thing in different ways.
Check out the example code for each of the scatterplots we are asking you to do.
- Notice what changes and what stays the same.
ggplot(data = study.two.gen, aes(x = SHIPLEY, y = mean.acc)) +
geom_point()
ggplot(data = study.two.gen, aes(x = HLVA, y = mean.acc)) +
geom_point()
ggplot(data = study.two.gen, aes(x = AGE, y = mean.acc)) +
geom_point()
Revise: make sure you are confident about doing these things
Task 8 – Edit the appearance of each plot step-by-step
- You may want to use the same plot appearance choices for all plots.
- Producing plots with a consistent appearance will make it easier for your audience to read your plots.
- You can find links to reference information on options in the how-to guide.
- Use the information to make the plots pleasing in appearance to you.
- Do not be afraid to select, copy then paste code to re-use it and save yourself the effort of typing out the code over and over again.
- But be careful to make sure that you change variable names, and that things like axis values are sensible for each variable.
Q.8. First, edit the appearance of the points using
alpha
,size
,shape
, andcolour
.
Check out the example code for each of the scatterplots.
- Notice what changes and what stays the same.
ggplot(data = study.two.gen, aes(x = SHIPLEY, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square')
ggplot(data = study.two.gen, aes(x = HLVA, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square')
ggplot(data = study.two.gen, aes(x = AGE, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square')
Q.9. Then edit the colour of the background using
theme_bw()
.
Check out the example code for each of the scatterplots.
- Notice what changes and what stays the same.
ggplot(data = study.two.gen, aes(x = SHIPLEY, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw()
ggplot(data = study.two.gen, aes(x = HLVA, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw()
ggplot(data = study.two.gen, aes(x = AGE, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw()
Q.10. Then edit the appearance of the labels using
labs()
.
Check out the example code for each of the scatterplots.
- Notice what changes and what stays the same.
ggplot(data = study.two.gen, aes(x = SHIPLEY, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw() +
labs(x = "SHIPLEY", y = "mean accuracy")
ggplot(data = study.two.gen, aes(x = HLVA, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw() +
labs(x = "HLVA", y = "mean accuracy")
ggplot(data = study.two.gen, aes(x = AGE, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw() +
labs(x = "Age (Years)", y = "mean accuracy")
Introduce: make some new moves
Q.11. Then set the x-axis and y-axis limits to the minimum-maximum ranges of the variables you are plotting.
You can set axis limits by adding the xlim()
and ylim()
function calls to the chunk of code you have written to produce each plot.
- The task, here, is to work out what numeric values you should enter inside
xlim()
orylim()
. - The code works if you enter values like this:
xlim(minimum, maximum)
andylim(minimum, maximum)
. - Where
minimum, maximum
are the numbers representing the smallest possible (minimum) and largest possible (maximum) value for each variable.
- For these plots the y-axis limits will be the same because the outcome stays the same.
- But the x-axis limits will be different for each different predictor variable.
- Check out the information in the
summary()
you got of the dataset. - The minimum values for the variables will often (not always) be 0 e.g. if you are looking at data from ability tests and people who do the tests can get 0 (because none of their responses are correct). However, if you are looking at e.g. ratings data then the minimim value could be 1 (e.g. because people are asked to rate something on a scale from 1-9).
- The maximum values for the variables is not necessarily the largest value recorded for that variable in the sample data-set you are working with (e.g., because nobody in your sample got all test questions correct). Thus, you need to use information about measurement design, see Section 5.2.2.
Check first if you can write the code you need to produce each scatterplot. Click on the button to see the code example: compare it to the code you wrote.
Check out the example code for each of the scatterplots.
- Notice what changes and what stays the same.
ggplot(data = study.two.gen, aes(x = SHIPLEY, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw() +
labs(x = "SHIPLEY", y = "mean accuracy") +
xlim(0, 40) + ylim(0, 1)
ggplot(data = study.two.gen, aes(x = HLVA, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw() +
labs(x = "HLVA", y = "mean accuracy") +
xlim(0, 16) + ylim(0, 1)
ggplot(data = study.two.gen, aes(x = AGE, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
theme_bw() +
labs(x = "Age (Years)", y = "mean accuracy") +
xlim(0, 80) + ylim(0, 1)
Step 5: Use correlation to to answer the research questions
Revise: make sure you are confident about doing these things
One of our research questions is:
- What person attributes predict success in understanding?
Task 9 – Examine the correlations between the outcome variable and predictor variables
You need to run three separate correlations:
- between mean accuracy and
SHIPLEY
; - between mean accuracy and
HLVA
; - between mean accuracy and
AGE
.
- Use
cor.test()
to do the correlation analysis. - You can look at the
how-to
guide or review previous materials (e.g. ?@sec-wk12-labactivities or ?@sec-wk16-lab-activities-1-questions) for more advice.
Check first if you can write the code you need to complete each correlation analysis. Click on the button to see the code example: compare it to the code you wrote.
Check out the example code for doing each of the correlation analyses.
- Notice what changes and what stays the same.
cor.test(study.two.gen$SHIPLEY, study.two.gen$mean.acc, method = "pearson", alternative = "two.sided")
cor.test(study.two.gen$HLVA, study.two.gen$mean.acc, method = "pearson", alternative = "two.sided")
cor.test(study.two.gen$AGE, study.two.gen$mean.acc, method = "pearson", alternative = "two.sided")
Now use the results from the correlations to answer the following questions.
Q.12. What is
r
, the coefficient for the correlation betweenmean.acc
andSHIPLEY
?
Q.13. Is the correlation between
mean.acc
andHLVA
significant?
Q.14. What are the values for t and p for the significance test for the correlation between
mean.acc
andAGE
?
Q.15. For which pair of outcome-predictor variables is the correlation the largest?
Q.16. What is the sign or direction of each of the correlations?
Step 6: Use a linear model to to answer the research questions
Introduce: Make some new moves
One of our research questions is:
- What person attributes predict success in understanding?
Task 10 – Examine the relation between outcome mean accuracy (mean.acc
) and each of the predictors: SHIPLEY
, HLVA
and AGE
You need to run three separate lm()
analyses:
- with mean accuracy as the outcome and
SHIPLEY
as the predictor; - with mean accuracy as the outcome and
HLVA
as the predictor; - with mean accuracy as the outcome and
AGE
as the predictor.
You need to use lm()
to do the analyses.
- Be careful to identify the outcome and predictor variables correctly.
- Remember that analysis code is arranged like this:
lm(outcome.variable ~ predictor.variable, data = data.set)
With:
lm()
asking R to do the linear model analysis;outcome.variable ~ ...
specified on the left of the~
;- the
predictor.variable ~ ...
specified on the right of the~
; - and
data.set
identifying to R what dataset you are working with.
Notice that R has a general formula syntax:
outcome ~ predictor *or* y ~ x
- and uses the same format across a number of different functions;
- each time, the left of the tilde symbol
~
is some output or outcome; - and the right of the tilde
~
is some input or predictor or set of predictors.
Check first if you can write the code you need to complete each linear model analysis. Click on the button to see the code example: compare it to the code you wrote.
Check out the example code for each of the models.
- Notice what changes and what stays the same.
.1 <- lm(mean.acc ~ SHIPLEY, data = study.two.gen)
modelsummary(model.1)
.2 <- lm(mean.acc ~ HLVA, data = study.two.gen)
modelsummary(model.2)
.3 <- lm(mean.acc ~ AGE, data = study.two.gen)
modelsummary(model.3)
If you look at the model summary you can answer the following questions.
Q.17. What is the estimate for the coefficient of the effect of the predictor
HLVA
onmean.acc
?
Q.18. Is the effect significant?
Q.19. What are the values for t and p for the significance test for the coefficient?
Q.20. How would you describe in words the shape or direction of the association between
HLVA
andmean.acc
?
Q.21. How how would you describe the relations apparent between the predictor and outcome in all three models?
Step 7: Use a linear model to generate predictions
Introduce: Make some new moves
One of our research questions is:
- What person attributes predict success in understanding?
Task 11 – We can use the model we have just fitted to plot the model predictions
We are going to draw a scatterplot and add a line.
- The line will show the model predictions, given the model intercept and effect coefficient estimates.
You can see reference information here:
First fit a model and get the summary: model the relationship between mean.acc
and HLVA
.
Check first if you can write the code you need to complete the linear model analysis. Click on the button to see the code example: compare it to the code you wrote.
<- lm(mean.acc ~ HLVA, data = study.two.gen)
model summary(model)
You will need to record some information from the model summary so you can use it next.
Q.22. What is the coefficient estimate for the intercept?
Q.23. What is the coefficient estimate for the slope of
HLVA
(see earlier)?
Second, draw a prediction plot, using the geom_point()
to draw a scatterplot and using the geom_abline()
function to draw the prediction line representing the association between this outcome and predictor.
Check first if you can write the code you need to produce the prediction plot. Click on the button to see the code example: compare it to the code you wrote.
ggplot(data = study.two.gen, aes(x = HLVA, y = mean.acc)) +
geom_point(alpha = 0.5, size = 2, colour = "blue", shape = 'square') +
geom_abline(intercept = 0.522016, slope = 0.026207,
colour = "red", size = 1.5) +
theme_bw() +
labs(x = "HLVA", y = "mean accuracy") +
xlim(0, 15) + ylim(0, 1)
You have now completed the Week 17 questions.
Predicting human behaviour is at the heart of:
- Psychological science, and our collective attempt to understand ourselves.
- Behavioural analytics, and the ways businesses work with what we know about people.
This is an important step in your developmental journey: Well done!
- We will continue to deepen and extend your skills and understanding but everything builds on the key lessons we have been learning here.
Answers
When you have completed all of the lab content, you may want to check your answers with our completed version of the script for this week.
The .Rmd
script containing all code and all answers for each task and each question will be made available after the final lab session has taken place.
You can download the script by clicking on the link: 2023-24-PSYC122-w17-workbook-answers.Rmd when it is available.
Or by copying the code into the R
Console
window and running it to get the2023-24-PSYC122-w17-workbook-answers.Rmd
loaded directly into R:
download.file("https://github.com/lu-psy-r/statistics_for_psychologists/blob/main/PSYC122/data/2023-24-PSYC122-w17-workbook-answers.Rmd?raw=true", destfile = "2023-24-PSYC122-w17-workbook-answers.Rmd")
We set out answers information the Week 17 Better understanding the linear model questions, below.
- We focus on the Lab activity 2 questions where we ask you to interpret something or say something.
- We do not show questions where we have given example or target code in the foregoing lab activity Section 5.4.
You can see all the code and all the answers in 2023-24-PSYC122-w17-workbook-answers.Rmd
.
Answers
Click on a box to reveal the answer.
Questions
Q.6. Can you find information on how to define the limits on the x-axis and on the y-axis?
You can see the information in this week’s how-to
guide but try a search online for “ggplot reference xlim”.
Q.7. Can you find information on how to a reference line?
You can see the information in this week’s how-to but try a search online for “ggplot reference vline”.
One of our research questions is:
- What person attributes predict success in understanding?
Examine the correlations between the outcome variable and predictor variables.
You need to run three separate correlations:
- between mean accuracy and
SHIPLEY
; - between mean accuracy and
HLVA
; - between mean accuracy and
AGE
.
Check out the example code for doing each of the correlation analyses.
cor.test(study.two.gen$SHIPLEY, study.two.gen$mean.acc, method = "pearson", alternative = "two.sided")
cor.test(study.two.gen$HLVA, study.two.gen$mean.acc, method = "pearson", alternative = "two.sided")
cor.test(study.two.gen$AGE, study.two.gen$mean.acc, method = "pearson", alternative = "two.sided")
Now use the results from the correlations to answer the following questions.
Q.12. What is
r
, the coefficient for the correlation betweenmean.acc
andSHIPLEY
?
Q.13. Is the correlation between
mean.acc
andHLVA
significant?
Q.14. What are the values for t and p for the significance test for the correlation between
mean.acc
andAGE
?
Q.15. For which pair of outcome-predictor variables is the correlation the largest?
Q.16. What is the sign or direction of each of the correlations?
Examine the relation between outcome mean accuracy (mean.acc
) and each of the predictors: SHIPLEY
, HLVA
and AGE
You need to run three separate lm()
analyses:
- with mean accuracy as the outcome and
SHIPLEY
as the predictor; - with mean accuracy as the outcome and
HLVA
as the predictor; - with mean accuracy as the outcome and
AGE
as the predictor.
Check out the example code for each of the models.
.1 <- lm(mean.acc ~ SHIPLEY, data = study.two.gen)
modelsummary(model.1)
.2 <- lm(mean.acc ~ HLVA, data = study.two.gen)
modelsummary(model.2)
.3 <- lm(mean.acc ~ AGE, data = study.two.gen)
modelsummary(model.3)
If you look at the model summary you can answer the following questions.
Q.17. What is the estimate for the coefficient of the effect of the predictor
HLVA
onmean.acc
?
Q.18. Is the effect significant?
Q.19. What are the values for t and p for the significance test for the coefficient?
Q.20. How would you describe in words the shape or direction of the association between
HLVA
andmean.acc
?
Q.21. How how would you describe the relations apparent between the predictor and outcome in all three models?
We are going to draw a scatterplot and add a line.
- The line will show the model predictions, given the model intercept and effect coefficient estimates.
First fit a model and get the summary: model the relationship between mean.acc
and HLVA
.
You will need to record some information from the model summary so you can use it next.
Q.22. What is the coefficient estimate for the intercept?
Q.23. What is the coefficient estimate for the slope of
HLVA
(see earlier)?
Online Q&A
You will find, below, a link to the video recording of the Week 17 online Q&A after it has been completed.