Week 18. Developing linear mixed-effects models

Written by Rob Davies

Week 18 Developing linear mixed-effects models: workbook overview

Welcome to your overview of the work we will do together in Week 18.

We have been learning to:

To recognize the situations where we shall see multilevel structured data and therefore where we will need to apply multilevel or mixed-effects models.
To understand the nature and the advantages of these models: what they are, and why they work better than other kinds of models, given multilevel data.
To practice how we code for mixed-effects models, and how we read or write about the results.

We now need to develop our understanding and skills further. And we now need to examine some of the complexities that we may face when we work with mixed-effects models.

The key idea

Shrinkage or regularization means that models of data should be excited by the data but not too excited.

Targets

We have three capacities that we are seeking to develop this week:

to understand mixed-effects models;
to work with these models practically or efficiently in R;
and to communicate their results effectively (to ourselves and others).

Targets

You can test your development against this checklist of targets for learning.

We want to develop the capacity to understand mixed-effects models, the capacity to:

recognize where data have a multilevel structure;
recognize where multilevel or mixed-effects models are required;
distinguish the elements of a mixed-effects model, including fixed effects and random effects;
explain how random effects can be understood in terms of random differences (or deviations) between groups or classes or individuals, in intercepts or slopes;
explain how random effects can be understood in terms of variances, as a means to account for random differences between groups or classes or individuals in intercepts or slopes;
explain how mixed-effects models work better than linear models, for multilevel structured data;
explain how mixed-effects models work better because they allow partial-pooling of estimates.

We want to develop the capacity to work practically with mixed-effects models in R, the capacity to:

specify a mixed-effects model in lmer() code;
identify how the mixed-effects model code varies, depending on the kinds of random effects that are assumed;
identify the elements of the output or results that come from an lmer() mixed-effects analysis;
interpret the fixed-effects estimates;
interpret the random effects estimates, including both the variance and covariance estimates.

We want to develop the capacity to communicate the results of mixed-effects models effectively, to ourselves and to others, the capacity to:

describe in words and summary tables the results of a mixed-effects model;
visualize the effects estimates or predictions from a mixed-effects model.

Learning resources

You will see, next, the lectures we share to explain the concepts you will learn about, and the practical data analysis skills you will develop. Then you will see information about the practical materials you can use to build and practise your skills.

Every week, you will learn best if you first watch the lectures then do the practical exercises.

Linked resources

To support your learning, you can read about the ideas and the practical coding required for analyses in the chapters I wrote for this course.

We learned about multilevel structured data in the conceptual introduction to multilevel data and the workbook introduction to multilevel data.
We then deepened our understanding by looking at the analysis of data from studies with repeated-measures designs in the conceptual introduction to linear mixed-effects models and the workbook introduction to mixed-effects models.
This week, you can find more in-depth explanations of analysis methods and key ideas in the chapter on developing linear mixed-effects models.

Lectures

The lecture materials for this week are presented in three short parts.

Click on a link and your browser should open a tab showing the Panopto video for the lecture part.

Part 1 (13 minutes) Developing the capacity to understand and work with mixed-effects models: what we want to be able to do, and why; our working example data, and the study.

Part 2 (13 minutes): Data tidying, data transformation, and why we do this work; sample data, variation and the benefits of pooling.

Part 3 (24 minutes): The elements and the logic of mixed-effects models; the impact of including random effects, pooling, shrinkage and regularisation; how mixed-effects models are estimated.

Lecture slides

Download the lecture slides

You can download the lecture slides in three different versions:

402-week-19-LME-3.pdf: high resolution .pdf, exactly as delivered [1.3 MB];
402-week-19-LME-3_1pp.pdf: printable version, one-slide-per-page [1.2 MB];
402-week-19-LME-3_6pp.pdf: printable version, six-slides-per-page [1.3 MB].

The high resolution version is the version delivered for the lecture recordings. To make the slides easier to download, I produced lower resolution versions: 1pp and 6pp. These should be easier to download and print out if that is what you want to do.

Practical materials: data and R-Studio

We will be working with the ML word recognition study data-set. ML examined visual word recognition in younger and older adults using the lexical decision task.

In lexical decision, participants are presented with a stimulus: a string of letters that is either a real word (e.g., ‘car’) or a made-up or non-word (e.g., ‘cas’). Participants are required to respond to the stimulus by pressing a button to indicate either that they think the stimulus is a word or that they think it is a non-word. Each complete sequence of events, in which a stimulus is presented and a response is recorded, is known as a trial. In the lexical decision task implemented by ML, all study participants were presented with a mix of 160 word stimuli and 160 non-word stimuli, in random order, in a total of 320 trials.

Each stimulus was presented one at a time on the computer screen. The critical outcome measure was the reaction time (RT) or latency for each response. Observed RT represents the interval of time from the moment the stimulus was first presented (the stimulus onset) to the moment the response was made (the response onset).

The critical feature of the study, here, is that we have an outcome – a decision response – observed multiple times (for each stimulus) for each participant. We shall be analyzing the speed of response, reaction time (RT), measured in milliseconds (ms).

You can read more about these data in the chapter on developing linear mixed-effects models.

Instead of posing a simple and general research question, we shall orient our work around a set of quite specific predictions. ML hypothesized:

Effects of stimulus attributes

Predicting that words that are shorter, that look like more other words, and that appear frequently in the language will be easier to recognize.

Effects of participant attributes

Predicting that older readers would be faster and more accurate than younger readers in word recognition.

Effects of interactions between the effects of word attributes and person attributes.

Predicting that better (older) readers will show smaller effects of word attributes.

Important

Get the data: get the data file and the .R script you can use to do the exercises that will support your learning.

You can download the files folder for this chapter by clicking on the link 03-mixed.zip.

The practical materials folder includes a data file and an .R script:

subjects.behaviour.words-310114.csv which holds information about the (word) stimuli, participants, and the responses recorded in the ML study.

The .csv file is a comma separated values file and can be opened in Excel.

The data file is collected together with the .R script:

402-03-mixed-workbook.R the workbook you will need to do the practical exercises.

Important

You can access the sign-in page for R-Studio Server here

Practical materials guide

We have reached the stage where our learning targets can be detailed – here, we are concerned with:

being able to specify a mixed-effects model in lmer() code;
being able to identify how the mixed-effects model code varies depending on the kinds of random effects;
being able to identify the elements of the output or results that come from an lmer() mixed-effects analysis;
being able to interpret the fixed-effects estimates;
being able to interpret the random effects estimates, variance, covariance;
being able to describe in words and summary tables the results of a mixed-effects model;
being able to visualise the effects estimates from a mixed-effects model.

The practical exercises

Now you will progress through a series of tasks, and challenges, to aid your learning.

Warning

We will work with the data file:

subjects.behaviour.words-310114.csv

We again split the steps into into parts, tasks and questions.

We are going to work through the following workflow steps: each step is labelled as a practical part.

Set-up
Load the data
Tidy the data
Analyze data with lm
Analyze data with lmer
Compare models with different random effects
p-values and significance using lmerTest
Run your own sequence of lmer models
Optional: reproduce the plots in the chapter and slides

In the following, we will guide you through the tasks and questions step by step.

Important

An answers version of the workbook will be provided after the practical class.

Practical Part 1: Set-up

To begin, we set up our environment in R.

Practical Task 1 – Run code to load relevant libraries

Use the library() function to make the functions we need available to you.

Code

library(broom)
library(gridExtra)
library(here)

here() starts at /Users/robdavies/Dropbox/teaching-statistics-year-MSc

library(lme4)

Loading required package: Matrix

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::combine() masks gridExtra::combine()
✖ tidyr::expand()  masks Matrix::expand()
✖ dplyr::filter()  masks stats::filter()
✖ dplyr::lag()     masks stats::lag()
✖ tidyr::pack()    masks Matrix::pack()
✖ tidyr::unpack()  masks Matrix::unpack()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Practical Part 2: Load the data

Practical Task 2 – Read in the data file we will be using

Read the data files into R:

Code

ML.all <- read_csv("subjects.behaviour.words-310114.csv", na = "-999")

Rows: 5440 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): subjectID, Test, Gender, Subject, item_name, item_type
dbl (18): item_number, Age, Years_in_education, TOWRE_wordacc, TOWRE_nonword...
num  (1): SUBTLWF

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The ML study data includes the following variables that we will work with (as well as some you can ignore):

Identifying variables

subjectID – identifying code for participants
item_name – words presented as stimuli
item_number – identifying code for words presented

Response variables

RT – response reaction time (ms), for responses to words, i.e., word recognition time

Participant attribute variables

Age – in years
Gender – coded M (male), F (female)
TOWRE_wordacc – word reading skill, words read correctly (out of 104)
TOWRE_nonwordacc – nonword reading skill, nonwords (made up words) read correctly (out of 63)
ART_HRminusFR – reading experience score

Stimulus property variables

Length – word length, in letters
Ortho_N – orthographic neighbourhood size, how many other words in English a stimulus word looks like
OLD – orthographic Levenshtein distance, how many letter edits (addition, deletion or substitution) it would take to make a stimulus word look like another English word (a measure of orthographic neighbourhood) (Yarkoni et al., 2008)
BG_Sum, BG_Mean, BG_Freq_By_Pos – measures of how common are pairs of letters that compose stimulus words
SUBTLWF, LgSUBTLWF, SUBTLCD, LgSUBTLCD – measures of how common stimulus words are, taken from the SUBTLEX corpus analysis of word frequency (Brysbaert & New, 2009)

Practical Part 3: Tidy the data

Tidying the data involves a number of tasks, some essential and some things we do for our convenience.

We are going to first filter the observations, then transform the outcome variable.

We do this work and use data visualization to examine the impacts of the actions.

Tip

It is always a good idea to first inspect what you have got when you read a data file into R before you do anything more demanding.

You cannot assume that the data are what you think they are
or that the data are structured or coded in the ways that you think (or have been told) they should be structured or coded.

Practical Task 3 – Produce a density plot showing word recognition reaction time, for both correct and incorrect responses

We should examine the distribution of the outcome variable, lexical decision response reaction time (RT in ms). Observations about variable value distributions are a part of Exploratory Data Analysis and serve to catch errors in the data-set (e.g. incorrectly recorded scores) but also to inform the researcher’s understanding of their own data.

Hint

We shall examine the distribution of the outcome variable, lexical decision response reaction time (RT in ms), using density plots. An alternative method would be to use histograms.

A density plot shows a curve. You can say that the density corresponds to the height of the curve for a given value of the variable being depicted, and that it is related to the probability of observing values of the variable within some range of values (Howell, 2016).

Code

ML.all %>%
  ggplot(aes(x = RT)) +
  geom_density(size=1.5) +
  geom_rug(alpha = .2) +
  ggtitle("Raw RT") +
  theme_bw()

Density plot showing word recognition reaction time, correct and incorrect responses. The figure shows a black line, a curve peaking over RT = about 500 but ranging between -2000 and +2000. Under the curve, a series of black tick marks are shown under the x-axis border. — Figure 1: Density plot showing word recognition reaction time, correct and incorrect responses

The code to produce Figure 1 works in a series of steps.

ML.all %>% takes the data-set, from the ML study, that we have read in to the R workspace and pipes it to the visualization code, next.
ggplot(aes(x = RT)) + creates a plot object in which the x-axis variable is specified as RT. The values of this variable will be mapped to geometric objects, i.e. plot features, that you can see, next.
geom_density(size=1.5) + first displays the distribution of values in the variable RT as a density curve. The argument size=1.5 tells R to make the line $1.5 \times$ the thickness of the line used by default to show variation in density.

Some further information is added to the plot, next.

geom_rug(alpha = .2) + with a command that tells R to add a rug plot below the density curve.
ggtitle("Raw RT") makes a plot title.

Notice that beneath the curve of the density plot, you can see a series of vertical lines. Each line represents the x-axis location of an RT observation in the ML study data set. This rug plot represents the distribution of RT observations in one dimension.

geom_rug() draws a vertical line at each location on the x-axis that we observe a value of the variable, RT, named in aes(x = RT).
geom_rug(alpha = .2) reduces the opacity of each line, using alpha, to ensure the reader can see how the RT observations are denser in some places than others.

Practical Task 4 – You should try out alternative visualisation methods to reveal the patterns in the distribution of variables in the ML data-set (or in your own data).

Take a look at the geoms documented in the {ggplot2} library reference section here.
Experiment with code to answer the following questions:

Would a histogram or a frequency polygon provide a more informative view? Take a look here for advice.
What about a dotplot? Take a look here for advice

Practical Task 5 – Filter out incorrect and outlier short RT observations

The code example, shown above, delivers a plot (Figure 1) showing three peaks in the distribution of RT values. You can see that there is a peak of RT observations around 500-1000ms, another smaller peak around -500ms, and a third smaller peak around -2000ms.

The density plot shows the reaction times recorded for participants’ button press ‘yes’ responses to word stimuli in the lexical decision task. The peaks of negative RTs represent observations that are impossible.

How do we deal with the impossible RT values?

Hint

The density plot shows us that the raw ML lexical decision RT variable includes negative RT values corresponding to incorrect response. These have to be removed.

We can do this quite efficiently by creating a subset of the original “raw” data, defined according to the RT variable using the {dpyr} library filter() function.

Code

ML.all.correct <- filter(ML.all, RT >= 200)

The filter code is written to subset the data by rows using a condition on the values of the RT variable.

ML.all.correct <- filter(ML.all, RT >= 200) works as follows.

ML.all.correct <- filter(ML.all ...) creates a new data-set with a new name ML.all.correct from the old data-set ML.all using the filter() function.
filter(... RT >= 200) specifies an argument for the filter() function.

In effect, we are asking R to check every value in the RT column.

R will do a check through the ML.all data-set, row by row.
If a row includes an RT that is greater than or equal to 200 then that row will be included in the new data-set ML.all.correct. This is what I mean by using a condition.
But if a row includes an RT that is less than 200, then that row will not be included. We express this condition as RT >= 200.

Practical Task 6 – Check out the impact of filtering on the number of rows in the data-set

How can we check that the filter operation worked in the way we should expect it to?

After we have removed negative (error) RTs, we can check that the size of the data-set – here, the number of rows – matches our expectations.

How do we do that?

Hint

The length() function will count the elements in whatever object is specified as an argument in the function call.

This means that if you put a variable name into the function as in length(data$variable) it will count how long that variable is – how many rows there are in the column.
If that variable happens to be, as here, part of a data-set, the same calculation will tell you how many rows there are in the data-set as a whole.
If you just enter length(data), naming some data-set, then the function will return a count of the number of columns in the data-set.

Code

length(ML.all$RT)

[1] 5440

length(ML.all.correct$RT)

[1] 5257

If you run the length() function calls then you should see that the length or number of observations or rows in the ML.all.correct data-set should be smaller than the number of observations in the ML.all data-set.

Tip

It is wise to check that the operations you perform to tidy, process or wrangle data actually do do what you mean them to do. Checks can be performed, for each processing stage, by:

Forming expectations or predictions about what the operation is supposed to do e.g. filter out some rows by some number;
Check what you get against these predictions e.g. count the number of rows before versus after filtering.

Practical Task 7 – Produce a density plot showing word recognition reaction time, for correct responses only

Code

ML.all.correct %>%
  ggplot(aes(x = RT)) +
  geom_density(size=1.5) + 
  geom_rug(alpha = .2) +
  ggtitle("Correct RTs") +
  theme_bw()

Practical Task 8 – Vary the filter conditions in different ways:

Change the threshold for including RTs from RT >= 200 to something else: you can change the number, or you can change the operator from >= to a different comparison (try =, <, <=, >.
Can you assess what impact the change has?

Practical Task 9 – Transform RT to log base 10 RT

Figure 2 shows that we have successfully removed all errors (negative RTs) but now we see just how skewed the RT distribution is. Note the long tail of longer RTs.

Generally, we assume that departures from a model’s predictions about our observations (the linear model residuals) are normally distributed, and we often assume that the relationship between outcome and predictor variables is linear (Cohen et al., 2003). We can ensure that our data are compliant with both assumptions by transforming the RT distribution.

What transformation should we use?

Hint

Psychology researchers often take the log (often the log base 10) of RT values before performing an analysis. Transforming RTs to the log base 10 of RT values has the effect of correcting the skew – bringing the larger RTs ‘closer’ (e.g., $1000 = 3$ in log10) to those near the middle which do not change as much (e.g. $500 = 2.7$ in log10).

Code

ML.all.correct$logrt <- log10(ML.all.correct$RT)

The log10() function works as follows:-

ML.all.correct$logrt <- log10(...) creates a a new variable logrt, adding it to the ML.all.correct data-set. The variable is created using the transformation function log10().
log10(ML.all.correct$RT) creates a the new variable by transforming (to log10) the values of the old variable, RT.

Practical Task 10 – Produce a density plot showing log10 transformed reaction time, correct responses only

We can see the effect of the transformation if we plot the log10 transformed RTs (see Figure 3).

Do we arrive at a distribution that more closely approximates the normal distribution?

Code

ML.all.correct %>%
  ggplot(aes(x = logrt)) +
  geom_density(size = 1.5) + 
  geom_rug(alpha = .2) +
  ggtitle("Correct log10 RTs") +
  theme_bw()

Density plot showing word recognition reaction time, correct and incorrect responses. The figure shows a black line, a curve peaking over logrt = about 2.8 ranging between 2.4 and 3.5. Under the curve, a series of black tick marks are shown under the x-axis border. The skew towards high values is now considerably reduced. — Figure 3: Density plot showing log10 transformed reaction time, correct responses only

Tip

There are other log transformation functions and we often see researchers using the natural log instead of the log base 10 as discussed here

Practical Task 11 – Produce a density plot showing log10 transformed reaction time, correct responses, separately for each participant

ML asked all participants in a sample of people to read a selection of words, a sample of words from the language.

For each participant, we will have multiple observations and these observations will not be independent. One participant will tend to be slower or less accurate compared to another.

What does this look like?

Code

ML.all.correct %>%
  group_by(subjectID) %>%
  mutate(mean_logrt = mean(logrt, na.rm = TRUE)) %>%
  ungroup() %>%
  mutate(subjectID = fct_reorder(subjectID, mean_logrt)) %>%
  ggplot(aes(x = logrt)) +
  geom_density(size = 1.25) +
  facet_wrap(~ subjectID) +
  geom_vline(xintercept = 2.778807, colour = "red", linetype = 2) +
  scale_x_continuous(breaks = c(2.5,3)) +
  ggtitle("Plot showing distribution of logRT for each participant; red line shows mean log10 RT") +
  theme_bw()

The figure presents a grid of density plots showing log10 transformed reaction time, correct responses, separately for each participant. A dashed red line is drawn through all plots, indicating the mean RT. The curves vary in both the location of the peak and in the shape of the curve. — Figure 4: Density plot showing log10 transformed reaction time, correct responses, separately for each participant

Practical Task 12 – Can you work out how to adapt the plotting code to show a grid of histograms?

Practical Task 13 – Can you work out how to adapt the code to show a grid of plots indicating the distribution of log RT by different items (instead of participants)?

For each stimulus word, there are multiple observations and these observations will not be independent. One stimulus may prove to be more challenging to all participants compared to another, eliciting slower or less accurate responses on average.

What does this look like?

Practical Part 4: Analyze data with lm

Practical Task 14 – Fit a linear model with log10 RT as the outcome and word frequency (`LgSUBTLCD`) as the predictor

We model the effects of interest, using all the data (hence, complete pooling) but ignoring the differences between participants. This means we can see something of the ‘true’ picture of our data through the linear model results though we should be aware that the linear model will miss important information, which the mixed-effects model will include, that would improve its performance.

Hint

The linear model is fit in R using the lm() function.

Code

ML.all.correct.lm  <- lm(logrt ~
                             
                             LgSUBTLCD,     
                           
                           data = ML.all.correct)

summary(ML.all.correct.lm)


Call:
lm(formula = logrt ~ LgSUBTLCD, data = ML.all.correct)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.41677 -0.07083 -0.01163  0.05489  0.53411 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.885383   0.007117  405.41   <2e-16 ***
LgSUBTLCD   -0.033850   0.002209  -15.32   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1095 on 5255 degrees of freedom
Multiple R-squared:  0.04277,   Adjusted R-squared:  0.04259 
F-statistic: 234.8 on 1 and 5255 DF,  p-value: < 2.2e-16

Vary the linear model using different outcomes or predictors.

The ML study data, like the CP study data, are rich with possibility. It would be useful to experiment with data like this because this will prepare you for similar analysis targets in future.
Change the predictor from frequency to something else: what do you see when you visualize the relationship between outcome and predictor variables using scatterplots?

Tip

Specify linear models with different predictors: do the relationships you see in plots match the coefficients you see in the model estimates?

You should both estimate the effects of variables and visualize the relationships between variables using scatterplots.
If you combine reflection on the model estimates with evaluation of what the plots show you then you will be able to see how reading model results and reading plots can reveal the correspondences between the two ways of looking at your data.

Practical Part 5: Analyze data with lmer

Practical Task 15 – Fit a linear mixed-effects model to estimate the effect of frequency on log RT while taking into account random effects

You should specify, here, a model of the frequency effect that takes into account:

the random effect of participants on intercepts;
the random effect of participants on the slopes of the frequency (LgSUBTLCD) effect;
as well as the random effect of items on intercepts.

You can specify the random effects of participants allowing for covariance between the random intercepts and the random slopes.

Code

ML.all.correct.lmer  <- lmer(logrt ~

                           LgSUBTLCD +

                           (LgSUBTLCD + 1|subjectID) +

                           (1|item_name),

                         data = ML.all.correct)

summary(ML.all.correct.lmer)

Linear mixed model fit by REML ['lmerMod']
Formula: logrt ~ LgSUBTLCD + (LgSUBTLCD + 1 | subjectID) + (1 | item_name)
   Data: ML.all.correct

REML criterion at convergence: -9868.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.6307 -0.6324 -0.1483  0.4340  5.6132 

Random effects:
 Groups    Name        Variance  Std.Dev. Corr 
 item_name (Intercept) 0.0003268 0.01808       
 subjectID (Intercept) 0.0054212 0.07363       
           LgSUBTLCD   0.0002005 0.01416  -0.63
 Residual              0.0084333 0.09183       
Number of obs: 5257, groups:  item_name, 160; subjectID, 34

Fixed effects:
             Estimate Std. Error t value
(Intercept)  2.887997   0.015479 186.577
LgSUBTLCD   -0.034471   0.003693  -9.333

Correlation of Fixed Effects:
          (Intr)
LgSUBTLCD -0.764

As will now be getting familiar, the code works as follows:

ML.all.correct.lmer <- lmer(...) creates a linear mixed-effects model object using the lmer() function.
logrt ~ LgSUBTLCD the fixed effect in the model is expressed as a formula in which the outcome or dependent variable logrt is predicted ~ by the independent or predictor variable LgSUBTLCD word frequency.

The random effects part of the model is then specified as follows.

We first have the random effects associated with random differences between participants:

(...|subjectID) adds random effects corresponding to random differences between sample groups (participants subjects) coded by the subjectID variable.
(...1 |subjectID) including random differences between sample groups (subjectID) in intercepts coded 1.
(LgSUBTLCD... |subjectID) and random differences between sample groups (subjectID) in the slopes of the frequency effect coded by using theLgSUBTLCD variable name.

Then, we have the random effects associated with random differences between stimuli:

(1|item_name) adds a random effect to account for random differences between sample groups (item_name) in intercepts coded 1.

...(..., data = ML.all.correct) specifies the data-set in which you can find the variables named in the model fitting code.
Lastly, we can then specify summary(ML.all.correct.lmer) to get a summary of the fitted model.

Can you explain the results shown in the model summary?

You can be guided by the in-depth explanation of how. to read the analysis results in the chapter on developing linear mixed-effects models.

Practical Part 6: Compare models with different random effects

Practical Task 16 – Fit a linear mixed-effects model to estimate the effect of frequency on log RT while taking into account different sets of random effects

You can specify models where the fixed effects part is the same (e.g., you assume $RT $ word frequency) but, for various reasons, you may suppose different random effects structures.

In this scenario — the model fixed effects part is the same but the random effects are different — you may be required to evaluate which model structure fits the data better, to the extent that that can be assessed, given your sample data.

To begin, fit a model with just the fixed effects of intercept and frequency, and the random effects of participants or items on intercepts only.

We exclude the (LgSUBTLCD + ...|subjectID) specification for the random effect of participants on the slope of the frequency LgSUBTLCD effect.

We use REML fitting, if we want to compare (as we will) models with the same fixed effects but different random effects.

Code

ML.all.correct.lmer.REML.si  <- lmer(logrt ~ LgSUBTLCD + 
                                    
                                          (1|subjectID) + (1|item_name),

       data = ML.all.correct, REML = TRUE)

summary(ML.all.correct.lmer.REML.si)

Linear mixed model fit by REML ['lmerMod']
Formula: logrt ~ LgSUBTLCD + (1 | subjectID) + (1 | item_name)
   Data: ML.all.correct

REML criterion at convergence: -9845.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5339 -0.6375 -0.1567  0.4364  5.5851 

Random effects:
 Groups    Name        Variance  Std.Dev.
 item_name (Intercept) 0.0003204 0.01790 
 subjectID (Intercept) 0.0032650 0.05714 
 Residual              0.0085285 0.09235 
Number of obs: 5257, groups:  item_name, 160; subjectID, 34

Fixed effects:
             Estimate Std. Error t value
(Intercept)  2.887697   0.013253   217.9
LgSUBTLCD   -0.034390   0.002774   -12.4

Correlation of Fixed Effects:
          (Intr)
LgSUBTLCD -0.658

If you look at the code chunk, you can see that:

REML = TRUE is the only change to the code: it specifies the change in model fitting method;
also, I changed the model name to ML.all.correct.lmer.REML.si to be able to distinguish the maximum likelihood from the restricted maximum likelihood model.

Following Baayen et al. (2008a), we can then run a series of models with just one random effect.

Next, fit a model with just the random effect of items on intercepts.

Code

ML.all.correct.lmer.REML.i  <- lmer(logrt ~

       LgSUBTLCD + (1|item_name),

       data = ML.all.correct, REML = TRUE)

summary(ML.all.correct.lmer.REML.i)

Linear mixed model fit by REML ['lmerMod']
Formula: logrt ~ LgSUBTLCD + (1 | item_name)
   Data: ML.all.correct

REML criterion at convergence: -8337

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.7324 -0.6455 -0.1053  0.4944  4.8970 

Random effects:
 Groups    Name        Variance  Std.Dev.
 item_name (Intercept) 0.0002364 0.01537 
 Residual              0.0117640 0.10846 
Number of obs: 5257, groups:  item_name, 160

Fixed effects:
             Estimate Std. Error t value
(Intercept)  2.886765   0.009047  319.07
LgSUBTLCD   -0.034206   0.002811  -12.17

Correlation of Fixed Effects:
          (Intr)
LgSUBTLCD -0.977

Next, fit a model with just the random effect of participants on intercepts.

Code

ML.all.correct.lmer.REML.s  <- lmer(logrt ~

       LgSUBTLCD + (1|subjectID),

       data = ML.all.correct, REML = TRUE)

summary(ML.all.correct.lmer.REML.s)

Linear mixed model fit by REML ['lmerMod']
Formula: logrt ~ LgSUBTLCD + (1 | subjectID)
   Data: ML.all.correct

REML criterion at convergence: -9786.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5843 -0.6443 -0.1589  0.4434  5.5266 

Random effects:
 Groups    Name        Variance Std.Dev.
 subjectID (Intercept) 0.003275 0.05723 
 Residual              0.008837 0.09401 
Number of obs: 5257, groups:  subjectID, 34

Fixed effects:
             Estimate Std. Error t value
(Intercept)  2.885751   0.011561  249.60
LgSUBTLCD   -0.033888   0.001897  -17.87

Correlation of Fixed Effects:
          (Intr)
LgSUBTLCD -0.517

Tip

Notice how each different model has a distinct name.

We will use these different names to compare the models, next.

Practical Task 17 – Compare the different models using `anova()`

If we now run Likelihood Ratio Test comparisons of these models, we are effectively examining if one of the random effects can be dispensed with: if its inclusion makes no difference to the likelihood of the model then it is not needed.

Is the random effect of subjects on intercepts justified?

Hint

Compare models, first, with ML.all.correct.lmer.REML.si versus without ML.all.correct.lmer.REML.i the random effect of subjects on intercepts.

Code

anova(ML.all.correct.lmer.REML.si, ML.all.correct.lmer.REML.i, refit = FALSE)

Data: ML.all.correct
Models:
ML.all.correct.lmer.REML.i: logrt ~ LgSUBTLCD + (1 | item_name)
ML.all.correct.lmer.REML.si: logrt ~ LgSUBTLCD + (1 | subjectID) + (1 | item_name)
                            npar     AIC     BIC logLik -2*log(L)  Chisq Df
ML.all.correct.lmer.REML.i     4 -8329.0 -8302.7 4168.5   -8337.0          
ML.all.correct.lmer.REML.si    5 -9835.1 -9802.3 4922.6   -9845.1 1508.1  1
                            Pr(>Chisq)    
ML.all.correct.lmer.REML.i                
ML.all.correct.lmer.REML.si  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Is the random effect of items on intercepts justified?

Hint

Compare models with ML.all.correct.lmer.REML.si versus without ML.all.correct.lmer.REML.s the random effect of items on intercepts.

Code

anova(ML.all.correct.lmer.REML.si, ML.all.correct.lmer.REML.s, refit = FALSE)

Data: ML.all.correct
Models:
ML.all.correct.lmer.REML.s: logrt ~ LgSUBTLCD + (1 | subjectID)
ML.all.correct.lmer.REML.si: logrt ~ LgSUBTLCD + (1 | subjectID) + (1 | item_name)
                            npar     AIC     BIC logLik -2*log(L)  Chisq Df
ML.all.correct.lmer.REML.s     4 -9778.3 -9752.0 4893.2   -9786.3          
ML.all.correct.lmer.REML.si    5 -9835.1 -9802.3 4922.6   -9845.1 58.825  1
                            Pr(>Chisq)    
ML.all.correct.lmer.REML.s                
ML.all.correct.lmer.REML.si  1.723e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We compare models using the anova() function.

anova() does the model comparison, for the models named in the list in brackets.

You can do this for the foregoing series of models but notice that in the code we specify:

refit = FALSE

What happens if we do not add that bit?

What you will see if you run the anova() function call, without therefit = FALSE argument – try it – is that you will then get the warning refitting model(s) with ML (instead of REML).

Why? The immediate reason for this warning is that we have to specify refit = FALSE because otherwise R will compare ML fitted models. The refitting occurs by default.

What is the reason for the imposition of this default?

Pinheiro & Bates (2000) advise that if one is fitting models with random effects the estimates are more accurate if the models are fitted using Restricted Maximum Likelihood (REML). That is achieved in the lmer() function call by adding the argument REML=TRUE. Pinheiro & Bates (2000) further recommend (see, e.g., pp.82-) that if you compare models:

with the same fixed effects
but with varying random effects
then the models should be fitted using Restricted Maximum Likelihood.

Read a discussion of the issues, with references, in chapter on developing linear mixed-effects models. See that chapter, also, for an in-depth account of what the ’anova()` results tell you.

Practical Task 18 – Now examine whether random slopes are required

Is it justified to add a random effect of participants on the slopes of the frequency effect?

How should you address this question?

First, you will need to fit a model incorporating the random effect of participants on the slopes of the frequency effect.

Hint

You will want to make sure that this random slopes model is comparable to the next-most complex model: ML.all.correct.lmer.REML.si.

Code

ML.all.correct.lmer.REML.slopes  <- lmer(logrt ~ LgSUBTLCD + 
                                           
                                          (LgSUBTLCD + 1|subjectID) +
                                          (1|item_name),

       data = ML.all.correct, REML = TRUE)

Looking at the code:

With (LgSUBTLCD + 1 |subjectID) we specify a random effect of subjects on intercepts and on the slope of the frequency effects.
We do not specify – it happens by default – the estimation of the covariance of random differences among subjects in intercepts and random differences among subjects in the slope of the frequency effect.

Then you will want to compare two models that are the same except for the incorporation of the the random effect of participants on the slopes of the frequency effect.

Code

anova(ML.all.correct.lmer.REML.si, ML.all.correct.lmer.REML.slopes, 
      refit = FALSE)

Data: ML.all.correct
Models:
ML.all.correct.lmer.REML.si: logrt ~ LgSUBTLCD + (1 | subjectID) + (1 | item_name)
ML.all.correct.lmer.REML.slopes: logrt ~ LgSUBTLCD + (LgSUBTLCD + 1 | subjectID) + (1 | item_name)
                                npar     AIC     BIC logLik -2*log(L)  Chisq Df
ML.all.correct.lmer.REML.si        5 -9835.1 -9802.3 4922.6   -9845.1          
ML.all.correct.lmer.REML.slopes    7 -9854.1 -9808.1 4934.0   -9868.1 22.934  2
                                Pr(>Chisq)    
ML.all.correct.lmer.REML.si                   
ML.all.correct.lmer.REML.slopes  1.047e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Practical Part 7: p-values and significance using `lmerTest`

If you look at the fixed effects summary, you can see that we do not get p-values by default. To calculate p-values, we need to count residual degrees of freedom. The authors of the {lme4} library that furnishes the lmer() function do not (as e.g. Baayen et al., 2008b discuss) think that it is sensible to estimate the residual degrees of freedom for a model in terms of the number of observations. This is because the number of observations concerns one level of a multilevel data-set that might be structured with respect to some number of subjects, some number of items. This means that one cannot then accurately calculate p-values to go with the t-tests on the coefficients estimates; therefore they do not.

Practical Task 19 – Get p-values using lmerTest

We can run mixed-effects models with p-values from significance tests on the estimates of the fixed effects coefficients using the library(lmerTest).

Code

library(lmerTest)

ML.all.correct.lmer.REML.slopes  <- lmer(logrt ~ LgSUBTLCD + 
                                           
                                            (LgSUBTLCD + 1|subjectID) +
                                            (1|item_name),

                                          data = ML.all.correct, REML = TRUE)

summary(ML.all.correct.lmer.REML.slopes)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logrt ~ LgSUBTLCD + (LgSUBTLCD + 1 | subjectID) + (1 | item_name)
   Data: ML.all.correct

REML criterion at convergence: -9868.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.6307 -0.6324 -0.1483  0.4340  5.6132 

Random effects:
 Groups    Name        Variance  Std.Dev. Corr 
 item_name (Intercept) 0.0003268 0.01808       
 subjectID (Intercept) 0.0054212 0.07363       
           LgSUBTLCD   0.0002005 0.01416  -0.63
 Residual              0.0084333 0.09183       
Number of obs: 5257, groups:  item_name, 160; subjectID, 34

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)  2.887997   0.015479 47.782840 186.577  < 2e-16 ***
LgSUBTLCD   -0.034471   0.003693 60.338785  -9.333 2.59e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
          (Intr)
LgSUBTLCD -0.764

Basically, the call to access the lmerTest library ensures that when we run the lmer() function we get a calculation of an approximation to the denominator degrees of freedom that enables the calculation of the p-value for the t-test for the fixed effects coefficient.

Tip

An alternative approach to doing significance tests of hypothesized effects, is to compare models with versus without the effect of interest.

Practical Part 8: Run your own sequence of lmer models

Practical Task 20 – Pick your own predictors and compare the impact of specifying different sets of random effects

It will be useful for you to examine model comparisons with a different set of models for the same data.

You could try to run a series of models in which the fixed effects variable is something different, for example, the effect of word Length or the effect of orthographic neighbourhood size Ortho_N.

Run models with these variables or pick your own set of predictor variables to get estimates of fixed effects.

In this series of models, specify a set of models where:

Each model has the same fixed effect e.g. logrt ~ Length;
But you vary the random effects components so that, at the end, you can assess what random effects are justified by better model fit to data.

In the following, I set out another series of models as an example of what you can do.

Random effects of participants on intercepts and slopes, random effect of items on intercepts

Code

ML.all.correct.lmer.REML.slopes  <- lmer(logrt ~ Length + 
                                           
                                           (Length + 1|subjectID) +
                                           (1|item_name),
                                         
                                         data = ML.all.correct, REML = TRUE)

summary(ML.all.correct.lmer.REML.slopes)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logrt ~ Length + (Length + 1 | subjectID) + (1 | item_name)
   Data: ML.all.correct

REML criterion at convergence: -9749.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.4052 -0.6300 -0.1563  0.4364  5.5167 

Random effects:
 Groups    Name        Variance  Std.Dev. Corr 
 item_name (Intercept) 9.043e-04 0.030071      
 subjectID (Intercept) 3.055e-03 0.055276      
           Length      9.039e-05 0.009507 -0.33
 Residual              8.474e-03 0.092056      
Number of obs: 5257, groups:  item_name, 160; subjectID, 34

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept) 2.764e+00  1.870e-02 1.110e+02 147.781   <2e-16 ***
Length      3.740e-03  4.038e-03 1.200e+02   0.926    0.356    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
       (Intr)
Length -0.844

Random effect of participants on intercepts, random effect of items on intercepts

Code

ML.all.correct.lmer.REML.si  <- lmer(logrt ~ Length + 
                                       
                                       (1|subjectID) + 
                                       (1|item_name),
                                     
                                     data = ML.all.correct, REML = TRUE)

summary(ML.all.correct.lmer.REML.si)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logrt ~ Length + (1 | subjectID) + (1 | item_name)
   Data: ML.all.correct

REML criterion at convergence: -9738.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5428 -0.6289 -0.1535  0.4357  5.5149 

Random effects:
 Groups    Name        Variance Std.Dev.
 item_name (Intercept) 0.000901 0.03002 
 subjectID (Intercept) 0.003259 0.05709 
 Residual              0.008523 0.09232 
Number of obs: 5257, groups:  item_name, 160; subjectID, 34

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept) 2.764e+00  1.885e-02 1.725e+02 146.625   <2e-16 ***
Length      3.723e-03  3.691e-03 1.513e+02   1.009    0.315    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
       (Intr)
Length -0.843

Random effect of items on intercepts

Code

ML.all.correct.lmer.REML.i  <- lmer(logrt ~ Length + 
                                       
                                       (1|item_name),
                                     
                                     data = ML.all.correct, REML = TRUE)

summary(ML.all.correct.lmer.REML.i)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logrt ~ Length + (1 | item_name)
   Data: ML.all.correct

REML criterion at convergence: -8233.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.4940 -0.6411 -0.1042  0.4930  4.8136 

Random effects:
 Groups    Name        Variance  Std.Dev.
 item_name (Intercept) 0.0008171 0.02858 
 Residual              0.0117545 0.10842 
Number of obs: 5257, groups:  item_name, 160

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept) 2.765e+00  1.622e-02 1.489e+02 170.471   <2e-16 ***
Length      3.487e-03  3.716e-03 1.487e+02   0.938     0.35    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
       (Intr)
Length -0.986

Random effect of participants on intercepts

Code

ML.all.correct.lmer.REML.s  <- lmer(logrt ~ Length + 
                                      
                                      (1|subjectID),
                                    
                                    data = ML.all.correct, REML = TRUE)

summary(ML.all.correct.lmer.REML.s)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logrt ~ Length + (1 | subjectID)
   Data: ML.all.correct

REML criterion at convergence: -9481.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.6945 -0.6475 -0.1608  0.4572  5.2837 

Random effects:
 Groups    Name        Variance Std.Dev.
 subjectID (Intercept) 0.003272 0.05720 
 Residual              0.009367 0.09678 
Number of obs: 5257, groups:  subjectID, 34

Fixed effects:
             Estimate Std. Error        df t value Pr(>|t|)    
(Intercept) 2.760e+00  1.266e-02 8.792e+01 218.088   <2e-16 ***
Length      4.346e-03  1.832e-03 5.222e+03   2.372   0.0177 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
       (Intr)
Length -0.623

Model comparisons

I would compare the models in the sequence, shown in the foregoing, one pair of models at a time, to keep it simple.

When you look at the model comparison, ask: is the difference between the models a piece of complexity (an effect) whose inclusion in the more complex model is justified or warranted by improved model fit to data?

Code

If we run model comparisons, we can see that random effects of subjects or of items on intercepts, as well as random effect of subjects on slopes, are justified by significantly improved model fit to data.

anova(ML.all.correct.lmer.REML.si, ML.all.correct.lmer.REML.s, refit=FALSE)

Data: ML.all.correct
Models:
ML.all.correct.lmer.REML.s: logrt ~ Length + (1 | subjectID)
ML.all.correct.lmer.REML.si: logrt ~ Length + (1 | subjectID) + (1 | item_name)
                            npar     AIC     BIC logLik -2*log(L)  Chisq Df
ML.all.correct.lmer.REML.s     4 -9473.9 -9447.7 4741.0   -9481.9          
ML.all.correct.lmer.REML.si    5 -9728.9 -9696.1 4869.4   -9738.9 256.97  1
                            Pr(>Chisq)    
ML.all.correct.lmer.REML.s                
ML.all.correct.lmer.REML.si  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(ML.all.correct.lmer.REML.si, ML.all.correct.lmer.REML.i, refit=FALSE)

Data: ML.all.correct
Models:
ML.all.correct.lmer.REML.i: logrt ~ Length + (1 | item_name)
ML.all.correct.lmer.REML.si: logrt ~ Length + (1 | subjectID) + (1 | item_name)
                            npar     AIC     BIC logLik -2*log(L)  Chisq Df
ML.all.correct.lmer.REML.i     4 -8225.5 -8199.2 4116.8   -8233.5          
ML.all.correct.lmer.REML.si    5 -9728.9 -9696.1 4869.4   -9738.9 1505.4  1
                            Pr(>Chisq)    
ML.all.correct.lmer.REML.i                
ML.all.correct.lmer.REML.si  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(ML.all.correct.lmer.REML.si, ML.all.correct.lmer.REML.slopes, refit=FALSE)

Data: ML.all.correct
Models:
ML.all.correct.lmer.REML.si: logrt ~ Length + (1 | subjectID) + (1 | item_name)
ML.all.correct.lmer.REML.slopes: logrt ~ Length + (Length + 1 | subjectID) + (1 | item_name)
                                npar     AIC     BIC logLik -2*log(L) Chisq Df
ML.all.correct.lmer.REML.si        5 -9728.9 -9696.1 4869.4   -9738.9         
ML.all.correct.lmer.REML.slopes    7 -9735.5 -9689.5 4874.8   -9749.5 10.61  2
                                Pr(>Chisq)   
ML.all.correct.lmer.REML.si                  
ML.all.correct.lmer.REML.slopes   0.004965 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Practical Part 9: Optional – reproduce the plots in the chapter and slides

Reproduce the plots in the chapter and slides

This part is optional for you to view and work with.

Run the code, and consider the code steps only if you are interested in how the materials for the book chapter were created.

Practical Task 21 – Compare no versus complete versus partial pooling estimates

In the chapter on developing linear mixed-effects models I talk about pooling, no pooling and partial pooling, and the impact on estimates.

To produce the plots I discuss there, I adapted Tristan Mahr’s code, here:

https://www.tjmahr.com/plotting-partial-pooling-in-mixed-effects-models/

We work through a sequence of fitting models, getting estimates, and producing plots.

Get no pooling estimates by running lm() for each participant.

Code

df_no_pooling <- lmList(logrt ~
                          
                          LgSUBTLCD | subjectID, ML.all.correct) %>% 
  coef() %>% 
  # Subject IDs are stored as row-names. Make them an explicit column
  rownames_to_column("subjectID") %>% 
  rename(Intercept = `(Intercept)`, Slope_frequency = LgSUBTLCD) %>% 
  add_column(Model = "No pooling")

# -- you can uncomment and inspect the tibble produced by these lines of code
# summary(df_no_pooling)

Make the subjectID and Model label vectors a factor.

Code

df_no_pooling$subjectID <- as.factor(df_no_pooling$subjectID)
df_no_pooling$Model <- as.factor(df_no_pooling$Model)

In contrast, we might consider a complete pooling model where all the information from the participants is combined together.

Fit a single mdoel for all the data pooled together.

Code

m_pooled <- lm(logrt ~ LgSUBTLCD, ML.all.correct)

Repeat the intercept and slope terms for each participant.

Code

df_pooled <- data_frame(
  Model = "Complete pooling",
  subjectID = unique(ML.all.correct$subjectID),
  Intercept = coef(m_pooled)[1], 
  Slope_frequency = coef(m_pooled)[2])

Warning: `data_frame()` was deprecated in tibble 1.1.0.
ℹ Please use `tibble()` instead.

# -- inspect
# summary(df_pooled)

Make the model label vector a factor.

Code

df_pooled$Model <- as.factor(df_pooled$Model)

We can compare these two approaches: instead of calculating the regression lines with stat_smooth(), we can use geom_abline() to draw the lines from our dataframe of intercept and slope parameters.

Join the raw data so we can use plot the points and the lines.

Code

df_models <- bind_rows(df_pooled, df_no_pooling) %>% 
  left_join(ML.all.correct, by = "subjectID")

Warning in left_join(., ML.all.correct, by = "subjectID"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 3 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Produce the plot:

Code

p_model_comparison <- ggplot(df_models) + 
  aes(x = LgSUBTLCD, y = logrt) + 
  # Set the color mapping in this layer so the points don't get a color
  geom_point(alpha = .05) +
  geom_abline(aes(intercept = Intercept, 
                  slope = Slope_frequency, 
                  color = Model),
              size = .75) +
  facet_wrap(~ subjectID) +
  scale_x_continuous(breaks = 1.5:4 * 1) + 
  theme_bw() + 
  theme(legend.position = "top")

p_model_comparison

Compare no vs. complete vs. partial pooling estimates

Mixed-effects models represent partial pooling of data.

We fit a separate line for each cluster of data, one for each participant.

the lmList() function in {lme4} automates this process.

Code

df_no_pooling <- lmList(logrt ~
                          
                          LgSUBTLCD | subjectID, ML.all.correct) %>% 
  coef() %>% 
  # Subject IDs are stored as row-names. Make them an explicit column
  rownames_to_column("subjectID") %>% 
  rename(Intercept = `(Intercept)`, Slope_frequency = LgSUBTLCD) %>% 
  add_column(Model = "No pooling")

# -- inspect the tibble produced by these lines of code
# summary(df_no_pooling)

# -- make the subjectID and model label vectors a factor
df_no_pooling$subjectID <- as.factor(df_no_pooling$subjectID)
df_no_pooling$Model <- as.factor(df_no_pooling$Model)

In contrast, we might consider a complete pooling model where all the information from the participants is combined together.

We fit a single model on all the data pooled together.

Code

m_pooled <- lm(logrt ~ LgSUBTLCD, ML.all.correct) 

# -- repeat the intercept and slope terms for each participant
df_pooled <- data_frame(
  Model = "Complete pooling",
  subjectID = unique(ML.all.correct$subjectID),
  Intercept = coef(m_pooled)[1], 
  Slope_frequency = coef(m_pooled)[2])

# -- inspect
# summary(df_pooled)

# -- make the model label vector a factor
df_pooled$Model <- as.factor(df_pooled$Model)

We can fit a mixed-effects model including the fixed effects of the intercept and the frequency effect

as well as the random effects due to differences between participants (subject ID) in intercepts or in the slope of the frequency effect, as well as differences between items (item_name) in intercepts.

Code

ML.all.correct.lmer  <- lmer(logrt ~
                               
                               LgSUBTLCD + (LgSUBTLCD + 1|subjectID) + (1|item_name),     
                             
                             data = ML.all.correct, REML = FALSE)
# summary(ML.all.correct.lmer)

To visualize these estimates, we extract each participant’s intercept and slope using coef().

Code

# -- make a dataframe with the fitted effects
df_partial_pooling <- coef(ML.all.correct.lmer)[["subjectID"]] %>% 
  rownames_to_column("subjectID") %>% 
  as_tibble() %>% 
  rename(Intercept = `(Intercept)`, Slope_frequency = LgSUBTLCD) %>% 
  add_column(Model = "Partial pooling")

# -- inspect the tibble produced by these lines of code
# summary(df_partial_pooling)

# -- make the subjectID and model label vectors both factors
df_partial_pooling$subjectID <- as.factor(df_partial_pooling$subjectID)
df_partial_pooling$Model <- as.factor(df_partial_pooling$Model)

Visualize the comparison by producing a plot using a data-set consisting of of all three sets of model estimates.

Code

df_models <- rbind(df_pooled, df_no_pooling, df_partial_pooling) %>%
  left_join(ML.all.correct, by = "subjectID")

Warning in left_join(., ML.all.correct, by = "subjectID"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 3 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

# -- inspect the differences between the data-sets
# summary(df_pooled)
# summary(df_no_pooling)
# summary(df_partial_pooling)
# summary(ML.all.correct)
# summary(df_models)

# -- we  produce a plot showing the no-pooling, complete-pooling and partial-pooling (mixed effects)
# estimates

p_model_comparison <- ggplot(df_models) +
  aes(x = LgSUBTLCD, y = logrt) +
  # Set the color mapping in this layer so the points don't get a color
  geom_point(alpha = .05) +
  geom_abline(aes(intercept = Intercept,
                  slope = Slope_frequency,
                  color = Model),
              size = .75) +
  facet_wrap(~ subjectID) +
  scale_x_continuous(breaks = 1.5:4 * 1) +
  theme_bw() +
  theme(legend.position = "top")

p_model_comparison

Zoom in on a few participants:

use filter() to select a small number of participants.

Code

df_zoom <- df_models %>% 
  filter(subjectID %in% c("EB5", "JP3", "JL3", "AA1"))

# -- re-generate the plot with this zoomed-in selection data-set

p_model_comparison <- ggplot(df_zoom) + 
  aes(x = LgSUBTLCD, y = logrt) + 
  # Set the color mapping in this layer so the points don't get a color
  geom_point(alpha = .05) +
  geom_abline(aes(intercept = Intercept, 
                  slope = Slope_frequency, 
                  color = Model),
              size = .75) +
  facet_wrap(~ subjectID) +
  scale_x_continuous(breaks = 1.5:4 * 1) + 
  theme_bw() + 
  theme(legend.position = "top")

p_model_comparison

Practical Task 22 – Illustrate shrinkage

The partial pooling model pulls (or shrinks) more extreme estimates towards an overall average. We can visualize this shrinkage effect by plotting a scatterplot of intercept and slope parameters from each model and connecting estimates for the same participant.

We use arrows to connect the different estimates for each participant, different estimates from no-pooling (per-participant) compared to partial-pooling (mixed-effects) models.
The plot shows how more extreme estimates are shrunk towards the global average estimate.

Code

df_fixef <- data_frame(
  Model = "Partial pooling (average)",
  Intercept = fixef(ML.all.correct.lmer)[1],
  Slope_frequency = fixef(ML.all.correct.lmer)[2])

# Complete pooling / fixed effects are center of gravity in the plot
df_gravity <- df_pooled %>% 
  distinct(Model, Intercept, Slope_frequency) %>% 
  rbind(df_fixef)
# df_gravity
#> # A tibble: 2 x 3
#>                       Model Intercept Slope_Days
#>                       <chr>     <dbl>      <dbl>
#> 1          Complete pooling  252.3207   10.32766
#> 2 Partial pooling (average)  252.5426   10.45212

df_pulled <- rbind(df_no_pooling, df_partial_pooling)

ggplot(df_pulled) + 
  aes(x = Intercept, y = Slope_frequency, color = Model) + 
  geom_point(size = 2) + 
  geom_point(data = df_gravity, size = 5) + 
  # Draw an arrow connecting the observations between models
  geom_path(aes(group = subjectID, color = NULL), 
            arrow = arrow(length = unit(.02, "npc"))) + 
  # Use ggrepel to jitter the labels away from the points
  ggrepel::geom_text_repel(
    aes(label = subjectID, color = NULL),
    data = df_no_pooling) +
  # Don't forget 373
  # ggrepel::geom_text_repel(
  #   aes(label = subjectID, color = NULL), 
  #   data = filter(df_partial_pooling, Subject == "373")) + 
  theme(legend.position = "right") + 
  # ggtitle("Pooling of regression parameters") + 
  xlab("Intercept estimate") + 
  ylab("Slope estimate") + 
  scale_color_brewer(palette = "Dark2") +
  theme_bw()

Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

The answers

After the practical class, we will reveal the answers that are currently hidden.

The answers version of the webpage will present my answers for questions, and some extra information where that is helpful.

References

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008b). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008a). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005

Brysbaert, M., & New, B. (2009). Moving beyond kucera and francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for american english. Behavior Research Methods, 41(4), 977–990. https://doi.org/10.3758/BRM.41.4.977

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioural sciences (3rd. edition). Lawrence Erlbaum Associates.

Howell, D. C. (2016). Fundamental statistics for the behavioral sciences. Cengage learning.

Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in s and s-plus (statistics and computing). Springer.

Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond coltheart’s n: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15(5), 971–979. https://doi.org/10.3758/PBR.15.5.971

Week 18 Developing linear mixed-effects models: workbook overview

Targets

Targets

Learning resources

Lectures

Lecture slides

Practical materials: data and R-Studio

Practical materials guide

The practical exercises

Practical Part 1: Set-up

Practical Task 1 – Run code to load relevant libraries

Practical Part 2: Load the data

Practical Task 2 – Read in the data file we will be using

Practical Part 3: Tidy the data

Practical Task 3 – Produce a density plot showing word recognition reaction time, for both correct and incorrect responses

Practical Task 4 – You should try out alternative visualisation methods to reveal the patterns in the distribution of variables in the ML data-set (or in your own data).

Practical Task 5 – Filter out incorrect and outlier short RT observations

Practical Task 6 – Check out the impact of filtering on the number of rows in the data-set

Practical Task 7 – Produce a density plot showing word recognition reaction time, for correct responses only

Practical Task 8 – Vary the filter conditions in different ways:

Practical Task 9 – Transform RT to log base 10 RT

Practical Task 10 – Produce a density plot showing log10 transformed reaction time, correct responses only

Practical Task 11 – Produce a density plot showing log10 transformed reaction time, correct responses, separately for each participant

Practical Task 12 – Can you work out how to adapt the plotting code to show a grid of histograms?

Practical Task 13 – Can you work out how to adapt the code to show a grid of plots indicating the distribution of log RT by different items (instead of participants)?

Practical Part 4: Analyze data with lm

Practical Task 14 – Fit a linear model with log10 RT as the outcome and word frequency (LgSUBTLCD) as the predictor

Practical Part 5: Analyze data with lmer

Practical Task 15 – Fit a linear mixed-effects model to estimate the effect of frequency on log RT while taking into account random effects

Practical Part 6: Compare models with different random effects

Practical Task 16 – Fit a linear mixed-effects model to estimate the effect of frequency on log RT while taking into account different sets of random effects

Practical Task 17 – Compare the different models using anova()

Practical Task 18 – Now examine whether random slopes are required

Practical Part 7: p-values and significance using lmerTest

Practical Task 19 – Get p-values using lmerTest

Practical Part 8: Run your own sequence of lmer models

Practical Task 20 – Pick your own predictors and compare the impact of specifying different sets of random effects

Practical Part 9: Optional – reproduce the plots in the chapter and slides

Practical Task 21 – Compare no versus complete versus partial pooling estimates

Practical Task 22 – Illustrate shrinkage

The answers

References

Practical Task 14 – Fit a linear model with log10 RT as the outcome and word frequency (`LgSUBTLCD`) as the predictor

Practical Task 17 – Compare the different models using `anova()`

Practical Part 7: p-values and significance using `lmerTest`