9. Expanding on binary logistic regression

Amy Atkinson

Lecture

Part 1: This part covers how to conduct binary logistic regression with different types of predictors.

Part 2: This part covers binary logistic regression with multiple predictors.

Part 3: This part covers ordinal logistic regression.

Download all the lecture slides here in both .pptx and .pdf format.

Lab preparation

Before the lab, please watch the following short video. This covers how to run an ordinal logistic regression model in R.

If you want to have a play around with the script yourself, the R markdown script and dataset can be downloaded here. The script also contains an example of how to run a multiple binary logistic regression model in R.

Lab

Today I’ll provide you with two research questions which will require you to complete the statistical tests covered in the lecture. You can work in groups or individually.

You can write your script as a .R or Rmd file. Use the lab preparation video and script, lecture slides, and previous content covered in the statistics modules to help you.

The presentation given at the start of the lab can be downloaded here.

Datasets

The datasets for this lab can be downloaded here.

Research Question 1

You are interested in factors that predict academic achievement in mathematics. Children’s academic achievement can be rated as below expected, at expected or above expected.

The predictors you are interested in are:

  • Number if hours spent revising (continuous)
  • Likes school (Yes/no)
  • Favourite subject (Maths, English or Science)

For the categorical predictors:

-Set the reference category for Likes school as “No” and the reference category for Favourite subject as “Maths”

Tip

See last week’s content for how to set the reference category. Or better yet, revisit your own script to see how you did it last time!

Research Question 2

You work in a nursery. In the nursery, there has been an outbreak of measles. You are interested in factors that predict whether a child in your nursery will have measles (yes/no).

The predictors you are interested in are:

  • Number of hours spent at nursery weekly (continuous)
  • Has siblings (Yes/no)
  • Vaccinated against measles (Yes/no)

For the categorical predictors:

  • Set the reference category for Siblings as “No”
  • Set the reference category for Vaccinated as “No”

For the outcome (measles – yes/no):

  • Set No as 0, and Yes as 1
Tip

See last week’s content for how to set the reference category. Or better yet, revisit your own script to see how you did it last time!

Hints and tips

Your script should aim to answer and interpret the research question above.

Start a new session on the server, load in the required libraries in the following order (car, DescTools, tidyverse, MASS, brant), and then the dataset.

Here are the steps you’ll need to follow:

  1. Prepare our data for analysis
  2. Explore our data
  3. Run the model
  4. Evaluate the model
  5. Evaluate the individual predictors
  6. Predicted probabilities
  7. Assumption checks
  8. Interpret the output

Model script

A model script showing one way of answering the research questions above using R will be available here from 9am on Friday of Week 19.

Feedback on student scripts

There were no student submissions this week.

Independent learning activities

Below are some independent learning activities you can have a go at to help consolidate the content. These are optional, but recommended. Activity 1 is the WBA. Activities 2 and 3 are further activities to help you consolidate the content.

Activity 1: The WBA

The WBA can be accessed here from 6th March 2025. You will need the following dataset to complete the WBA. Each student gets three attempts. We recommend having a go at the WBA following the lecture and lab. We recommend saving at least one attempt for revision purposes close in time to the class test.

Activity 2: Interpreting odds ratios from multiple binary logistic regression

Imagine you are interested in examining factors that predict whether an individual has a dog (yes/no). The variables you are interested in are: has children (yes/no), working pattern (full-time, part-time, unemployed), and number of pets previously (continuous). You code dog into a numeric variable where 0 = No and 1 = Yes. You set “No” as the reference category for “has children” and “unemployed” as the reference category for working pattern. Below are the odds ratios and 95% confidence intervals around the odds ratio that you obtain.

Odds ratio Lower confidence interval bound Upper confidence interval bound
Has_childrenYes 3.67 2.14 5.64
Working_patternFull-time 6.85 1.34 14.67
Working_patternPart-time 3.12 0.67 1.35
Num_previous_pets 0.45 0.23 0.67

QUESTION: Can you interpret each of the odds ratios?

Activity 2: Answers

You could interpret the odds ratios in two ways:

Way one:

Individuals who had children had higher odds of having a dog relative to individuals who did not have children (odds ratio = 3.67, 95% confidence interval = 2.14-5.64) when holding other variables constant

Individuals who worked full time had higher odds of having a dog relative to individuals who were unemployed (odds ratio = 6.85, 95% confidence interval = 1.34-14.67) when holding other variables constant

Individuals who worked part-time had higher odds of having a dog relative to individuals who were unemployed (odds ratio = 3.12, 95% confidence interval = 0.67-1.35) when holding other variables constant

A one unit increase in the number of previous pets was associated with lower odds of currently having a dog (odds ratio = 0.45, 95% confidence interval = 0.23-0.67) when holding other variables constant

Way two:

Individuals who had children had 3.67x higher odds of having a dog relative to individuals who did not have children (95% confidence interval = 2.14-5.64), when holding other variables constant

Individuals who worked full time had 6.85x higher odds of having a dog relative to individuals who were unemployed (95% confidence interval = 1.34-14.67), when holding other variables constant

Individuals who worked part-time had 3.12x higher odds of having a dog relative to individuals who were unemployed (95% confidence interval = 0.67-1.35)

A one unit increase in the number of previous pets was associated with a 0.45x higher odds (i.e. lower odds) of having a dog (odds ratio = 0.45, 95% confidence interval = 0.23-0.67), when holding other variables constant

Both ways of reporting are fine.

Activity 3: Interpreting odds ratios from a proportional odds model

Imagine you are interested in examining factors that predict severity of a disease (mild, moderate, or severe). The variables you are interested in are: pre-existing health condition (yes/no), smokes (yes/no), and number of units of alcohol consumed weekly (continuous). You code disease severity into an ordered factor (mild < moderate < severe). You set “No” as the reference category for “pre-existing health condition” and “smoking”. Below are the odds ratios and confidence intervals you obtain.

Odds ratio Lower confidence interval bound Upper confidence interval bound
Pre-existing_healthYes 2.31 1.45 4.56
SmokesYes 1.45 0.89 4.56
Num_alcohol_units 1.12 1.02 1.45

*QUESTION: Can you interpret each of the odds ratios?

Activity 3: Answers

You can interpret the odds ratios in two ways:

Way one:

Individuals who had a pre-existing health condition had higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not have a pre-existing health condition (odds ratio = 2.31, 95% confidence interval = 1.45-4.56), when holding other variables constant

Individuals who smoked had higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not smoke (odds ratio = 1.45, 95% confidence interval = 0.89-4.56) when holding other variables constant

A one unit increase in the number of alcohol units consumed weekly increased the odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease), when holding the other variables constant (odds ratio = 1.12, 95% confidence interval = 1.02-1.45)

Way two:

Individuals who had a pre-existing health condition had 2.31x higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not have a pre-existing health condition (95% confidence interval = 1.45-4.56), when holding other variables constant

Individuals who smoked had 1.45x higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not smoke (95% confidence interval = 0.89-4.56) when holding other variables constant

A one unit increase in the number of alcohol units consumed weekly increased the odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) by 1.12, when holding the other variables constant (95% confidence interval = 1.02-1.45)

Both ways of reporting are fine.

Asking questions

If you have any questions about this week’s content, please post them on the discussion board here. If you prefer to remain anonymous, you can post questions anonymously here. I will then copy your question to the discussion forum and answer it there and/or cover it in the next Q&A session.

Back to top