Odds ratio | Lower confidence interval bound | Upper confidence interval bound | |
---|---|---|---|
Has_childrenYes | 3.67 | 2.14 | 5.64 |
Working_patternFull-time | 6.85 | 1.34 | 14.67 |
Working_patternPart-time | 3.12 | 0.67 | 1.35 |
Num_previous_pets | 0.45 | 0.23 | 0.67 |
9. Expanding on binary logistic regression
Amy Atkinson
Lecture
Part 1: This part covers how to conduct binary logistic regression with different types of predictors.
Part 2: This part covers binary logistic regression with multiple predictors.
Part 3: This part covers ordinal logistic regression.
Download all the lecture slides here in both .pptx and .pdf format.
Lab preparation
Before the lab, please watch the following short video. This covers how to run an ordinal logistic regression model in R.
If you want to have a play around with the script yourself, the R markdown script and dataset can be downloaded here. The script also contains an example of how to run a multiple binary logistic regression model in R.
Lab
Today I’ll provide you with two research questions which will require you to complete the statistical tests covered in the lecture. You can work in groups or individually.
You can write your script as a .R or Rmd file. Use the lab preparation video and script, lecture slides, and previous content covered in the statistics modules to help you.
The presentation given at the start of the lab can be downloaded here.
Datasets
The datasets for this lab can be downloaded here.
Research Question 1
You are interested in factors that predict academic achievement in mathematics. Children’s academic achievement can be rated as below expected, at expected or above expected.
The predictors you are interested in are:
- Number if hours spent revising (continuous)
- Likes school (Yes/no)
- Favourite subject (Maths, English or Science)
For the categorical predictors:
-Set the reference category for Likes school as “No” and the reference category for Favourite subject as “Maths”
See last week’s content for how to set the reference category. Or better yet, revisit your own script to see how you did it last time!
Research Question 2
You work in a nursery. In the nursery, there has been an outbreak of measles. You are interested in factors that predict whether a child in your nursery will have measles (yes/no).
The predictors you are interested in are:
- Number of hours spent at nursery weekly (continuous)
- Has siblings (Yes/no)
- Vaccinated against measles (Yes/no)
For the categorical predictors:
- Set the reference category for Siblings as “No”
- Set the reference category for Vaccinated as “No”
For the outcome (measles – yes/no):
- Set No as 0, and Yes as 1
See last week’s content for how to set the reference category. Or better yet, revisit your own script to see how you did it last time!
Hints and tips
Your script should aim to answer and interpret the research question above.
Start a new session on the server, load in the required libraries in the following order (car, DescTools, tidyverse, MASS, brant), and then the dataset.
Here are the steps you’ll need to follow:
- Prepare our data for analysis
- Explore our data
- Run the model
- Evaluate the model
- Evaluate the individual predictors
- Predicted probabilities
- Assumption checks
- Interpret the output
Model script
A model script showing one way of answering the research questions above using R will be available here from 9am on Friday of Week 19.
Feedback on student scripts
There were no student submissions this week.
Independent learning activities
Below are some independent learning activities you can have a go at to help consolidate the content. These are optional, but recommended. Activity 1 is the WBA. Activities 2 and 3 are further activities to help you consolidate the content.
Activity 1: The WBA
The WBA can be accessed here from 6th March 2025. You will need the following dataset to complete the WBA. Each student gets three attempts. We recommend having a go at the WBA following the lecture and lab. We recommend saving at least one attempt for revision purposes close in time to the class test.
Activity 2: Interpreting odds ratios from multiple binary logistic regression
Imagine you are interested in examining factors that predict whether an individual has a dog (yes/no). The variables you are interested in are: has children (yes/no), working pattern (full-time, part-time, unemployed), and number of pets previously (continuous). You code dog into a numeric variable where 0 = No and 1 = Yes. You set “No” as the reference category for “has children” and “unemployed” as the reference category for working pattern. Below are the odds ratios and 95% confidence intervals around the odds ratio that you obtain.
QUESTION: Can you interpret each of the odds ratios?
Activity 2: Answers
You could interpret the odds ratios in two ways:
Way one:
Individuals who had children had higher odds of having a dog relative to individuals who did not have children (odds ratio = 3.67, 95% confidence interval = 2.14-5.64) when holding other variables constant
Individuals who worked full time had higher odds of having a dog relative to individuals who were unemployed (odds ratio = 6.85, 95% confidence interval = 1.34-14.67) when holding other variables constant
Individuals who worked part-time had higher odds of having a dog relative to individuals who were unemployed (odds ratio = 3.12, 95% confidence interval = 0.67-1.35) when holding other variables constant
A one unit increase in the number of previous pets was associated with lower odds of currently having a dog (odds ratio = 0.45, 95% confidence interval = 0.23-0.67) when holding other variables constant
Way two:
Individuals who had children had 3.67x higher odds of having a dog relative to individuals who did not have children (95% confidence interval = 2.14-5.64), when holding other variables constant
Individuals who worked full time had 6.85x higher odds of having a dog relative to individuals who were unemployed (95% confidence interval = 1.34-14.67), when holding other variables constant
Individuals who worked part-time had 3.12x higher odds of having a dog relative to individuals who were unemployed (95% confidence interval = 0.67-1.35)
A one unit increase in the number of previous pets was associated with a 0.45x higher odds (i.e. lower odds) of having a dog (odds ratio = 0.45, 95% confidence interval = 0.23-0.67), when holding other variables constant
Both ways of reporting are fine.
Activity 3: Interpreting odds ratios from a proportional odds model
Imagine you are interested in examining factors that predict severity of a disease (mild, moderate, or severe). The variables you are interested in are: pre-existing health condition (yes/no), smokes (yes/no), and number of units of alcohol consumed weekly (continuous). You code disease severity into an ordered factor (mild < moderate < severe). You set “No” as the reference category for “pre-existing health condition” and “smoking”. Below are the odds ratios and confidence intervals you obtain.
Odds ratio | Lower confidence interval bound | Upper confidence interval bound | |
---|---|---|---|
Pre-existing_healthYes | 2.31 | 1.45 | 4.56 |
SmokesYes | 1.45 | 0.89 | 4.56 |
Num_alcohol_units | 1.12 | 1.02 | 1.45 |
*QUESTION: Can you interpret each of the odds ratios?
Activity 3: Answers
You can interpret the odds ratios in two ways:
Way one:
Individuals who had a pre-existing health condition had higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not have a pre-existing health condition (odds ratio = 2.31, 95% confidence interval = 1.45-4.56), when holding other variables constant
Individuals who smoked had higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not smoke (odds ratio = 1.45, 95% confidence interval = 0.89-4.56) when holding other variables constant
A one unit increase in the number of alcohol units consumed weekly increased the odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease), when holding the other variables constant (odds ratio = 1.12, 95% confidence interval = 1.02-1.45)
Way two:
Individuals who had a pre-existing health condition had 2.31x higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not have a pre-existing health condition (95% confidence interval = 1.45-4.56), when holding other variables constant
Individuals who smoked had 1.45x higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not smoke (95% confidence interval = 0.89-4.56) when holding other variables constant
A one unit increase in the number of alcohol units consumed weekly increased the odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) by 1.12, when holding the other variables constant (95% confidence interval = 1.02-1.45)
Both ways of reporting are fine.
Asking questions
If you have any questions about this week’s content, please post them on the discussion board here. If you prefer to remain anonymous, you can post questions anonymously here. I will then copy your question to the discussion forum and answer it there and/or cover it in the next Q&A session.