| Odds ratio | Lower confidence interval bound | Upper confidence interval bound | |
|---|---|---|---|
| Has_childrenYes | 3.67 | 2.14 | 5.64 |
| Working_patternFull-time | 6.85 | 1.34 | 14.67 |
| Working_patternPart-time | 3.12 | 0.67 | 1.35 |
| Num_previous_pets | 0.45 | 0.23 | 0.67 |
9. Expanding on binary logistic regression
Amy Atkinson
Research Question 1
You are interested in factors that predict academic achievement in mathematics. Children’s academic achievement can be rated as below expected, at expected or above expected.
The predictors you are interested in are:
- Number if hours spent revising (continuous)
- Likes school (Yes/no)
- Favourite subject (Maths, English or Science)
For the categorical predictors:
-Set the reference category for Likes school as “No” and the reference category for Favourite subject as “Maths”
See last week’s content for how to set the reference category. Or better yet, revisit your own script to see how you did it last time!
Research Question 2
You work in a nursery. In the nursery, there has been an outbreak of measles. You are interested in factors that predict whether a child in your nursery will have measles (yes/no).
The predictors you are interested in are:
- Number of hours spent at nursery weekly (continuous)
- Has siblings (Yes/no)
- Vaccinated against measles (Yes/no)
For the categorical predictors:
- Set the reference category for Siblings as “No”
- Set the reference category for Vaccinated as “No”
For the outcome (measles – yes/no):
- Set No as 0, and Yes as 1
See last week’s content for how to set the reference category. Or better yet, revisit your own script to see how you did it last time!
Hints and tips
Your script should aim to answer and interpret the research question above.
Start a new session on the server, load in the required libraries in the following order (car, DescTools, tidyverse, MASS, brant), and then the dataset.
Here are the steps you’ll need to follow:
- Prepare our data for analysis
- Explore our data
- Run the model
- Evaluate the model
- Evaluate the individual predictors
- Predicted probabilities
- Assumption checks
- Interpret the output
Model script
A model script showing one way of answering the research questions above using R will be available next week.
Independent learning activities
Below are some independent learning activities you can have a go at to help consolidate the content. These are optional, but recommended. Activity 1 is the WBA. Activities 2 and 3 are further activities to help you consolidate the content.
Activity 2: Interpreting odds ratios from multiple binary logistic regression
Imagine you are interested in examining factors that predict whether an individual has a dog (yes/no). The variables you are interested in are: has children (yes/no), working pattern (full-time, part-time, unemployed), and number of pets previously (continuous). You code dog into a numeric variable where 0 = No and 1 = Yes. You set “No” as the reference category for “has children” and “unemployed” as the reference category for working pattern. Below are the odds ratios and 95% confidence intervals around the odds ratio that you obtain.
QUESTION: Can you interpret each of the odds ratios?
Activity 2: Answers
You could interpret the odds ratios in two ways:
Way one:
Individuals who had children had higher odds of having a dog relative to individuals who did not have children (odds ratio = 3.67, 95% confidence interval = 2.14-5.64) when holding other variables constant
Individuals who worked full time had higher odds of having a dog relative to individuals who were unemployed (odds ratio = 6.85, 95% confidence interval = 1.34-14.67) when holding other variables constant
Individuals who worked part-time had higher odds of having a dog relative to individuals who were unemployed (odds ratio = 3.12, 95% confidence interval = 0.67-1.35) when holding other variables constant
A one unit increase in the number of previous pets was associated with lower odds of currently having a dog (odds ratio = 0.45, 95% confidence interval = 0.23-0.67) when holding other variables constant
Way two:
Individuals who had children had 3.67x higher odds of having a dog relative to individuals who did not have children (95% confidence interval = 2.14-5.64), when holding other variables constant
Individuals who worked full time had 6.85x higher odds of having a dog relative to individuals who were unemployed (95% confidence interval = 1.34-14.67), when holding other variables constant
Individuals who worked part-time had 3.12x higher odds of having a dog relative to individuals who were unemployed (95% confidence interval = 0.67-1.35)
A one unit increase in the number of previous pets was associated with a 0.45x higher odds (i.e. lower odds) of having a dog (odds ratio = 0.45, 95% confidence interval = 0.23-0.67), when holding other variables constant
Both ways of reporting are fine.
Activity 3: Interpreting odds ratios from a proportional odds model
Imagine you are interested in examining factors that predict severity of a disease (mild, moderate, or severe). The variables you are interested in are: pre-existing health condition (yes/no), smokes (yes/no), and number of units of alcohol consumed weekly (continuous). You code disease severity into an ordered factor (mild < moderate < severe). You set “No” as the reference category for “pre-existing health condition” and “smoking”. Below are the odds ratios and confidence intervals you obtain.
| Odds ratio | Lower confidence interval bound | Upper confidence interval bound | |
|---|---|---|---|
| Pre-existing_healthYes | 2.31 | 1.45 | 4.56 |
| SmokesYes | 1.45 | 0.89 | 4.56 |
| Num_alcohol_units | 1.12 | 1.02 | 1.45 |
*QUESTION: Can you interpret each of the odds ratios?
Activity 3: Answers
You can interpret the odds ratios in two ways:
Way one:
Individuals who had a pre-existing health condition had higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not have a pre-existing health condition (odds ratio = 2.31, 95% confidence interval = 1.45-4.56), when holding other variables constant
Individuals who smoked had higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not smoke (odds ratio = 1.45, 95% confidence interval = 0.89-4.56) when holding other variables constant
A one unit increase in the number of alcohol units consumed weekly increased the odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease), when holding the other variables constant (odds ratio = 1.12, 95% confidence interval = 1.02-1.45)
Way two:
Individuals who had a pre-existing health condition had 2.31x higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not have a pre-existing health condition (95% confidence interval = 1.45-4.56), when holding other variables constant
Individuals who smoked had 1.45x higher odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) relative to individuals who did not smoke (95% confidence interval = 0.89-4.56) when holding other variables constant
A one unit increase in the number of alcohol units consumed weekly increased the odds of having more severe disease (e.g. “severe” disease vs “mild” or “moderate” disease) by 1.12, when holding the other variables constant (95% confidence interval = 1.02-1.45)
Both ways of reporting are fine.
Asking questions
If you have any questions about this week’s content, please post them on the moodle discussion board.