Hi, it's Margriet

So in all the models we've
considered so far, we have dealt

with continuous outcome

variables. We have
previously looked at

incorporating categorical
predictors, but what if the

response or the outcome
itself is categorical? So

that is what we will be
looking at today.

Now what do we mean by a
discrete or categorical outcome

variable? Now categorical
outcome variables are

ubiquitous, ubiquitous in
psychology and examples include

responses in any two forced
choice task. So for instance, in

a face recognition task, the
participant is asked to decide

which one of two faces they have

seen before. Or an it might
be. Another example, is the

likelihood of cancellations
occurring in a

schizophrenic patient
receiving a particular

treatment or an. You might
want that the likelihood

that a child.

Falls in one of two groups like good readers versus poor

readers, or the likelihood that somebody will move their eyes

the left or the right.

Um

These are all examples of
categorical outcome variables

so there are a lot. But before we look at logistic

regression, let's first talk
about why it is a bad idea

to use the linear model to
analyze these. So ANOVA or

simple or multiple linear
regression.

Now.

If you are interested in some
binary event, so participants

accuracy in a false choice task.
For instance, each measurement

will be represented as a 0.

Inaccurate or wrong accurate?

And what you could do is that
you kind of calculate the

proportion of accurate
responses for each

participant. So percent
correct an many people do

that and then stick that number into a in an ANOVA

simple linear regression, or
multiple linear regression.

No, that is a bad idea for two

reasons. Firstly, the scale. So
the values that the outcome

variable can take is
intrinsically limited, so it's

bounded because it is.

Zero and one. Can't be less than
0. Can't be more than one.

And that leads to problems
if you treat it as if it

is a continuous variable.

And the other thing is that you
might remember that to apply a

linear model, you would usually check whether the ah 

whether the residuals are show
homoskedasticity. So whether the

variance is kind of constant
across different fitted values.

And that is often not the
case for a discrete or

categorical outcome.
Variables in more often

than not, the variance is
actually proportional to

the mean of any those kind
of count data.

So let's look at each of
these issues in term.

And let's first think about this
banded scale issue. So in this

graph you see.

In each of these plots you see
the proportion correct responses

on the Y axis made by each
participant in 12 blocks.

An in in in learning
experiment. So in one

condition did only learn nouns
in another condition, then

learn nouns and verbs come
into mixed and in the third

condition they were learning
new verbs and that condition

were only verbs. So across the
X axis here we have block.

Now as you can see.

Across the blocks, they
start to become more often

correct, so they they are
learning these words.

So these black solid lines
they showed the best fit for

the effect of block for all
the participants, but of more

interest is that other other
kind of grey dots and Gray

lines because they showed a
linear model best fit for the

block effect, but for each
participant an individually.

So the er each Gray line here is one participant.

Now the.

Accuracy responses of a

participant in. A set of trials
like this can only vary between

zero and one at the trial level. Either they are correct on that

trial, in which case it is one
or they are incorrect an, in

which case the value is 0. So
there is an inbuilt or intrinsic

limit to the predicted
proportions of the responses

that can be correct.

And you know that it can't go
beyond zero or above or above 1.

But here these Gray lines
illustrated this problem,

so this is a linear model
fit, right?

That can basically predict any
value. So if you follow these

lines you can see that some of
them go well beyond one, and

they could theoretically also
can go below 0.

So you could predict that the
proportion of a person's

responses that are correct
could be less than zero or

greater than one, and that is
clearly impossible.

Now this.

If you treat if you treat a
categorical outcome variable as

if it is a continuous variable.

This bounded scale problem also
can lead to kind of spurious

interactions, and that can
happen in the following

scenario. So if participants are
already highly accurate, kind of

more than 90% in one condition.

But not in the other condition.
And then they participate in an

intervention and you would look
at how accurate they were after

they had taken part in this
intervention. Then you can only

go up so far from condition one
right because they were already

at 90% and before they even did
the year the intervention. So in

condition one, there's only very
little room to kind of become

better where because it is a better skill because you can't go

beyond 100%. Accurate.

Where is in condition B or in in
the other condition where there

were only 50% correct before the
intervention. There's a lot of

room for improvement, so you
might get kind of spurious

interaction effects if you treat
a categorical variable as if it

is a continuous variable. That
could take any.

Value.

And therefore, if an interaction like that, occurs it is

difficult to know whether any whether that interaction effect

reflects something theoretically
meaningful, or whether it's just

an artifact of the band nature
of the scale.

Now the other fundamental
problem with using analysis

approaches like an ANOVA or regression to analyze

categorical outcome variables
like equity of responses is that

we cannot assume that the
variance in accuracy of

responses will be homogeneous
across different experimental

conditions. So you might
remember this these pictures on

the slide here. So this shows homoscedasticity and these are patterns

where this is not the case, so.

The point basically is that it
is unlikely that a categorical

outcome variable will show this.

And and that is illustrated in this figure as well, so

this is the rainy days
example from the interactive

textbook by Dale Barr

So here you are looking at the
number of rainy days in four

different European cities. So
Barcelona has the lowest number

and Brussels has the highest
number, so this is about 55

rainy days on average a year.
And here you have almost 200

rainy days a year and.

You can see that this
distribution is kind of thinner

than this distribution, so the
higher the mean, the fatter the

distribution gets. So the variance is proportional to

the mean, which suggests that there is no homogeneity of variance.

So to summarize, kind of.

In yeah, to summarize up for
the things that we talked

about up to now, linear models
assume that outcomes are

unbounded, so allow
predictions that are

impossible when outcomes are
in fact bounded, as is the

case for accuracy rather
categorical variables.

And Secondly. Linear models
assume homogeneity of

variance, but that is unlikely
and anyway cannot be predicted

in advance when outcomes are
categorical variables.

So you can think of the problem
like this, so if you're using

ANOVA or a regression to analyze
the variation in the categorical

outcome, measure like the
accuracy of responses, it is

basically only an approximation.

So if you're lucky and the
proportions of responses that

are correct are on average kind
of .5 or in different

conditions, similar distances from .5, then your

approximation will give you a reasonable estimate of the

effect of your experimental conditions on your outcome

measure. So if you're not interested in accuracy

prediction of outcomes at the
extremes are you do not mind

if your regression.

Or whatever you do in normal
linear model of outcome accuracy

can give you impossible
predicted values as in below 0

in above 1.

And then again, the
approximation is acceptable.

However, it seems a little bit
hard to justify that we should

tolerate kind of the limitations
in these traditional approaches

when we have perfectly
appropriate models available.

To us to to analyze these data
properly. So these are the

generalized so generalized
rather than general generalized

linear linear models.

OK. So now we're going to
take a slight detour by

talking about distributions.

Now remember that if you fit a
linear regression model, you

need to check whether the
residuals are normally

distributed, doesn't matter
whether the outcome variable

itself is or is not normally
distributed. It is all about the

residuals. So I'm talking here.
I'm talking about a perfectly

normal linear model with a
continuous outcome variable.

So in the graph on the slide,
you see a situation where the

outcome variable, itself, is not normally distributed. So so

just look at this solid black

line here.

It is. This is actually a
histogram.

Or it's a density plot, but it's
the same shows the same thing as

a as a histogram, and you can
see that there are more

observations here, kind of
towards the left. Then there are

towards the right, everything
there is positive skew here.

Now.

So the outcome variable itself
is not normally distributed.

However, this distribution can
arise from multiple normal

distributions and that is shown
by these dashed lines, so here.

So if you fit a Model to this of the form kind of an Y as a

function of group.

Um? Then you will build
a prediction for each

of these groups, say.

If these values are drawn from underlying normal

distributions, then the
residuals will also be normally

distributed, so as long as.

These individual distributions
of these individual groups are

normally distributed.
Residuals will be normally

distributed. That is kind of encapsulated on the

right of the slide. So if you
look at this formula here.

It shows Y is assumed to be.

Generated by normally
distributed process with a mean

of Mew and a standard deviation of Sigma.

So.

The idea is that Y that's this

thing. Comes from normally the data that contribute to it

had come from a kind of normal.

Distribution.

Now.

The Y values change as a function of your predictor? That's the

whole idea of kind of wanting to

predict. Different values of Y based on certain predictors X

so your Y values change as a
function of predictors and

predicting that basically
predicting a shifting mean as a

function of X.

So how it shifts depends on the
slope of course of this. This

beta one, so we can try to
predict Y values based on this

equation that's used to linear

regression equation. And it is the job off regression to

supply the corresponding estimates rightful beta's

zero and beta one.

Now this is all just about
linear regression. Everything as

we have been using it. The only
difference is that we've started

to think about the process that
has generated the data.

OK, now let's talk about the
situation where the process

that has generated data
wasn't normally distributed.

And that is what the generalized
linear framework does.

So it generalizes the linear
model framework to incorporate

data generation generation. Generating processes that

follow any distribution.

So an and a type of generalized
linear model that we will look

at today is logistic regression.

So that's it within the
generalized linear framework and

the generalized linear framework
generalizes the linear model

framework to incorporate data
generating processes that follow

any distribution, rather than
just a normal distribution.

OK. Now, logistic regression is

assumed. For logistic regression that this assumed

that the response Y has been generated by a process that

follows a binomial
distribution.

So the binomial distribution has two parameters N and p.

And so P is the probability
parameter and describes the

probability that Y is either
zero or one.

It being binary.

N is the trial parameter, so how many trials have been

conducted and for our
purposes we will only use

logistic regression
exclusively to model data at

the individual trial level.
So we set and to one. So in

that case binomial
distribution characterizes

the probability of observing
a single event, such as

whether a participant correctly identify the face

on that trial.

Now the binomial distribution
with N set to one has a

special name. It is called the Bernoullie distribution.

So the way we're going to use logistic regression 

assumes that Y is generated by process that follows the

Bernouille distribution.

Now in this figure you see what the Bernoullie distribution

assigns to the values of zero
and one for three different

parameters P. Say P .8.

It's much the probability of something of an event

occurring. It's much more likely
than if not occurring at the

probability of .2.

It's much more likely that event
does not occur, and if it does

occur and you can see at P. .5,
it is kind of a 5050 chance.

Now, in the context of logistic
regression, you are generally

interested in modeling P as a
function of one or more

predictors. So, for example,
you might want to model the

probability of recognizing face
as a function of the number of

presentations during training.
Or you might want to model the

probability of passing a second
language test as a function of

language, background, age, and
educational background.

So ultimately you want something
like the probability of

something. Equals. Beta 0
plus beta 1 * X So you want your

predictor with its slope. And of
course there will be an

intercept as well.

So you want to use
that to predict.

A certain probability of
something.

You want different probabilities
for different, so you basically

want different probabilities for
different values of your X.

Now here's the problem. The equation beta 0 plus beta 1 *

X. Can predict any continuous

value. But probabilities have to
be between zero and one, so that

means that you need to find a
way of constraining what

regression can predict.

So you need to kind of crunch
the output of the predictive

equation to fit into the
interval between zero and one.

So we want. To do something to this predicted feq.

So that's its outcome can only
vary between zero and one.

Now to go to mathematical
function for that is the

logistic function, which is
why it is called logistic

regression.

So rather than modeling the
probability P directly as a

function of the predictors,
the output of the predictive

equation is transformed via
the logistic.

Now on this slide you see the
effects of the logistic

function. So notice how negative
numbers. So this is kind of just

the normal scale, so here minus

one. After you've applied the

logistic function. But the
number will be positive and

it will be.

Between zero and one.

So applying the logistic
function to the numerical value

0 yields .5.

And etc etc. So you kind of crunch it all. The numbers from

what was a continuous variable
within this limited range.

And in order, logistic function
is implemented in command

plogis(), so that's good.

Show me that later.

Now before we start fitting
logistic regression models, we

need to talk about one other
kind of bit of math.

And you need to know about log odds. Also called logits.

So. The odds of something
expresses the probability of an

event occurring divided by that event, not occuring or a bit more

formally. So probability of the
event event happening divided by

1 minus the prompt that
probability and you can express

that an. You can express this
statement by kind of plugging.

.5 P is .5 in here, so if you
put in .5 here and.

Then here you get 1 - .5 = .5 / .5 = 1.

And so that that you have heard, maybe the expression the odds are

one to one which describes there
kind of a 50% chance of an event

occurring. And if that's what I
what I meant to say first. So

let me start again.

The odds are defined like this.

You will have heard of this
expression an the odds are one

to one, and that describes a
5050 chance of something

happening and you can express
this by putting .5 into this

equation. So if you put .5 in
here and .5 there, you get .5 /

.5 = 1. So the odds are kind of
1 to one.

Now odds range from zero to Infinity and that it's a

continuous scale, which is good
because we need to. It makes it

much more amenable to something
like regression, but we don't

need to apply and and at the

natural logarithm. And so that
it ranges from negative to

positive Infinity rather than
from zero to Infinity.

So the point of using the log
term form is that you get a

continuous scale that ranges
from negative Infinity to

positive Infinity.

So these are steps that you kind
of have to take to ask to be

able to estimate effect over
bound with outcome that is.

Between zero and one and when we transform it odds and then to log

odds so we can make sure that we crunch it into that we use what

was between zero and one. We
transform into something that is

actually continuous and.

Varies from minus or negative Infinity to positive

Infinity and that is much
more useful if you're doing

regression.

So in this table you can see how
the different probabilities

correspond to odds.
Probabilities here correspond to

odds here and how these in turn respond to log odds.

So log odds take quite a bit of time to get used to really

but a good thing to remember to start out with are bound log odds

is that a log odd of 0.

That corresponds to probability
of .5 points one to one.

odds

And a positive log odds.

Corresponds to probabilities
larger than .5.

And negative log odds correspond to probabilities

smaller than .5.

So for example, if you're modeling face recognition then a

positive log odds so here.

Indicates that the face is more
likely to be recognized.

Now as I said the the whole point of talking about log odds is

that it puts probabilities onto
a continuous scale, which is

more amenable to being modeled

with regression. So logistic
regression actually predicts the

log odds or logits.

So here we have our predicted
part of the equation and what

you are actually predicting is
not the probability P what the

logit of their probability.

So um those relationships between logits or log odds and

probabilities you see on there
on the right here as well. So

the logistic function.

Crunches logits onto and into the range between zero and one

and the logit function
expresses probability on a

scale that ranges from negative Infinity to positive

Infinity. So these kind of
opposite or inverse

transformations.

Now important thing for now is
to remember this so when we're

doing logistic regression with
trying to estimate and we're

trying to predict.

What probability of something is
going to be? But we're not

actually predicting the
probability because of solving

this. This bounded skill problem. We are predicting the

logit of that probability.

So a second summary, if you have a categorical outcome

variable, there's two problems
we have that it has an

intrinsically bounded skill
because it varies between zero

and one. And the second problem
is that the homogeneity of

variance assumption is not met.
So these two things an are the

reasons that it is not a good
idea to use ANOVA or any simple

or a multiple linear regression
model to analyze these types of

data. Instead you should be
using generalized linear model

and a bernouille distribution is a special case of that. It's

actually a special case.

Using a binomial distribution to fit the model.

And to make kind of move from
a situation where you have

probabilities to something that you can analyze in a

regression you need to
transform these probabilities

to odds to put him on a
continuous scale, and then you

have to transform these odds to log odds and so that they

can vary between minus and
negative and positive. And

that is what you're then using for you for your modeling.

OK, now let's look at an example
and hopefully that will make

things a bit clearer.

So this is an example that is
inspired by some real data,

but the data and you see here actually artificial data

and it is data with a
categorical outcome variable

or speech error. So whether
or not somebody makes a

speech error on that
particular trial as a

function of blood alcohol
concentration. So how drunk

you are?

Each dot here is a data point and you can see them less.

Lower levels of blood alcohol
concentration make it more

likely that don't make a
mistake and higher levels

make it more likely that you do make a speech error

So that is, that's just the data
that we are going to fit. The

logistic regression model to

Now these.

Data we can come read and you
know that's just the name of the

file. We read it in and that is
what the table looks like. So we

have the column called BAC. That
is the blood alcohol

concentration predictor variable and we have a column called

speech error which contains the information about the presence

or absence of a speech error. So 0 means no speech

error, one means there was a speech error so that speech

error is our outcome variable.

Now the function for fitting a
logistic regression model is

called GLM. So here for
generalized linear model.

And you specify your model
formula as before, but in

addition you need to specify the
assumed distribution of the data

generation process. So here we
have our outcome.

variable here we have a predictors modelling speech error as a

function of blood alcohol concentration. We've told R

what our data are.

And in addition, have to
specify this.

Arguments here.

So.

Here we tell R which.

Distribution, we think that the data generation process

has come from.

So this is done in the family
argument and the named family

comes from the fact that you
can think of any basic

distribution's shape, whether its gaussian or

binomial or whatever. As a
family of distributions. That

is this because you get by
changing the parameters allows

you to create lots of versions
of that same distribution.

Now you specify the family to be binomial. Remember the bernouille

distribution is a special case
of the binomial distribution.

So, and we're storing all of
that into an object that we

call alcohol model.

Now we can use tidy from the broom package to retrieve

the coefficients table, and as
always it is important to

define sometime to
interpreting yes the estimate

column here so that column.

And in this case, in the case of
logistic regression, these

estimates are log odds or logits

So the first thing to look for is the sign of its

Coefficient, so here we have to slope for BAC and that is

positive, which means that an
increase in blood alcohol

concentration corresponds to an
increase in the log odds of

observing a speech error.

Then the other thing to notice

that the. The Intercept is
negative, so that indicates that

for Axis zero it is the case
that the probability of making

this speech error is smaller

than .5. In other words, sober
people make a speech error less

than 50% of the time.

Now, the fact that the P value here for BAC coefficient is

significant can be translated
into something like.

Obtaining a slope of 16.11 or
more extreme than that is quite

unlikely if the null hypothesis
is true. So therefore you can

reject the null hypothesis.

So finally this statistic here actually is

Z rather than T. So when you fit a linear model, it is

the statistic for your
coefficients is T, But if you

fit a generalized linear model the statistic for your

coefficients is Z

We won't go into the reasons why
that is the case here. Just

remember that because it's
relevant when you report it, so

you could report this along with
something like this, so blood

blood alcohol concentration
significantly predicted the

occurrence of a speech error,
and then the logit coefficient

is that it's standard deviation
is that in there?

Z is 3.3 and P value is that.

OK, now. To get rid of
those confusing log odds, let's

calculate some probabilities. So first we'll extract the

coefficients. So what we're just
we're doing here, we have a

model. I'm and we ask R to put the estimate that is in

the first location into.

An object that we call

intercept. And yes the second estimate, so I just

go back to the previous slide

First estimate second
estimate. So for estimate,

first location, second
location.

We put that into something that we call slope. If we don't just

call it, we say intercept. That
will tell us that it is minus

3.64. An and the slope is 16.11.

So now we can compute the log odds values for a blood alcohol

concentration, or say 0 so
completely sober and for a blood

alcohol concentration level of
.3, which is pretty drunk.

So we use our predictive
equation. So that's the

intercept. What's the slope multiplied by the value of X

So in case that is 0, so that's
just the Intercept basically,

and in the other case it is the
Intercept plus the slope

multiplied by .3.

Value of X is .3 and that gives
us this value so.

Minus 3.64 and 1.19 are the
predicted log odds for the

corresponding blood alcohol

concentrations.

And from there we can.

Predict calculates predicted

probabilities. So to get the
predicted probabilities rather

than the log odds of making a speech error, you have to apply

the logistic function. Plogis() to these logg odds so we're saying

OK use. Give me use this. The
logistic function on this whole

thing and then we get a number.
So this number.

Means that for sober people.

Remember value X, value of X is zero. You expect speech errors

to occur on average, about 2.5%,
so we get this value here

.025. I'm so given this model
you expect speech errors to

occur on average about 2 1/2%
of the time. If people are

completely sober.

Now for drunk people and the
predicted probability is .77.

So this er, so given this model, you expect speech

errors to occur on average about 77% of the time.

Now, it's quite a lot to get your head around.

But yeah, well, we'll get some
practice in in the lab.

And yeah, I hope this has given
you some sort of introduction

and to make some sort of sense
of all this logistic regression

business. Thank you very much
for your attention.