Hi, it's Margriet here.

So up to this point, we looked
at predictors that were

continuous, but what if you
wanted to know whether the

response differed between two or
more discrete groups? For

instance, you may want to test
whether well-being scores

differ by biological
sex, so female versus male.

Or whether phobia symptoms differ
by psychological treatment

group. Sex or treatment group, in
these cases, are categorical

variables. So in this video we will
look at how we can model

responses as a function of such categorical predictors.

Now let's first look at an
example. So in this paper

the author looked at emotional
valence of perceptual

adjectives. So specifically,
it has been suggested that smell

words are overall more negative,
especially when compared to

taste words. So you can assess
this by looking at the context

in which these words occur. So
for example, the taste word

sweet, often appears together
with pleasant nouns such as

aroma and music and smile.

While

the smell word rancid

often occurs with nouns such as
grease or sweat.

So the average noun context,
valence, is more positive for

sweet than for rancid.

So we'll look at modelling a
categorical predictor variable

by building a linear model
to describe the relationship

between context valence, and
sensory modality, so taste

versus smell.

Now regression works well with
sets of numbers, but how do we

incorporate categorical
predictors? So in the middle we

have the graph from the previous
slide. This graph from the previous

slide with the box plots for

smell words and taste words. So
on the left here, this is

plotted slightly differently in
that you see the individual

values for the words. So here we have the

smell words and then we have the
taste words, and the individual

context valence scores for
these two groups of words.

Now,

to incorporate the categorical
predictor modality into a

regression model, the labels,
smell, and taste need to be

replaced with numerical
identifiers. So this process is

called dummy coding and
geometrically assigning numbers

to categories basically means
that the categories are placed

within a coordinate system. So
in the plot here in the middle,

smell words are located at x = 0

and taste words are located
at X = 1.

So this coding scheme is
called treatment coding or

dummy coding.

The overarching term
used for various coding schemes

is contrasts and

we'll come across a few
different ones in

this video.

Now, within the treatment coding
scheme, the category x = 0

is called the reference level
and it will assume the role of

the intercept of the

regression model. So in this
example, smell words are

rhe reference level.

Now when you fit a regression
line onto a variable with two

categories, the line has to go
through the means of both

categories. But although there
is a line, as in here,

it is important to remember
that that line can only be

interpreted at the discrete
points of x=0 and

x=1.

Any predictions generated for
the intermediate values, here,

they do not make any sense

in the case of a
categorical predictor.

Now you may remember that the
slope of the line indicates how

much you go up on the Y axis,

when you move one unit along the
X axis. So if you move from x=0

To x=1, you move
from the mean Valance for smell

words to the mean valence value
for taste words.

So in other words, for
categorical predictors, your

regression slope, this thing,

actually shows the
difference between the groups.

Now,

so let's plug the values
for our example

with smell and taste words into
into the into the formula.

And so the mean
emotional context

valence for smell
words is 5.5.

And and the regression

has estimated that the slope for
this this line is 0.3.

Now for the smell words.
Remember, smell is at x=0, so

here we fit in at our context value
at zero, which tells us that for

smell words the mean is 5.5.

So for taste words, which
reside at x=1,

we do the same thing again. We
have the intercept plus the

slope (0.3) and we
multiply this by 1, because

taste is at x=1.

So then we get

the mean value for
taste words, which is 5.8.

You can see that here.

So that is how to translate that.

Now another commonly used coding
scheme that is called sum

coding or also referred to as

deviation coding. So you might
ask, why would you want to use

one coding scheme over another?

We will see in another
video that when you have

interactions in your
regression model, sum coding

or deviation coding makes
interpreting the interaction

easier.

So it's good to have
some flexibility as to which

coding scheme you apply,
depending on your model and

on your hypothesis.

So how does the sum coding
work? So when converting a

categorical predictor into
sum codes, one category is

assigned to value minus one,

here and the other is assigned plus

one, here. So you can see
that here.

And with this coding scheme, the
intercept basically sits in the

middle of the two categories.
Here, this is the intercept,

where X is 0.

Which is the conceptual analogue
of centering for categorical

predictors. So the Y value of
the intercept is now the mean of

the means between these two

categories. So in other words,
the intercept is half way in

between the two categories.

If we stick this into
our formula, it looks like this:

so we see

that, you know, bar
differences due to rounding, the

predictions are pretty much
exactly the same as those for

the treatment model, right? So
it estimates that the

intercept is 5.6, which kind
of sits in between the two,

right? And that the slope is

.17. And for the smell words
which sit at minus one,

so we have to multiply this
thing. Sorry that thing, by minus

one so we can remember if you
multiply a positive number and

the negative number, it will become
negative. So you have

5.6 -- 0.17 and that equals 5.43, so that is

where the mean for smell words
sits. And for taste

words, it is plus one. So if
we stick that into that, we get

5.6 + 0.17 which equals
5.77 and so that is the

mean for the taste words.

Examples so far can be used with
a kind of binary categorical

predictor, so with two
categories. But what if your

predictor variable has more
than two levels right? So we looked

at the taste and smell words,

but the data actually contains
words for all the five senses,

so sight, touch, sound,
taste and smell.

So how would you create contrasts
for a 5-level predictor?

Now, if we fit a regression
model with a 5-level

predictor variable, this
is what happens.

The regression output shows
basically four slopes.

So here we have 1, 2, 3, 4
slopes and the intercept.

And then you can determine the
reference level from the output.

It is basically whatever
category is not shown as one of

the slopes. So here we have
smell, sound, touch and

taste, so sight is
missing. And so in this case the

reference level is sight.

And that you can also ... the R
default is that the level that

comes first in the alphabet, is
the reference level. So in this

case sight and that comes first
in the alphabet, compared to all

the other levels.

So sight words are basically the
hidden in the intercept and the

first slope, this one, the
ModalitySmell is the difference

between sight and smell words.

Similarly, ModalitySound is the

difference between sight and

sound words. So these values are
negative, which means that

smell words are more
negative, have a more

negative emotional valence, than
sight words, and

similarly, sound words have a
more negative emotional

valence than sight
words.

So at the top here you can see
what the matrix is like that

specifies the two level
situation that we looked at

earlier. So this is how R

specifies the treatment
coding scheme in case of two

predictors. So here the first

row (that is this one)

So this first

level is mapped onto zero and then
the second category, the second level

is mapped onto 1.

And the column here, is named two as
the dummy variable will be named

after the second category.

So that is how to read that

thing. So here at the bottom you
see the situation and the matrix

that specifies the five level
situation. So again, we have

here the 1st first category,
second, third, 4th and 5th

category. So in this case would
be sight, smell, sound, taste

and touch. As you can see the

1st

sight is basically
the reference level and then

comes smell, which sits at
one when you look at

this second slope.

And the second slope is

named, by the second level,
so that's why it's called

modalitySmell etc etc. for
the other levels.

Now there are various other
coding schemes and they can be

useful in certain situations so

there's actually many
others, and one example is

Helmert coding, which compares
the levels of variable with the

mean of the subsequent levels of
that variable. So it is

sometimes comes in useful when
you have an ordered categorical

predictor in your model. So for
instance, education level is an

ordered categorical predictor,
because having a PhD having

completed a PhD is considered
the highest level of education

which is higher than having
completed a Masters degree,

which is higher than

having completed a bachelors
degree, which is higher than

having completed A-levels etc etc.
So it is an ordered categorical

variable and in that case a
Helmert coding scheme can be

useful. So it looks like this.
So in the corresponding

regression output, the first
slope indicates the difference,

here to this one,

Between levels one and two. And

the second slope indicates the
difference between levels

one and two compared to level 3.

And the third slope indicates
the difference between

levels 1, 2 and 3
compared to level 4.

That will be useful. So each
consecutive level is compared to

the mean of all the previous
levels in an ordered sequence.

Don't worry too much about it,
just be aware that there are

different coding schemes around.

And you can use them
depending on what's best for

your particular model.

So in summary, we looked at:

Different ways to incorporate

categorical predictors into a
regression model, and if you do

that, you have to specify what
are called contrasts.

If you have a two level
categorical predictor, you

can use treatment or dummy
coding, which is actually the

default in R, or you can use
sum or deviation coding.

If you have more than two
levels, you can use either

of those as well, or you
can use many others, one of

which is the Helmert coding.

Now it's important to remember
that the reference level is

kind of hidden in the intercept
and the default in R is that

whatever level comes first in
the alphabet is used as the

reference level.

You can change that actually
in R, but keep it this in

mind. Keep in mind that
that is the default.

Now actually, as you can
imagine, to be able to

interpret your coefficients,
it's really important that

the reader knows which
coding scheme you have used.

So it is important that when
you write up a regression

analysis that includes
categorical predictors and

you have used a particular
coding scheme, tell the

reader explicitly what
coding scheme that is.

OK, that's all for now.
Thank you very much for

your attention.