Hi it's Margriet here. And in this video we will talk about multiple regression. Now first we'll look at what the regression looks like when you have multiple predictors instead of just a single predictor. And we'll apply that to to an example. And then there are three kind of topics that we need to get through around multiple regression that is, standardized coefficients, the assumptions that are associated with multiple regression and the adjusted R ^2. OK, now from the previous video you might remember that there are several different types of linear regression and we already looked at simple linear regression where you have just one predictor in your model. So today we will talk about multiple linear regression and that involves models with multiple predictors. Now this is what the regression line looked like for the scenario where you had one predictor. So remember you had the dependent variable Y the line that needs an intercept and slope, so the Intercept is the beta zero and the slope determined by the weight associated with your predictor. So this is the mathematical description of a line and this is the scenario for simple linear regression. Now, what does that look like in the context of multiple regression? It has the same form, right? So the dependent variable is again represented by Yand then we have multiple predictors. So we have an X1 as in simple linear regression. But we also have an X2 and X3 and you could have a whole range of X-es so represented by X n. And then each of these predictors has their own beta or weight. Of course you also have an intercept and you have an error term. Now let's look at an example. This example comes from the Pirates Guide to R. Now pirates like diamonds, and so this is an example using the diamonds data Set that comes with R. And in that data set there is information about various characteristics of a whole range of diamonds that we have there. Weight, we have clarity and we have the color and various other things. Now if we. Put those predictors into our regression line, we get the following, so we have the intercept And so, let's say that we want to try to predict the value of a diamond based on their weight, their clarity and their color. So here you can see how do the beta of weight multiplies by weight? So that's the first predictor. Our second predictor here is X2. It's clarity. An our X3 is color. Of course, we again we have the intercept and this is what we produce to predict our value of the diamond and there will be an error term as well of course as. As is always the case. So when we then want to build our model in R, we use the lm() function for linear model and the formula that that function takes, mirrors the regression line formula so. Again, it has the form of Y as a function of X1 plus X2 plus all the other predictors that you want to add, so where Y is the dependent variable or outcome variable, and X1 and X2 etc. the independent variables or the predictor variables. That's just different names for the same type of thing. And of course, you need to tell R what data you're using. So if we use our diamonds example then we could say OK, let's build a linear model for diamonds. That is just the name of the object we're assigning these to and then we have lm for the function and our formula equals the value Y as a function of weight plus clarity plus color. And then we tell it OK, the data That is this diamonds data set. So this is how to to read this, kind of, formula notation in R and value is being modelled as a function of weight, clarity and color. So y is modeled as a function of X, as a function of that thing. It looks maybe a little bit weird when you see thatthe first few times, but when you kind of read it out loud like that, it helps you to make sense of it and it is important to remember that the way it is coded in R mirrors, the regression line formula. So when you look at the output, you actually get that get that whole thing again, just to remind you, what you've actually modeled. Now what are the other important parts of the output? So here at the bottom we have our F statistic which tell us whether the overall model is significant or not. So we have the F statistic and the P value next to it. Cause the P value if it is smaller than .05, it tells us that all the predictors together in the model Significantly predict y, the model overall is significant. And then it also reports the adjusted R-squared. Sorryit reports the R-squared that we talked about in the previous video on simple linear regression, but it also reports the adjusted R-squared and As mentioned, I will. I'll talk about that in in a little while. And then of course there are the coefficients and they tell you whether or not each predictor (and here we have our weight and our clarity predictor color predictor) whether each one of them by themselves (while kind of keeping all the other ones constant) contributes significantly to the model, so these values here on the right. They tell you whether this individual predictors and whether weight is significant or not, and in this case a weight is significant and this is the exact P value. But there's also the three stars next to it to indicate that it is significant at the P is smaller than .01 level. And then we have. So weight is significant and clarity is significant. In the first line here that that is your intercept. OK. So much about outputs. Now over to standardized coefficients. Now it is important to keep the metric of each variable, each predictor in mind when performing multiple regression. So you need to ask yourself what does one unit change mean for each of the different predictors? Now let's look at an example. So we will use the iconicity study by Winter at al. 2017 for that. Now the concept of iconicity describes whether the the form of words resembles their meaning. So onomatopoeic words such as boom or beeping or buzzing are examples of words with high iconicity. And then in their study they measured iconicity via rating scale, asking listeners how much does this word sound like what it means? In the study they modelled iconicity as a function of several and kind of linguistic concepts, and one of them was sensory experience, another word frequency an another one imagineability, and finally systematicity. Now don't worry about what each of these things mean. You can look at that paper to learn more about that if you're interested, But let's here use it as an example for why you need to think about standardized coefficients when you do multiple regression. So here we have our regression formula, right? So they modelled iconicity as a function of sensory experience, plus imagineability, second predictor, plus systematicity, third predictor, plus word frequency, 4th predictor. They built a model and these are the various coefficients for it. No. As you can see, hey, so let's look at this value. So the larger the coefficient, usually the more important that predictor is in the model. So if you just look at the different weights here, you'll see that systematicity is way bigger than the others. So you might conclude from that, that systematicity is a really important predictor for iconicity. But then if we go back to the question of what does a one unit change mean for that predictor? So what does one unit change in systematicity mean for in the model, in the context of the model and we look at the values that are actually associated with systematicity, then we NOTE Confidence: 0.7899729 identify a problem here. So if you look at the minimum value, you know do your descriptives: you look at the mean, you look at the standard deviation and you also look at the minimum and the maximum and minimum value of systematicity in this data set was this. And so you need to go to a lot of decimals to actually find find a number. So this is negative but very close to zero. And similarly the maximum number was also very close to 0. So, The variation in systematicity basically varied from just below 0 to just above 0, and so the units were tiny and if you then go up one unit, it's a huge jump on the Y axis basically. So that that is when you use the unstandardized coefficient. If instead, the numbers that you feed into the model have been standardized (and if you can't remember that means, please watch the video on centuring and standardizing), and so if you standardize the numbers and you run the model again, then you get to this outcome here: the standardized version. So these are standardized coefficients and you can see that now that the metric is expressed as a standard unit across the different predictors, they are much more compareable, so now we're talking about .5 for sensory experience, minus .4 for imagineability and O for systematicity, and minus .3 for word frequency. And so now you might actually conclude that here sensory experience is the most important predictor for iconicity. So that is an example to kind of illustrate why it is important to use standardized values so that you have standardized coefficients in your model. OK. Over to assumptions. As you know, statistical models rely on assumptions, and that is the case for multiple regression too. So all the claims made on the basis of a model are contingent on satisfying its assumptions to some reasonable degree, and as mentioned as mentioned in a previous lecture for regression, the assumptions are actually about the error term, that is, they are related to the residuals of the model. Now, if the model satisfies the normality assumption, its residuals are approximately normally distributed. And you can see that here (in the blue rectangle). If the model satisfies the constant variance assumption, the spread of the residuals should be about equal while moving along the regression line. And this is also known as homoscedasticity. We talked about those two in the context of simple linear regression as well. Now, when it comes to multiple linear regression, there's a third assumption that's important to look at, and that is collinearity. But first, let's just review how we check for normality and homoscedasticity of the residuals. So it is generally recommended to assess both normality and homoscedasticity visually. That is to assess whether the residuals are normally distributed, you draw a histogram. You can see that here in the blue box on the slide, on the left. And so in the graph on the left, the residuals for iconicity model have been plotted and they look pretty good. Note that the shape of this thing is is Bell curve shaped. It's pretty good. It is actually easier to kind of graphically explore the normality assumption via quantile quantile plot or QQ plot. You see an example here in the middle of the slide: this is a QQ plot. So when the sample quantiles assemble around a straight line, like there, you know that the residuals conform with the normal distribution. The function that I usually use in R to look at that (the qqPlot() function, and plot is with a capital) actually also gives you kind of dashed lines around that as a as a confidence interval, and as long as the individual data points fall within that range, so kind of in between those dashed lines, you know that you're fine; that the residuals are normally distributed. Now, according to the constant variance assumption, the error should be equal across the fitted values, right? And that you can investigate in a residual plot. So in the green box here, on the right. So this plots the residuals on the Y axis over here, and fitted values on the X axis over there. If the constant variance assumption is satisfied, the spread of the residuals should be approximately equal across the range of fitted values. So if you move your eyes from this side to this side, this spread should be equal. So the residual plot should basically look like a big block. And this is the case for this model this iconicity model. So maybe they funnel out a little bit towards the higher fitted values, but there is really no drastic violation of the constant variance assumption. Now you might find it a bit disconcerting that the assumptions are assessed visually and there are other options, such as the Shapiro Wilk test of Normality, and we will look at some of those in the labs as well,but it is really important to use graphs as well because they they tell you more about your model and about your data. For example, the residuals may reveal a hidden non- linearity in the data or they might reveal some extreme values that need looking at. So here we have some examples of what 'good' residual plots look like. Mostly they look rather like clouds of random dots, and even though there are some apparent patterns, they just results from from a chance process. Now let's compare that to some 'bad' residual plots over here. And indeed, you can clearly see. these plots show non-constant variance. In other words, they violate that assumption, these are all examples of heteroscedastic data rather than homoscedastic. And as you move along the X axis along the fitted values from left to right the residuals progressively fan out. So this is an example of 1 type of pattern that might occur. OK, now I already mentioned that when doing multiple regression, you also need to check for collinearity. Now, collinearity describes a situation where one predictor can be predicted by other predictors. So it arises from highly correlated predictors. And then it makes regression models harder to interpret. So on the right we have an example of no collinearity. If we think of y; In the middle, we have to circle that represents y (our dependent or outcome variable), and here we have each of our individual predictors. And as you can see in this figure, they each describe part of the variation in in Y, but they themselves don't overlap, so this is a scenario where there is no collinearity. This is a scenario where you can see that some of the variance that is described by this X3 (the blue bit), overlaps with variance that is described by X2 our second predictor. So it's kind of that bit (the purple area) that is described by both. And similarly this orange area here, that kind of reflects the variance that is shared between the Y variable, the X1 and X2. That is, it is a kind of conceptual representation of a moderate collinearity. Now. Here, you have an example of extreme collinearity. So here you can see that X1 and X2 almost overlap in the extent to which they describe the variance in Y. Now we can demonstrate collinearity using some simulated data. Here, this gives us some simulated data and now we create a variable that is very similar. So. First, we've created the next variable and just told R: Please give me 50 random numbers. So if we put that into a regression formula and we say OK in this simulated ('made up data set') the regression line is going to be described by this. We have an intercept of 10 and we have here our X of random numbers, and weight for that is 3 and we add an error term of another 50 random numbers. Now, why do we use simulated data? Because it is sometimes really helpful to know exactly what should be in the data and then see what models look like to when, when, when you fit a model to it. What does the model pull out? Because you know what should be in there, you can demonstrate some things quite nicely. OK, so that is our initial model and then we create a second variable X2. And we basically tell it OK, X2 is X1. So X2 is same as X1, but we can just change one value. So that would create something like that on the right where X1 and X2 are extremely similar. And indeed, if you look at the correlation between the two, it is very very high (point 98). OK, so, that's what we what we made up. And now we're going to fit a model to it and see what So here we have a model with just the X predictor and you can see that it tells us indeed that the intercept is about 10 and the weight the beta for X is 2.8. Now we know that it is 3, this is pretty close, you might think 'Why is it not exactly 3?' That is because we've added this error term. So it's it can't estimate it that precisely, but overall it's pretty accurate. So that's the model with one predictor. If we then, use X2 to predict Y. So again, this is just one predictor in here, instead of X1 we now have X2. We see that it predicts it pretty well too, right and given the setup; given how we have set these things up, it comes as no surprise that X2 also predicts Y. Just as I was the case with X1, right? OK, now it gets interesting, with regard to collinearity. So let's see what happens when we enter both X and X2 together into the same model. Now, this is what you get, so we have our intercept, It's still about 10, that's you know, close to that. Then what happens to slope for X2. So here you can see that the slope for X2 has changed dramatically. If it's the only predictor in the model, it's 2.7 and now it's minus .43. It's negative, even though the data has been set up so that X2 and Y are positively correlated! So that that is an example of what will happen to coefficients when you're dealing with strong collinearity in a model. So the coefficients can change dramatically depending on which predictors are in the model. So to assess whether you have to worry about collinearity in your analysis, you can use what are called variance inflation factors. And they measure the degree to which one predictor can be accounted for by other predictors. So the VIF function from the car package in R can be used to to compute variance inflation factors and There are various ideas about how big a VIF value is deemed problematic, so Zuur et al. are a little bit more conservative and say OK, any VIF value over three or four, you should look at that. Another suggest that you need to worry about anything over 10. I tend to kind of go with that more often. So for the iconicity model, an all VIF values are close to 1, which is good. But this is the situation for the the model of the simulated data that we looked at. So if you have both the X and the X2 in the in the model of the simulated data, the VIF values are going through the roof basically: correctly identifying strong collinearity. And here the VIF values for the Iconicity model. You can see they're all pretty close to one. So that's all fine. OK, so what what do you do if you find that you are dealing with a situation with strong collinearity? Now there's several, you've got several options. The first one is that you remove one of the predictors with a high VIR, and you need to really use your understanding, your expertise, your knowledge of the subject to decide, and also to justify which one you you remove. A second solution, can be to collect more data as that will basically allow you to estimate the regression coefficients more precisely, which should then help or kind of decrease the amount of collinearity. You can also use another approach than regression. So random forests is an example of that. Or, first do a principle component analysis to combine the predictor variables before you do regression. So principle component analysis identifies how strongly correlated predictors can be combined in an inappropriate way, and then they give you one new variable that you can then use in your regression analysis. It's also really important to think about this issue actually at the planning stage of your study and kind of make theoretically motivated choices as to which one of possibly correlated measures you want to include. OK, finally yeah, we need to talk about R-squared and adjusted R ^2. So, using the glance function from the broom package, if you run that on your on your model summary, it shows the model summary output and in this case, if you do that for the iconicity model, the adjusted R-squared and the, sorry, adjusted R-squared here, and the R-squared here, they are very close together, which suggests there is no problem with with overfitting. Um, So what What adjusted R-squared does is that, like like R^2, it measures how much of the variance in the outcome variable is described by all the predictors in the model together, right? So that's R squared. Adjusted R-squared takes the number of predictors in the model into account because R-squared it is the situation that, the more variables you add, the better it gets, the more variance it describes. But that is kind of cheating, in the sense that if you have more more variables, yes, of course you will describe it more if you've got more and more predictor variables, of course you will describe more variance in the outcome variable, but you might get you to a situation where it is highly specific to that particular data set rather than more generic and generalizable to other datasets as well. So adjusted R-squared takes the number of predictors in the model into account, and so in the case of multiple regression, you should always report the adjusted R ^2. And as you can see here for the iconicity model, adjusted R-squared and R-squared are very similar, which suggests that there is an appropriate number of variables in the model and you haven't been overfitting your data with the model. OK, in summary, we spoke about the regression line with multiple predictors and then about why you need to think about standardized coefficients, and they make the predictors more compareable by converting them to standard using standard units. So it really helps with interpreting the coefficients. Then we spoke about the assumptions, so normality and homoscedasticity of the residuals and then in addition, in multiple regression you need to check for collinearity. And finally we talked about R- squared and adjusted R-squared as a measure of an the overall variance that's explained by the by the model. Thank you very much.