Hi, it's Margriet. So last week we looked at logistic regression and logistic regression is an example of a generalized linear model that you can use if your outcome variable is categorical. Now we specifically looked at logistic regression in the context of a binomial outcome variable that is, a two level variable such as correct versus incorrect, or looking to the left versus looking to the right. Now this week we will look at Poisson progression. And that is another type of generalized linear model that is particularly useful for count data. OK, so what is count data? Now, many researchers in psychology and behavioral scientists more generally have theoretical questions that involve count variables. Now an account variable is a variable that takes on discrete values reflecting the number of occurrences of a particular event in a fixed period of time. So some sort of event might happen none of the time, once, twice, three times, four times, etc. Now, accountable can only take on a positive integer values or zero, because an event cannot occur, a negative number of times. Better some examples on the slide here. So for instance, the number of depressive symptoms that are child exhibits or the number of alcoholic drinks consumed per day, or the number of re-admissions to alcohol detoxification programs. Or the number of disciplinary incidents among a group of prison inmates or the number of fillers such as ur and awe as a function of politeness context. If you want kind of a linguistic example. The the figure on the slide actually shows and artificial data set where speech error counts are related to blood alcohol concentration, and in the last in last week's lecture, these speech errors were treated in a binary fashion. So present or absent and here they are treated as a count variable. OK, so in this figure you can see the Poisson distribution and for two, representative parameters. So the height of the bars indicates the probability of particular cans. And the present distribution only really has one parameter Lambda. And that specifies the rate of the counter process. So if Lambda is high, then the rate of observing an event such as speech errors is high. And so you can see that here Lambda is 5, then the rate is higher than for a Lambda that is 2. The average rate is smaller, right? Now, the Poisson distribution is bounded by zero because, as we already said, counts cannot be negative, right? So this is the lowest value that you could ever get 0. The distribution is discrete, right? So we only have positive integers. And. So it was zero one or two and you can't have like 1 1/2. Or anything in between. Now, another peculiar property of the Poisson distribution is that the variance, the spread, of the distribution is linked to the Lambda. It's kind of married to the Lambda, and this is very different from the normal distribution. In a normal distribution, you have a standard deviation, sigma, as an independent parameter that needs to be estimated, right? So you can see in the figure that the distribution for Lambda that is 5, so the dark grey lines here, is wider than the distribution for Lambda is two, so this is more scrunched up basically. So for low low rates the variance is slow because the distribution is bounded by zero and with no way of extending beyond that boundary. And for high rates the distribution can extend in both directions towards the lower and the higher counts. You can think of this as heteroscedasticity. So that's so does unequal variance as being build into this distribution. OK, so Poisson regression models the parameter parameter Lambda as a function of some predictors. That the problem is that our familiar equation, can predict any value, right? So, our familiar linear predictor equation, here, can predict any value. Uhm? But Lambda can only be positive. Now that means that we need a function that restricts the output of our linear predictor to positive values. And the function that achieves this is the exponential function. So in the figure on the slide, you see a linear relation between X&Y on the left. And when you transform the Y with the exponential function, the values of Y become restricted to the positive range, that's on the right. And the dashed line shows that when zero is exponentially transformed, it becomes one. So here we have our familiar linear predictor equation. And around that you have this exponential function. So wrapping the exponential function around the linear predictor of beta 0 plus beta 1 * X will ensure that no negative values can be predicted. Now you might remember from last week that the logarithm is the inverse of the exponential function. So if we take the logarithm of both sides of this equation and it becomes this. So we've taken the logarithm of the right side of the equation. Now that kind of undo's the exponential, so that's why you can't see it here. But that means we also have to take the logarithm of the lambda. So here we have the logarithm of the lambda. So, the familiar equation of beta 0 plus beta 1 * X, that familiar equation will actually predict log lambda. Now, similar to logistic regression, that means that we'll have to be careful in interpreting the output. Now let's look at an example. In this example, we'll look at the data from somebody called Nettle, who looked at linguistic diversity. So the number of languages that exist in a particular country. Nettle's hypothesis was that countries with low ecological risk, so more fertile environments, are predicted to have higher linguistic diversity - so more languages. Now on this slide you can see data for five variables for 74 countries. So population and area contain the log 10 of population size and area of the country. Then we have MGS or mean growing season and that is the measure of ecological risk, that is our predictor. That variable indicates how many months per year one can grow crops in a country. Then we have Langs and that is the number of languages within a country. Now, as always, it is a good idea to get a feel for the data set. So by for example, checking the range of values for the MGS variable, - our predictor. And if we do that by using the range function as, you can see on the slide, it shows us that there are some countries in which one cannot grow crops at all. They have zero months and others where you can grow crops the entire year, so 12 months. So the countries Guyana, the Solomon Islands, Suriname and Vanuatu have mean growing seasons of 12, indicating kind of minimal ecological risk. And on the other hand, countries like Yemen and Oman have an MGS of zero, which means maximal ecological risk. You can already see that the number of languages in those countries, is quite a bit lower, which suggests that Nettle's hypothesis might actually be correct. OK, now to model linguistic diversity as a function of ecological risk with Poisson regression, we use that GLM function. The generalised linear model function, so the same one as we used last week for logistic regression. So, as was the case for logistic regression, the argument family needs to be specified. In this case it is Poisson, not binomial. So, here we have the estimates of our model. These represent logarithms, so we must use exponentiation to report the predicted mean rates. If we do that, Here we have the logarithm for each of the 12 months. So or each of the 13 values for MGS (so 0, 1, 2, 3 months, four months, five months, etc.) we've got the intercept and the slope So for each of the values of MGS, we have kind of a predicted value. And now you need to take the exponential of those to be able to say something about the rates. That's what those logarithms represent. Based on that, we expect a country with zero months MGS, so that's the first location, zero months, to have about 30 languages and a country with an MGS of 12, we would expect a value of 166 languages. Now in this figure you can see the predicted Rates on a predicted number of languages as a function of MGS. So we have 'number of languages' on the Y axis, and MGS on the X axis. And instead of points, we have told GGplot to use the names of the countries. And this thick line is the Poisson regression fit. OK, there's something else that we need to take into account here. So, in order for the regression of languages as a function of mean growing season to be meaningful, you need to control for country size. Because larger countries tend to have a higher number of different languages. So, in this case a country's area determines what in Poisson regression is called 'exposure'. More area means more opportunities for observing high counts. So you can adjust a rate by an exposure variable, which in this case is area, but in another model it could be something like time. So for example, if you were to conduct an experiment where you're counting speech errors in trials with varying durations, there are naturally going to be higher counts for longer trials, so you would want to control for trial duration, so trial duration would be the exposure variable in that context. Now, for exposure variables the rate lambda is split into 2 components. Those are the mean number of events, mu, per unit of exposure, tau. So for example, here we would have... So here we have a lambda, here we have mu, so that is the mean number of events. And here we have tau, the exposure variable. So we could have the mean number of languages per square mile. Or to take the speech error example: the number of speech errors, per second. Now in R this is easy to do. You simply wrap the offset function, here, around the exposure variable of interest. Let's look at this here. We have the model that we used previously, right, the number of languages as a function of the mean growing season. I'm using the Poisson regression. Now what we're doing now, to take this exposure variable area into account, we add it to the model by saying: offset, brackets, area. Now, the first thing to notice is that there is no beta for the exposure variable in the output or anything. So the output in terms of the terms present is the same. So the exposure variable, the term doesn't have a coefficient and nothing is actually being estimated for it, right? But what you can see is that compared to the original model... So this is the original model and this is the new model that takes the exposure variable into account. So compared to the original model, the slope of the MGS variable, here, has increased considerably, right? So after controlling for a country size, the relationship between ecological risk and linguistic diversity is estimated to be even stronger. So here we have .21 rather than .14. OK, here's another thing that we need to talk about in the context of Poisson regression, and that is overdispersion. So as we saw at the start of this lecture that the variance of the Poisson distribution scales with the mean. So the higher the mean rate, the more variable the count. Now it is possible that in an actual data set, the variance is larger than it's theoretically expected for a given Lambda. Now, if that happens, you're dealing with what is called overdispersion or excess variance. Now you can compensate for overdispersion by using a variant of Poisson regression that is called negative binomial regression. Now, negative binomial regression is essentially a variant of Poisson regression where the variance is kind of freed from the mean. So in other words, the constraints that the mean is equal to the variance is relaxed. Other than that, everything stays the same. So let's re-fit the model that we've used previously, so with the exposure variable. So let's refit that, but this time use a negative binomial regression. Now for this you we wanted to use the glm.nb for negative binomial function. So here we have exactly the same model, so this was a model with the exposure variable, right? So here we use the GLM to model language as a function of MGS plus ise the offset function to take our exposure variable into account. Here we'll be doing exactly the same thing, except that we use glm.nb The glm.nb function is from the mass package. That's why we have to call that package here. Now, first thing to notice is that the standard error, so here the standard error of our estimate, so the standard error of the MGS slope, has increased quite a bit. compared to the Poisson model. So here we had a standard error of .0047, and if we use the negative binomial regression, we have a standard error of .034. So that's substantially larger than for the Poisson model. So negative binomial models are generally the more conservative option if there actually is overdispersion in the data. Now you can test whether there is a significant degree of overdispersion using what is called an overdispersion test. Now in one implementation of this in R is the odTest. Here, the odTest function, from the PSCL package. Now what this function does is that it performs a likelihood ratio test comparing the likelihood of the negative binomial model against the likelihood of the corresponding Poisson model. So if you run that on on our data here, you would see that in this case the difference in likelihood here between the two models significant. So here we have a P value that is way smaller than .05. So that indicates that you should use a negative binomial model rather than a Poisson model. OK, now. This figure summarizes the different aspects of the generalized linear model framework that we have been speaking about last week and this week well and before actually. Now there is one bit that you haven't been exposed to so far, and that is the I function wrapped around the predictor of linear regression. So this bit should look familiar by now. I'm sure it does: Beta 0 plus beta 1 * X. That is our linear predictor. Now here for linear regression, all of a sudden we have is I function wrapped around it. What is it? This is called the identity function. And that is just a mathematical term for a function that preserves identity. Or in other words, this function does nothing. It can be translated as: take this predictor, 'as is', without transformations. Now you might ask why on Earth do you add it if it does nothing? The reason for adding it is that in the figure in this way shows that the parallelism between the different types of model. So you can see how all of these different models that we've looked at are really kind of versions of the same thing. So it allows you to see that linear regression is actually a specific case of the generalized linear model. So that is, a general generalized linear model where the output of the predictive equation is not transformed. Whereas for logistic regression, it is transformed and for the Poisson regression it is transformed in a different way. Now I'm showing you this to highlight that each generalized linear model has three components. So we have: a distribution for the data generating process. We have the normal distribution or the Bernoulli distribution, or the Poisson distribution. We have a predictive equation or what is called the linear predictor, here, and that is the same in each case. And then we have a link function. Which links the linear predictor to the parameter of interest. So that function, (so here and here and here) makes sure that the linear predictor predicts sensible values for each parameter, right? So the link function makes sure that, for logistic regression the values fall between zero and one, the values of (the probability). Or in case of Poisson regression, the link function makes sure that values of the predictor are only positive so we only get positive values for Lambda. Now, maybe somewhat confusingly, these link functions are named after their inverses, right? So logistic regression uses the logit to log odd link and Poison regression uses the log link. And it is because of these link functions, that you need to do these transformations, right? So for logistic regression, it returns a log. It returns log odds predictions and you need the logistic function to transform those to probabilities. For the Poisson regression, it returns log predictions and you needs the exponential function to transform those log predictions into rates. OK, in summary. Today we have spoken about how to model count data with Poisson regression and its extension negative binomial regression. Coefficients of a Poisson model are shown as log coefficients, which means that after calculating the log predictions, you need to use exponentiation to interpret your model in terms of average rates. Now to control for differential exposure you can add exposure variables to the model. And negative binomial regression: we use that to account for something called overdispersion. Finally, we talked about the fact that each generalized linear model has three components, a distribution for the data generating process, a linear predictor, and a link function. That is it for now. Thank you very much for your attention.