4. Customisation of graphs, and z-scores

Written by Tom Beesley (2025)

Week 4 Lecture

Watch Part 1

Watch Part 2

Watch Lecture Part 3

Watch Lecture Part 4

Download the lecture slides here

Pre-lab work

  • Complete materials from sessions in previous week. Consolidate what we have already covered.

  • This week - there’s a new learnr tutorial to follow and help prep for what we are covering: You can find it here.

Make sure you download the zip file for the RStudio tasks.

Create a new folder and upload the file into RStudio, ideally before the lab class.

R Studio tasks

Last week we introduced two different ways to get descriptive information about a numerical variable (column) as a function of a categorical variable. We discussed how this was extremely powerful, as we can look at the effect of an IV on the DV, which gets us a long way in the analysis of our psychological experiments!

In the labs students were showing great skills in utilising each of these tools:

aggregate(x = DV, by = list(IV), FUN = mean)

and

dataframe %>% group_by(IV) %>% summarise(label_header = mean(DV))

This week we’ll first look at some more features of our powerful graphing tool, ggplot().

Customisation of data plots

Step 1. Set up a folder for Week 4 and set the working directory.

Step 2. Bring the week_4_2025.zip file into R Studio server. Like last week, upload the zip file. Launch the week_4 R script. This will give you 3 files.

If you’ve done Step 1 & 2 already as a lab preparation, super! Pat yourself on the back, skip these steps, and move on…

Step 3. Once again, we’re gong to be using several commands from the tidyverse library (the pipe operator is one example) so we need to ensure that it’s active. Run the command

library(tidyverse)

Step 4. Read in the “wages.csv” data. There’s already a read_csv() script line for this, you just need to change the file name and the object name (like we’ve done in previous weeks).

Step 5. Draw a graph of the screen time data, with the phone type on the x axis and the usage data on the y axis. You’ve done this last week, so it should be straightforward.

Step 6. Customize you graph work. We’ve provided some suggestions about adding titles and labels for your graph. Edit and play with the script lines to make them useful to you and to understand how they work. Note how we are using the piping command, %>%, to send the data into the ggplot() commands. When we build up layers of the graph we + each layer to the graph (we’ll learn more about this in future weeks).

  • Try to change the text, the colours, and so on of the graphs.
  • Add comments for yourself about what the different commands do. The idea is to learn by trying different things out (changing values, taking out elements of the command, putting other is) and record for yourself.

Step 7. Let’s add a boxplot over the top of our violin plots. The code for a boxplot is geom_boxplot() and remember that you’ll need to add a + to the line before to add this new layer.

Step 8. Let’s use the next block of code to draw a simple histogram of the salary estimates. You’ll just need to add the object name to do this.

Step 9. We’ll now use a new tool to split the data up by the “family_position” variable. This technique is called “faceting”. facet_wrap() makes a long ribbon of panels according to the variable(s) you specify. This is useful if you have a single variable with many levels and want to arrange the plots in a more space efficient manner.

# remember to add a + to the end of the graph command
facet_wrap(vars(family_position))

You should have a plot with 4 different graphs, each showing a histogram for each value in the family position variable.

Step 10. We’ve included the screentime.csv data again this week and given you some code to work with. Try out some of the techniques you’ve learnt in this week’s tutorial as well as practicing the technqiues you’ve learnt in previous weeks. You’ll see we’ve included a new type of plot: geom_density() which is another great way to plot distributions of data. As always, make sure you add comments to your ode.

z-scores

Hint / Reminder: Sketch a normal (z score) distribution and mark the mean/mode, and mark off the relevant parts of the question so you know what you are trying to achieve and how to interpret any calculations you make.

Hint/ Guide 2. For questions 6 & 7, typically in psychology we use the 5% level as a cutoff to decide, in broadly described terms, whether something is extreme or unlikely vs. at least somewhat plausible or likely.

z-scores basics

z-score distributions

Q1. What is the relationship between the sign of a z-score and its position in a distribution?

Q2. If a distribution has a mean of 100 and a standard deviation of 10, what is the raw score equivalent to a z-score of 1.96?

Q3. If a distribution has a mean of 157 and a standard deviation of 19, what is the raw score equivalent to a z-score of 1?

Using z-score tables

Q4. What proportion of scores lie between the mean and a z-score of 0.5?

Q5. What is the combined proportion of scores lying between z=-1.2 and z=.85?

Applying z-scores to inferential problems

Q6. A Neuropsychologist has presented a test of face recognition to 200 neurotypical participants and finds that the scores are normally distributed with a mean of 85 and the standard deviation of 12. Two brain-damaged patients are also given the test. The one with right hemisphere brain damage scored 58 and the one with left hemisphere damage scored 67.

  1. What is the z score of the right hemisphere patient when compared to the neurotypical group?

  2. What proportion of neurotypical participants score lower than this patient?

  3. Is this patient likely to belong to the population of neurotypical participants? (justify your answer)

  4. What is the z score of the left hemisphere patient when compared to the neurotypical group?

  5. What proportion of neurotypical participants score lower than this patient?

  6. Is this patient likely to belong to the population of neurotypical participants? (justify your answer)

Final z-score challenge

Come back to this afterwards for some extra practice if you want:

Q7. Tom Bunion has completed a huge research study and measured the foot size of men and women and found each to be normally distributed. The men have a mean size of 55 with a standard deviation of 5 and the women a mean of 33 and a standard deviation of 5. Joanna Toes has foolishly measured two individuals but forgotten to note their gender. These have foot sizes of 37 and 47. To which gender is each more likely to belong? What evidence is there for this?

RStudio and stats humour

For a bit of fun… The following are parody music videos about stats. Now that you have a few weeks’ experience with R Studio and also, an introduction to hypothesis testing, you might appreciate the following

The R Inferno Song (Teenage Dirtbag Parody) filmed largely on campus at Maynooth University, Ireland with stats students and staff:

Hypothesis testing and p values (plus bunny rabbits and a dog)

Extra external R resources

Some students have asked for a pointer to additional R resources so they can structure some time exploring the R system. There are lots, but this is good and very compatible with the teaching we provide: R for data science

Back to top