Hypotheses, associations

Rob Davies

Department of Psychology, Lancaster University

2024-02-19

PSYC122: Classes in weeks 16-20

  • My name is Dr Rob Davies, I am an expert in communication, individual differences, and methods

Tip

Ask me anything:

  • questions during class in person or anonymously through slido;
  • all other questions on the discussion forum

Weeks 16-20

  • Introduction: our objectives, our methods and the benefits to you

Objectives: 2. Strengthen your practice and build your independence

  • In PSYC121 and PSYC122, you have learned about working with data
  • In PSYC122, so far: you have learned about correlations and linear models
  • Our job now is to deepen and broaden your skills

Picture shows a group  of climbers on a snow field, standing near some rocks. In the background, there is a mountain peak and blue cloudless skies.

flickr, Magryciak ‘Great weekend’

Objectives: 3. Show you how to join the credibility revolution

  • We have taught you about a revolution
  1. Old ways: questionable research, closed practices
  2. New ways: research integrity, open science
  • Our job now is to show you how to join in as critical thinkers

Picture shows a person with long hair boarding a train. They are holding a sign with the word 'revolution' and a rainbow painted on it.

flickr, Cesar Salvadeo ‘Revolution’

What we are going to do

  • Now, we will put the ideas into practice
  • In the context of a live investigation
  • We will work together for real world impact

Picture shows a sign 'because we can' drawn in light in front of a space with white garage doors to left and right, and houses behind.

flickr, Ben Matthews ‘Because we can’

Our mission: to make the world a bit better

Picture shows a gallery or picture space, with people wearing masks and old fresco pictures on the walls. Some of the people are looking towards the camera.

Our approach: Concepts, skills, levels

  • Each week, we focus on building concepts and skills
  • In conceptual work, we aim for deep and broad understanding of what you do and why
  • In practical work, we introduce, consolidate, or extend within a single problem set to grow your independence

The new idea: data analysis in context

  • Traditionally, psychologists have to teach a different procedure each week: a-test-a-week
  • limited discussion of theory or measurement,
  • focus on rules about doing and reporting null hypothesis significance tests
  • but:
  1. This risks student (researcher) focus on doing the significance test (when, what, how)
  2. While the real challenges are located in figuring out what we want to find out, measure, and explain

Targets for weeks 16-19: Concepts

We are working together to develop concepts:

  1. Week 16 — Hypotheses, measurement and associations
  2. Week 17 — Predicting people using linear models
  3. Week 18 — Everything is some kind of linear model
  4. Week 19 — The real challenge in psychological science

Targets for weeks 16-19: Concepts

  • The real challenges we face as psychologists: our diversity
  • We examine the impact of diversity
  • And we explore how far we can ever reproduce or generalize our findings

Picture shows a crowd of people seen from above with a variety of colours of clothing.

flickr, Cat Walker ‘crowd’

Targets for weeks 16-19: Skills

We are working together to develop skills:

  1. Week 16 — Visualizing, estimating, and reporting associations
  2. Week 17 — Using data to predict people
  3. Week 18 — Going deeper on linear models
  4. Week 19 — Evaluating evidence across multiple studies

Targets for weeks 16-19: Skills

  • We revisit some ideas in new ways
  • Students – like all people – are diverse
  • We build visual and verbal ways of thinking

The benefits: critical thinking and you

Important

By end of first year, most students learn to code in R:

  • The real challenge comes – in the second and third years – when you have to show that you can critically evaluate evidence
  • To get a B+ or an A you will need to show critical reflection
  • Our work here will build your ability to do this

Statistical rituals largely eliminate critical thinking

  • Traditionally, students learn statistical tests, and learn to identify if a test statistic is significant or not
  • If we do not also talk about what is actually observed, and whether or how it is or is not compatible with theory-based predictions then we do ritual not science (Gigerenzer, 2004)
  • This is a problem: the focus on significance allows us to build or accommodate vague theories that can never be wrong

Open, reproducible, methods are not enough

  • Now: we need to think causally about predictions and measurement
  1. We need better theory so we can build clear testable predictions from explicit assumptions
  2. We need better measurement because if we cannot reliably measure something then it is hard to build a theory about it

We need to think about the derivation chain

Q cluster_R nd_1_l Concept formation nd_1_r Causal model nd_2_l Measurement nd_1_l->nd_2_l nd_3 Statistical predictions nd_2_l->nd_3 nd_2_r Auxiliary assumptions nd_2_r->nd_3 nd_4 Testing hypotheses nd_3->nd_4
Figure 1: The derivation chain

Here’s a toolkit for thinking productively about your hypotheses

The derivation chain (Meehl, 1990; Scheel et al., 2021)

  1. Develop your theory: the concepts, and the assumptions about causality
  2. Specify how psychological concepts will be measured
  3. Identify auxiliary assumptions about how we get from theoretical concepts to observable data
  4. Identify theoretical predictions
  5. Link theoretical predictions to specific statistical tests that may support or contradict them

Valid measures

  • We often teach and learn about different kinds of validity but the key idea is simple (Borsboom et al., 2004):

    a test is valid for measuring an attribute if and only if (a) the attribute exists and (b) variations in the attribute causally produce variations in the outcomes of the measurement procedure

  • We want to work with valid measures but validity requires explaining: (Q.1) Does the thing exist in the world? (Q.2) Is variation in that thing be reflected in variation in our measurement?

Summary: our critical thinking checklist

  • What is our (causal) theory?
  • What measures are we using, why?
  • What is our specific prediction, why?
  • Does the prediction relate to sign and to magnitude?
  • What analysis can test this prediction, why?
  • How will our results affect our beliefs, why?

Let’s take a break

  • End of part 1

Learning targets for this week:

  • Concepts: begin learning to think critically
  • Skills: identify how we build hypotheses

Case study: the health comprehension project

  • Because the real challenge concerns how psychologists ask and answer questions
  • We will work in the context of a live research project: What makes it easy or difficult to understand written health information?

flickr: Sasin Tipchair 'Senior woman in wheelchair talking to a nurse in a hospital'

flickr, Sasin Tipchair ‘Senior woman in wheelchair talking to a nurse in a hospital’

Why this? We don’t really know what makes it easy or difficult to understand advice about health

flickr: WendyHarris1955 'COVID-19 Antibody test' packs and information leaflets

Health comprehension project: impacts

  • We are working to improve health communication
  • With partners at Vienna Business University, Kantar Public, and the London School of Economics
  • Our results could change: business and health communication; understanding reading development

Health comprehension project: questions and analyses

  • Our research questions are:

Note

  1. What person attributes predict success in understanding?
  2. Can people accurately evaluate whether they correctly understand written health information?
  • These kinds of research questions can be answered using methods like correlation, linear models

Health comprehension project – relevance: methods you will use in your professional work

  • We collect data using online Qualtrics questionnaire surveys
  • We test people on a range of dimensions using standardized ability and our own knowledge tests
  • Many of you will go on to work with online surveys, and with data from standardized ability measures

You can get involved

  • You can – if you choose – get involved
  • Complete the survey and contribute your responses
  • Forthcoming on PEP: be a named co-author assisting in the development of preprint and repository to share our data

Extract from Qualtrics survey showing a sample written health information text extract, a multiple choice question probing understanding of the information in the extract, and a rating scale allowing participants to self-evaluate their understanding

Extract from Qualtrics survey

Health comprehension project: why it is a case study

  • The health project has strengths and limitations
  • Watch how to identify and critically evaluate this project
  • So you can do the same for your work

Extract from Qualtrics survey showing a sample written health information text extract, a multiple choice question probing understanding of the information in the extract, and a rating scale allowing participants to self-evaluate their understanding

Extract from Qualtrics survey

Cognitive process theory of comprehension success

  • When skilled adult readers read and try to understand written text (Kintsch, 1994)
  • They must recognize and access the meanings of words
  • Then use knowledge and reasoning to build an interpretation of what is in the text
  • Based on connecting the information in the text with what they already know

Individual differences theory of comprehension success

  • Successfully understanding text depends on (1.) language experience and (2.) reasoning ability (Freed et al., 2017)
Q nd_1_l Language experience nd_2 Comprehension outcome nd_1_l->nd_2 nd_1_r Reasoning capacity nd_1_r->nd_2
Figure 2: Factors influencing comprehension success

Where the data come from: our measures

  • We measure reading comprehension: asking people to read text and then answer multiple choice questions
  • We measure background knowledge: vocabulary knowledge (Shipley); health literacy (HLVA)
  • We ask people to rate their own understanding of each text

Example critical evaluation questions

  • Are multiple choice questions good ways to probe understanding? – What alternatives are there?
  • Are tests like the Shipley good measures of language knowledge? – What do we miss?
  • Can a person accurately evaluate their own understanding? – Can we rely on subjective judgments?

Relevance to you

Tip

  • Even very good students sometimes do not question the validity of measures:
  • Not asking questions like this has a real impact on the value of the interpretation of results
  • Here, we are looking ahead to the critical thinking you will need to show in your second and third year essays

Let’s take a break

  • End of part 2

Learning targets for this week:

  • Concepts – associations: correlations, estimates and hypothesis tests
  • Skills – visualizing variation and covariation
  • Skills – writing the code
  • Skills – estimating correlations
  • Skills – interpreting and reporting correlations

Talking about the relationships between variables

  • Psychologists and people who work in related fields often want to know about associations
  • Is variation in observed values on one dimension (e.g., comprehension) related to variation in another dimension (e.g., vocabulary)?
  • Do values on both dimensions vary together?

The language in this area can vary: we will be consistent but you need to be aware of the different terms

  • Outcome \(=\) response \(=\) criterion \(=\) dependent variable
  • Predictor \(=\) covariate \(=\) independent variable \(=\) factor
  • Linear model \(=\) regression analysis \(=\) regression model \(=\) multiple regression

Let’s look at the data we will use

  • The person in row 1 has ETHNICITY White, is AGE 34 years, scored 33 on Shipley vocabulary, scored 7 on HLVA health literacy
  • and, on average, self-rated their understanding of health information as 7.96 (so 8/9, mean.self)
  • while scoring 0.49 accuracy in tests of understanding (49% mean.acc)
# A tibble: 4 × 6
  mean.acc mean.self  HLVA SHIPLEY   AGE ETHNICITY
     <dbl>     <dbl> <dbl>   <dbl> <dbl> <fct>    
1     0.49      7.96     7      33    34 White    
2     0.85      7.28     7      33    25 White    
3     0.82      7.36     8      40    43 White    
4     0.94      7.88    11      33    46 White    

Destination correlation: where the correlation number comes from

Covariance

\[COV_{xy} = \frac{\sum(x - \bar{x})(y - \bar{y})}{n -1}\]

  • If we want to estimate the correlation between two sets of numbers: \(x\) and \(y\)
  • We want to know if variation in \(x\) (given by \(x - \bar{x}\))
  • Varies together with variation in \(y\) (given by \(y - \bar{y}\))

Destination correlation: where the correlation number comes from

Covariance divided by standard deviations

\[r = \frac{COV_{xy}}{s_xs_y}\]

  • Because the two sets of numbers can be on different scales: e.g., SHIPLEY out of 40; mean.acc (proportion, out of 1)
  • And because covariance values depend on the scales
  • To make correlations easier to compare, we remove scaling by dividing by the standard deviations of the variables

Let’s think about an example

Note

Research question: Can people accurately evaluate whether they correctly understand written health information?

  • Measurement: Someone with higher scores on tested accuracy of understanding will also present higher scores on their ratings of their own understanding
  • Statistical prediction: We predict that mean.acc and mean.self scores will be associated
  • Test: If the prediction is correct, mean.acc and mean.self scores will be correlated

Distributions: Let’s see what this means – how do scores vary?

There are two histograms, shown side by side: The 'mean accuracy' histogram shows how 'mean accuracy' scores vary between about 0.3 and 1.0, with a peak, indicated by a vertical red line, around .8; the 'mean self-rated accuracy' histogram shows how 'mean accuracy' scores vary between about 2.5 and 9.0, with a peak, indicated by a vertical red line, around 7.

Histograms showing the distribution of mean accuracy and mean self-rated accuracy scores in the ‘clearly.one.subjects’ dataset: means calculated for each participant over all their responses

A histogram is a useful way to show the distribution of values

  • We have a sample of accuracy scores:
  • Mean accuracy scores vary between 0.0 and 1.0
  • We draw the plot by grouping together similar values in bins
  • Heights of bars represent numbers of cases with similar values in same bin

The 'mean accuracy' histogram shows how 'mean accuracy' scores vary between about 0.3 and 1.0, with a peak, indicated by a vertical red line, around .8.

Distribution of mean accuracy

When we talk about variance we are talking about how values vary in relation to the mean for the sample

  • The average of these mean accuracy scores is marked with a red line where \(\bar{x} =\) 0.8
  • The accuracy score for the person in row 1 is located at \(x = .49\), marked in blue

The 'mean accuracy' histogram shows how 'mean accuracy' scores vary between about 0.3 and 1.0, with the average of mean accuracy scores, indicated by a vertical red line located near .8, and the score of the person in row 1 of the dataset, indicated by a blue line located at .49.

Distribution of mean accuracy

We are talking about how values vary in relation to the mean for the sample

  • In comparison, the mean accuracy score for the person in row 4 is located at \(x = .94\), marked in blue

The 'mean accuracy' histogram shows how 'mean accuracy' scores vary between about 0.3 and 1.0, with the average of mean accuracy scores, indicated by a vertical red line located near .8, and the score of the person in row 4 of the dataset, indicated by a blue line located at .94.

Distribution of mean accuracy

The basic question when we examine covariance: do values vary together?

  • If the person at row 1 has a mean.accuracy score of .49, lower than the average
  • And the person at row 4 has a mean.accuracy score of .94, higher than the average
  • What will their mean.self scores be: will they be higher or lower than the average mean.self score?

We can use scatterplots to examine associations

The figure shows two scatterplots, side by side: both plots show points where each point indicates the pair of scores corresponding to the 'mean accuracy' and the 'mean self-rated accuracy' recorded for each participant in the example data. The plot on the left orients the presentation with 'mean accuracy' on the y axis. The plot on the right orients the presentation with 'mean self-rated accuracy' on the y axis.

Scatterplots showing whether values on mean accuracy (mean.acc) vary together with values on mean self-rated accuracy (mean.self) for the participants in this sample

A scatterplot is a useful way to examine if the values of two or more variables vary together

  • Mean accuracy scores vary between 0.0 and 1.0
  • The height of each point shows the observed value of accuracy on the y-axis
  • Self-rated accuracy scores vary between 1 and 9
  • The horizontal position of each point shows the observed value of self-rated accuracy on the x-axis

The figure shows a scatterplot showing points where each point indicates the pair of scores corresponding to the 'mean accuracy' and the 'mean self-rated accuracy' recorded for each participant in the example data. The plot orients the presentation with 'mean accuracy' on the y axis.

Scatterplot showing how values on mean accuracy and mean self-rated accuracy vary together

A scatterplot is a useful way to examine if the values of two or more variables vary together

  • We have a sample of 170 people
  • For each person, we have a value for the mean accuracy and a paired value for the mean self-rated accuracy
  • Each point shows the paired data values for a person
  • In red: someone scored 3.48 on mean self-rated accuracy, 0.57 on mean accuracy

The figure shows a scatterplot showing points where each point indicates the pair of scores corresponding to the 'mean accuracy' and the 'mean self-rated accuracy' recorded for each participant in the example data. The plot orients the presentation with 'mean accuracy' on the y axis.

Scatterplot showing how values on mean accuracy and mean self-rated accuracy vary together

Let’s take a break

  • End of part 3

The R code for a correlation test, bit by bit

cor.test(clearly.one.subjects$mean.acc, 
         clearly.one.subjects$mean.self,
         method = "pearson")
  1. We specify the cor.test function, and name one variable clearly.one.subjects$mean.acc
  2. Then we name the second variable clearly.one.subjects$mean.self
  3. Last we specify the correlation method = "pearson" because we have a choice

Identifying the key information in the results from one correlation test

  • We look at the value of the correlation (here, cor) and the p-value
  • We can see that the correlation statistic is positive cor = .4863771 which we round to \(cor = .49\)
  • And p-value = 2.026e-11 indicating that the correlation is significant \(p < .001\)
cor.test(clearly.one.subjects$mean.acc, 
         clearly.one.subjects$mean.self, 
         method = "pearson")

    Pearson's product-moment correlation

data:  clearly.one.subjects$mean.acc and clearly.one.subjects$mean.self
t = 7.1936, df = 167, p-value = 2.026e-11
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.3619961 0.5937425
sample estimates:
      cor 
0.4863771 

Reporting a correlation

  • Usually, we report a correlation like this:

Mean accuracy and mean self-rated accuracy were significantly correlated (\(r (167) = .49, p < .001\)). Higher mean accuracy scores are associated with higher mean self-rated accuracy scores.

Interpreting correlations with the help of visualization

  • The correlation statistic is positive in sign and moderate in size, about \(r = .49\)
  • We can see that higher mean accuracy (mean.acc) scores are associated with higher mean self-rated accuracy (mean.self) scores

The figure shows a scatterplot showing points where each point indicates the pair of scores corresponding to the 'mean accuracy' and the 'mean self-rated accuracy' recorded for each participant in the example data. The plot orients the presentation with 'mean accuracy' on the y axis.

Scatterplot showing how values on mean accuracy and mean self-rated accuracy vary together

What will different kinds of correlations look like?

We can simulate data to demonstrate: (left) the correlation is positive, \(r = .5\); (right) the correlation is negative, \(r = -.5\)

The figure shows two scatterplots showing how simulated data values on mean accuracy and mean self-rated accuracy *could* vary together given positive or negative correlations. Each plot shows points, where each point indicates the pair of scores corresponding to the 'mean accuracy' and the 'mean self-rated accuracy' recorded for each participant in a simulated dataset. The plot on the left shows the scatter of points when data are simulated assuming r = .5. The plot on the left shows the scatter of points when data are simulated assuming r = -.5.

Scatterplots showing how simulated data values on mean accuracy and mean self-rated accuracy could vary together given positive or negative correlations

We can also imagine – again with simulated data – what correlations of increasing size might look like

The figure shows 4 scatterplots showing how simulated data values on mean accuracy and mean self-rated accuracy *could* vary together given positive correlations of increasing size. Each plot shows points, where each point indicates the pair of scores corresponding to the 'mean accuracy' and the 'mean self-rated accuracy' recorded for each participant in a simulated dataset. The plots show the scatter of points, from left to right, (1.) if r - .1; (2.) if r = .3; (3.) if r = .5; (4.) if r = .8

Scatterplots showing how simulated data values on mean accuracy and mean self-rated accuracy could vary together given positive correlations of increasing size

Summary

  • We are often interested in whether or how variation in the values of two variables are associated
  • We can visualize the distribution of values in any one variable using histograms
  • We visualize the association of values in two variables using scatterplots
  • We conduct correlation tests to examine the sign (positive or negative) and the strength of the association
  • But we always need to think about our research questions, about where our data come from and about whether our measures are any good

End of lecture

References

Borsboom, D., Mellenbergh, G. J., & Heerden, J. van. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Freed, E. M., Hamilton, S. T., & Long, D. L. (2017). Comprehension in proficient readers: The nature of individual variation. Journal of Memory and Language, 97, 135–153. https://doi.org/10.1016/j.jml.2017.07.008
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587–606. https://doi.org/10.1016/j.socec.2004.09.033
Kintsch, W. (1994). Text comprehension, memory, and learning. American Psychologist, 49(4), 294–303. https://doi.org/10.1037/0003-066x.49.4.294
Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable 1, 2. i, 195–244.
Scheel, A. M., Tiokhin, L., Isager, P. M., & Lakens, D. (2021). Why Hypothesis Testers Should Spend Less Time Testing Hypotheses. Perspectives on Psychological Science, 16(4), 744–755. https://doi.org/10.1177/1745691620966795