Linear models and critical perspectives on science and knowledge

Rob Davies

Department of Psychology, Lancaster University

2024-03-12

PSYC122: Classes in weeks 16-20

  • My name is Dr Rob Davies, I am an expert in communication, individual differences, and methods

Tip

Ask me anything:

  • questions during class in person or anonymously through slido;
  • all other questions on discussion forum

Week 19: Linear models – critical perspectives

The figure presents a grid of histograms showing the distribution of mean accuracy scores in each of 11 studies. The histograms indicate that there are many more observations in some studies -- studyone, studytwo, and psyc122.21.22 -- than in others. In these studies, scores are skewed towards high values around .75-.85. Scores are widely distributed in the br, iw, km, mp, tp, ka and zw studies: ranging from aboput .25 to .85.

Figure 1: Histograms showing the distribution of mean accuracy scores in 11 studies

Targets for weeks 16-19: Concepts

We are working together to develop concepts:

  1. Week 16 — Hypotheses, measurement and associations
  2. Week 17 — Predicting people using linear models
  3. Week 18 — Everything is some kind of linear model
  4. Week 19The real challenge in psychological science

Targets for weeks 16-19: Skills

We are working together to develop skills:

  1. Week 16 — Visualizing, estimating, and reporting associations
  2. Week 17 — Using data to predict people
  3. Week 18 — Going deeper on linear models
  4. Week 19Evaluating evidence across multiple studies

Learning targets for this week

  • Concepts – To engage with the real challenges in psychological science:
  1. People vary
  2. Results vary

Learning targets for this week

  • Skills – To engage with the real challenges in data analysis skills development:
  1. Growing in independence
  2. Exploiting the R knowledge ecosystem

Why these targets? Key ideas

Science (including psychological science) has undergone a rolling series of crises:

The triggers for crisis

The credibility revolution: responses

The credibility revolution: replication is recognised as crucial to building a science of psychology

Figure 2 plot from Nosek and colleagues (2022) paper on replicability, reproducibility and robustness in psychological science. Figure shows five plots, one for each study, from left to right: Soto (2019); Camerer (2018); Open Science Collaboration (2015); plus multisite replications; and Protzko (2020) study. Plots are scatterplots. Each plot shows on the horizontal x axis variation in original effect size, on the vertical y axis replication of original effect size. Points shown above the dashed line at y = 0 indicate that the replication effect size was large than the original effect size. Solid circles indicate that replications were statistically significant in the same direction as the original study. Considering all replications (N = 307), 64% reported statistically significant evidence in the same direction, with effect sizes 68% as large as in the original studies. But many points below zero, indicating observed effects smaller in replication than in original studies.

Nosek et al. (2022): replication outcomes for three systematic replication studies

The real challenge: people vary

  • The real challenges we face as psychologists: people vary
  • We examine the impact of this variation
  • And we explore if or how we can reproduce or generalize findings in psychological science

Picture shows a crowd of people seen from above with a variety of colours of clothing.

flickr, Cat Walker ‘crowd’

The real challenge: results vary

The figure presents a grid of scatterplots showing the association between mean accuracy and self-rated accuracy of understanding, of health information, in 11 studies. There are 11 different plots: one plot for each study. In each plot, points represent the mean tested accuracy and mean self-rated accuracy of understanding of health information, for each participant in each study. Trends are shown by linear model best fit lines. Across the plots, it can be seen that higher levels of self-rated accuracy are associated with higher levels of tested accuracy of understanding. The strength of the association varies between studies, indicated by varying slopes. Shaded ellipses indicate confidence about estimates. Ellipses are broader for studies with fewer participants, indicating greater uncertainty given the estimates of the association given the data from those studies.

Figure 2: Scatterplots showing the association between mean accuracy and self-rated accuracy of understanding, of health information, in 11 studies

Variation and replication

Psychological and social processes show much more variability than the usual phenomena in the physical sciences (a. Gelman, 2015)

  • The patterns or effects that interest us may (maybe will) vary between places, people, and times
  • Because treatment effects can be expected to vary then we may not see replication of effects
  • So we investigate how effects vary and we open our workflow

The professional data analysis workflow: from raw data to results

Q cluster_R nd_1 Get raw data nd_2 Tidy data nd_1->nd_2 nd_3_l Visualize nd_2->nd_3_l nd_3 Analyze nd_2->nd_3 nd_3_r Explore nd_2->nd_3_r nd_3_a Assumptions nd_3_a->nd_3_l nd_3_a->nd_3 nd_3_a->nd_3_r nd_3_l->nd_3 nd_4 Present nd_3_l->nd_4 nd_3->nd_4
Figure 3: The data analysis pipeline or workflow

The data analysis workflow

  • Get some data
  • Process or tidy the data
  • Explore, visualize, and analyze the data
  • Present or report your findings

Tip

  • Identify the elements and the order in your work as the parts of a pipeline or the stages in a workflow

Analysis multiverse

Different researchers: different choices (Silberzahn & Uhlmann, 2015)

Image taken from Silberzahn and Uhlmann (2015). The image shows a series of error bars, indicating point estimates and 95% confidence intervals. The points indicate estimates of the extent to which dark skinned players are more likely to be given a red card than white players. Each estimate represents the estimate deriving from the analysis done by an analysis team. Most but not all estimates are labelled as significant. The points are ordered, from left to right, by the estimated effect of skin colour. The estimates vary from equally likely to four times more likely.

Silberzahn and Uhlmann (2015): Twenty-nine research teams reached a wide variety of conclusions using different methods on the same data set to answer the same question.

Kinds of reproducibility

Gilmore et al. (2017; following Goodman et al., 2016) present three kinds of reproducibility:

  • Methods reproducibility
  • Results reproducibility
  • Inferential reproducibility

Inferential reproducibility

If researchers repeat a study (results reproducibility) or re-analyze original data (methods reproducibility) then they should come to similar conclusions as original authors

But …

  • reproducibility attempt could reveal problems, uncertainty over choices
  • different researchers could apply different prior expectations over the probability of possible effects

Lessons learned from crises mean we now hope to see that researchers:

  1. Share data and code
  2. Publish research reports in ways that enable others to check or query analyses

Let’s take a break

  • End of part 1

Health comprehension project – answers to our questions

  • We have been working in the context of a live research project: What makes it easy or difficult to understand written health information?

flickr: Sasin Tipchair 'Senior woman in wheelchair talking to a nurse in a hospital'

flickr, Sasin Tipchair ‘Senior woman in wheelchair talking to a nurse in a hospital’

Health comprehension project: questions and analyses

  • Our research questions are:

Note

  1. What person attributes predict success in understanding?
  2. Can people accurately evaluate whether they correctly understand written health information?

Theory: Models of comprehension accuracy should include predictors:

(1.) experience HLVA, SHIPLEY and (2.) reasoning ability (FACTOR3, reading strategy) (Freed et al., 2017)

Q nd_1_l Language experience nd_2 Comprehension outcome nd_1_l->nd_2 nd_1_r Reasoning capacity nd_1_r->nd_2
Figure 4: Understanding text depends on (1.) language experience and (2.) reasoning ability (Freed et al., 2017)

Multiple candidate predictor variables

The figure presents a grid of scatterplots indicating the association between outcome mean accuracy (on y-axis) and (x-axis) scores on a range of predictor variables. The points are shown in grey, and higher points are associated with higher accuracy scores. The grid includes as predictors: self-rated accuracy; vocabulary (SHIPLEY); health literacy (HLVA); reading strategy (FACTOR3); age (years); gender; education, and ethnicity. The plots indicate that mean accuracy increases with increasing self-rated accuracy, vocabulary, health literacy, and reading strategy scores. Trends are indicated by red lines.

Figure 5: Scatterplots showing the potential association between accuracy of comprehension and variation on each of a series of potential predictor variables.

Critical thinking: data analysis assumptions

  1. validity: that differences in knowledge or ability cause differences in test scores
  2. measurement: that this is equally true across the different kinds of people we tested
  3. generalizability: that the sample of people we recruited resembles the population

Critical thinking: uncertainty

There are three levels of uncertainty when we look at sample data (McElreath, 2020) – uncertainty over:

  1. The nature of the expected change in outcome
  2. The ways that expected changes might vary between individual participants or between groups of participants
  3. The random ways that specific responses can be produced

Critical thinking: working with samples

  • We test who we can – convenience sampling – and who we can test has an impact on the quality of evidence (Bornstein et al., 2013)

Tip

Practice critical evaluation:

  • If age, ethnicity or gender are not balanced \(\rightarrow\) does this matter to your research question?
  • If samples are limited in size \(\rightarrow\) how does that affect our uncertainty over effects estimates?

Let’s take a break

  • End of part 2

11 health comprehension studies

The figure presents a grid of 3 Cleveland style dot plots-- counts of participants of: (1.) different self-reported genders (2.) different education levels and (3.) different ethnicities. The different studies counts are shown in different colours. While there is variation between studies in counts it can be seen that across studies most participants are female, white and better educated.

Figure 6: Dotplots showing the gender, education and ethnicity of participants across 11 studies

Participants vary in accuracy

The figure presents a grid of histograms showing the distribution of mean accuracy scores in each of 11 studies. The histograms indicate that there are many more observations in some studies -- studyone, studytwo, and psyc122.21.22 -- than in others. In these studies, scores are skewed towards high values around .75-.85. Scores are widely distributed in the br, iw, km, mp, tp, ka and zw studies: ranging from aboput .25 to .85.

Figure 7: Grid of histograms showing the distribution of mean accuracy scores in each of 11 studies

Participants vary in age

The figure presents a grid of histograms showing the distribution of participant ages in each of 11 studies. The histograms indicate that there are many more participants in some studies -- studyone, studytwo, and psyc122.21.22 -- than in others. In all studies, ages are skewed towards younger participants with ages around 20-40 years.

Figure 8: Grid of histograms showing the distribution of participant ages in each of 11 studies

Participants vary in health literacy

The figure presents a grid of histograms showing the distribution of health literacy (HLVA)scores in each of 11 studies. The histograms indicate that there are many more observations in some studies -- studyone, studytwo, and psyc122.21.22 -- than in others. In all studies, score distributions are approximately symmetrical. The scores peak around 5-10, with fewer observations of lower or higher scores

Figure 9: Grid of histograms showing the distribution of health literacy (HLVA) scores in each of 11 studies

Participants vary in vocabulary

The figure presents a grid of histograms showing the distribution of vocabulary (Shipley) scores in each of 11 studies. The histograms indicate that there are many more observations in some studies -- studyone, studytwo, and psyc122.21.22 -- than in others. In these studies, scores are skewed towards high values around 35. Scores are more widely distributed in the br, iw, km, mp, tp, ka and zw studies: ranging more evenly from aboput 20 to 40.

Figure 10: Grid of histograms showing the distribution of vocabulary (Shipley) scores in each of 11 studies

Participants vary in reading strategy

The figure presents a grid of histograms showing the distribution of reading strategy (FACTOR3) scores in each of 11 studies. The histograms indicate that there are many more observations in some studies -- studyone, studytwo, and psyc122.21.22 -- than in others. In all studies, the distribution of scores is approximately symmetrical, peaking about 40-45.

Figure 11: Grid of histograms showing the distribution of reading strategy (FACTOR3) scores in each of 11 studies

Associations vary

The figure presents a grid of scatterplots showing the potential association between mean accuracy scores and health literacy scores among the participants in each of 11 studies. Each point in a scatterplot represents the paired accuracy and health literacy scores recorded for a participant in a study. The plots indicate that there are many more observations in some studies -- studyone, studytwo, and psyc122.21.22 -- than in others. A smoother is drawn through the points in each plot. The trends indicated by the smoothers suggest that higher health literacy scores are associated with higher mean accuracy. Shaded ellipses indicate higher uncertainty about the trends given the data in some studies e.g. the md study. The trends clearly vary in magnitude if not direction across studies.

Figure 12: Association between mean accuracy and health literacy

Associations vary

The figure presents a grid of scatterplots showing the potential association between mean accuracy scores and mean self-rated accuracy scores among the participants in each of 11 studies. Each point in a scatterplot represents the paired accuracy and self-rated accuracy scores recorded for a participant in a study. The plots indicate that there are many more observations in some studies -- studyone, studytwo, and psyc122.21.22 -- than in others. A smoother is drawn through the points in each plot. The trends indicated by the smoothers suggest that higher ratings of accuracy are associated with higher mean accuracy. Shaded ellipses indicate higher uncertainty about the trends given the data in some studies e.g. the md study. The trends clearly vary in magnitude if not direction across studies.

Figure 13: Association between mean accuracy and mean self-rated accuracy

Health comprehension project: answers

Note

  1. What person attributes predict success in understanding?
  • Health literacy, vocabulary, and reading strategy
  1. Can people accurately evaluate whether they correctly understand written health information?
  • Yes but not very well

Do we see replication across studies?

The figure presents a grid of plots showing the estimated association between mean accuracy and health literacy in each of 11 studies. The lines have different colours for different studies. Points and ellipses have been removed to focus on variation in the trends indicated by the lines. It can be seen that the trends are more marked in some studies than in others. In one study -- the psyc122.21.22 study -- the line is flat suggesting that the association is near null or null.

Figure 14: Varying estimated association between mean accuracy and health literacy

Do we see replication across studies?

The figure presents a grid of plots showing the estimated association between mean accuracy and mean self-rated accuracy in each of 11 studies. The lines have different colours for different studies. Points and ellipses have been removed to focus on variation in the trends indicated by the lines. It can be seen that the trends are more marked in some studies than in others.

Figure 15: Varying association between mean accuracy and mean self-rated accuracy

Results reproducibility

  • If a researcher finds a pattern in human behaviour or in individual differences
  • We may critically evaluate the robustness or the generalizability of the finding

Important

Results reproducibility means that a new study with new data, collected following the original procedures as closely as possible, yields the same outcomes Gilmore et al. (2017)

Health comprehension studies evidence

  • Maybe it is wiser – given levels of uncertainty – to expect some variation in results

Tip

What is your view?

  • Do we see robust prediction of accuracy of understanding of health information, given measures of vocabulary, health literacy, and reading strategy?

PSYC122 response data

  • What will we see in a new study: with your data?

Tip

Will we see the same or different patterns?

  • Do the practical work to find out

Let’s take a break

  • End of part 3

Kinds of reproducibility

Gilmore et al. (2017; following Goodman et al., 2016) present three kinds of reproducibility:

  • Methods reproducibility
  • Results reproducibility
  • Inferential reproducibility

Methods reproducibility

  • Other researchers should be able to get the same results if they use the analysis methods with the same data
Image taken from Hardwicke et al. (2021): Frequency of reproducibility outcomes by value type. The image shows a series of filled bars. The heights of the bar segments represent the frequencies with which reproducibility outcomes of different types were observed (match, minor discrepancy, major discrepancy) for each of the following kinds of reproducibility outcome: variation/uncertainty measures including standard deviations, standard errors and confidence intervals; effect sizes including Cohen’s d, Pearson’s r, partial eta squared and phi; test statistics including t, F and chi-squared; and central tendency measures including means and medians. Most bars indicate matches or minor discrepancies but there are small numbes of major discrepancies on p values, uncertainty measures and effect sizes.

Hardwicke et al. (2021): Frequency of reproducibility outcomes by value type.

Require data and code sharing

  • Analyses by Kidwell et al. (2016) and analyses reviewed by Nosek et al. (2022): study data increasingly available
Image taken from Nosek et al. (2021) figure 4. The image shows a series of line graphs. The graphs show yearly counts of users, sharing of files (research data, materials, code), and registration of studies on OSF and AsPredicted. The trends show steeply rising counts of new users, new public files and new project registrations

Nosek et al. (2022): Yearly counts of users, sharing of files (research data, materials, code), and registration of studies on OSF and AsPredicted.

Why we use R: We can share data and code helpfully

It is great that data are shared but analyses show they are not always readily usable (Towse et al., 2021) but should be

  • completeness: are all the data and the data descriptors supporting a study’s findings publicly available?
  • reusability: how readily can the data be accessed and understood by others?

Why we use R: We can write self-documented code

# Here: fit a linear model
lm(mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE, ...)

Tip

  • R will ignore everything after #
  • Add a # comment after each step to briefly explain to yourself and others what is going on

The R knowledge ecosystem

R is:

  • a language
  • a computing environment
  • a knowledge ecosystem

R is a language

We use:

  • functions like lm() in the same way we use verbs to describe doing things
  • arguments like (mean.acc ~ HLVA) in the same way we use nouns to identify who does what to whom
# Here: fit a linear model
lm(mean.acc ~ SHIPLEY + HLVA + FACTOR3 + AGE + NATIVE.LANGUAGE, ...)

R is a language

Like every language, we can often say the same thing using different words or accents

# This code does essentially the same job
data <- read_csv("mydata.csv")
# As this code
data <- read.csv("mydata.csv")

Languages have dialects

R has four different ways to draw plots: base, {lattice}, {grid}, {ggplot2}

# Base R graphics histogram of HLVA scores
hist(all.subjects$HLVA)

The R knowledge ecosystem

Above all, R is free:

Photograph of a neon sign, letters in red saying 'Free speech fear free'

The R knowledge ecosystem

Tip

Every problem you ever have:

  • someone has had it before
  • solved it
  • and written a blog (or tweet or toot) or recorded a YouTube or TikTok about it

The R knowledge ecosystem

  • R is free open statistical software: everything you use is contributed, discussed and taught by a community of R users online, in open forums
  • Learning to navigate this knowledge is an introduction to the future of knowledge sharing

Picture shows a person with long hair boarding a train. They are holding a sign with the word 'revolution' and a rainbow painted on it.

flickr, Cesar Salvadeo ‘Revolution’

How to find things out when you know what you need

  • R has a built-in help system: typing
help(geom_histogram)
  • Gets you detailed technical information

… Examples ggplot(diamonds, aes(carat)) + geom_histogram() …

Tip

Start with the examples

How to find things out when you don’t know what you need

This code won’t work

all.subjects %>%
  ggplot(aes(x = AGE, y = mean.acc)) + 
  geom_histogram()
Error in `f()`:
! stat_bin() can only have an x or y aesthetic.
Backtrace:

How to find things out when you don’t know what you need

Tip

Just google it:

  • Copy warning or error messages from R-Studio and paste them into a search engine

Copying an error message into google gets you a list of web pages

stat_bin() can only have an x or y aesthetic.

Image shows Google search results in response to entering error message: 'stat_bin() can only have an x or y aesthetic.' including Stack Overflow results

Stack Overflow lists question and answer discussions

Picture shows a person with long hair boarding a train. They are holding a sign with the word 'revolution' and a rainbow painted on it.

How to find things out when you don’t know what you need

  • Stack Overflow pages identify:
  1. Questions asked
  2. Answers, often with code solutions to problems, and helpful discussions
  • With questions and answers ranked by usefulness

The wider revolution in building and sharing knowledge

We can find many excellent free online books like:

https://r4ds.had.co.nz

Image shows the front page of the 'R for Data Science' web book

A worldwide community of knowledge sharing

Image shows the front page of the 'ggplot gallery' web pages

Screenshot of front page of ggplot gallery

Summary

  • In the health comprehension project: accuracy of understanding of health information can be predicted by vocabulary knowledge, health literacy and reading strategy
  • People can judge their own accuracy of understanding but not well

Tip

  • What do your PSYC122 response data say?

Summary: critical thinking

  • We can expect results – associations, effects, patterns – to vary between times, places, people
  • Samples will be limited, measurement under uncertainty
  • Data analysis choices will vary between researchers
  • So we share data and code and critically evaluate results

Summary: grow in independence

  • Comment your code in your .R scripts to explain what you are doing and how
  • Use online information sources to understand choices and options

Tip

  • Someone has already solved your problem: you just need to find the blog/Stack Overflow discussion/TikTok where they explain the solution

End of week 19

References

Aarts, E., Dolan, C. V., Verhage, M., & Van der Sluis, S. (2015). Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives. BMC Neuroscience, 16(1), 1–15. https://doi.org/10.1186/s12868-015-0228-5
Bornstein, M. H., Jager, J., & Putnick, D. L. (2013). Sampling in developmental science: Situations, shortcomings, solutions, and standards. Developmental Review, 33(4), 357–370. https://doi.org/10.1016/j.dr.2013.08.003
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376.
Flake, J. K., & Fried, E. I. (2020). Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393
Freed, E. M., Hamilton, S. T., & Long, D. L. (2017). Comprehension in proficient readers: The nature of individual variation. Journal of Memory and Language, 97, 135–153. https://doi.org/10.1016/j.jml.2017.07.008
Gelman, a. (2015). The connection between varying treatment effects and the crisis of unreplicable research: A bayesian perspective. Journal of Management, 41(2), 632–643. https://doi.org/10.1177/0149206314525208
Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460–465. https://doi.org/10.1511/2014.111.460
Gilmore, R. O., Diaz, M. T., Wyble, B. A., & Yarkoni, T. (2017). Progress toward openness, transparency, and reproducibility in cognitive neuroscience. Annals of the New York Academy of Sciences, 1396, 5–18. https://doi.org/10.1111/nyas.13325
Goodman, S. N., Fanelli, D., & Ioannidis, J. P. A. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341).
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? The Behavioral and Brain Sciences, 33(2-3). https://doi.org/10.1017/S0140525X0999152X
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953
Kidwell, M. C., Lazarević, L. B., Baranski, E., Hardwicke, T. E., Piechowski, S., Falkenberg, L. S., Kennett, C., Slowik, A., Sonnleitner, C., Hess-Holden, C., Errington, T. M., Fiedler, S., & Nosek, B. A. (2016). Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biology, 14(5), 1–15. https://doi.org/10.1371/journal.pbio.1002456
McElreath, R. (2020). Statistical rethinking. Chapman; Hall/CRC. https://doi.org/10.1201/9780429029608
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie Du Sert, N., Simonsohn, U., Wagenmakers, E. J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 1–9. https://doi.org/10.1038/s41562-016-0021
Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., van?t Veer, A. E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in Cognitive Sciences, 23(10), 815–818.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Kline Struhl, M., Nuijten, M. B., Rohrer, J. M., Romero, F., Scheel, A. M., Scherer, L. D., Schönbrodt, F. D., & Vazire, S. (2022). Replicability, Robustness, and Reproducibility in Psychological Science. Annual Review of Psychology, 73, 719–748. https://doi.org/10.1146/annurev-psych-020821-114157
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192
Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science, 7(6), 528–530. https://doi.org/10.1177/1745691612465253
Silberzahn, R., & Uhlmann, E. L. (2015). Crowdsourced research: Many hands make tight work. Nature, 526(7572), 189–191. https://doi.org/10.1038/526189a
Towse, J. N., Ellis, D. A., & Towse, A. S. (2021). Opening Pandora’s Box: Peeking inside Psychology’s data sharing practices, and seven recommendations for change. Behavior Research Methods, 53(4), 1455–1468. https://doi.org/10.3758/s13428-020-01486-1
Wild, H., Kyröläinen, A.-J., & Kuperman, V. (2022). How representative are student convenience samples? A study of literacy and numeracy skills in 32 countries. PLOS ONE, 17(7), e0271191. https://doi.org/10.1371/journal.pone.0271191
Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, e1. https://doi.org/10.1017/S0140525X20001685