Research report – why?

Rob Davies

Department of Psychology, Lancaster University

PSYC411: Classes weeks 6-10

My name is Dr Rob Davies, I am an expert in communication, individual differences, and methods

Tip

Ask me anything:

questions during class on slido or just ask;
all other questions on discussion forum

PSYC411: Classes weeks 6-10

We are working together to develop skills in a series of classes

Week 6 — Research report
Week 7 — Associations, hypotheses
Week 8 — Linear models: predictions
Week 9 — Visualizing data
Week 10 — Linear models: development

This week: we focus on the research report, especially why

Why: what is the motivation for the assignment?
Explain this, to help to make sense of what we are doing, and what you can learn

How to work with the materials

Watch these lectures for an outline
Read the notes to get a detailed explanation
Follow references to sources to build expertise

What to do and how to do it: see notes, and discuss in class

Why this? Key ideas

The difference between science “…being done, science in the making, and science already done, a finished product …” [Bourdieu (2004); p.2]
Published reported analyses are not necessary or sufficient to the data or the question

Wider context: Credibility revolution

Science (including psychological science) has undergone a rolling series of crises:

Replicability or replication crisis (Pashler & Wagenmakers, 2012)
Statistical crisis (Gelman & Loken, 2014)
Generalizability crisis (Yarkoni, 2022)

Wider context: Credibility revolution

Figure 2 plot from Nosek and colleagues (2022) paper on replicability, reproducibility and robustness in psychological science. Figure shows five plots, one for each study, from left to right: Soto (2019); Camerer (2018); Open Science Collaboration (2015); plus multisite replications; and Protzko (2020) study. Plots are scatterplots. Each plot shows on the horizontal x axis variation in original effect size, on the vertical y axis replication of original effect size. Points shown above the dashed line at y = 0 indicate that the replication effect size was large than the original effect size. Solid circles indicate that replications were statistically significant in the same direction as the original study. Considering all replications (N = 307), 64% reported statistically significant evidence in the same direction, with effect sizes 68% as large as in the original studies. But many points below zero, indicating observed effects smaller in replication than in original studies.

Nosek et al. (2022): replication outcomes for three systematic replication studies

The triggers for crisis

Failures to replicate influential claims (Nosek et al., 2022)
Questionable research practices (John et al., 2012)
Questionable measurement practices (Flake & Fried, 2020)
Limited samples (Button et al., 2013; Henrich et al., 2010; Wild et al., 2022)

The credibility revolution: responses

Pre-registration (Nosek et al., 2018, 2019) and registered reports (Nosek & Lakens, 2014)
Replication studies (e.g. Aarts et al., 2015)
Identification of open science principles (Munafò et al., 2017)

What is the motivation?

Opportunity to do original research work

Important elements of the hard work in trying to make science work better is led by PhD students and junior researchers (e.g., Herndon et al., 2014)
Identification of problems in the literature because of independent post-publication review work

Screen shot of Excel spreadsheet from 2013-04-16 blog post by Andrew Gelman. Image shows Excel spreadsheet with GDP growth in rows, by country, and different levels of GDP growth in columns by Debt to GDP ratio. Image shows that calculated average growth rate by Debt to GDP ratio depends on the cells included in the calculation — Student Herndon discovered that a famous claim – countries with more debt have low growth – based on error

Let’s take a break

End of part 1

Multiverses: a fresh perspective

We introduce the idea that your analysis work will flow through the stages of a pipeline
From getting the data |> to presenting your findings
Because, next, we examine how pipelines can multiply

Data analysis: we get from raw data to presenting results in stages

Figure 1: The data analysis pipeline or workflow

The data analysis pipeline (workflow)

Get some data
Process or tidy the data
Explore, visualize, and analyze the data
Present or report your findings

Tip

Identify the elements and the order in your work as the parts of a pipeline or the stages in a workflow

The garden of forking paths

Figure 2: Forking paths in data analysis

The garden of forking paths: pathways multiply at each choice

It is possible to have multiple potential different paths from the data to the results
The results may vary, depending on the path we take

Credibility and multiplicity

Certainty over data processing or the uncertainty over analysis methods revealed in multiverse analyses (next)
Plus limitations of data and code sharing, and incompleteness of reports
Drives problems replicating or reproducing claims

Multiple pathways and secret lives

Image taken from Carp (2012) study of methods reporting and methodological choices across 241 recent fMRI articles. Many studies did not report critical methodological details with re- gard to experimental design, data acquisition, and analysis. Bar chart shows one bar for each of several methods procedures in neuroimaging. Heights of bars represent how often each procedure was reported. Most often reported is visualization of results. Very few power analyses reported.

Carp (2012): Proportions of studies that reported using each of 21 procedures for data collection and analysis.

Data processing is always necessary

When you collect or access data for a research study
The complete raw dataset you receive is almost never the complete dataset you analyze or whose analysis you report

Data multiverse analyses

The impact of dataset construction choices on analysis results
Same data constructed in different ways, given reasonable alternate choices
p-values of tests vary for same tests across data versions

Image extracted from Steegen et al. (2016); figure 1. Histograms showing the frequency distribution of p values for tests of the effect of the interaction between fertility and relationship status. The study authors constructed multiple different versions of a dataset, then completed the same statistical test of the effect of the interaction in each version. They did this for tests of the effects on the outcomes religiosity, fiscal or social political attitudes, and voting or donation preferences. The histograms show that for some outcomes, the distribution of p-values for tests of the interaction effect, varied widely from low to high p-values. For two variables, religiosity in Study 1 (Panel A) and fiscal political attitudes (Panel C), the multiverse analysis reveals a near-uniform distribution. For the remaining four variables, roughly half of the choice combinations lead to a significant interaction effect. — Steegen et al. (2016); figure 1

Data multiverse: lessons

We approach our study with the same research question, and prediction
We begin with the exact same data
We could construct different datasets depending on equally reasonable choices
As a result, we may see different results for the analyses of the different datasets

Analysis multiverse

Different researchers may adopt different approaches (Silberzahn & Uhlmann, 2015)

Image taken from Silberzahn and Uhlmann (2015). The image shows a series of error bars, indicating point estimates and 95% confidence intervals. The points indicate estimates of the extent to which dark skinned players are more likely to be given a red card than white players. Each estimate represents the estimate deriving from the analysis done by an analysis team. Most but not all estimates are labelled as significant. The points are ordered, from left to right, by the estimated effect of skin colour. The estimates vary from equally likely to four times more likely.

Silberzahn and Uhlmann (2015): Twenty-nine research teams reached a wide variety of conclusions using different methods on the same data set to answer the same question (about football players’ skin colour and red cards).

Analysis multiverse

In multiverse analyses, we typically see variation in:

how psychological constructs are operationalized;
how data are processed or datasets constructed;
what statistical techniques are used;
how those techniques are used;
\(\rightarrow\) with associated variation in results

Conclusions

If we see one analysis reported in a paper that does not mean only one analysis can reasonably be applied

Tip

If you read the methods or results section of a paper, you should reflect:

What other analysis methods could be used here?
How could variation in analysis method — in what or how you do the analysis — influence the results?

Let’s take a break

End of Part 2

Kinds of reproducibility

(Gilmore et al., 2017; following Goodman et al., 2016) present three kinds of reproducibility:

Methods reproducibility
Results reproducibility
Inferential reproducibility

We are mostly focused — in thinking about the research report — on methods and inferential reproducibility

Methods reproducibility

Other researchers should be able to get the same results if they use the analysis methods with the same data

Hardwicke et al. (2021): Frequency of reproducibility outcomes by value type.

Inferential reproducibility

If researchers repeat a study (results reproducibility) or re-analyze original data (methods reproducibility) then they should come to similar conclusions as original authors

But …

reproducibility attempt could reveal problems, uncertainty over choices
different researchers could apply different prior expectations over the probability of possible effects

The match between open science ideas and practices

The lessons learned from crises mean we now hope to see researchers:

Share data and code;
Publish research reports in ways that enable others to check or query analyses.
What do we see, when we look at current practices?

Enabling others to check or query analyses

Overall, the good news is

for many published reports, if data are shared and if the shared data are accessible and reusable then
most of the time, the researchers can reproduce the results presented by the original study authors (Hardwicke et al., n.d.; Hardwicke et al., 2018; Laurinavichyute et al., 2022; Minocher et al., n.d.; Obels et al., 2020; but see Crüwell et al., n.d.)

Yet studies of reproducibility identify challenges

Data challenges

Data Availability Statements or open science badges indicate data are shared but data may not be directly accessible
The data are shared and accessible but there may be missing or incorrect information about the data
Original study authors may share raw and processed data or just processed or just raw data
There may be mismatches between the variables referred to in the report and the variables named in the data file

Analysis challenges

The description of the analysis procedure may be incomplete or ambiguous
It may be difficult to identify the key analysis
It is easier to reproduce results if both data and code are shared but code is not always shared
Sometimes, analysis code is shared but it is difficult to use
Sometimes, there are errors in the analysis

Summary: why this?

We are in the middle of a credibility revolution
Focusing on data analysis, it is useful to think about the whole data pipeline
At every stage of the data pipeline, there are choices: forking paths
We can share data and enable others to check, query analysis choices

Tip

Learning to do data analysis requires learning good workflow practices

Opportunities

The conventions of normal scientific practice can work so that if we encounter an anomaly we blame ourselves (Kuhn, 1970)
Sometimes, if you think you have found an error or a problem in a published analysis or a shared dataset, remember: you may be right (Herndon et al., 2014)

What we learn: critical reading skills

Learn to critically read research reports and data documentation
Develop proficiency in analysis workflow
Develop critical skills in analysis

What we learn: good practice sense

The credibility revolution requires us to think about and to teach good open science practices that safeguard the value of evidence in psychology

What we learn: practical sense

People make mistakes
Different choices are often reasonable
We always need to check the evidence

What we learn: do better

Share data and code
Publish research reports in ways that enable others to check or query analyses

References

Aarts, E., Dolan, C. V., Verhage, M., & Van der Sluis, S. (2015). Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives. BMC Neuroscience, 16(1), 1–15. https://doi.org/10.1186/s12868-015-0228-5

Bourdieu, P. (2004). Science of Science and Reflexivity. Polity.

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376.

Crüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger, S. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S., Zaneva, M., & Brown, N. J. L. (n.d.). What’s in a badge? A computational reproducibility investigation of the open data badge policy in one issue of psychological science. https://doi.org/10.31234/osf.io/729qt

Flake, J. K., & Fried, E. I. (2020). Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393

Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460–465. https://doi.org/10.1511/2014.111.460

Gilmore, R. O., Diaz, M. T., Wyble, B. A., & Yarkoni, T. (2017). Progress toward openness, transparency, and reproducibility in cognitive neuroscience. Annals of the New York Academy of Sciences, 1396, 5–18. https://doi.org/10.1111/nyas.13325

Goodman, S. N., Fanelli, D., & Ioannidis, J. P. A. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341).

Hardwicke, T. E., Bohn, M., MacDonald, K., Hembacher, E., Nuijten, M. B., Peloquin, B. N., deMayo, B. E., Long, B., Yoon, E. J., & Frank, M. C. (n.d.). Analytic reproducibility in articles receiving open data badges at the journal psychological science: An observational study. Royal Society Open Science, 8(1), 201494. https://doi.org/10.1098/rsos.201494

Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., Hofelich Mohr, A., Clayton, E., Yoon, E. J., Henry Tessler, M., Lenne, R. L., Altman, S., Long, B., & Frank, M. C. (2018). Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition. Royal Society Open Science, 5(8), 180448. https://doi.org/10.1098/rsos.180448

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? The Behavioral and Brain Sciences, 33(2-3). https://doi.org/10.1017/S0140525X0999152X

Herndon, T., Ash, M., & Pollin, R. (2014). Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge Journal of Economics, 38(2), 257–279. https://doi.org/10.1093/cje/bet075

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953

Kidwell, M. C., Lazarević, L. B., Baranski, E., Hardwicke, T. E., Piechowski, S., Falkenberg, L. S., Kennett, C., Slowik, A., Sonnleitner, C., Hess-Holden, C., Errington, T. M., Fiedler, S., & Nosek, B. A. (2016). Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biology, 14(5), 1–15. https://doi.org/10.1371/journal.pbio.1002456

Kuhn, T. S. (1970). The structure of scientific revolutions ([2d ed., enl). University of Chicago Press.

Laurinavichyute, A., Yadav, H., & Vasishth, S. (2022). Share the code, not just the data: A case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy. Journal of Memory and Language, 125, 104332. https://doi.org/10.1016/j.jml.2022.104332

Minocher, R., Atmaca, S., Bavero, C., McElreath, R., & Beheim, B. (n.d.). Estimating the reproducibility of social learning research published between 1955 and 2018. Royal Society Open Science, 8(9), 210450. https://doi.org/10.1098/rsos.210450

Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie Du Sert, N., Simonsohn, U., Wagenmakers, E. J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 1–9. https://doi.org/10.1038/s41562-016-0021

Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., van?t Veer, A. E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in Cognitive Sciences, 23(10), 815–818.

Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.

Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Kline Struhl, M., Nuijten, M. B., Rohrer, J. M., Romero, F., Scheel, A. M., Scherer, L. D., Schönbrodt, F. D., & Vazire, S. (2022). Replicability, Robustness, and Reproducibility in Psychological Science. Annual Review of Psychology, 73, 719–748. https://doi.org/10.1146/annurev-psych-020821-114157

Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192

Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of open data and computational reproducibility in registered reports in psychology. Advances in Methods and Practices in Psychological Science, 3(2), 229–237. https://doi.org/10.1177/2515245920918872

Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science, 7(6), 528–530. https://doi.org/10.1177/1745691612465253

Silberzahn, R., & Uhlmann, E. L. (2015). Crowdsourced research: Many hands make tight work. Nature, 526(7572), 189–191. https://doi.org/10.1038/526189a

Towse, J. N., Ellis, D. A., & Towse, A. S. (2021). Opening Pandora’s Box: Peeking inside Psychology’s data sharing practices, and seven recommendations for change. Behavior Research Methods, 53(4), 1455–1468. https://doi.org/10.3758/s13428-020-01486-1

Wild, H., Kyröläinen, A.-J., & Kuperman, V. (2022). How representative are student convenience samples? A study of literacy and numeracy skills in 32 countries. PLOS ONE, 17(7), e0271191. https://doi.org/10.1371/journal.pone.0271191

Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, e1. https://doi.org/10.1017/S0140525X20001685

Research report – why?

PSYC411: Classes weeks 6-10

PSYC411: Classes weeks 6-10

This week: we focus on the research report, especially why

How to work with the materials

Why this? Key ideas

Wider context: Credibility revolution

Wider context: Credibility revolution

The triggers for crisis

The credibility revolution: responses

What is the motivation?

Let’s take a break

Multiverses: a fresh perspective

Data analysis: we get from raw data to presenting results in stages

The data analysis pipeline (workflow)

The garden of forking paths

The garden of forking paths: pathways multiply at each choice

Credibility and multiplicity

Multiple pathways and secret lives

Data processing is always necessary

Data multiverse analyses

Data multiverse: lessons

Analysis multiverse

Analysis multiverse

Conclusions

Let’s take a break

Kinds of reproducibility

Methods reproducibility

Inferential reproducibility

The match between open science ideas and practices

Data and code sharing

Data and code sharing

Enabling others to check or query analyses

Data challenges

Analysis challenges

Summary: why this?

Opportunities

What we learn: critical reading skills

What we learn: good practice sense

What we learn: practical sense

What we learn: do better

References