Research report – why?

Rob Davies

Department of Psychology, Lancaster University

PSYC411: Classes weeks 6-10

  • My name is Dr Rob Davies, I am an expert in communication, individual differences, and methods

Tip

Ask me anything:

  • questions during class on slido or just ask;
  • all other questions on discussion forum

PSYC411: Classes weeks 6-10

We are working together to develop skills in a series of classes

  1. Week 6 — Research report
  2. Week 7 — Associations, hypotheses
  3. Week 8 — Linear models: predictions
  4. Week 9 — Visualizing data
  5. Week 10 — Linear models: development

This week: we focus on the research report, especially why

  • Why: what is the motivation for the assignment?
  • Explain this, to help to make sense of what we are doing, and what you can learn

How to work with the materials

  • Watch these lectures for an outline
  • Read the notes to get a detailed explanation
  • Follow references to sources to build expertise

What to do and how to do it: see notes, and discuss in class

Why this? Key ideas

  1. The difference between science “…being done, science in the making, and science already done, a finished product …” [Bourdieu (2004); p.2]
  2. Published reported analyses are not necessary or sufficient to the data or the question

Wider context: Credibility revolution

Science (including psychological science) has undergone a rolling series of crises:

Wider context: Credibility revolution

Figure 2 plot from Nosek and colleagues (2022) paper on replicability, reproducibility and robustness in psychological science. Figure shows five plots, one for each study, from left to right: Soto (2019); Camerer (2018); Open Science Collaboration (2015); plus multisite replications; and Protzko (2020) study. Plots are scatterplots. Each plot shows on the horizontal x axis variation in original effect size, on the vertical y axis replication of original effect size. Points shown above the dashed line at y = 0 indicate that the replication effect size was large than the original effect size. Solid circles indicate that replications were statistically significant in the same direction as the original study. Considering all replications (N = 307), 64% reported statistically significant evidence in the same direction, with effect sizes 68% as large as in the original studies. But many points below zero, indicating observed effects smaller in replication than in original studies.

Nosek et al. (2022): replication outcomes for three systematic replication studies

The triggers for crisis

The credibility revolution: responses

What is the motivation?

Opportunity to do original research work

  • Important elements of the hard work in trying to make science work better is led by PhD students and junior researchers (e.g., Herndon et al., 2014)
  • Identification of problems in the literature because of independent post-publication review work

Screen shot of Excel spreadsheet from 2013-04-16 blog post by Andrew Gelman. Image shows Excel spreadsheet with GDP growth in rows, by country, and different levels of GDP growth in columns by Debt to GDP ratio. Image shows that calculated average growth rate by Debt to GDP ratio depends on the cells included in the calculation

Student Herndon discovered that a famous claim – countries with more debt have low growth – based on error

Let’s take a break

  • End of part 1

Multiverses: a fresh perspective

  • We introduce the idea that your analysis work will flow through the stages of a pipeline
  • From getting the data |> to presenting your findings
  • Because, next, we examine how pipelines can multiply

Data analysis: we get from raw data to presenting results in stages

Q cluster_R nd_1 Get raw data nd_2 Tidy data nd_1->nd_2 nd_3_l Visualize nd_2->nd_3_l nd_3 Analyze nd_2->nd_3 nd_3_r Explore nd_2->nd_3_r nd_3_a Assumptions nd_3_a->nd_3_l nd_3_a->nd_3 nd_3_a->nd_3_r nd_3_l->nd_3 nd_4 Present nd_3_l->nd_4 nd_3->nd_4
Figure 1: The data analysis pipeline or workflow

The data analysis pipeline (workflow)

  • Get some data
  • Process or tidy the data
  • Explore, visualize, and analyze the data
  • Present or report your findings

Tip

  • Identify the elements and the order in your work as the parts of a pipeline or the stages in a workflow

The garden of forking paths

D A A B1 B1 A->B1 B2 B2 A->B2 C1 C1 B1->C1 C2 C2 B1->C2 C3 C3 B1->C3 C4 C4 B2->C4 C5 C5 B2->C5 C6 C6 B2->C6
Figure 2: Forking paths in data analysis

The garden of forking paths: pathways multiply at each choice

  • It is possible to have multiple potential different paths from the data to the results
  • The results may vary, depending on the path we take

Credibility and multiplicity

  • Certainty over data processing or the uncertainty over analysis methods revealed in multiverse analyses (next)
  • Plus limitations of data and code sharing, and incompleteness of reports
  • Drives problems replicating or reproducing claims

Multiple pathways and secret lives

Image taken from Carp (2012) study of methods reporting and methodological choices across 241 recent fMRI articles. Many studies did not report critical methodological details with re- gard to experimental design, data acquisition, and analysis. Bar chart shows one bar for each of several methods procedures in neuroimaging. Heights of bars represent how often each procedure was reported. Most often reported is visualization of results. Very few power analyses reported.

Carp (2012): Proportions of studies that reported using each of 21 procedures for data collection and analysis.

Data processing is always necessary

  • When you collect or access data for a research study
  • The complete raw dataset you receive is almost never the complete dataset you analyze or whose analysis you report

Data multiverse analyses

  • The impact of dataset construction choices on analysis results
  • Same data constructed in different ways, given reasonable alternate choices
  • p-values of tests vary for same tests across data versions

Image extracted from Steegen et al. (2016); figure 1. Histograms showing the frequency distribution of p values for tests of the effect of the interaction between fertility and relationship status. The study authors constructed multiple different versions of a dataset, then completed the same statistical test of the effect of the interaction in each version. They did this for tests of the effects on the outcomes religiosity, fiscal or social political attitudes, and voting or donation preferences. The histograms show that for some outcomes, the distribution of p-values for tests of the interaction effect, varied widely from low to high p-values. For two variables, religiosity in Study 1 (Panel A) and fiscal political attitudes (Panel C), the multiverse analysis reveals a near-uniform distribution. For the remaining four variables, roughly half of the choice combinations lead to a significant interaction effect.

Steegen et al. (2016); figure 1

Data multiverse: lessons

  • We approach our study with the same research question, and prediction
  • We begin with the exact same data
  • We could construct different datasets depending on equally reasonable choices
  • As a result, we may see different results for the analyses of the different datasets

Analysis multiverse

Different researchers may adopt different approaches (Silberzahn & Uhlmann, 2015)

Image taken from Silberzahn and Uhlmann (2015). The image shows a series of error bars, indicating point estimates and 95% confidence intervals. The points indicate estimates of the extent to which dark skinned players are more likely to be given a red card than white players. Each estimate represents the estimate deriving from the analysis done by an analysis team. Most but not all estimates are labelled as significant. The points are ordered, from left to right, by the estimated effect of skin colour. The estimates vary from equally likely to four times more likely.

Silberzahn and Uhlmann (2015): Twenty-nine research teams reached a wide variety of conclusions using different methods on the same data set to answer the same question (about football players’ skin colour and red cards).

Analysis multiverse

In multiverse analyses, we typically see variation in:

  • how psychological constructs are operationalized;
  • how data are processed or datasets constructed;
  • what statistical techniques are used;
  • how those techniques are used;
  • \(\rightarrow\) with associated variation in results

Conclusions

If we see one analysis reported in a paper that does not mean only one analysis can reasonably be applied

Tip

If you read the methods or results section of a paper, you should reflect:

  • What other analysis methods could be used here?
  • How could variation in analysis method — in what or how you do the analysis — influence the results?

Let’s take a break

  • End of Part 2

Kinds of reproducibility

(Gilmore et al., 2017; following Goodman et al., 2016) present three kinds of reproducibility:

  • Methods reproducibility
  • Results reproducibility
  • Inferential reproducibility

We are mostly focused — in thinking about the research report — on methods and inferential reproducibility

Methods reproducibility

  • Other researchers should be able to get the same results if they use the analysis methods with the same data
Image taken from Hardwicke et al. (2021): Frequency of reproducibility outcomes by value type. The image shows a series of filled bars. The heights of the bar segments represent the frequencies with which reproducibility outcomes of different types were observed (match, minor discrepancy, major discrepancy) for each of the following kinds of reproducibility outcome: variation/uncertainty measures including standard deviations, standard errors and confidence intervals; effect sizes including Cohen’s d, Pearson’s r, partial eta squared and phi; test statistics including t, F and chi-squared; and central tendency measures including means and medians. Most bars indicate matches or minor discrepancies but there are small numbes of major discrepancies on p values, uncertainty measures and effect sizes.

Hardwicke et al. (2021): Frequency of reproducibility outcomes by value type.

Inferential reproducibility

If researchers repeat a study (results reproducibility) or re-analyze original data (methods reproducibility) then they should come to similar conclusions as original authors

But …

  • reproducibility attempt could reveal problems, uncertainty over choices
  • different researchers could apply different prior expectations over the probability of possible effects

The match between open science ideas and practices

The lessons learned from crises mean we now hope to see researchers:

  • Share data and code;
  • Publish research reports in ways that enable others to check or query analyses.
  • What do we see, when we look at current practices?

Data and code sharing

  • Analyses by Kidwell et al. (2016) and analyses reviewed by Nosek et al. (2022): study data increasingly available
Image taken from Nosek et al. (2021) figure 4. The image shows a series of line graphs. The graphs show yearly counts of users, sharing of files (research data, materials, code), and registration of studies on OSF and AsPredicted. The trends show steeply rising counts of new users, new public files and new project registrations

Nosek et al. (2022): Yearly counts of users, sharing of files (research data, materials, code), and registration of studies on OSF and AsPredicted.

Data and code sharing

It is great that data are shared but analyses show they are not always readily usable (Towse et al., 2021) but should be

  • completeness: are all the data and the data descriptors supporting a study’s findings publicly available?
  • reusability: how readily can the data be accessed and understood by others?

Enabling others to check or query analyses

Overall, the good news is

Yet studies of reproducibility identify challenges

Data challenges

  1. Data Availability Statements or open science badges indicate data are shared but data may not be directly accessible
  2. The data are shared and accessible but there may be missing or incorrect information about the data
  3. Original study authors may share raw and processed data or just processed or just raw data
  4. There may be mismatches between the variables referred to in the report and the variables named in the data file

Analysis challenges

  1. The description of the analysis procedure may be incomplete or ambiguous
  2. It may be difficult to identify the key analysis
  3. It is easier to reproduce results if both data and code are shared but code is not always shared
  4. Sometimes, analysis code is shared but it is difficult to use
  5. Sometimes, there are errors in the analysis

Summary: why this?

  1. We are in the middle of a credibility revolution
  2. Focusing on data analysis, it is useful to think about the whole data pipeline
  3. At every stage of the data pipeline, there are choices: forking paths
  4. We can share data and enable others to check, query analysis choices

Tip

  • Learning to do data analysis requires learning good workflow practices

Opportunities

  • The conventions of normal scientific practice can work so that if we encounter an anomaly we blame ourselves (Kuhn, 1970)
  • Sometimes, if you think you have found an error or a problem in a published analysis or a shared dataset, remember: you may be right (Herndon et al., 2014)

What we learn: critical reading skills

  • Learn to critically read research reports and data documentation
  • Develop proficiency in analysis workflow
  • Develop critical skills in analysis

What we learn: good practice sense

  • The credibility revolution requires us to think about and to teach good open science practices that safeguard the value of evidence in psychology

What we learn: practical sense

  • People make mistakes
  • Different choices are often reasonable
  • We always need to check the evidence

What we learn: do better

  • Share data and code
  • Publish research reports in ways that enable others to check or query analyses

References

Aarts, E., Dolan, C. V., Verhage, M., & Van der Sluis, S. (2015). Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives. BMC Neuroscience, 16(1), 1–15. https://doi.org/10.1186/s12868-015-0228-5
Bourdieu, P. (2004). Science of Science and Reflexivity. Polity.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376.
Crüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger, S. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S., Zaneva, M., & Brown, N. J. L. (n.d.). Whats in a badge? A computational reproducibility investigation of the open data badge policy in one issue of psychological science. https://doi.org/10.31234/osf.io/729qt
Flake, J. K., & Fried, E. I. (2020). Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393
Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102(6), 460–465. https://doi.org/10.1511/2014.111.460
Gilmore, R. O., Diaz, M. T., Wyble, B. A., & Yarkoni, T. (2017). Progress toward openness, transparency, and reproducibility in cognitive neuroscience. Annals of the New York Academy of Sciences, 1396, 5–18. https://doi.org/10.1111/nyas.13325
Goodman, S. N., Fanelli, D., & Ioannidis, J. P. A. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341).
Hardwicke, T. E., Bohn, M., MacDonald, K., Hembacher, E., Nuijten, M. B., Peloquin, B. N., deMayo, B. E., Long, B., Yoon, E. J., & Frank, M. C. (n.d.). Analytic reproducibility in articles receiving open data badges at the journal psychological science: An observational study. Royal Society Open Science, 8(1), 201494. https://doi.org/10.1098/rsos.201494
Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., Hofelich Mohr, A., Clayton, E., Yoon, E. J., Henry Tessler, M., Lenne, R. L., Altman, S., Long, B., & Frank, M. C. (2018). Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition. Royal Society Open Science, 5(8), 180448. https://doi.org/10.1098/rsos.180448
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? The Behavioral and Brain Sciences, 33(2-3). https://doi.org/10.1017/S0140525X0999152X
Herndon, T., Ash, M., & Pollin, R. (2014). Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge Journal of Economics, 38(2), 257–279. https://doi.org/10.1093/cje/bet075
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. https://doi.org/10.1177/0956797611430953
Kidwell, M. C., Lazarević, L. B., Baranski, E., Hardwicke, T. E., Piechowski, S., Falkenberg, L. S., Kennett, C., Slowik, A., Sonnleitner, C., Hess-Holden, C., Errington, T. M., Fiedler, S., & Nosek, B. A. (2016). Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biology, 14(5), 1–15. https://doi.org/10.1371/journal.pbio.1002456
Kuhn, T. S. (1970). The structure of scientific revolutions ([2d ed., enl). University of Chicago Press.
Laurinavichyute, A., Yadav, H., & Vasishth, S. (2022). Share the code, not just the data: A case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy. Journal of Memory and Language, 125, 104332. https://doi.org/10.1016/j.jml.2022.104332
Minocher, R., Atmaca, S., Bavero, C., McElreath, R., & Beheim, B. (n.d.). Estimating the reproducibility of social learning research published between 1955 and 2018. Royal Society Open Science, 8(9), 210450. https://doi.org/10.1098/rsos.210450
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie Du Sert, N., Simonsohn, U., Wagenmakers, E. J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1), 1–9. https://doi.org/10.1038/s41562-016-0021
Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., van?t Veer, A. E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in Cognitive Sciences, 23(10), 815–818.
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606.
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Kline Struhl, M., Nuijten, M. B., Rohrer, J. M., Romero, F., Scheel, A. M., Scherer, L. D., Schönbrodt, F. D., & Vazire, S. (2022). Replicability, Robustness, and Reproducibility in Psychological Science. Annual Review of Psychology, 73, 719–748. https://doi.org/10.1146/annurev-psych-020821-114157
Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. https://doi.org/10.1027/1864-9335/a000192
Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of open data and computational reproducibility in registered reports in psychology. Advances in Methods and Practices in Psychological Science, 3(2), 229–237. https://doi.org/10.1177/2515245920918872
Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science, 7(6), 528–530. https://doi.org/10.1177/1745691612465253
Silberzahn, R., & Uhlmann, E. L. (2015). Crowdsourced research: Many hands make tight work. Nature, 526(7572), 189–191. https://doi.org/10.1038/526189a
Towse, J. N., Ellis, D. A., & Towse, A. S. (2021). Opening Pandora’s Box: Peeking inside Psychology’s data sharing practices, and seven recommendations for change. Behavior Research Methods, 53(4), 1455–1468. https://doi.org/10.3758/s13428-020-01486-1
Wild, H., Kyröläinen, A.-J., & Kuperman, V. (2022). How representative are student convenience samples? A study of literacy and numeracy skills in 32 countries. PLOS ONE, 17(7), e0271191. https://doi.org/10.1371/journal.pone.0271191
Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, e1. https://doi.org/10.1017/S0140525X20001685