Powered by JASP

What Makes Science Transparent? A Consensus-Based Checklist

This post is a synopsis of Aczel et al. (2019). A consensus-based transparency checklist. Nature Human Behaviour. Open Access: https://www.nature.com/articles/s41562-019-0772-6.
The associated Shiny app is at http://www.shinyapps.org/apps/

How can social scientists make their work more transparent? Sixty-three editors and open science advocates reached consensus on this topic and created a checklist to help authors document various transparency-related aspects of their work.

Preprint: BFpack — Flexible Bayes Factor Testing of Scientific Theories in R

This post is a synopsis of Mulder, J., Gu, X., Olsson-Collentine, A., Tomarken, A., Böing-Messing, F., Hoijtink, H., Meijerink, M., Williams, D. R., Menke, J., Fox, J.-P., Rosseel, Y., Wagenmakers, E.-J., & van Lissa, C. (2019). BFpack: Flexible Bayes factor testing of scientific theories in R. Preprint available at https://arxiv.org/pdf/1911.07728.pdf


“There has been a tremendous methodological development of Bayes factors for hypothesis testing in the social and behavioral sciences, and related fields. This development is due to the flexibility of the Bayes factor for testing multiple hypotheses simultaneously, the ability to test complex hypotheses involving equality as well as order constraints on the parameters of interest, and the interpretability of the outcome as the weight of evidence provided by the data in support of competing scientific theories. The available software tools for Bayesian hypothesis testing are still limited however. In this paper we present a new R-package called BFpack that contains functions for Bayes factor hypothesis testing for the many common testing problems. The software includes novel tools (i) for Bayesian exploratory testing (null vs positive vs negative effects), (ii) for Bayesian confirmatory testing (competing hypotheses with equality and/or order constraints), (iii) for common statistical analyses, such as linear regression, generalized linear models, (multivariate) analysis of (co)variance, correlation analysis, and random intercept models, (iv) using default priors, and (v) while allowing data to contain missing observations that are missing at random.”

Overview of BFpack Functionality

A Variety of BFpack Test Questions

Example Applications

The preprint discusses seven application examples and illustrates each with R code. The examples concern (1) the t-test; (2) a 2-way ANOVA; (3) a test of equality of variances; (4) linear regression (with missing data) in fMRI research; (5) logistic regression in forensic psychology; (6) measures of association in neuropsychology; and (7) intraclass correlation.


Mulder, J., Gu, X., Olsson-Collentine, A., Tomarken, A., Böing-Messing, F., Hoijtink, H., Meijerink, M., Williams, D. R., Menke, J., Fox, J.-P., Rosseel, Y., Wagenmakers, E.-J., & van Lissa, C. (2019). BFpack: Flexible Bayes factor testing of scientific theories in R. Preprint available at https://arxiv.org/pdf/1911.07728.pdf

About The Author

Eric-Jan Wagenmakers

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.

Crowdsourcing Hypothesis Tests: The Bayesian Perspective

This post is a synopsis of the Bayesian work featured in Landy et al. (in press). Crowdsourcing hypothesis tests: Making transparent how design choices shape research results. Psychological Bulletin. Preprint available at https://osf.io/fgepx/; the 325-page supplement is available at https://osf.io/jm9zh/; the Bayesian analyses can be found on pp. 238-295.


“To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from two separate large samples (total N > 15,000) were then randomly assigned to complete one version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: materials from different teams rendered statistically significant effects in opposite directions for four out of five hypotheses, with the narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for two hypotheses, and a lack of support for three hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, while considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.”

Preprint: Practical Challenges and Methodological Flexibility in Prior Elicitation

This post is an extended synopsis of Stefan, A. M., Evans, N. J., & Wagenmakers, E.-J. (2019). Practical challenges and methodological flexibility in prior elicitation. Manuscript submitted for publication. Preprint available on PsyArXiv: https://psyarxiv.com/d42xb/



It is a well-known fact that Bayesian analyses require the specification of a prior distribution, and that different priors can lead to different quantitative, or even qualitative, conclusions. Because the prior distribution can be so influential, one of the most frequently asked questions about the Bayesian statistical framework is: How should I specify the prior distributions? Here, we take a closer look at prior elicitation — a subjective Bayesian method for specifying (informed) prior distributions based on expert knowledge — and examine the practical challenges researchers may face when implementing this approach for specifying their prior distributions. Specifically, our review of the literature suggests that there is a high degree of methodological flexibility within current prior elicitation techniques. This means that the results of a prior elicitation effort are not solely determined by the expert’s knowledge, but also heavily depend on the methodological decisions a researcher makes in the prior elicitation process. Thus, it appears that prior elicitation does not completely solve the issue of prior specification, but instead shifts influential decisions to a different level. We demonstrate the potential variability resulting from different methodological choices within the prior elicitation process in several examples, and make recommendations for how the variability in prior elicitation can be managed in future prior elicitation efforts.

A Breakdown of “Preregistration is Redundant, at Best”

In this sentence-by-sentence breakdown of the paper “Preregistration is Redundant, at Best”, I argue that preregistration is a pragmatic tool to combat biases that invalidate statistical inference. In a perfect world, strong theory sufficiently constrains the analysis process, and/or Bayesian robots can update beliefs based on fully reported data. In the real world, however, even astrophysicists require a firewall between the analyst and the data. Nevertheless, preregistration should not be glorified. Although I disagree with the title of the paper, I found myself agreeing with almost all of the authors’ main arguments.

How to Evaluate a Subjective Prior Objectively

The Misconception

Gelman and Hennig (2017, p. 989) argue that subjective priors cannot be evaluated by means of the data:

“However, priors in the subjectivist Bayesian conception are not open to falsification (…), because by definition they must be fixed before observation. Adjusting the prior after having observed the data to be analysed violates coherence. The Bayesian system as derived from axioms such as coherence (…) is designed to cover all aspects of learning from data, including model selection and rejection, but this requires that all potential later decisions are already incorporated in the prior, which itself is not interpreted as a testable statement about yet unknown observations. In particular this means that, once a coherent subjectivist Bayesian has assessed a set-up as exchangeable a priori, he or she cannot drop this assumption later, whatever the data are (think of observing 20 0s, then 20 1s, and then 10 further 0s in a binary experiment)”

Similar claims have been made in the scholarly review paper by Consonni at al., 2018 (p. 628): “The above view of “objectivity” presupposes that a model has a different theoretical status relative to the prior: it is the latter which encapsulates the subjective uncertainty of the researcher, while the model is less debatable, possibly because it can usually be tested through data.”

The Correction

Statistical models are a combination of likelihood and prior that together yield predictions for observed data (Box, 1980; Evans, 2015). The adequacy of these predictions can be rigorously assessed using Bayes factors (Wagenmakers, 2017; but see the blog post by Christian Robert, further discussed below). In order to evaluate the empirical success of a particular subjective prior distribution, we require multiple subjective Bayesians, or a single “schizophrenic” subjective Bayesian that is willing to entertain several different priors.

« Previous Entries Next Entries »

Powered by WordPress | Designed by Elegant Themes