Powered by JASP

Concerns About the Default Cauchy Are Often Exaggerated: A Demonstration with JASP 0.12

Contrary to most of the published literature, the impact of the Cauchy prior width on the t-test Bayes factor is seen to be surprisingly modest. Removing the most extreme 50% of the prior mass can at best double the Bayes factor against the null hypothesis, the same impact as conducting a one-sided instead of a two-sided test. We demonstrate this with the help of the “Equivalence T-Test” module, which was added in JASP 0.12.

We recently revised a comment on a scholarly article by Jorge Tendeiro and Henk Kiers (henceforth TK). Before getting to the main topic of this post, here is the abstract:

Tendeiro and Kiers (2019) provide a detailed and scholarly critique of Null Hypothesis Bayesian Testing (NHBT) and its central component –the Bayes factor– that allows researchers to update knowledge and quantify statistical evidence. Tendeiro and Kiers conclude that NHBT constitutes an improvement over frequentist p-values, but primarily elaborate on a list of eleven ‘issues’ of NHBT. We believe that several issues identified by Tendeiro and Kiers are of central importance for elucidating the complementary roles of hypothesis testing versus parameter estimation and for appreciating the virtue of statistical thinking over conducting statistical rituals. But although we agree with many of their thoughtful recommendations, we believe that Tendeiro and Kiers are overly pessimistic, and that several of their ‘issues’ with NHBT may in fact be conceived as pronounced advantages. We illustrate our arguments with simple, concrete examples and end with a critical discussion of one of the recommendations by Tendeiro and Kiers, which is that “estimation of the full posterior distribution offers a more complete picture” than a Bayes factor hypothesis test.


A Primer on Bayesian Model-Averaged Meta-Analysis

This post is an extended synopsis of a preprint that is available on PsyArXiv: https://psyarxiv.com/97qup/


Meta-analysis is the predominant approach for quantitatively synthesizing a set of studies. If the studies themselves are of high quality, meta-analysis can provide valuable insights into the current scientific state of knowledge about a particular phenomenon. In psychological science, the most common approach is to conduct frequentist meta-analysis. In this primer, we discuss an alternative method, Bayesian model-averaged meta-analysis. This procedure combines the results of four Bayesian meta-analysis models: (1) fixed-effect null hypothesis, (2) fixed-effect alternative hypothesis, (3) random-effects null hypothesis, and (4) random-effects alternative hypothesis. These models are combined according to their plausibilities in light of the observed data to address the two key questions “Is the overall effect non-zero?” and “Is there between-study variability in effect size?”. Bayesian model-averaged meta-analysis therefore avoids the need to select either a fixed-effect or random-effects model and instead takes into account model uncertainty in a principled manner.

Omit Needless Words: An Unapproachable Example of Conciseness Related by the Traveling Chinese Story-teller Kai Lung

As mentioned in an earlier post, the epigraphs in Harold Jeffreys’s 1935 geophysics book “Earthquakes and mountains” prompted me to read “The Wallet of Kai Lung”, a collection of short stories by Ernest Bramah Smith (1868-1942). In one of the stories, “The confession of Kai Lung”, the traveling Chinese story-teller Kai Lung relates the following autobiographical tale, “an unapproachable example of conciseness”:

Preprint: A Tutorial on Bayesian Multi-Model Linear Regression with BAS and JASP

This post is a teaser for van den Bergh, D., Clyde, M. A., Raj, A., de Jong, T., Gronau, Q. F., Marsman, M., Ly, A., and Wagenmakers, E.-J. (2020). A Tutorial on Bayesian Multi-Model Linear Regression with BAS and JASP. Preprint available on                                                                             PsyArXiv: https://psyarxiv.com/pqju6/


Linear regression analyses commonly involve two consecutive stages of statistical inquiry. In the first stage, a single ‘best’ model is defined by a specific selection of relevant predictors; in the second stage, the regression coefficients of the winning model are used for prediction and for inference concerning the importance of the predictors. However, such second-stage inference ignores the model uncertainty from the first stage, resulting in overconfident parameter estimates that generalize poorly. These drawbacks can be overcome by model averaging, a technique that retains all models for inference, weighting each model’s contribution by its posterior probability. Although conceptually straightforward, model averaging is rarely used in applied research, possibly due to the lack of easily accessible software. To bridge the gap between theory and practice, we provide a tutorial on linear regression using Bayesian model averaging in JASP, based on the BAS package in R. Firstly, we provide theoretical background on linear regression, Bayesian inference, and Bayesian model averaging. Secondly, we demonstrate the method on an example data set from the World Happiness Report. Lastly, we discuss limitations of model averaging and directions for dealing with violations of model assumptions.

Preprint: A Bayesian Multiverse Analysis of Many Labs 4

Below is a summary of a preprint featuring an extensive reanalysis of the results Many Labs 4 project (current preprint). ML4 attempted to replicate the mortality salience effect. Following the publication of the preprint a heated debate broke out about data inclusion criteria. In an attempt of conciliation we decided to reanalyze the data using all proposed data inclusion criteria in a multiverse analysis. The figure below shows the results of this analysis.


Many Labs projects have become the gold standard for assessing the replicability of key findings in psychological science. The Many Labs 4 project recently failed to replicate the mortality salience effect where being reminded of one’s own death strengthens the own cultural identity. Here, we provide a Bayesian reanalysis of Many Labs 4 using meta-analytic and hierarchical modeling approaches and model comparison with Bayes factors. In a multiverse analysis we assess the robustness of the results with varying data inclusion criteria and prior settings. Bayesian model comparison results largely converge to a common conclusion: We find evidence against a mortality salience effect across the majority of our analyses. Even when ignoring the Bayesian model comparison results we estimate overall effect sizes so small (between d = 0.03 and d = 0.18) that it renders the entire field of mortality salience studies as uninformative.

On the Beauty of Publishing an Ugly Registered Report

I was exhausted and expecting my newborn to wake up any moment, but I wanted to look at the data. I had stopped data collection a month prior, and wasn’t due back at work for weeks, so it could have waited, but my academic brain was beginning to stir after what seemed like eons of pregnancy leave. Sneaking a peek at my still sleeping daughter, I downloaded the .csv from Qualtrics. I made columns for the independent variables, splitting the 6 conditions in half, and then fed the data into JASP. I had run the Bayesian ANOVA in JASP before, for the pilot study, and used the program for years before that, so I knew the interface by heart. I had my results, complete with a plot, within seconds.

The output wasn’t what I had expected or hoped for. It certainly wasn’t what our pilot had predicted. The inclusion Bayes factors were hovering around 1 and the plot with its huge error bars and strangely oriented lines were all wrong. Maybe I’d made a mistake. I had been in a rush after all, I reasoned, and could have easily mixed up the conditions. Several checks later, I was looking at the same wrong results through tear-filled eyes.

From the beginning, I had believed so completely in the effect we were attempting to capture. I thought it was a given that people would find the results of a registered report (RR) more trustworthy than those of a preregistration (PR), and that the PR results would be yet more trustworthy than those published `traditionally’ with no registration at all. Adding a layer of complexity to the design, we had considered familiarity for each level of registration. We expected that results reported by a familiar colleague would be more trustworthy than those of an unfamiliar person. Logical hypotheses, right? To me they were.

« Previous Entries Next Entries »

Powered by WordPress | Designed by Elegant Themes