Powered by JASP

Workshop “Design and Analysis of Replication Studies”, January 23-24

The Center of Reproducibility Science (CRS) in Zurich opens the new year by organizing a workshop “Design and Analysis of Replication Studies”. The goal of this workshop is to have “a thorough methodological discussion regarding the design and the analysis of replication studies including specialists from different fields such as clinical research, psychology, economics and others.”

I quite look forward to attending this workshop. The speakers include a former PhD student (Don van Ravenzwaaij), current collaborators (some of whom I’ve never met in person), and a stray statistician who is intelligent, knowledgeable, and nonetheless explicitly un-Bayesian; in other words, a complete and utter enigma. Also, this workshop forced me to consider again the Bayesian perspective on quantifying replication success. Previously, in work with Josine Verhagen and Alexander Ly, we had promoted the “replication Bayes factor”, in which the posterior distribution from the original study is used as the prior distribution for testing the effect in the replication study. However, this setup can be generalized considerably, as indicated in my workshop abstract below:

Poisson Regression in Labor Law

R code for the reported analyses is available at https://osf.io/sfam7/.

My wife Nataschja teaches labor law at Utrecht University. For one of her papers she needed to evaluate the claim that “over the past 35 years, the number of applications processed by the AAC (Advice and Arbitration Committee) has decreased”. After collecting the relevant data Nataschja asked me whether I could help her out with a statistical analysis. Before diving in, below are the raw data and the associated histogram:


Gegevens <- data.frame(
  Jaar      = seq(from=1985,to=2019),
  Aanvragen = c(6,3,4,3,6,3,2,4,0,2,3,1,3,3,2,7,0,1,2,4,2,1,
# NB: “Gegevens” means “Data”, “Jaar” means “Year”, and “Aanvragen” means “Applications”


NB. “Aantal behandelde aanvragen” means “number of processed applications”.

Based on a visual inspection most people would probably conclude that there has indeed been a decrease in the number of processed applications over the years, although that decrease is due mainly to the relatively high numbers of processed applications in the first five years (more on this later).

Below I will describe the analyses that I conducted without the benefit of knowing a lot about the subject area. Indeed, I also didn’t know much about the analysis itself. In experimental psychology, the methodologist feeds on a steady diet: a t-test for breakfast, a correlation for lunch, and an ANOVA for dinner, interrupted by the occasional snack of a contingency table. After some thought, I felt that this data set cried out for Poisson regression — the dependent variable are counts, and “year” is the predictor of interest. By testing whether we need the predictor “year”, we can more or less answer Nataschja’s question directly. Poisson regression has not yet been added to JASP, and this is why I am presenting R code here (the complete code is at https://osf.io/sfam7/).

Unpacking the Disagreement: Guest Post by Donkin and Szollosi

This post is a response to the previous post A Breakdown of “Preregistration is Redundant, at Best”.

We were delighted to see how interested people were in the short paper we wrote on preregistration with our co-authors (now published at Trends in Cognitive Science – the revised version of which has been uploaded). First, a note on the original title. As EJ correctly reconstructed in his review, we initially gave the provocative title “Preregistration is redundant, at best” in an effort to push-back against the current idolizing attitude towards preregistration. What we meant by redundancy was simply that preregistration is not diagnostic of good science (we tried to bring out this point more clearly in the revision, now titled “Is preregistration worthwhile?”). Many correctly noted that this can be said of any one method of science. Our argument is that we should not promote and reward any one method, but rather good arguments and good theory (or, rather, acts that move us in the direction of good theory).

Based on EJ’s post, it seems that we agree in many ways with proponents of preregistration (e.g., that there’s room and need for improvement in the behavioral and social sciences). However, there remains much we disagree on. In the following we try to (start to) articulate some of the points of disagreement in order to identify why we, ultimately, reach such different conclusions.

The Support Interval

This post summarizes Wagenmakers, E.-J., Gronau, Q. F., Dablander, F., & Etz, A. (in press). The support interval. Erkenntnis. Preprint available on PsyArXiv: https://psyarxiv.com/zwnxb/


A frequentist confidence interval can be constructed by inverting a hypothesis test, such that the interval contains only parameter values that would not have been rejected by the test. We show how a similar definition can be employed to construct a Bayesian support interval. Consistent with Carnap’s theory of corroboration, the support interval contains only parameter values that receive at least some minimum amount of support from the data. The support interval is not subject to Lindley’s paradox and provides an evidence-based perspective on inference that differs from the belief-based perspective that forms the basis of the standard Bayesian credible interval.

Preprint: A Cautionary Note on Estimating Effect Size

This post is a teaser for van den Bergh, D., Haaf, J. M., Ly, A., Rouder, J. N., & Wagenmakers, E.-J. (2019). A cautionary note on estimating effect size. Preprint available on PsyArXiv: https://psyarxiv.com/h6pr8/



“An increasingly popular approach to statistical inference is to focus on the estimation of effect size while ignoring the null hypothesis that the effect is absent. We demonstrate how this common “null hypothesis neglect” may result in effect size estimates that are overly optimistic. The overestimation can be avoided by incorporating the plausibility of the null hypothesis into the estimation process through a “spike-and-slab model”.”


Compensatory Control and Religious Beliefs: A Registered Replication Report Across Two Countries

This post is an extended synopsis of Hoogeveen, S., Wagenmakers, E.-J., Kay, A. C., & van Elk, M. (in press). Compensatory Control and Religious Beliefs: A Registered Replication Report Across Two Countries. Comprehensive Results in Social Psychology. https://doi.org/10.1080/



Compensatory Control Theory (CCT) suggests that religious belief systems provide an external source of control that can substitute a perceived lack of personal control. In a seminal paper, it was experimentally demonstrated that a threat to personal control increases endorsement of the existence of a controlling God. In the current registered report, we conducted a high-powered (N = 829) direct replication of this effect, using samples from the Netherlands and the United States (US). Our results show moderate to strong evidence for the absence of an experimental effect across both countries: belief in a controlling God did not increase after a threat compared to an affirmation of personal control. In a complimentary preregistered analysis, an inverse relation between general feelings of personal control and belief in a controlling God was found in the US, but not in the Netherlands. We discuss potential reasons for the replication failure of the experimental effect and cultural mechanisms explaining the cross-country difference in the correlational effect. Together, our findings suggest that experimental manipulations of control may be ineffective in shifting belief in God, but that individual differences in the experience of control may be related to religious beliefs in a way that is consistent with CCT.

« Previous Entries Next Entries »

Powered by WordPress | Designed by Elegant Themes