The Center of Reproducibility Science (CRS) in Zurich opens the new year by organizing a workshop “Design and Analysis of Replication Studies”. The goal of this workshop is to have “a thorough methodological discussion regarding the design and the analysis of replication studies including specialists from different fields such as clinical research, psychology, economics and others.”
I quite look forward to attending this workshop. The speakers include a former PhD student (Don van Ravenzwaaij), current collaborators (some of whom I’ve never met in person), and a stray statistician who is intelligent, knowledgeable, and nonetheless explicitly un-Bayesian; in other words, a complete and utter enigma. Also, this workshop forced me to consider again the Bayesian perspective on quantifying replication success. Previously, in work with Josine Verhagen and Alexander Ly, we had promoted the “replication Bayes factor”, in which the posterior distribution from the original study is used as the prior distribution for testing the effect in the replication study. However, this setup can be generalized considerably, as indicated in my workshop abstract below:
In this presentation I outline Bayesian answers to statistical questions surrounding replication success. The key object of interest is the posterior distribution for effect size based on data from an original study. The predictive performance of this posterior distribution can then be examined in light of data from a replication study. Specifically, the “replication Bayes factor” compares the predictive performance of the posterior distribution (which quantifies the opinion of an idealized proponent after seeing data from the original study) to that of the point null hypothesis (which quantifies the opinion of a hardened skeptic). However, we may also compare the predictive performance of the posterior distribution to that of the initial prior distribution (which quantifies the opinion of an unaware proponent who does not know the original study). Finally, the predictive performance of the posterior distribution may also be compared to that of alternative distributions that have a different mean but contain the same amount of information. Together, these methods allow a comprehensive and coherent assessment of the issues that surround the overly general question “did it replicate?”.
The basic idea is to link the question of replication success to the question of assessing prior model adequacy (see also Box, 1980, and the associated discussion), where the “prior model” is the one based on data from the original study (and prior to the replication study). I might illustrate the concepts involved with a recent highly successful replication attempt (my first?!) that so far has remained unpublished. Stay tuned…
Box, G. E. P. (1980). Sampling and Bayes’ inference in scientific modelling and robustness. Journal of the Royal Statistical Society, Series A, 143, 383-430.
Ly, A., Etz, A., Marsman, M., & Wagenmakers, E.-J. (in press). Replication Bayes factors from evidence updating. Behavior Research Methods.
Verhagen, A. J., & Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a replication attempt. Journal of Experimental Psychology: General, 143, 1457-1475.
About The Author
Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.