Across virtually all of the empirical disciplines, the single most dominant procedure for drawing conclusions from data is “compare-your-p-value-to-.05-and-declare-victory-if-it-is-lower”. Remarkably, this common strategy appears to create about as much enthusiasm as forcefully stepping in a fresh pile of dog poo.
For instance, In a recent critique of the “compare-your-p-value-to-.05-and-declare-victory-if-it-is-lower” procedure, 72 researchers argued that p-just-below-.05 results are evidentially weak, and therefore ought to be interpreted with caution; in order to make strong claims, a threshold of .005 is more appropriate. Their approach is called “Redefine Statistical Significance” (RSS). In response, 88 other authors argued that statistical thresholds ought to be chosen not by default, but by judicious argument: these authors argued that one should justify one’s alpha. Finally, another group of authors, the Abandoners, argued that p-values should never be used to declare victory, regardless of the threshold. In sum, several large groups of researchers have argued, each with considerable conviction, that the popular “compare-your-p-value-to-.05-and-declare-victory-if-it-is-lower” procedure is fundamentally flawed.
This post is about the position taken by the “Justifiers”. Their position was examined in earlier blog posts, here and here. Now JP de Ruiter has written a paper (the preprint is here) in which he critically examines the arguments of the Justifiers. Here is the abstract of de Ruiter’s paper:
Benjamin et al. (2017) proposed improving the reproducibility of findings in psychological research by lowering the alpha level of our conventional Null Hypothesis Significance Tests from .05 to .005, because findings with p-values close to .05 represent insufficient empirical evidence. They argued that findings with a p-value between 0.005 and 0.05 should still be published, but not called “significant” anymore.
This proposal was criticized and rejected in a response by Lakens et al. (2018), who argued that instead of lowering the traditional alpha threshold to .005, we should stop using the term “statistically significant”, and require researchers to determine and justify their alpha levels before they collect data.
In this contribution, I argue that the arguments presented by Lakens et al. against the proposal by Benjamin et al (2017) are not convincing. Thus, given that it is highly unlikely that our field will abandon the NHST paradigm any time soon, lowering our alpha level to .005 is at this moment the best way to combat the replication crisis in psychology.
De Ruiter, J.P. (in press). Redefine or justify? Comments on the alpha debate. Psychonomic Bulletin & Review.