Currently Browsing: General
#
“Bayesian Inference Without Tears” at CIRM

#
A Bayesian Perspective on the Proposed FDA Guidelines for Adaptive Clinical Trials

#
Bayesian Advantages for the Pragmatic Researcher: Slides from a Talk in Frankfurt

#
Redefine Statistical Significance XVII: William Rozeboom Destroys the “Justify Your Own Alpha” Argument…Back in 1960

#
Redefine Statistical Significance Part XVI: The Commentary by JP de Ruiter

Posted on Oct 25th, 2018

Today I am presenting a lecture for the “Masterclass in Bayesian Statistics” that takes place from October 22 to October 26th 2018, at CIRM (Centre International de Rencontres Mathématiques) in Marseille, France. The slides of my talk,“Bayesian Inference Without Tears” are here. Unfortunately the slides cannot convey the JASP demo work, but the presentations are taped so I hope to be able to provide a video link at some later point in time.

Posted on Oct 18th, 2018

The ~~frequentist~~ food and drug administration (FDA) has circulated a draft version of new guidelines for adaptive designs, with the explicit purpose of soliciting comments. The draft is titled “Adaptive designs for clinical trials of drugs and biologics: Guidance for industry” and you can find it here. As summarized on the FDA webpage, this draft document

“(…) addresses principles for designing, conducting and reporting the results from an adaptive clinical trial. An adaptive design is a type of clinical trial design that allows for planned modifications to one or more aspects of the design based on data collected from the study’s subjects while the trial is ongoing. The advantage of an adaptive design is the ability to use information that was not available at the start of the trial to improve efficiency. An adaptive design can provide a greater chance to detect the true effect of a product, often with a smaller sample size or in a shorter timeframe. Additionally, an adaptive design can reduce the number of patients exposed to an unnecessary risk of an ineffective investigational treatment. Patients may even be more willing to enroll in these types of trials, as they can increase the probability that subjects will be assigned to the more effective treatment.”

Posted on Sep 20th, 2018

This Monday in Frankfurt I presented a keynote lecture for the *51th Kongress der Deutschen Gesellschaft fuer Psychologie*. I resisted the temptation to impress upon the audience the notion that they were all Statistical Sinners for not yet having renounced the p-value. Instead I outlined five concrete Bayesian data-analysis projects that my lab had conducted in recent years. So no p-bashing, but only Bayes-praising, and mostly by directly demonstrating the practical benefits in concrete application.

The talk itself went well, although at the beginning I believe the audience was fearful that I would just drone on and on about the theory underlying Bayes’ rule. Perhaps I’m just too much in love with the concept. Anyway, it seemed the audience was thankful when I switched to the concrete examples. I could show a new cartoon by Viktor Beekman (“The Two Faces of Bayes’ Rule”, also in our Library; concept by myself and Quentin Gronau), and I showed two pictures of my son Theo (not sure whether the audience realized that, but it was not important anyway).

Posted on Sep 13th, 2018

Background: the recent paper “Redefine Statistical Significance” suggested that it is prudent to treat *p*-values just below .05 with a grain of salt, as such *p*-values provide only weak evidence against the null. The counterarguments to this proposal were varied, but in most cases the central claim (that *p*-just-below-.05 findings are evidentially weak) was not disputed; instead, one group of researchers (the Abondoners) argued that *p*-values should simply be undervalued or replaced entirely, whereas another group (the Justifiers) argued that instead of employing a pre-defined threshold α for significance (such as .05, .01, or .005), researchers should *justify* the α used.

The argument from the Justifiers sounds appealing, but it has two immediate flaws (see also the recent paper by JP de Ruiter). First, it is somewhat unclear how exactly the researcher should go about the process of “justifying” an α (but see this blog post). The second flaw, however, is more fundamental. Interestingly, this flaw was already pointed out by William Rozeboom in 1960 (the reference is below). In his paper, Rozeboom discusses the trials and tribulations of “Igor Hopewell”, a fictional psychology grad student whose dissertation work concerns the study of the predictions from two theories, and . Rozeboom then proceeds to demolish the position from the Justifiers, almost 60 years early:

“In somewhat similar vein, it also occurs to Hopewell that had he opted for a somewhat riskier confidence level, say a Type I error of 10% rather than 5%, would have fallen outside the region of acceptance and would have been rejected.

Now surely the degree to which a datum corroborates or impugns a proposition should be independent of the datum-assessor’s personal temerity. [italics ours] Yet according to orthodox significance-test procedure, whether or not a given experimental outcome supports or disconfirms the hypothesis in question depends crucially upon the assessor’s tolerance for Type I risk.” (Rozeboom, 1960, pp. 419-420)

Posted on Aug 27th, 2018

Across virtually all of the empirical disciplines, the single most dominant procedure for drawing conclusions from data is “compare-your-p-value-to-.05-and-declare-victory-if-it-is-lower”. Remarkably, this common strategy appears to create about as much enthusiasm as forcefully stepping in a fresh pile of dog poo.

For instance, In a recent critique of the “compare-your-p-value-to-.05-and-declare-victory-if-it-is-lower” procedure, 72 researchers argued that p-just-below-.05 results are evidentially weak, and therefore ought to be interpreted with caution; in order to make strong claims, a threshold of .005 is more appropriate. Their approach is called “Redefine Statistical Significance” (RSS). In response, 88 other authors argued that statistical thresholds ought to be chosen not by default, but by judicious argument: these authors argued that one should *justify* one’s alpha. Finally, another group of authors, the Abandoners, argued that p-values should never be used to declare victory, regardless of the threshold. In sum, several large groups of researchers have argued, each with considerable conviction, that the popular “compare-your-p-value-to-.05-and-declare-victory-if-it-is-lower” procedure is fundamentally flawed.