Posted on Nov 29th, 2017

This week, Dorothy Bishop visited Amsterdam to present a fabulous lecture on a topic that has not (yet) received the attention it deserves: “Fallibility in Science: Responsible Ways to Handle Mistakes”. Her slides are available here.

As Dorothy presented her series of punch-in-the-gut, spine-tingling examples, I was reminded of a presentation that my Research Master students had given a few days earlier. The students presented ethical dilemmas in science — hypothetical scenarios that can ensnare researchers, particularly early in their career when they lack the power to make executive decisions. And for every scenario, the students asked the class, ‘What would you do?’ Consider, for example, the following situation:

SCENARIO: You are a junior researcher who works in a large team that studies risk-seeking behavior in children with attention-deficit disorder. You have painstakingly collected the data, and a different team member (an experienced statistical modeler) has conducted the analyses. After some back-and-forth, the statistical results come out *exactly* as the team would have hoped. The team celebrates and prepares to submit a manuscript to *Nature Human Behavior*. However, you suspect that multiple analyses have been tried, and only the best one is presented in the manuscript.

Posted on Nov 23rd, 2017

The paper “Redefine Statistical Significance” continues to make people uncomfortable. This, of course, was exactly the goal: to have researchers realize that a p-just-below-.05 outcome is evidentially weak. This insight can be painful, as many may prefer the statistical blue pill (‘believe whatever you want to believe’) over the statistical red pill (‘stay in Wonderland and see how deep the rabbit hole goes’). Consequently, a spirited discussion has ensued.

Posted on Nov 16th, 2017

*For Christian Robert’s blog post about the bridgesampling package, click here.*

Bayesian inference is conceptually straightforward: we start with prior uncertainty and then use Bayes’ rule to learn from data and update our beliefs. The result of this learning process is known as posterior uncertainty. Quantities of interest can be *parameters* (e.g., effect size) within a single statistical model or different competing *models* (e.g., a regression model with three predictors vs. a regression model with four predictors). When the focus is on models, a convenient way of comparing two models *M*_{1} and *M*_{2} is to consider the model odds:

Posted on Nov 11th, 2017

*This post is based on the example discussed in Wagenmakers et al. (in press).*

Bayes factors are a measure of *absolute* goodness-of-fit or *absolute* pre-

dictive performance.

Bayes factors are a measure of *relative* goodness-of-fit or *relative* predictive performance. Model *A* may outpredict model *B* by a large margin, but this does not imply that model *A* is good, appropriate, or useful in absolute terms. In fact, model *A* may be absolutely terrible, just less abysmal than model *B*.

Statistical inference rarely deals in absolutes. This is widely recognized: many feel the key objective of statistical modeling is to quantify the uncertainty about parameters of interest through confidence or credible intervals. What is easily forgotten is that there is additional uncertainty, namely that which concerns the choice of the statistical model.

Posted on Nov 3rd, 2017

*This post is inspired by Morey et al. (2016), Rouder and Morey (in press), and Wagenmakers et al. (2016a).*

Bayes factors may be relevant for model selection, but are irrelevant for

parameter estimation.

For a continuous parameter, Bayesian estimation involves the computation of an infinite number of Bayes factors against a continuous range of different point-null hypotheses.

Let *H*_{0} specify a general law, such that, for instance, the parameter *θ* has a fixed value *θ*_{0}. Let *H*_{1} relax the general law and assign *θ* a prior distribution *p*(*θ* | *H*_{1}). After acquiring new data one may update the plausibility for *H*_{1} versus *H*_{0} by applying Bayes’ rule (Wrinch and Jeffreys 1921, p. 387):