JASP_logo
Currently Browsing: General

Redefine Statistical Significance Part XII: A BITSS debate with Simine Vazire and Daniel Lakens

This Tuesday, one of us [EJ] participated in a debate about –you guessed it– the α = .005 recommendation from the paper ‘Redefine Statistical Significance’. The debate was organized as part of the Annual Meeting of the Berkeley Initiative for Transparency in the Social Sciences (BITSS), and the two other discussants were Simine Vazire and Daniel Lakens.

The debate was live-streamed and taped so that you can view a recording: the debate starts at about 32:30 and lasts until 1:40:30. The discussion starts at around 01:13:00.

Barbecue Chicken Alert!

In the opening statement, I [EJ] wanted to emphasize the weakness of evidence for p-just-below-.05 results. To drive the point home, I used a popular phrase from basketball great Shaquille O’Neal — a.k.a. ‘The big Aristotle’. For those of you who do not know the Diesel, he is a 325 pound (147 kg), 7 ft 1 in (2.16 m) force of nature. In his prime, he reduced the wonderful game of basketball to a boring display of raw power: Shaq would catch the ball in the post (i.e., near the basket), back up into his defender, let the poor fellow taste “some of the elbow juice”, and then dunk the ball with authority, occasionally shattering the backboard for good measure (a funny example is here, “is that all you got?”). The way in which Shaq would demolish the defense felt both inevitable and unfair. At any rate, the Diesel had a phrase to describe the inevitability of success in the face of weak opposition: “barbecue chicken”.

If you look up the phrase on the urban dictionary, you will find two definitions, and both I believe are missing the point. Here’s the first one:

And here’s the second one:

We won’t know for certain until the big Aristotle explains exactly what he meant, but I believe the expression applies in general, and is meant to describe any situation in which a person of superior skill applies a series of routine moves to overwhelm an ineffective resistance. In Shaq’s analogy, there is no need to chase the chicken, pluck the chicken, and cook the chicken; no, the chicken has already been prepared and the only thing that you need to do is reach out and eat it. A phrase that is semantically close is ‘shooting fish in a barrel’.

At any rate, what I [EJ] wanted to convey is that, whenever a p-just-below-.05 result is presented, it is a routine exercise to open JASP, tick a few boxes in the Summary Stats module, and reveal that the evidence against the null hypothesis is surprisingly weak. Concrete demonstrations are provided in earlier blog posts, here and here.

So, whether encountered in the published literature or in a preprint, any p-just-below-.05 result should raise a red flag — it will be easy to show that the evidence is disappointingly low. In the words of the big Aristotle: “barbecue chicken alert!”

The Upshot

So who won the debate? You can judge for yourself, but in the end we hope the real winners are the researchers who viewed the debate and are now more aware that the .05 threshold is anything but sacred.

At the end of the workshop, the 40+ participants were offered a choice between three alternatives:

  1. Stick to α = .05 (one vote);
  2. Move to α = .005 (three votes, including that of Simine and myself);
  3. Something else / don’t know (all the other votes).

Would the third option be ‘justify your own alpha and ignore the concept of evidence altogether?’; would it be ‘let’s just be Bayesian, who needs this alpha?’; or perhaps ‘let’s use estimation instead of testing’; or perhaps even ‘let’s think carefully and take all information into consideration before arriving at a judgement that everybody will consider wise and appropriate, after which world peace is declared’.

We don’t know, but we do know that the debate itself was worthwhile and that there is an increasing need to better understand how correct conclusions can be drawn from noisy data.

References

Wagenmakers, E.-J. (2017). Barbecue chicken alert! Invited presentation for the plenary discussion session (with Simine Vazire and Daniel Lakens) at the Annual Meeting of the Berkeley Initiative for Transparency in the Social Sciences (BITSS), Berkeley, USA, December 2017. Slides are here.


 

Like this post?

Subscribe to the JASP newsletter to receive regular updates about JASP including the latest Bayesian Spectacles blog posts! You can unsubscribe at any time.


About The Authors

Eric-Jan Wagenmakers

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.

Quentin Gronau

Quentin is a PhD candidate at the Psychological Methods Group of the University of Amsterdam.


How to Prevent Your Dog from Getting Stuck in the Dishwasher

This week, Dorothy Bishop visited Amsterdam to present a fabulous lecture on a topic that has not (yet) received the attention it deserves: “Fallibility in Science: Responsible Ways to Handle Mistakes”. Her slides are available here.

As Dorothy presented her series of punch-in-the-gut, spine-tingling examples, I was reminded of a presentation that my Research Master students had given a few days earlier. The students presented ethical dilemmas in science — hypothetical scenarios that can ensnare researchers, particularly early in their career when they lack the power to make executive decisions. And for every scenario, the students asked the class, ‘What would you do?’ Consider, for example, the following situation:

SCENARIO: You are a junior researcher who works in a large team that studies risk-seeking behavior in children with attention-deficit disorder. You have painstakingly collected the data, and a different team member (an experienced statistical modeler) has conducted the analyses. After some back-and-forth, the statistical results come out exactly as the team would have hoped. The team celebrates and prepares to submit a manuscript to Nature Human Behavior. However, you suspect that multiple analyses have been tried, and only the best one is presented in the manuscript.

(more…)


Redefine Statistical Significance Part XI: Dr. Crane Forcefully Presents…a Red Herring?

The paper “Redefine Statistical Significance” continues to make people uncomfortable. This, of course, was exactly the goal: to have researchers realize that a p-just-below-.05 outcome is evidentially weak. This insight can be painful, as many may prefer the statistical blue pill (‘believe whatever you want to believe’) over the statistical red pill (‘stay in Wonderland and see how deep the rabbit hole goes’). Consequently, a spirited discussion has ensued.

(more…)


A Personal Impression of the ASA Symposium on Statistical Inference: A World Beyond p<.05

 

I (Alex Etz) recently attended the American Statistical Association’s “Symposium on Statistical Inference” (SSI) in Bethesda Maryland. In this post I will give you a summary of its contents and some of my personal highlights from the SSI.

The purpose of the SSI was to follow up on the historic ASA statement on p-values and statistical significance. The ASA statement on p-values  was written by a relatively small group of influential statisticians and lays out a series of principles regarding what they see as the current consensus about p-values. Notably, there were mainly “don’ts” in the ASA statement. For instance: “P-values do not measure the probability that the studied hypothesis is true, nor the probability that the data were produced by random chance alone”; “Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold”; “A p-value, or statistical significance, does not measure the size of an effect or the importance of a result” (emphasis mine).

(more…)


Redefine Statistical Significance Part X: Why the Point-Null Will Never Die

In our previous post, we discussed the paper “Abandon Statistical Significance”, which is a response to the paper “Redefine Statistical Significance” that has dominated the contents of this blog so far. The Abandoners include Andrew Gelman and Christian Robert, and on their own blogs they’ve each posted a reaction to our Bayesian Spectacles post. Below is a short response to their reaction to the discussion of the reply to the original paper. 🙂

(more…)


« Previous Entries

Powered by WordPress | Designed by Elegant Themes
RSS
Follow by Email
Facebook
Twitter