Posted on Oct 21st, 2017

I (Alex Etz) recently attended the American Statistical Association’s “Symposium on Statistical Inference” (SSI) in Bethesda Maryland. In this post I will give you a summary of its contents and some of my personal highlights from the SSI.

The purpose of the SSI was to follow up on the historic ASA statement on p-values and statistical significance. The ASA statement on p-values was written by a relatively small group of influential statisticians and lays out a series of principles regarding what they see as the current consensus about p-values. Notably, there were mainly “don’ts” in the ASA statement. For instance: “P-values **do not** measure the probability that the studied hypothesis is true, nor the probability that the data were produced by random chance alone”; “Scientific conclusions and business or policy decisions **should not** be based only on whether a p-value passes a specific threshold”; “A p-value, or statistical significance, **does not** measure the size of an effect or the importance of a result” (emphasis mine).

The SSI was all about figuring out the “do’s”. The list of sessions was varied (you can find the entire program here), with titles ranging from “What kind of statistical evidence do policy makers need?” to “Alternative methods with strong Frequentist foundations” to “Statisticians: Sex Symbols, Liars, Both, or Neither?” From the JASP twitter account (@JASPStats), I live-tweeted a number of sessions at the SSI:

- Can Bayesian methods offer a practical alternative to P-values (twitter thread)
- What must change in the teaching of statistical inference in introductory classes? (twitter thread)
- Communicating statistical uncertainty (twitter thread)
- Statisticians: Sex symbols, Liars, Both, or Neither? (twitter thread)

The rest of this post will highlight two very interesting sessions I got to see while at the SSI. For the rest of them see the live tweet threads above. Overall I found the SSI to be incredibly intellectually stimulating and I was impressed by the many insightful perspectives on display!

This session began with Valen Johnson explaining the rationale behind some of the comparisons in the recent (notorious) p<.005 paper (preprint here). He clearly identified his main points (see the twitter thread), namely that Bayes factors (based on observed t-values) against the null hypothesis are bounded at low values (3 to 6) when p is around .05. Most of the material was included to some extent in the .005 paper, which you can see for the details.

The second speaker was Merlise Clyde, who wanted to investigate the frequentist properties of certain Bayesian procedures when working in a regression context. This involves looking at coverage rate (how often does an interval contain true values) and rates of incorrect inferences. My big takeaway from Clyde’s talk was that when there are many possible models that can account for the data, such as regression models that include or exclude various predictors, our best inferences are made when we do model averaging. A great example of this is when we have multiple forecasts for where a hurricane will land, so we take them all into account rather than pick just one that we think is best! (Clyde also gave a shout-out to JASP, which will soon be implementing her R package).

The final speaker was Bramar Mukherjee, who discussed the practical benefits that Bayesian methods offer in the context of genetics. From Mukherjee I learned the new abbreviation BFF: **B**ayes **F**actors **F**orever. She traced the history of the famously low p-value thresholds used in genetics research, as well as discussed the very simple idea of focusing on Shrinkage Estimation, which can be framed as implementing an automatic bias-variance tradeoff. In the discussion Mukherjee raised a very important point: We need to begin focusing, as many fields already have, on large-scale collaboration, “It will get harder to evaluate CVs if every paper has 200 authors,” Mukherjee noted, “but we need to do it anyway!”

This session focused on the number of educational challenges we face as we move forward in a post p<.05 world. John Bailer began the session discussing how his department has been trying to improve: Introductory undergrad courses have become more of a hybrid between procedural and definitional work done outside of class, and in class emphasis on just-in-time teaching of concepts and lab exercises. The goal is to emphasize understanding of concepts and encourage active student engagement. Their graduate level courses have begun to incorporate more integrated projects from real research scenarios to give context to the theory the students are learning. Some challenges: Students tend to not understand p-values after a single introductory class, and there is still little emphasis on teaching Bayesian methods.

The second speaker was Dalene Stangl, who discussed why “we need to transition toward having more philosophy in our courses”. I jotted down a couple of very interesting things she said during the talk: “Teach that disagreement [in the context of debating proper statistical analyses] is natural and tensions are OK”; “200,000 high school students took the AP exam last year. Notably there is basically no Bayesian statistics on the curriculum!”. Moreover, statisticians (and quantitatively focused researchers in general) face certain system pressures: Other disciplines desire that we teach algorithms and procedures that if followed will lead to a right/wrong answer, rather than a way of disciplined thinking, challenging, telling a story, and persuasive argument.

The final speaker was Daniel Kaplan, who had a lovely title for his talk: Putting p-values in their place. This was one of my favorite talks at the SSI. Kaplan stressed that we need to bring context into play when teaching statistical methods. Introductory stats problems often result in uninterpretable answers, and we must ask “is this result meaningful?” In a related point, he also stressed that one of the reasons for heavy teaching of p-values is that it allows teachers to avoid needing domain expertise, and keeps it safely in the domain of math. Kaplan highlighted a big problem in teaching statistics that he calls *Proofiness:* “The ethos of mathematics is proof and deduction. So teach about things that can be proved, e.g., the distribution [of the test statistic] under the null hypothesis. Avoid covariates [and how to choose them]. [Avoid] One-tailed vs two-tailed tests. [Avoid] Equal vs unequal variances t-test.” He sees the problem stemming from teaching statistics with “undue exactitude.” Statistics is messy! Kaplan had a wonderful analogy regarding how we teach students to avoid causal inferences when doing stats: “We teach statistics like abstinence-only sex-education: We don’t want our students to infer causation, but they’re going to do it anyway! We need to teach safer causal inference.” Some recommendations for teaching stats moving forward: Everyone teaching stats should acquire some domain-specific knowledge and use examples in a meaningful context. “What does our result tell us about how the world works?” We should train instructors in ways of dealing with covariates (not just: no causation without experiments). Put data interpretation into the domain of models, not “parameters”.

Alexander is a PhD student in the department of cognitive sciences at the University of California, Irvine.