Corona and the Statistics Wars

As the corona-crisis engulfs the world, politicians left and right are accused of “politicizing” the pandemic. In order to follow suit I will try to weaponize the pandemic to argue in favor of Bayesian inference over frequentist inference.

In recent months it has become clear that the corona pandemic is not just fought by doctors, nurses, and entire populations as they implement social distancing; it is also fought by statistical modelers, armed with data. As the disease spreads, it becomes crucial to study it statistically: how contagious it is, how it may respond to particular policy measures, how many people will get infected, and how many hospital beds will be needed. Fundamentally, one of the key goals is prediction. Good predictions come with a measure of uncertainty, or at least present different scenarios ranging from pessimistic to optimistic.

So how do statistical models for corona make their predictions? I am not an epidemiologist, but the current corona modeling effort is clearly a process that unfolds as more data become available. Good models will continually consume new data (i.e., new corona cases, information from other countries, covariates, etc.) in order to update their predictions. In other words, the models learn from incoming data in order to make increasingly accurate predictions about the future. This process of continual learning, without post-hoc and ad-hoc corrections for “data snooping”, is entirely natural — to the best of my knowledge, nobody has yet proposed that predictions be corrected for the fact that the models were estimated on a growing body of data.

In fact, it seems positively silly to “correct” models because they had access to more data. Yet, frequentist confidence intervals are based on the idea that parameter values are “tested”, “decisions” are being made, and error rate needs to be controlled. Specifically, a 95% frequentist confidence interval contains all parameter values that would not have been rejected had they been tested with a two-sided alpha-level of .05. Multiple “tests”, however, necessitate a correction if the error rate is to be controlled.

Here is a thought experiment to drive home the problem: imagine that the corona-pandemic is, unbeknownst to us, a galaxy-wide clinical trial conducted by alien medical researchers from a collection of planets near Proxima Centauri. Unfortunately, earth is the control condition. Now suppose the aliens had not stipulated a stopping rule, and are just monitoring the data, adjusting their conclusions and learning as they go along — clearly, from the earthly perspective of the FDA, this is statistical heresy. Any time the alien researchers would issue a prediction or stipulate a confidence interval, the FDA would gleefully point out that such a statement lacks any statistical basis, as “optional stopping” biases the conclusion.

It is arguably unlikely that the corona-pandemic is a galaxy-wide clinical trial. But the point is that our statistical methods should not depend on whether it is or isn’t: the data are exactly the same, and those who believe it is sensible for corona models to learn as data come in, without any corrections, will have a hard time arguing why such corrections are crucial if the pandemic were studied in a clinical trial.

About the Author

Eric-Jan Wagenmakers

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.