To my shame and regret, I only recently found the opportunity to read the book “Bayesian philosophy of science” (BPS), by Jan Sprenger and Stephan Hartmann. It turned out to be a wonderful book, both in appearance, typesetting, and in contents. The book confirmed many of my prior beliefs ;- ) but it also made me think about the more philosophical topics that I usually avoid thinking about. One example is “the problem of old evidence” (due to Glymour, 1980). Entire forests have been felled in order for philosophers to be able to debate the details of this problem.

Below I will provide my current perspective on this problem, considered by Sprenger and Hartmann to present “one of the most troubling and persistent challenges for Bayesian Confirmation Theory” (p. 132). It is highly likely that my argument is old, or even beside the point. I am not an expert on this particular problem. But first let’s outline the problem.

### Old Evidence

In order to provide the correct context I will quote liberally from PBS:

The textbook example from the history of science is the precession of the perihelion of the planet Mercury (…). For a long time, Newtonian mechanics failed to account for this phenomenon; and postulated auxiliary hypotheses (e.g., the existence of another planet within the orbit of Mercury) failed to be confirmed. Einstein realized in the 1910s that his General Theory of Relativity (GTR) accounted for the perihelion shift. According to most physicists, explaining this “old evidence” (in the sense of data known previously) conferred a substantial degree of confirmation on GTR, perhaps even more than some pieces of novel evidence, such as Eddington’s 1919 solar eclipse observations. (…)

We can extract a general scheme (…): A phenomenon is unexplained by the available scientific theories. At some point, it is discovered that theory accounts for . The observation is “old evidence”: at the time when the relationship between and is developed, the scientist is already certain, or close to certain, that the phenomenon is real. Indeed, in the GTR example, astronomers had been collecting data on the Mercury perihelion shift for many decades.” (PBS, pp. 131-132)

PBS then presents the problem in the form of Bayes rule:

\begin{equation} p(T | E) = p(T) \frac{p(E | T)}{p(E)}, \end{equation}

and argue: “When is old evidence and already known to the scientist, her degree of belief in is maximal: . Because predicts , also .” It follows that cannot confirm .

### Issue 1: Old Evidence or Unrecognized Evidence?

From my perspective, the problem of old evidence does not highlight a limitation of Bayesian confirmation theory, but a limitation of the human intellect. To make this clear, let’s change the scenario such that comes first, and comes later. For instance, assume that Einstein first developed GRT and that the astronomical data on the perihelion shift were collected five years later. Crucially, however, assume that when these astronomical data first became available, nobody thought that they would be relevant to GRT. Epistemically we are then in exactly the same situation as the one that the field was in immediately after Einstein first proposed GRT: in both scenarios there is a theory , there are data , but these data are (falsely!) judged to be non-diagnostic or predictively irrelevant. That is, scientists mistakenly judged that

\begin{equation} \frac{p(E | T)}{p(E | \text{not-}T)} = 1, \end{equation}

that is, theory neither helps nor hurts in explaining, accounting for, or predicting .

Thus, it does not matter whether came before or after . What matters is that was incorrectly judged to be irrelevant. At some point after has been observed and has been developed (regardless of the temporal order of these two events), scientists think about the problem more deeply and then discover that they have made an error in judgment, and that is in fact diagnostic and predictively relevant:

\begin{equation} \frac{p(E | T)}{p(E | \text{not-}T)} >> 1. \end{equation}

The fault therefore appears to lie not with Bayesian confirmation theory, but with researchers not being omniscient.

### Issue 2: p(E) = 1?

One may object that the equation immediately above is incorrect, as is known to have occurred and hence is simply 1. Perhaps the crux of the problem is meant to be that predictive performance can be assessed only for events whose outcome is still uncertain. I must admit that I find this whole business of arguing that rather strange. Suppose I tell you that I played a dice game yesterday and my opponent, who is notorious for cheating, rolled four sixes in seven throws. It appears completely legitimate (to me) to assess the probability of this having happened under a fair-die hypothesis versus a loaded-die hypothesis. I view as a prediction that follows from the model , and the *model* does not know whether or not has occurred. Another example: suppose we wish to assess the predictive adequacy of a weatherperson W. We provide W with abundant information about the weather, and W then issues a probabilistic prediction about the amount of precipitation for the next day. It does not matter whether this information happens to refer to data from a previous year, from a previous day, or from today (with the amount of precipitation still hidden in the future). We wish to assess the predictive adequacy of W, so all that matters is that W itself is agnostic about the outcome. Epistemically, for W the probability of the outcome is definitely not 1. The same holds, I would argue, when we wish to assess the predictive performance of statistical models. When the models do not have access to the outcome, it does not matter whether or not that outcome has already manifested itself to the scientist – we are evaluating the models, not the scientists.

#### References

Glymour, C. (1980). Theory and evidence. Princeton, NJ: Princeton University Press.

Sprenger, J., & Hartmann, S. (2019). Bayesian philosophy of science. Oxford: Oxford University Press. https://academic.oup.com/book/36527