Struggling with de Finetti’s Representation Theorem

De Finetti’s Representation Theorem is among the most celebrated results in Bayesian statistics. As I mentioned in an earlier post, I have never really understood its significance. A host of excellent writers have all tried to explain why the result is so important [e.g., Lindley (2006, pp. 107-109), Diaconis & Skyrms (2018, pp. 122-125), and the various works by Zabell], but their words just went over my head. Yes, I understand that for an exchangeable series, the probability of the data can be viewed as a weighted mixture over a prior distribution, but this just seemed like an application of Bayes rule — you integrate out the parameter to obtain the result. So what’s the big deal?

Recently I stumbled across a 2004 article by Phil Dawid, one of the most reputable (and original) Bayesian statisticians. In his article, Dawid provides a relatively accessible introduction to the importance of de Finetti’s theorem. In the section “Exchangeability”, Dawid writes:

“Perhaps the greatest and most original success of de Finetti’s methodological program is his theory of exchangeability (de Finetti, 1937). When considering a sequence of coin-tosses, for example, de Finetti does not assume—as would typically be done automatically and uncritically—that these must have the probabilistic structure of Bernoulli trials. Instead, he attempts to understand when and why this Bernoulli model might be reasonable. In accordance with his positivist position, he starts by focusing attention directly on Your personal joint probability distribution for the potentially infinite sequence of outcomes (\emph{X}_1, \emph{X}_2,…) of the tosses—this distribution being numerically fully determined (and so, in particular, having no “unknown parameters”). Exchangeability holds when this joint distribution is symmetric, in the sense that Your uncertainty would not be changed even if the tosses were first to be relabelled in some fixed but arbitrary way (so that, e.g., \emph{X}_1 now refers to toss 5, \emph{X}_2 to toss 21, \emph{X}_3 to toss 1, etc.). ” In many applied contexts You would be willing to regard this as an extremely weak and reasonable condition to impose on Your personal joint distribution, at least to an acceptable approximation. de Finetti’s famous representation theorem now implies that, assuming only exchangeability, we can deduce that Your joint distribution is exactly the same as if You believed in a model of Bernoulli trials, governed by some unknown parameter p, and had personal uncertainty about p (expressed by some probability distribution on [0,1]). In particular, You would give probability 1 to the existence of a limiting relative frequency of H’s in the sequence of tosses, and could take this limit as the definition of the “parameter” p. Because it can examine frequency conceptions of Probability from an external standpoint, the theory of personal probability is able to cast new light on these—an understanding that is simply unavailable to a frequentist, whose very conception of probability is already based on ideas of frequency. Even more important, from this external standpoint these frequency interpretations are seen to be relevant only in very special setups, rather than being fundamental: for example, there is no difficulty in principle to extending the ideas and mathematics of exchangeability to two-dimensional, or still more complicated, arrays of variables (Dawid 1982a, 1985c).” (Dawid, 2004, pp. 45-46; italics in original)

Although this is (much) clearer than most of what I’ve read before, this does reinforce my impression that if you already buy into the Bayesian formalism there are no amazing new insights to be obtained. I feel I am on thin ice, partly because so many highly knowledgeable statisticians seem to be continually celebrating this Representation Theorem. Consider, for instance, the following fawning fragment from Diaconis and Skyrms (2018; this is one of the most interesting books on statistics from recent years). After explaining the coin toss setup and the concept of exchangeability, Diaconis and Skyrms first mention that “Any uncertainty about the bias of the coin in independent trials gives exchangeable degrees of belief”. Then:

“De Finetti proved the converse. Suppose your degrees of belief—the judgmental probabilities of chapter 2—about outcome sequences are exchangeable. Call an infinite sequence of trials exchangeable if all of its finite initial segments are. De Finetti proved that every such exchangeable sequence can be gotten in just this way. It is just as if you had independence in the chances and uncertainty about the bias. It is just as if you were Thomas Bayes.” (Diaconis & Skyrms, 2018, p. 124; italics in original)

The words have rhythm and are well-chosen, but for me they do not translate to immediate insight. What is meant with “in just this way”? What is meant with the construction “It is just as if”? Diaconis and Skyrms continue:

“What the prior over the bias would be in Bayes is determined in the representation. Call this the [sic] imputed prior probability over chances the de Finetti prior. If your degrees of belief about outcome sequences have a particular symmetry, exchangeability, they behave just as if they are gotten from a chance model of coin flipping with an unknown bias and with de Finetti prior over the bias.
     So it is perfectly legitimate to use Bayes’ mathematics even if we believe that chance does not exist, as long as our degrees of belief are exchangeable.” (Diaconis & Skyrms, 2018, p. 124; italics in original)

Yeah OK, but as a Bayesian I have always viewed probability as a reasonable degree of belief, an intensity of conviction, or a numerical expression of one’s lack of knowledge. I don’t need convincing that “probability does not exist” in some sort of objective form as a property of an object. We continue:

“De Finetti’s theorem helps dispel the mystery of where the prior belief over the chances comes from. From exchangeable degrees of belief, de Finetti recovers both the chance statistical model of coin flipping and the Bayesian prior probability over the chances. The mathematics of inductive inference is just the same. If you were worried about where Bayes’ priors came from, if you were worried about whether chances exist, you can forget your worries. De Finetti has replaced them with a symmetry condition on degrees of belief. This is, we think you will agree, a philosophically sensational result.” (Diaconis & Skyrms, 2018, p. 124; italics in original)

This is rhetorically strong and wonderfully written, but I’m still missing the point. I don’t need to forget any worries, because I was never worried to begin with. Where do the priors come from? From my lack of knowledge concerning the data-generating process. Does chance exist? Well, in Jeffreys’s conceptualization, probability is a degree of reasonable belief, and chance is a degree of reasonable belief that is unaffected by the outcome of other trials.

My student Fabian Dablander has reviewed the Diaconis & Skyrms book for the journal Significance, so many he can explain the relevance of de Finetti’s representation theorem to me. I’ll keep you posted. As I check out Fabian’s review, I see he references Bernardo (1996), and this is also a relatively clear paper on the Representation Theorem. After describing the theorem, Bernardo concludes:

“The representation theorem, —a pure probability theory result— proves that if observations are judged to be exchangeable, then they must indeed be a random sample from some model and there must exist a prior probability distribution over the parameter of the model, hence requiring a Bayesian approach.” Bernardo (1996, p. 3; italics in original)

This reinforces my current view that the theorem is not “sensational” for those who are already devout Bayesians. However, there’s a substantial probability that I’m wrong (or else, why the fuss), so to be continued…

References

Bernardo, J. M. (1996). The concept of exchangeability and its applications. Far East Journal of Mathematical Sciences, 111-122.

Dablander, F. (2018). In Review: Ten Great Ideas About Chance. Significance.

Dawid, A. P. (2004). Probability, causality and the empirical world: A Bayes-de Finetti-Popper-Borel synthesis. Statistical Science, 19, 44-57.

Diaconis, P., & Skyrms, B. (2018). Ten great ideas about chance. Princeton: Princeton University Press.

Lindley, D. V. (2006). Understanding uncertainty. Hoboken: Wiley.

About The Author

Eric-Jan Wagenmakers

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.