The 106-year old game “Mens Erger Je Niet!” (a German invention) involves players tossing a die and then moving a set of tokens around the board. The winner is the person who first brings home all of his tokens. The English version is known as Ludo, and the American versions are Parcheesi and Trouble. The outcry “Mens Erger Je Niet!” translates to “Don’t Get So Annoyed!”, because it is actually quite frustrating when your token cannot even enter the game (because you fail to throw the required 6 to start) or when your token is almost home, only to be “hit” by someone else’s token, causing it to be sent all the way back to its starting position.

Some modern versions of the game come with a “die machine”; instead of throwing the die, players hit a small plastic dome, which makes the die inside jump up, bounce against the dome, spin around, and land. But is this dome-die fair? One of us (EJ) who had experience with this machine felt that although the pips may come up about equally often, there would be a sequential dependency in the outcomes. Specifically, EJ’s original hypothesis was motivated by the observation that the dome sometimes misfires — it is depressed but the die does not jump. In other words, a “1” is more likely to be followed by a “1” than by a different number, a “2” more likely to be followed by a “2”, etc. Some of this action can be seen in the gif below:

To study this important matter in greater detail, one of us (EJ still) “threw” the die 1000 times. First we’ll use the Bayesian multinomial test in JASP to confirm that the pip numbers are about equal. The descriptives table looks as follows:

The associated figure suggests that nothing spectacular is going on:

And indeed the default Bayes factor is 200,000 in favor of the null hypothesis of equal proportions.

The crucial hypothesis, however, was that there would be a preponderance of repeats. As it turned out, this hypothesis was strongly contradicted by the data. One of us (Quentin) analyzed the transition matrix and discovered that, instead, there is a preponderance of “opposites”.

For instance, a throw showing a “6” (the pip count on its upper side) tended to be followed by a throw showing a “1” (which had been the pip count on the lower side). In general, the pips on the upper and lower side add to 7. If the die is fair, such “opposite outcomes” should occur with probability 1/6 or 0.1667. However, the actual sequence of 999 opportunities yielded 289 opposites, almost twice as many as expected if the die were fair. A default one-sided binomial test in JASP yields overwhelming evidence against the fair die hypothesis and in favor of the opposite-hypothesis:

With the power of hindsight, the opposite-hypothesis makes some sense: as the die jumps up, it spins and hits the dome before it has made a complete turn; the dome prevents complete turns and biases the die toward half-turns. However, the opposite-hypothesis was unexpected and post-hoc — it was completely motivated by the data that were then used to test it. So how should we assess the evidence in favor of the opposite-hypothesis?

From a purely subjective Bayesian perspective, the evidence is the evidence, and the data are really 5840000000000000000 times more likely under the opposite-hypothesis than under the fair-die hypothesis, no matter how the opposite-hypothesis was obtained. But posterior plausibility is a combination of evidence and prior plausibility. What is the prior plausibility of the opposite-hypothesis? Well, it is difficult to say, mainly because hindsight bias will cloud our judgment (which is why preregistration is helpful, even for Bayesians). It does seem likely, however, that the prior probability for the opposite-hypothesis is larger than 1 in 100,000, which would still make its posterior plausibility near 1.

However, to make absolutely sure, one of us (EJ) tossed the die some more — this time, for 1001 throws. Again, the data supported the hypothesis that the pip numbers are uniform (results not shown). For the hypothesis under scrutiny, out of a total possible 1000 opportunities, 302 were opposites. The evidence is again overwhelming:

This is a compelling replication of a surprising result — and one of the first that we have been able to demonstrate in our lab. For completeness, we also give the result for the combined data set, when all data are analysed simultaneously. Then, among 1999 opportunities, 591 are opposites, for a proportion near .30, almost twice as high as expected under the fair-die hypothesis. The evidence is overwhelming:

Although the replication experiment was not strictly necessary –the evidence was too strong, the opposite-hypothesis too plausible– it does reassure us. Based on the Bayes factor for the complete data set, and the Bayes factor for the original data set, we could compute the replication Bayes factor, that is, the evidence that the second data set adds on top of the first (Ly et al., 2019). The .jasp file containing the analyses and the data can be obtained from https://osf.io/swczj/.

Ly, A., Etz, A., Marsman, M., & Wagenmakers, E.-J. (2019). Replication Bayes factors from evidence updating. *Behavior Research Methods, 51*, 2498-2508.

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.

Quentin is a PhD candidate at the Psychological Methods Group of the University of Amsterdam.

]]>Apart from the merits and demerits of our specific analysis, it strikes us as undesirable that important clinical trials are analyzed in only one way — that is, based on the efforts of a single data-analyst, who operates within a single statistical framework, using a single statistical test, drawing a specific set of all-or-none conclusions. Instead, it seems prudent to present, alongside the original article, a series of brief comments that contain alternative statistical analyses; if these confirm the original result, this inspires trust in the conclusion; if these alternative analyses contradict the original result, this is grounds for caution and a deeper reflection on what the data tell us. Either way, we learn something important that we did not know before.

Anyhow, the latest installment in our collection concerns a Bayesian reanalysis of the SWEPIS clinical trial. The preprint is Wagenmakers & Ly, 2020 and its contents are copied below.

In a recent randomized clinical trial, Wennerholm and colleagues (2019) compared induction of labour at 41 weeks with expectant management and induction at 42 weeks. The primary endpoint was defined as “a composite perinatal outcome including one or more of stillbirth, neonatal mortality, Apgar score less than 7 at five minutes, pH less than 7.00 or metabolic acidosis (pH <7.05 and base deficit >12 mmol/L) in the umbilical artery, hypoxic ischaemic encephalopathy, intracranial haemorrhage, convulsions, meconium aspiration syndrome, mechanical ventilation within 72 hours, or obstetric brachial plexus injury.” The trial randomly assigned 1381 women to the induction group and 1379 women to the expectant management group. For the primary outcome measure, the trial found no effect: “The composite primary perinatal outcome did not differ between the groups: 2.4% (33/1381) in the induction group and 2.2% (31/1379) in the expectant management group.” However, the trial was stopped early, because six perinatal deaths occurred in the expectant management group, whereas none occurred in the induction group ()^{1}. As the authors describe, “On 2 October 2018 the Data and Safety Monitoring Board strongly recommended the SWEPIS steering committee to stop the study owing to a statistically significant higher perinatal mortality in the expectant management group. Although perinatal mortality was a secondary outcome, it was not considered ethical to continue the study.” The authors conclude that “Although these results should be interpreted cautiously, induction of labour ought to be offered to women no later than at 41 weeks and could be one (of few) interventions that reduces the rate of stillbirths.”

The -value of Wennerholm and colleagues leaves unaddressed the extent to which the data undercut or support the hypothesis that induction at 41 weeks reduces the rate of stillbirths. This is important, because if the evidence turns out to be weak, then it may be argued that the SWEPIS trial was stopped prematurely, and the SWEPIS data offer limited grounds for changing medical practice.

Here we conduct a Bayesian test for two proportions (Kass & Vaidyanathan, 1992; Gronau et al., 2019; i.e., logistic regression with group membership as the predictor) to quantify the evidence from the SWEPIS trial that induction of labour at 41 weeks reduces the rate of stillbirths. Under the no-effect model , the log odds ratio equals , whereas under the positive-effect model , is assigned a positive-only normal prior . A default analysis (i.e., ) reveals moderate evidence for : the data are 3.32 times more likely under the hypothesis that induction at 41 weeks is beneficial than under the hypothesis that it is ineffective. When and are deemed equally likely a priori, this observed level of evidence increases the probability for from 0.50 to 0.77, leaving a sizable probability of 0.23 for .

A sensitivity analysis examines the strength of the evidence for all prior combinations of in and in ; as is apparent from the legend of Figure 1, the evidence never exceeds 5.4. In other words, with equal prior probability for and , the posterior probability for is never less than 0.16.

Figure 1. Across a range of different priors, the evidence for the positive-effect over the no-effect is relatively weak and does not exceed 5.4. Figure from JASP.

In addition to hypothesis testing one may also inspect the posterior distribution for under a two-sided model that assigns a standard normal distribution prior to data observation. As Figure 2 shows, the posterior distribution is relatively wide (note that this distribution ignores the possibility that exactly).

Figure 2. Prior and posterior distribution for the log odds ratio for an unconstrained model that assigns a standard normal distribution. Figure from JASP.

In sum, the SWEPIS data indeed support the hypothesis that induction of labour at 41 weeks of pregnancy is associated with a lower rate of stillbirths. However, the degree of this support is moderate at best, and arguably provides insufficient ground for terminating the study. Note that premature study termination comes at a cost — here, the cost is that the experiment ended up providing ambiguous results that yield a poor basis for changes in medical policy, leaving the field in epistemic limbo. In general, it seems hazardous to terminate clinical studies on the basis of a single result, without converging support of a Bayesian analysis.

^{1} Our analysis yields . Because the induction group has zero perinatal deaths, the one-sided -value equals the two-sided -value.

Gronau, Q. F., Raj, K. N. A., & Wagenmakers, E.J. (2019). Informed Bayesian inference

for the A/B test. Manuscript submitted for publication and available on ArXiv: http://arxiv.org/abs/1905.02068.

Kass R. E., & Vaidyanathan, S. K. (1992). Approximate Bayes factors and orthogonal parameters, with application to testing equality of two binomial proportions. Journal of the Royal Statistical Society: Series B (Methodological). 54(1):129-144.

Wagenmakers, E.-J., & Ly, A. (2020). Bayesian scepsis about SWEPIS: Quantifying the evidence that early induction of labour prevents perinatal deaths.

Wennerholm, U. B., Saltvedt, S., Wessberg, A., et al. (2019). Induction of labour at 41 weeks versus expectant management and induction of labour at 42 weeks (SWEdish Post-term Induction Study, SWEPIS): Multicentre, open label, randomised, superiority trial. BMJ, 367:l6131.

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.

Alexander Ly is a postdoc at the Psychological Methods Group at the University of Amsterdam.

]]>- In a dissenting opinion on the 1950 UNESCO report “The race question”, Fisher argued that “Available scientific knowledge provides a firm basis for believing that the groups of mankind differ in their innate capacity for intellectual and emotional development”.
- Fisher strongly, repeatedly, and persistently opposed the conclusion that smoking is a cause of lung cancer.
- Fisher felt that “The theory of inverse probability [i.e., Bayesian statistics] is founded upon an error, and must be wholly rejected.” (for details see Aldrich, 2008).
- In
*The Design of Experiments*Fisher argued that “it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.” (1935, p. 16). This confession should be shocking, because it means that we cannot quantify evidence for a scientific law. As Jeffreys (1961, p. 377) pointed out, in Fisher’s procedure the law (i.e, the null hypothesis) “is merely something set up like a coconut to stand until it is hit”.

The next section discusses another shocking statement, one that has been conveniently forgotten and flies in the face of current statistical practice.

Chapter 2 of *The Design of Experiments* is titled “The Principles of Experimentation Illustrated by a Psycho-Physical Experiment”. Here Fisher introduces the famous case of the lady tasting tea:

“A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup. We will consider the problem of designing an experiment by means of which this assertion can be tested (…)

Our experiment consists in mixing eight cups of tea, four in one way and four in the other, and presenting them to the subject for judgment in a random order.(…)

Her task is to divide the 8 cups into two sets of 4, agreeing, if possible, with the treatments received.” (Fisher, 1935, p. 11)

We have already seen above that a nonsignificant result (usually p>.05) cannot be used to quantify support in favor of the null hypothesis that the lady’s discriminatory ability is illusory. But what of a significant result (usually p<.05)? Surely, when we reject the null hypothesis we can now embrace the hypothesis that the lady *does* have discriminatory abilities? But Fisher emphatically denies this:

“

It might be argued that if an experiment can disprove the hypothesis that the subject possesses no sensory discrimination between two different sorts of object, it must therefore be able to prove the opposite hypothesis, that she can make some such discrimination. But this last hypothesis, however reasonable or true it may be, is ineligible as a null hypothesis to be tested by experiment, because it is inexact. [italics ours] If it were asserted that the subject would never be wrong in her judgments we should again have an exact hypothesis, and it is easy to see that this hypothesis could be disproved by a single failure, but could never be proved by any finite amount of experimentation. It is evident that the null hypothesis must be exact, that is free from vagueness and ambiguity, because it must supply the basis of the “problem of distribution,” of which the test of significance is the solution.” (Fisher, 1935, p. 16)

Here we stand. It is common knowledge that a nonsignificant p-value cannot be used to support the null-hypothesis (according to Fisher). What is not generally known is that, according to Fisher, a *significant* p-value does not warrant acceptance of the alternative hypothesis. In other words, the only legitimate inference is that p<.05 (say) undercuts the null hypothesis. This does NOT mean that the result favors the alternative hypothesis! Not only is this counterintuitive, we believe that it violently conflicts with the way in which practitioners interpret their p-values. The purpose of most researchers is to make a positive claim (“there is evidence for the presence of X”); we speculate that most researchers believe that such claims can be made from significant p-values, that is, “p<.05, there is evidence against the absence of X” will quickly be interpret as “p<.05, there is evidence for the presence of X”.

Shocking.

Aldrich, J. (2008). R. A. Fisher on Bayes and Bayes’ theorem. *Bayesian Analysis, 3*, 161-170.

Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver & Boyd.

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.

Johnny van Doorn is a PhD candidate at the Psychological Methods department of the University of Amsterdam.

]]>

“Meta-analysis is an important quantitative tool for cumulative science, but its application is frustrated by publication bias. In order to test and adjust for publication bias, we extend model-averaged Bayesian meta-analysis with selection models. The resulting Robust Bayesian Meta-analysis (RoBMA) methodology does not require all-or-none decisions about the presence of publication bias, is able to quantify evidence in favor of the absence of publication bias, and performs well under high heterogeneity. By model-averaging over a set of 12 models, RoBMA is relatively robust to model misspecification, and simulations show that it outperforms existing methods. We demonstrate that RoBMA finds evidence for the absence of publication bias in Registered Replication Reports and reliably avoids false positives. We provide an implementation in R and JASP so that researchers can easily apply the new methodology to their own data.”

“Selection models use weighted distributions to account for the proportion of studies that are missing because they yielded non-significant results. The researcher specifies the p-value cut-offs that drive publication bias (usually p = .05). The selection model then estimates how likely studies in nonsignificant intervals are to be published compared to the interval with the highest publication probability (usually p < .05). The pooled effect size estimate accounts for the estimated publication bias by giving more weight to studies in intervals with lower publication probability (usually non-significant studies).”

“We propose Robust Bayesian Meta-Analysis (RoBMA), a Bayesian multi-model method that aims to overcome the limitations of existing procedures. RoBMA is an extension of BMA obtained by adding selection models to account for publication bias. This allows model-averaging across a larger set of models, ones that assume publication bias and

ones that do not.”

“It is hard to assess the performance of different methods on published meta-analyses since the true parameters are usually unknown. However, it is possible to assess the false positive rate of tests for publication bias using Registered Replication Reports (Chambers, 2013, 2019). Here we know that all primary studies are published regardless of the result; therefore, if a method detects publication bias, this is a false positive finding. In addition, Registered Replication Reports allow an empirical test of RoBMA’s ability to quantify evidence in favor of the absence of publication bias.”

“In this paper we introduced a robust Bayesian meta-analysis that model-averages over selection models as well as fixed and random effects models. By applying a set of twelve models simultaneously our method respects the underlying uncertainty when deciding between different meta-analytical models and is comparatively robust to model misspecification. RoBMA also performs well in different simulation conditions and correctly finds support for the absence of publication bias in the Many Labs 2 example. Besides this ability to quantify the evidence for absence of publication bias, the Bayesian approach also allows to update evidence sequentially as studies accumulate, addressing recent concerns about accumulation bias (ter Schure & Grünwald, 2019).”

“To conclude, this work offers applied researchers a new, conceptually straightforward method to conduct meta-analysis. Instead of basing conclusions on a single model, our method is based on keeping all models in play, with the data determining model importance according to predictive success. The simulations and the example suggest that RoBMA is a promising new method in the toolbox of various approaches to test and adjust for publication bias in meta-analysis.”

Maier, M., Bartoš, F. & Wagenmakers. E.-J. (2020). Robust Bayesian meta-analysis: Addressing publication bias with model-averaging. https://psyarxiv.com/u4cns

Maximilian Maier is a Research Master student in psychology at the University of Amsterdam.

František Bartoš is a Research Master student in psychology at the University of Amsterdam.

Recently I stumbled across a 2004 article by Phil Dawid, one of the most reputable (and original) Bayesian statisticians. In his article, Dawid provides a relatively accessible introduction to the importance of de Finetti’s theorem. In the section “Exchangeability”, Dawid writes:

“Perhaps the greatest and most original success of de Finetti’s methodological program is his theory of

exchangeability(de Finetti, 1937). When considering a sequence of coin-tosses, for example, de Finetti does not assume—as would typically be done automatically and uncritically—that these must have the probabilistic structure of Bernoulli trials. Instead, he attempts to understand when and why this Bernoulli model might be reasonable. In accordance with his positivist position, he starts by focusing attention directly on Your personal joint probability distribution for the potentially infinite sequence ofoutcomes(, ,…) of the tosses—this distribution being numerically fully determined (and so, in particular, having no “unknown parameters”). Exchangeability holds when this joint distribution is symmetric, in the sense that Your uncertainty would not be changed even if the tosses were first to be relabelled in some fixed but arbitrary way (so that, e.g., now refers to toss 5, to toss 21, to toss 1, etc.). ” In many applied contexts You would be willing to regard this as an extremely weak and reasonable condition to impose on Your personal joint distribution, at least to an acceptable approximation. de Finetti’s famous representation theorem now implies that, assumingonlyexchangeability, we can deduce that Your joint distribution is exactly the sameas ifYou believed in a model of Bernoulli trials, governed by some unknown parameterp, and had personal uncertainty aboutp(expressed by some probability distribution on [0,1]). In particular, You would give probability 1 to the existence of a limiting relative frequency of H’s in the sequence of tosses, and could take this limit as the definition of the “parameter”p. Because it can examine frequency conceptions of Probability from an external standpoint, the theory of personal probability is able to cast new light on these—an understanding that is simply unavailable to a frequentist, whose very conception of probability is already based on ideas of frequency. Even more important, from this external standpoint these frequency interpretations are seen to be relevant only in very special setups, rather than being fundamental: for example, there is no difficulty in principle to extending the ideas and mathematics of exchangeability to two-dimensional, or still more complicated, arrays of variables (Dawid 1982a, 1985c).” (Dawid, 2004, pp. 45-46; italics in original)

Although this is (much) clearer than most of what I’ve read before, this does reinforce my impression that if you already buy into the Bayesian formalism there are no amazing new insights to be obtained. I feel I am on thin ice, partly because so many highly knowledgeable statisticians seem to be continually celebrating this Representation Theorem. Consider, for instance, the following fawning fragment from Diaconis and Skyrms (2018; this is one of the most interesting books on statistics from recent years). After explaining the coin toss setup and the concept of exchangeability, Diaconis and Skyrms first mention that “Any uncertainty about the bias of the coin in independent trials gives exchangeable degrees of belief”. Then:

“De Finetti proved the converse. Suppose your degrees of belief—the judgmental probabilities of chapter 2—about outcome sequences are exchangeable. Call an infinite sequence of trials exchangeable if all of its finite initial segments are. De Finetti proved that every such exchangeable sequence can be gotten in just this way. It is just

as ifyou had independence in the chances and uncertainty about the bias. It is justas ifyou were Thomas Bayes.” (Diaconis & Skyrms, 2018, p. 124; italics in original)

The words have rhythm and are well-chosen, but for me they do not translate to immediate insight. What is meant with “in just this way”? What is meant with the construction “It is just as if”? Diaconis and Skyrms continue:

“What the prior over the bias would be in Bayes is determined in the representation. Call this the [sic] imputed prior probability over chances the

de Finetti prior. If your degrees of belief about outcome sequences have a particular symmetry,exchangeability, they behavejust as ifthey are gotten from a chance model of coin flipping with an unknown bias and with de Finetti prior over the bias.

So it is perfectly legitimate to use Bayes’ mathematics even if we believe that chance does not exist, as long as our degrees of belief are exchangeable.” (Diaconis & Skyrms, 2018, p. 124; italics in original)

Yeah OK, but as a Bayesian I have always viewed probability as a reasonable degree of belief, an intensity of conviction, or a numerical expression of one’s lack of knowledge. I don’t need convincing that “probability does not exist” in some sort of objective form as a property of an object. We continue:

“De Finetti’s theorem helps dispel the mystery of where the prior belief over the chances comes from. From exchangeable degrees of belief, de Finetti recovers both the chance statistical model of coin flipping and the Bayesian prior probability over the chances. The mathematics of inductive inference is just the same. If you were worried about where Bayes’ priors came from, if you were worried about whether chances exist, you can forget your worries.

De Finetti has replaced them with a symmetry condition on degrees of belief. This is, we think you will agree, a philosophically sensational result.” (Diaconis & Skyrms, 2018, p. 124; italics in original)

This is rhetorically strong and wonderfully written, but I’m still missing the point. I don’t need to forget any worries, because I was never worried to begin with. Where do the priors come from? From my lack of knowledge concerning the data-generating process. Does chance exist? Well, in Jeffreys’s conceptualization, *probability* is a degree of reasonable belief, and *chance* is a degree of reasonable belief that is unaffected by the outcome of other trials.

My student Fabian Dablander has reviewed the Diaconis & Skyrms book for the journal *Significance*, so many he can explain the relevance of de Finetti’s representation theorem to me. I’ll keep you posted. As I check out Fabian’s review, I see he references Bernardo (1996), and this is also a relatively clear paper on the Representation Theorem. After describing the theorem, Bernardo concludes:

“The representation theorem, —a pure probability theory result— proves that if observations are judged to be

exchangeable, then theymustindeed be a random sample from some modelandtheremust exista prior probability distribution over the parameter of the model, hence requiring aBayesianapproach.” Bernardo (1996, p. 3; italics in original)

This reinforces my current view that the theorem is not “sensational” for those who are already devout Bayesians. However, there’s a substantial probability that I’m wrong (or else, why the fuss), so to be continued…

Bernardo, J. M. (1996). The concept of exchangeability and its applications. *Far East Journal of Mathematical Sciences*, 111-122.

Dablander, F. (2018). In Review: Ten Great Ideas About Chance. *Significance*.

Dawid, A. P. (2004). Probability, causality and the empirical world: A Bayes-de Finetti-Popper-Borel synthesis. *Statistical Science, 19*, 44-57.

Diaconis, P., & Skyrms, B. (2018). Ten great ideas about chance. Princeton: Princeton University Press.

Lindley, D. V. (2006). Understanding uncertainty. Hoboken: Wiley.

“It would be impossible, even if space permitted, to trace back the possible development of my ideas, and their relationships with more or less similar positions held by other authors, both past and present. A brief survey is better than nothing, however (even though there is an inevitable arbitrariness in the selection of names to be mentioned).

I am convinced that my basic ideas go back to the years of High School as a result of my preference for the British philosophers Locke, Berkeley and, above all, Hume! I do not know to what extent the Italian school textbooks and my own interpretations were valid: I believe that my work based on exchangeability corresponds to Hume’s ideas, but some other scholars do not agree. I was also favourably impressed, a few years later, by the ideas of Pragmatism, and the related notions of operational definitions in Physics. I particularly liked the Pragmatism of Giovanni Vailati—who somehow `Italianized’ James and Peirce—and, as for operationalism, I was very much struck by Einstein’s relativity of `simultaneity’, and by Mach and (later) Bridgman.

As far as Probability is concerned, the first book I encountered was that of Czuber. (Before 1950—my first visit to the USA—I did not know any English, but only German and French.) For two or three years (before and after the `Laurea’ in Mathematics, and some application of probability to research on Medelian heredity), I attempted to find valid foundations for all the theories mentioned, and I reached the conclusion that the classical and frequentist theories admitted no sensible foundation, whereas the subjectivistic one was fully justified on a normative-behaviouristic basis.”

It is interesting that Dennis Lindley came to Bayesianism in much the same way as de Finetti. In his highly recommended 2000 paper “The Philosophy of Statistics”, Lindley describes his own conversion process as follows:

“I conclude on a personal note. When, half a century ago, I began to do serious research in statistics, my object was to put statistics, then almost entirely Fisherian, onto a logical, mathematical basis to unite the many disparate techniques that genius has produced. When this had been done by Savage, in the form that we today call Bayesian, I felt that practice and theory had been united. Kingman’s sentence is so apt to what followed.

‘Perhaps mathematicians select themselves by this desire to reduce chaos to order and only learn by experience that the real world takes its revenge.’

The revenge came later with the advocacy of the likelihood principle by Barnard, and later Birnbaum, so that doubts began to enter, and later still, as the plethora of counter-examples appeared, I realized that Bayes destroyed frequency ideas. Even then I clung to the improper priors and the attempt to be objective, only to have them damaged by the marginalization paradoxes. More recently the subjectivist view has been seen as the best that is currently available and de Finetti appreciated as the great genius of probability. It is therefore easy for me to understand how others find it hard to adopt a personalistic attitude and am therefore grateful to the discussants for the reasoned arguments that they have used, some of which I might have myself used in the past.” (Lindley, 2000, p. 336).

Let’s continue with de Finetti’s preface:

“I had some indirect knowledge of De Morgan, and found that some of Keynes’ ideas were in partial agreement with mine; some years later I was informed of the similar approach that had been adopted by F. P. Ramsey.”

An earlier post in this series already covered the similarities between de Morgan’s and de Finetti’s ideas on probability. De Finetti continues:

“Independent ideas, which were more or less similar, were put forward later by Harold Jeffreys. B. O. Koopman, and I. J. Good (with some beautiful new discussion which illustrated the totally illusory nature of the so-called

objectivedefinitions of probability). I could add to this list the name of Rudolf Carnap, but this would be not altogether proper in the light of his own vivid, subjective behaviouristic interpretation. (Richard Jeffreys, in publishing Carnap’s posthumous works, seems convinced of his underlying subjectivism.) A singular position is occupied by Robert Schlaifer, who arrived at the subjectivistic approach directly and with impressive freshness and originality, with little knowledge of previous work in the field. A similar thing, although in a different sense, may be said of George Pólya, who discussedplausible reasoningin mathematics in the sense of the probability (subjective, of course) of a supposed theorem being true, given the state of mind of the mathematician, and its (Bayesian) modification when new information or ideas appear. The following statement of his is most remarkable: `It seems to me more philosophical to consider the general idea ofplausible reasoninginstead of its isolated particular cases’ likeinductive(and analogical)reasoning. (There have been so many vain attempts to build a theory of induction without beliefs—like a theory of elasticity without matter.)”

I know Jack Good’s work reasonably well, but I am unaware of the “beautiful new discussion which illustrated the totally illusory nature of the so-called *objective* definitions of probability”. If I come across it in the future I will blog about it here. You can find a blog post on Pólya here. De Finetti continues:

A very special mention must be reserved, however, for Leonard J. Savage and Dennis V. Lindley, who escaped from the objectivistic school, after having grown up in it, by a gradual discovery of its ambiguities with the clarity of the subjectivistic theory, and the latter’s suitability for every kind of practical or theoretical problem. I have often had the opportunity of profitable exchanges of ideas with them, and, in the case of Savage, of actual collaboration, I wrote briefly of Savage’s invaluable contributions as a dedication to my book

Probability, Induction and Statistics, which appeared a few months after his sudden and premature death.

One should note, however, that, even with such close colleagues, agreement ought not to be absolute, on every detail. For example, not all agree with the rejection of countable-additivity.

Finally, having mentioned several of the authors who are more or less connected with the subjectivistic (and Bayesian) point of view, I feel an obligation to recall three great men—the first two, unfortunately, no longer with us—who, although they all shared an opposed view about our common subject, were always willing to discuss, and were extraordinarily friendly and helpful on every occasion. I refer to Guido Castelnuovo, Maurice Fréchet and Jerzy Neyman.Rome, 16 July 1973

Bruno de Finetti”

This concludes our three-part series on de Finetti’s preface.

de Finetti, B. (1974). *Theory of Probability, Vol. 1 and 2*. New York: John Wiley & Sons.

Lindley, D. V. (2000). The philosophy of statistics. *The Statistician, 49*, 293-337.

“The numerous, different, opposed attempts to put forward particular points of view which, in the opinion of their supporters, would endow Probability Theory with a ‘nobler’ status, or a ‘more scientific’ character, or ‘firmer’ philosophical or logical foundations, have only served to generate confusion and obscurity, and to provoke well-known polemics and disagreements–even between supporters of essentially the same framework.

The main points of view that have been put forward are as follows.

Theclassicalview, based on physical considerations of symmetry, in which one should beobligedto give the same probability to such ‘symmetric’ cases. But which symmetry? And, in any case, why? The original sentence becomes meaningful if reversed: the symmetry is probabilistically significant, in someone’s opinion, if it leads him to assign the same probabilities to such events.”

This classical view is nicely described by Laplace in his “philosophical essay on probabilities”:

“The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all the cases possible is the measure of this probability, which is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible.” (Laplace, 1829/1902, pp. 6-7)

Personally I am not convinced that Laplace’s opinion on probability was dramatically different from that of de Finetti. Laplace was a hard-core determinist and firmly believed that probability was purely a measure of one’s lack of knowledge. In the fragment above, Laplace explicitly states that the “cases equally possible” refer to us being “equally undecided”. And later, on page 8, Laplace provides a simple example of how different background knowledge leads to radically different assessments of probability: “In things which are only probable the difference of the data, which each man has in regard to them, is one of the principal causes of the diversity of opinions which prevail in regard to the same objects.”

We continue with de Finetti’s preface:

“The

logicalview is similar, but much more superficial and irresponsible inasmuch as it is based on similarities or symmetries which no longer derive from the facts and their actual properties, but merely from the sentences which describe them, and from their formal structure or language.”

Galavotti (2005) explains that the “logical interpretation” of probability holds that beliefs should be “rational”; probability indicates not what actual beliefs are, but what they ought to be. In contrast, in the subjectivist interpretation the beliefs are actual, and their only requirement is that they are coherent (I find this somewhat contradictory, since real people’s actual beliefs are not fully coherent, but OK). The logical interpretation is associated with people such as de Morgan, Jevons, Keynes, Carnap, and W. E. Johnson. Based on my own reading, de Morgan and Jevons were in close agreement with Laplace, and in their own writings de Morgan and Jevons also made it clear that probability is a reflection of background knowledge. W. E. Johnson was the inspiration to several Bayesians, including Dorothy Wrinch and Harold Jeffreys; as we will see next, Johnson anticipated some of de Finetti’s ideas.

De Finetti continues:

The

frequentist(orstatistical) view presupposes that one accepts the classical view, in that it considers an event as a class ofindividual events, the latter being ‘trials’ of the former. The individual events not only have to be ‘equally probable’, but also ‘stochastically independent’…(these notions when applied to individual events are virtually impossible to define or explain in terms of the frequentist interpretation). In this case, also, it is straightforward, by means of the subjective approach, to obtain, under the appropriate conditions, in a perfectly valid manner, the results aimed at (but unattainable) in the statistical formulation. It suffices to make use of the notion of exchangeability. The result, which acts as a bridge connecting this new approach with the old, has been referred to by the objectivists as ‘de Finetti’s representation theorem’.

Ah yes, de Finetti’s famous theorem. There are several books and articles that try to explain the relevance of this celebrated theorem: Lindley (2006, pp. 107-109), Diaconis & Skyrms (2018, pp. 122-125), and Zabell (2005; chapter 4, in which he discusses the link with W. E. Johnson, who invented the concept of exchangeability before de Finetti), for instance. One of my resolutions for 2020 is to understand the relevance of this theorem. It obviously ought to be highly relevant, or very smart statisticians would not make such a fuss about it; on the other hand, when I look at the equation it seems to state the obvious — perhaps the theorem is relevant for those who aren’t already devout Bayesians? I guess I’ll find out…

De Finetti continues:

“It follows that all the three proposed definitions of ‘objective’ probability, although useless

per se, turn out to be useful and good as valid auxiliary devices when included as such in the subjectivistic theory.

The above-mentioned ‘representation theorem’, together with every other more or less original result in my conception of probability theory, should not be considered as a discovery (in the sense of being the outcome of advanced research). Everything is essentially the fruit of a thorough examination of the subject matter, carried out in an unprejudiced manner, with the aim of rooting out nonsense.

And probably there is nothing new; apart, perhaps, from the systematic and constant concentration on the unity of the whole, avoiding piecemeal tinkering about, which is inconsistent with the whole; this yields, in itself, something new.

Something that may strike the reader as new is the radical nature of certain of my theses, and of the form in which they are presented. This does not stem from any deliberate attempt at radicalism, but is a natural consequence of my abandoning the reverential awe which sometimes survives in people who at one time embraced the objectivistic theories prior to their conversion (which hardly ever leaves them free of some residual).”

More to follow…

de Finetti, B. (1974). *Theory of Probability, Vol. 1 and 2*. New York: John Wiley & Sons.

Diaconis, P., & Skyrms, B. (2018). Ten great ideas about chance. Princeton: Princeton University Press.

Galavotti, M. C. (2005). A philosophical introduction to probability. Stanford: CSLI Publications.

Laplace, P.-S. (1829/1902). A philosophical essay on probabilities. London: Chapman & Hall.

Lindley, D. V. (2006). Understanding uncertainty. Hoboken: Wiley.

Zabell, S. L. (2005). Symmetry and its discontents: Essays on the history of inductive probability. New York: Cambridge University Press.

“Is it possible that in just a few lines I can achieve what I failed to achieve in my many books and articles? Surely not. Nevertheless, this preface affords me the opportunity, and I shall make the attempt. It may be that misunderstandings which persist in the face of refutations dispersed or scattered over some hundreds of pages can be resolved once and for all if all the arguments are pre-emptively piled up against them.”

One senses that de Finetti was exasperated at the stubborn refusal of frequentists to see the Bayesian light. At this point in time, of course, Bayesians such as de Finetti were a species threatened with extinction. In another masterpiece with the same title (“Theory of Probability”) Harold Jeffreys also indicates his own frustration in the preface:

“Adherents of frequency definitions of probability have naturally objected to the whole system. But they carefully avoid mentioning my criticisms of frequency definitions, which any competent mathematician can see to be unanswerable. In this way they contrive to present me as an intruder into a field where everything was already satisfactory. I speak from experience in saying that students have no difficulty in following my system if they have not already spent several years in trying to convince themselves that they understand frequency theories.” (Jeffreys, 1961, p. viii)

Let’s continue with de Finetti, who proceeds to make his most iconic statement:

“My thesis, paradoxically, and a little provocatively, but nonetheless genuinely, is simply this:

PROBABILITY DOES NOT EXIST. The abandonment of superstitious beliefs about the existence of Phlogiston, the Cosmic Ether, Absolute Space and Time,…, or Fairies and Witches, was an essential step along the road to scientific thinking. Probability, too, if regarded as something endowed with some kind of objective existence, is no less a misleading misconception, an illusory attempt to exteriorize or materialize our true probabilistic beliefs.”

As explained in an earlier post, Augustus de Morgan expressed a similar opinion in his 1838 (!) book *An Essay on Probabilities and on Their Application to Life Contingencies and Insurance Offices*. And in 1874, Stanley Jevons, who was a pupil of de Morgan (see Collison Black, 1972), wrote that “Probability belongs wholly to the mind“.

De Finetti elaborates:

“This point of view is not bound up with any particular philosophical position, nor is it incompatible with any such. It is strictly

reductionistin a methodological sense, in order to avoid becoming embroiled in philosophical controversy.

Probabilistic reasoning –always to be understood as subjective– merely stems from our being uncertain about something. It makes no difference whether the uncertainty relates to an unforeseeable future, or to an unnoticed past, or to a past doubtfully reported or forgotten; it may even relate to something more or less knowable (by means of a computation, a logical deduction, etc.) but for which we are not willing or able to make the effort; and so on.

Moreover, probabilistic reasoning is completely unrelated to general philosophical controversies, such as Determinism versus Indeterminism, Realism versus Solipsism—including the question of whether the world ‘exists’, or is simply the scenery of ‘my’ solipsistic dream. As far as Determinism and Indeterminism are concerned, we note that, in the context of gas theory or heat diffusion and transmission, whether one interprets the underlying process as being random or strictly deterministic makes no difference to one’s probabilistic opinion. A similar situation would arise if one were faced with forecasting the digits in a table of numbers; it makes no difference whether the numbers are random, or are some segment–for example, the 2001st to the 3000th digits– of the decimal expansion of π (which is not ‘random’ at all, but certain; possibly available in tables and, in principle, computable by you).

The only relevant thing is uncertainty–the extent of our own knowledge and ignorance. The actual fact of whether or not the events considered are in some sensedetermined, or known by other people, and so on, is of no consequence.”

An example of this scenario is given in Gronau & Wagenmakers (2018), who computed the empirical evidence (based on the first 100 million digits) for the conjecture that the digits in the decimal expansion of π occur equally often. Below is the result of a sequential analysis under two prior distributions:

But although de Finetti condoned this use of Bayesian inference, at least in principle (and so would the mathematician George Polya, see this post), Jeffreys explicitly disagreed, or so it seems:

“This distinction shows that theoretically a probability should always be worked out completely. We have again an illustration from pure mathematics. What is the 10,000th figure in the expansion of

e? Nobody knows; but that does not say that the probability that it is a 5 is 0.1. By following the rules of pure mathematics we could determine it definitely, and the statement is either entailed by the rules or contradicted; in probability language, on the data of pure mathematics it is either a certainty or an impossibility.” (Jeffreys, 1961, p. 38).

Given my own work with Quentin Gronau it should not come as a surprise that I side with de Finetti here. It is too late to ask Jeffreys, but I wonder why it is essential to condition “on the data of pure mathematics” for determining the probability that the 10,000th figure in the expansion of *e* is a 5, but that it would be unacceptable to condition “on the data of pure physics” for determining the outcome of a coin toss. I also wonder whether Jeffreys would accept a probability (other than 0 or 1) for a mathematical conjecture that cannot (yet) be proven. For instance, I wonder whether Jeffreys would condemn an in-between probability for the 10,000th figure in the expansion of *e* being a 5, but condone an in-between probability for the unproven conjecture that the digits in the expansion of *e* occur equally often.

More to follow…

Collison Black, R. D. (1972). Jevons, Bentham and De Morgan. *Economica*, 39, 119-134.

de Finetti, B. (1974). *Theory of Probability, Vol. 1 and 2*. New York: John Wiley & Sons.

de Morgan, A. (1838). *An Essay on Probabilities and on Their Application to Life Contingencies and Insurance Offices*. London: Longman. Freely available at https://archive.org/details/134257988

Gronau, Q. F., & Wagenmakers, E.-J. (2018). Bayesian evidence accumulation in experimental mathematics: A case study of four irrational numbers. *Experimental Mathematics, 27*, 277-286.

Jevons, W. S. (1877/1913). The Principles of Science: A Treatise on Logic and Scientific Method. London: MacMillan.