Redefine Statistical Significance Part XXI: Edgeworth Proposed the .005 Criterion Back in 1885

The statistical significance test was not invented by Ronald Fisher. The key idea was already laid out by Francis Ysidro Edgeworth (1845-1926), whose 1885 article “Methods of statistics” is quite explicit about the purpose, design, and interpretation of the significance test. As summarized by Kennedy-Shaffer:

In 1885, Francis Ysidro Edgeworth provided a more formal mathematical underpinning for the significance test and gave a simple example of how to use the standard deviation (he used the “modulus,” equal to the standard deviation multiplied by the square root of two) to perform a significance test on a given parameter (Edgeworth 1885, pp. 184–185). Using a threshold of twice the “modulus,” Edgeworth (1885) constructed a test that would be equivalent to a modern two-sided α = 0.005. Stigler (1986, p. 311) notes that this “was a rather exacting test” and that Edgeworth also considered smaller differences as “worthy of notice, although he admitted the evidence was then weaker.” (Kennedy-Shaffer, 2019, p. 83; text underlined here for emphasis).

What Edgeworth proposed as a criterion for significance in 1885 (!) is therefore the same as the criterion recently proposed by Benjamin et al. (2019): “We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.” As an aside, the 1885 article by Edgeworth is still worth reading and involves examples on Latin texts, wasp nests, attendance at club dinners, etc. Edgeworth must have enjoyed himself while writing this article.

For an accessible summary of Edgeworth’s contributions to statistics I recommend the Stigler (1986) book. A small excerpt about Edgeworth’s work on the significance test:

Edgeworth’s key work on this topic was contained in a series of four papers read in the year 1885. The first of these, “Observations and statistics: An essay on the theory of errors of observation and the first principles of statistics,” was read on 25 May 1885 to the Cambridge Philosophical Society. It concentrated on statistical theory and summarized and extended his work of the previous two years. The second paper, “Methods of statistics,” was read a month later, on June 23, to the international gathering to celebrate the jubilee of the [Royal] Statistical Society. It was concerned with methodology and presented, through an extensive series of examples taken from all manner of fields, an exposition of the application and interpretation of significance tests for the comparison of means. Much of the material in these two papers was presented at least in outline in Edgeworth’s evening classes in logic starting in April 1885 (…) The third and fourth papers, “On methods of ascertaining variations in the rate of births, deaths, and marriages” and “Progressive means,” were read at meetings of the British Association in September and October. The third presented a remarkable analysis for two-way classifications that anticipated many ideas of the analysis of variance. The fourth article, “Progressive means,” was a brief discussion of the use of linear least squares for detrending time series, including the estimation of the coefficients’ variabilities to permit significance tests for trend or comparisons of different series. (Stigler, 1986, pp. 308-309; text underlined here for emphasis)

In conclusion, it appears that Edgeworth was approximately 134 years ahead of this time, and his intuition on what constitutes compelling evidence was on point.

References

Benjamin, D. J. et al. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6-10: https://osf.io/preprints/psyarxiv/mky9j/

Edgeworth, F. Y. (1885). Methods of statistics. Journal of the Statistical Society of London, Jubilee Volume, 181-217.

Kennedy-Shaffer, L. (2019). Before p < 0.05 to beyond p < 0.05: Using history to contextualize p-values and significance testing. The American Statistician, 73, 82-90.

Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900. Cambridge, MA: Harvard University Press.

About The Author

Eric-Jan Wagenmakers

Eric-Jan (EJ) Wagenmakers is professor in Bayesian Methodology at the Psychological Methods Group at the University of Amsterdam.