Bayes Factors for Human versus ChatGPT Authorship Discrimination: Ultrafast Review of Bozza et al. (2023)

November 09 - 2023

Today I came across the recently published article “A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach” by Bozza and colleagues. As the title suggests, Bozza et al. use Bayes factors to quantify the evidence for texts being generated by humans versus ChatGPT. This seems exactly the right approach, and I am generally a fan of the authors’ Bayesian work in forensics. And the paper itself presents some promising results! However, browsing the paper left me with three gripes.

My first gripe is that the authors seem to believe their approach is novel:

“In particular, to assess the value of the redundancy measure and to offer a consistent classification criterion, a metric called Bayes factor is implemented. The proposed Bayesian probabilistic method represents an original approach in stylometry.” (Bozza et al., 2023)

and

“Although the use of Bayes factor in forensic science is a widely used approach, its application in stylometry is still unexplored.” (Bozza et al., 2023).

In direct contrast to these claims, the application of Bayes factors to stylometry was pioneered by Mosteller & Wallace (1963), exactly 60 years ago. I was attended to the Mosteller and Wallace work when reading Donovan & Mickey’s “Bayesian Statistics for Beginners”. My own course book (with Dora Matzke), “Bayesian inference from the ground up: The theory of common sense” discussed the problem in Chapter 7, “Learning from the likelihood ratio”, and concluded that the Mosteller & Wallace paper “energized the field of stylometry“. Another informative reference is the the Priceonomics blog post “How Statistics Solved a 175-Year-Old Mystery About Alexander Hamilton“. In sum, the authors’ approach is certainly not novel from a conceptual point of view, although there is still considerable merit in the specific application. But clearly Mosteller & Wallace should have been acknowledged.

My second gripe is that I was unable to find the important details concerning the computation of the Bayes factors. Bozza et al. provide a general analytic expression, but in practice one needs to commit to a particular prior distribution (and possibly assess the robustness of the conclusion to this choice). Maybe I did not read the article carefully enough, but to my eye there is an almost immediate transition from the generic equation to the specific Bayes factor results, without any mention of how the prior choice was made.

My third gripe is the lack of open data and open code. With respect to the data, Bozza et al. merely state that “The data that support the findings of this study are available on request from the corresponding author.” This may have been good enough in, say, 1963, but in 2023 such data are trivially easy to post online. With respect to the code, the main text just says “Data treatment, visualization and probabilistic evaluation were all carried out in the R statistical software package available at https://www.r-project.org.” But why not share the actual R code? Maybe the authors did, but I see no links in the article nor on the website. With the R code in hand, at least I would have been able to figure out what prior distribution was used for the Bayes factors. Clearly the reviewers and the journal dropped the ball here as well — they should have mandated the R code was shared.

Disclaimer: I read the paper quickly, and I may be wrong. I will edit this post and issue a mea culpa if I missed something that was in plain sight all along.

References

Bozza, S., Roten, CA., Jover, A. et al. (2023). A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach. Scientific Reports 13, 19217. https://doi.org/10.1038/s41598-023-46390-8.

Donovan, T. M., & Mickey, R. M. (2019). Bayesian Statistics for Beginners: A Step-by-Step Approach. Oxford: Oxford University Press.

Mosteller, F., & Wallace, D. L. (1963). Inference in an authorship problem. Journal of the American Statistical Association, 58, 275-309.

About The Author

EJ Wagenmakers

works at the University of Amsterdam.

Blog

Bayes Factors for Human versus ChatGPT Authorship Discrimination: Ultrafast Review of Bozza et al. (2023)

Search

Categories

follow us

Blog

About The Author

EJ Wagenmakers

Geometric Intuition for a Surprising Result

Too Good to be True: A “Fake” Podcast on how to go from p-values to Bayes Factors

Aleatory Uncertainty and the River Rubicon