Background: Former world chess champion Vladimir Kramnik has argued that several win streaks of Hikaru Nakamura on chess.com are unusually long. Can we analyze these streaks to amass statistical evidence that something fishy is going on?
In last week’s blogpost, I argued that the likelihood principle does not license the selection of a particular streak from a longer time series. It is almost always suboptimal to throw away information, and suboptimal quickly turns into misleading when the reason for selecting the streak is because it appears “special” – a follow-up test will then simply confirm that the selected streak is indeed special.
The blogpost was a response to a recent preprint by Maharaj, Polson, and Sokolov, in which the authors argued that Hikaru Nakamura was wrong to suggest that an analysis of streaks alone (without taking into account the non-streak results) amounts to cherry-picking. I concluded that Nakamura was indeed correct, and that a correct analysis would need to analyze all of the data.
A friend of mine indicated that “Big Vlad” had seen the blogpost and felt it was beside the point:
My response was as follows:
So now that my existence has been validated by a world chess champion I am under obligation to respond. First of all, Maharaj, Polson, and Sokolov analyze only a single streak, and invoke the likelihood principle (in error, I believe) to suggest that cherry-picking is not an issue here. My blog post was dedicated to this theoretical/statistical claim. In fact this was stated explicitly:
Here I will concern myself only with the validity of Nakamura’s argument of cherry-picking.
Kramnik is correct in the sense that he has identified several streaks, and not just one. In fact, these streaks are prominently displayed in the Maharaj, Polson, and Sokolov preprint:
It is unclear to me why, given this table in the paper, Maharaj, Polson, and Sokolov proceed to analyze only a single streak. If the likelihood principle would license the selection of streaks, as they imply in their preprint, then why not select all of these “unusual” streaks for analysis? In this sense I agree with Kramnik. Pondering this issue more, however, quickly makes it clear that streak selection is a bad idea. Suppose I toss a fair coin, and retain for analysis only those sequences that give five or more tails in a row. Considered in isolation, these streaks are unlikely under the fair coin hypothesis; however, the streaks are also inevitable to occur if you keep throwing for a long enough time. So at a minimum one needs to take into account that the streaks are embedded in a much larger collection of games.
Personally, I very much doubt that any correct statistical analysis can provide much evidence regarding the possibility that Nakamura received computer assistance, despite him playing so many games. The main reason is that Nakamura is exceptionally good in chess, much better than almost all of his opponents. This is true in over the board classical chess, but his dominance increases with over the board speed chess, and perhaps even more so with online speed chess. Online, Nakamura regularly plays against opposition rated hundreds of Elo points lower than he is (the preprint mentions an average gap of 366 Elo points, which is astronomical). This means that he is expected to win almost every game. Consequently, it becomes statistically very difficult to differentiate between a “clean” Nakamura and a digitally enhanced version: “clean” Nakamura already crushes the opposition. [Admittedly, this claim would be problematic if we had only seen Nakamura play chess online, since it would be possible that we would only ever see the digitally enhanced version; however, in the case of Nakamura we have additional information; see also below]
Since “clean” Nakamura already crushes the opposition, it becomes unclear why Nakamura would even consider to use computer support. Simply put: he does not need it. It may even be good for him to lose more often than he does, since more variable performance might attract a larger online audience. And if he were caught, the consequences for his career and his legacy would be disastrous. I am reminded of the allegations that Topalov made in 2006, suggesting that Kramnik (!!) received computer assistance during their world championship match in Elista. I am sure that at the time Kramnik must have felt deeply offended, and that the allegation did some psychological damage as well. At the time I was relieved that Kramnik ended up winning the match.
In conclusion, Nakamura is a superior player and the prior probability of him cheating is miniscule — also because Nakamura has little to gain and everything to lose. The data would need to bring extraordinary evidence to overturn this strong prior belief, and such evidence is almost impossible to collect because Nakamura is so exceptionally skilled. This is apparent not just from his over the board results and his detailed post-mortems, but also his mesmerizing puzzle-rush ability. Quite honestly, if Nakamura were to beat Carlsen 12-0 in an online speed chess match it would not provide much evidence for cheating in my opinion (alternative explanations: Carlsen may tilt, may experience repeated connection issues, may start to drink heavily before or during the match, etc.). This reminds me of Jaynes’ discussion on reports of miracles and ESP, but this is best left for a different post; meanwhile, I will be busy framing Kramnik’s tweet 🙂
References and Background Material
Maharaj, S., Polson, N., & Sokolov, V. (2024). Kramnik vs Nakamura: A Chess Scandal. ArXiv Preprint.
An article on chessbase: https://en.chessbase.com/post/did-a-us-chess-champion-cheat.
An article on the Chicago Booth: https://www.chicagobooth.edu/review/did-us-chess-champion-cheat.