On the Importance of Avoiding Shortcuts in Applying Cognitive Models to Hierarchical Data

This post summarizes the content of an article that is in press for Behavior Research Methods. The preprint is available on PsyArXiv.

Psychological experiments often yield data that are hierarchically structured. A number of popular shortcut strategies in cognitive modeling do not properly accommodate this structure and can result in biased conclusions. First, we considered a modeling strategy that ignores the hierarchical data structure, missing random effects on the participant-level. Our theoretical analysis indicates that this biases statistical results towards the null hypothesis. Second, we considered a modeling strategy that takes a two-step approach by first obtaining participant-level estimates from a hierarchical cognitive model and subsequently using these estimates in a follow-up statistical test. Our theoretical analysis indicates that this biases statistical results towards the alternative hypothesis. Using a simulation study for a two-group experiment, we demonstrate that both strategies result in considerable statistical biases when parameter estimation is based on little data; only hierarchical models of the multilevel data lead to correct conclusions. These results are particularly relevant for applications of hierarchical Bayesian cognitive models in settings with limitations on the size of the available data sets, such as clinical studies.


Musings on Preregistration: The Case of the Facial Feedback Effect

tl;dr. In 2016, the results of a multi-lab preregistered replication effort cast doubt on the idea, motivated by the “facial feedback hypothesis”, that holding a pen with one’s teeth (instead of with one’s lips) makes cartoons appear more amusing. The effect’s progenitor, Dr. Strack, critiqued the replication effort and suggested that the presence of a camera (put in place to confirm that participants held the pen as instructed) makes the effect disappear. This conjecture has recently received some empirical support from a preregistered experiment (Noah et al., 2018, JPSP). Overall, Noah et al. present a balanced account of their findings and acknowledge the considerable statistical uncertainty that remains. It may be the case that the camera matters, although we personally would currently still bet against it. Methodologically, we emphasize how important it is that the confirmatory analysis adheres exactly to what has been preregistered. In order to prevent the temptation of subtle changes in the planned analysis and at the same time eliminate the file drawer effect, we recommend the Registered Report over plain preregistration, although preregistration alone is vastly superior to registering nothing at all.

This tale starts in 2012. Doyen and colleagues had just published a bombshell paper, “Behavioral Priming: It’s All in the Mind, but Whose Mind?” (open access). Here is the first part of their abstract:

“The perspective that behavior is often driven by unconscious determinants has become widespread in social psychology. Bargh, Chen, and Burrows’ (1996) famous study, in which participants unwittingly exposed to the stereotype of age walked slower when exiting the laboratory, was instrumental in defining this perspective. Here, we present two experiments aimed at replicating the original study. Despite the use of automated timing methods and a larger sample, our first experiment failed to show priming. Our second experiment was aimed at manipulating the beliefs of the experimenters: Half were led to think that participants would walk slower when primed congruently, and the other half was led to expect the opposite. Strikingly, we obtained a walking speed effect, but only when experimenters believed participants would indeed walk slower.” (Doyen et al., 2012)


Karl Pearson’s Worst Quotation?

The famous statistician Karl Pearson was also a eugenicist, so there are a great many hair-raising quotations to choose from. I nominate the following two for being particularly shocking (for more information see Wikipedia and the Guardian). Brace yourself, here is quotation number one:


“History shows me one way, and one way only, in which a high state of civilization has been produced, namely, the struggle of race with race, and the survival of the physically and mentally fitter race. If you want to know whether the lower races of man can evolve a higher type, I fear the only course is to leave them to fight it out among themselves, and even then the struggle for existence between individual and individual, between tribe and tribe, may not be supported by that physical selection due to a particular climate on which probably so much of the Aryan’s success depended.” (Karl Pearson, 1901, pp. 19-20)


Karl Pearson’s Best Quotation?

NB. The next post will discuss two of Karl Pearson’s worst quotations.


“The field of science is unlimited; its material is endless, every group of natural phenomena, every phase of social life, every stage of past or present development is material for science. The unity of all science consists alone in its method, not in its material. The man who classifies facts of any kind whatever, who sees their mutual relation and describes their sequences, is applying the scientific method and is a man of science. The facts may belong to the past history of mankind, to the social statistics of our great cities, to the atmosphere of the most distant stars, to the digestive organs of a worm, or to the life of a scarcely visible bacillus. It is not the facts themselves which form science, but the method in which they are dealt with.” (Karl Pearson, The Grammar of Science, p. 16)


Quantifying Support for the Null Hypothesis in Psychology: An Empirical Investigation

This post summarizes the content of an article that is in press for Advances in Methods and Practices in Psychological Science.1 The preprint is available on PsyArXiv.

In the traditional statistical framework, nonsignificant results leave researchers in a state of suspended disbelief. This study examines, empirically, the treatment and evidential impact of nonsignificant results. Our specific goals were twofold: to explore how psychologists interpret and communicate nonsignificant results, and to assess how much these results constitute evidence in favor of the null hypothesis. Firstly, we examined all nonsignificant findings mentioned in the abstracts of the 2015 volume of Psychonomic Bulletin & Review, Journal of Experimental Psychology: General, and Psychological Science (N = 137). In 72% of cases, nonsignificant results were misinterpreted, in the sense that authors inferred that the effect was absent. Secondly, a Bayes factor reanalysis revealed that fewer than 5% of the nonsignificant findings provided strong evidence (i.e., BF01 > 10) in favor of the null hypothesis compared to the alternative hypothesis. We recommend that researchers expand their statistical toolkit in order to correctly interpret nonsignificant results and to be able to evaluate the evidence for and against the null hypothesis.


Bayesian Reanalysis of Null Results Reported in Medicine: Strong Yet Variable Evidence for the Absence of Treatment Effects

This post summarizes the content of an article that is in press for PLOS ONE. The preprint is available on PsyArXiv.

Efficient medical progress requires that we know when a treatment effect is absent. We considered all 207 Original Articles published in the 2015 volume of the New England Journal of Medicine and found that 45 (21.7%) reported a null result for at least one of the primary outcome measures. Unfortunately, standard statistical analyses are unable to quantify the degree to which these null results actually support the null hypothesis. Such quantification is possible, however, by conducting a Bayesian hypothesis test. Here we reanalyzed a subset of 43 null results from 36 articles using a default Bayesian test for contingency tables. This Bayesian reanalysis revealed that, on average, the reported null results provided strong evidence for the absence of an effect. However, the degree of this evidence is variable and cannot be reliably predicted from the p-value (see Figure 1). For null results, sample size is a better (albeit imperfect) predictor for the strength of evidence in favor of the null hypothesis (see Figure 2). Together, our findings suggest that (a) the reported null results generally correspond to strong evidence in favor of the null hypothesis; (b) a Bayesian hypothesis test can provide additional information to assist the interpretation of null results.


« Previous Entries

Powered by WordPress | Designed by Elegant Themes