Datacolada argue that the meta-analysis averages across fundamentally incommensurable results. We agree that different nudges are very different and that a meta-analysis best practice guide would not endorse pooling them together. However, the combining was done by Mertens et al. and we simply took their meta-analysis as reported, as did Datacolada, in order to critically evaluate it. A more recent paper shows how one could use mixture modeling to accommodate effects not generated from a single construct (DellaVigna & Linos, 2022). We think this is an excellent approach, and we are currently working to develop mixture models for RoBMA. However, this model deviates from the one that the original authors used and would be inspired by the data itself, requiring new data for a stringent test.

While we agree that it is important to go beyond the mean effect, meta-analyzing heterogeneous results is useful for three reasons: (1) when using a random effects model, *shrinkage* will improve the estimation of individual study effects; (2) meta-analyzing heterogeneous results allows us to evaluate the expected effect size and the strength of evidence for the body of literature representing nudging; (3) meta-analyzing the results allows us to quantify the heterogeneity.

Imagine you are on holiday and searching for a restaurant based on Google ratings. Google brings up two options with different ratings: Restaurant A has 4.7 stars based on 200 ratings, whereas Restaurant B has 5 stars based on only 2 ratings. Which of the two restaurants would you choose? Most of us would certainly choose restaurant A. Intuitively, what is going on here is that if a restaurant has 2 ratings of 5 stars we believe this is partially due to chance; therefore, given more ratings, the average rating would likely reduce; more precisely, the average rating would *shrink* towards the mean over all restaurant ratings!

Now in the context of meta-analysis, we can calculate how much to shrink different studies based on the variability between studies and sampling variability, with hierarchical modeling or random effects modeling. Datacolada show that the largest reminder effect in Mertens et al. (2021) is that sending sleep reminders (e.g., “Bedtime goals for tonight: Dim light at 9:30pm. Try getting into bed at 10:30pm.”) to non-Hispanic White participants increases their sleep hours (*d* = 1.18, *p* = 0.028).^{1} Note that only 20 non-Hispanic White individuals were tested here. Now, if a policymaker wants to know your opinion about the effect size of sleep reminders on non-Hispanic White people, would you confidently say, “I believe there is a huge effect of d = 1.18” or would you shrink your estimate towards the average effect size of nudges? We believe that you should shrink the estimate and, indeed, this is exactly what the random effect meta-analytic model allows us to do. Based on the model-averaged effect size (*d* = 0.005) and heterogeneity estimate (*tau* =0.117) for the assistance intervention (reminder) category, we can obtain a down-pooled estimate of *d* = 0.07 with *sd* = 0.11 for the study.^{2}

Which studies to include for calculating the mean is a difficult question that has long been debated and requires domain expertise. In our commentary, we followed the original authors. Otherwise, our paper would not be a compelling reply to their analysis. We would like to see more fine-grained meta-analyses on nudging in specific domains and using mixture models. However, for the reasons discussed above, we do believe that our meta-analysis improves estimation accuracy of individual study effects due to shrinkage, beyond reading and thinking about studies individually.

First, we need to point out that our analysis does not show that all nudges are ineffective (as we state in the article, “However, all intervention categories and domains apart from “finance” show evidence for heterogeneity, which implies that some nudges might be effective”). It is common practice to make statements about mean effects in meta-analyses; however, with the benefit of hindsight, we would retitle our article “No *Overall* Evidence for Nuding After Adjusting for Publication Bias” to avoid any confusion about this point.

Next to the issue of heterogeneity discussed in Datacolada, the interpretation of our paper as showing all nudges are ineffective conflates evidence of absence with absence of evidence. In other words, the Bayes factor that we observe for nudging overall is undecided – it does not provide evidence in favour of nudges, but it also does not provide evidence against nudges. This is why we titled the commentary ‘No evidence for nudging after adjusting for publication bias’ rather than ‘Evidence against nudging after adjusting for publication bias’.

Nevertheless, our analysis should strongly reduce our credence in nudges as effective behavioral science interventions. First, we think that the mean effect is useful as it shows us the expected effect size of a random nudge (and the evidence for it). Policymakers may decide about rolling out nudge interventions in a general area and therefore want to know the expected effect size to evaluate the likely benefits of this (e.g., whether to use more nudging in healthcare settings). Second, we can also use the meta-analytic estimates to investigate what share of academic nudges is effective after taking publication bias into account. This shows that after correcting for bias only 21.5% of academic nudge effects are larger than *d* = 0.3. In other words, unlike the reported mean of *d* = 0.43 in the original analysis, by taking this meta-analysis mean seriously and thinking about it, we find that most academic nudges are not able to produce even small effects.

An important and often underappreciated crux is that publication bias not only affects the meta-analytic mean but also the meta-analytic heterogeneity estimate. Therefore, we need to adjust for publication bias in order to assess whether heterogeneity is in fact still high once publication bias is accounted for. The Datacolada approach of looking only at the most extreme studies is insufficient to get a sense of the heterogeneity across the entire pool of studies. If we do not want to reread all of the studies and consequently make a subjective judgment about their similarity, we need a publication bias-adjusted heterogeneity estimate based on a meta-analysis. RoBMA allowed us to do this, and we obtain a bias-corrected heterogeneity estimate of 0.321, 95% CI [0.294, 0.351], which is somewhat smaller than the corresponding unadjusted estimate of 0.375.

Meta-analyzing heterogeneous studies is useful as it: (1) allows shrinkage to improve the accuracy of study level estimates; (2) allows us to calculate the expected effect size and strength of evidence for a body of literature; (3) allows us to estimate heterogeneity. Future research should develop more sophisticated modeling frameworks in this area based on mixture modeling.

^{1. We focus on this example rather than the example of increased portion sizes leading to more eating as the latter is not technically a nudge as it restricts freedom of choice (i.e., you cannot eat more food than is available.)}

^{2. We cannot obtain the posterior random effects estimates directly from the model as the random effects selection models require a marginalized parameterization. Therefore, we use the meta-analytic mean and heterogeneity estimate as our prior distribution of the effect sizes and combine it with our observed effect size estimate — an Empirical Bayes approach.}

Bakdash, J. Z., & Marusich, L. R. (2022). Left-truncated effects and overestimated meta-analytic means. *Proceedings of the National Academy of Sciences*, *119*(31), e2203616119.

DellaVigna, S., & Linos, E. (2022). RCTs to scale: Comprehensive evidence from two nudge units. *Econometrica*, *90*(1), 81-116.

Efron, B., & Morris, C. (1977). Stein’s paradox in statistics. *Scientific American*, *236*(5), 119-127.

Maier, M., Bartoš, F., Stanley, T. D., Shanks, D. R., Harris, A. J., & Wagenmakers, E. J. (2022). No evidence for nudging after adjusting for publication bias. *Proceedings of the National Academy of Sciences*, *119*(31), e2200300119.

Mertens, S., Herberz, M., Hahnel, U. J., & Brosch, T. (2022). The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains. *Proceedings of the National Academy of Sciences*, *119*(1), e2107346118.

Szaszi, B., Higney, A., Charlton, A., Gelman, A., Ziano, I., Aczel, B., … & Tipton, E. (2022). No reason to expect large and consistent effects of nudge interventions. *Proceedings of the National Academy of Sciences*, *119*(31), e2200732119.

Maximilian Maier is a PhD candidate in Psychology at University College London.

František Bartoš is a PhD candidate at the Psychological Methods Group of the University of Amsterdam.

Tom Stanley is a professor of meta-analysis at Deakin Laboratory for the Meta-Analysis of Research (DeLMAR), Deakin University.

David Shanks is Professor of Psychology and Deputy Dean of the Faculty of Brain Sciences at University College London.

Adam Harris is Professor of Cognitive & Decision Sciences at University College London.

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.

In this workshop, plenary lectures provide the theoretical background of Bayesian statistics, and practical computer exercises teach participants how to use the popular JAGS and Stan programs and apply them to a wide range of different statistical models. After completing this workshop, participants will have gained not only a new understanding of statistics, but also the technical skills to implement models that are appropriate for the substantive hypotheses of interest.

This workshop is meant for researchers who want to learn how to apply Bayesian inference in practice. Most applications we discuss are taken from the field of cognitive science. Because the workshop is based on a set of book chapters and concrete exercises of varying difficulty, the course material is appropriate for researchers with a wide range of prior knowledge and interests. Although some basic knowledge of Bayesian inference is an advantage, this is not a prerequisite. In the course we use JAGS or Stan in combination with R or Matlab (the choice is yours), and therefore some basic knowledge of either R or Matlab is also an advantage.

Michael Lee and Eric-Jan Wagenmakers have published a course book about Bayesian graphical modeling. This book is used to teach graphical modeling courses at several universities, including University of California Irvine, Ohio State University, the University of Washington, Tufts, Rutgers, Stanford, and the University of Amsterdam. This course book will form the basis of this workshop. At the start of the workshop, you will receive a printed copy of the latest version of the book, a memory stick with all the computer code, and the solutions to the exercises.

More details about the workshop can be found here.

Register(PhD) Student | € 550 |

Faculty | € 650 |

Other | € 850 |

UvA Students | € 500 |

UvA Faculty | € 600 |

In this two-day workshop, plenary lectures provide the theoretical background of Bayesian statistics and Bayesian hypothesis testing; in addition, practical exercises demonstrate how the JASP program is applied to a wide range of common statistical scenarios. This workshop provides an appreciation for the theoretical foundations that underlie Bayesian hypothesis testing, but its main goal is to teach participants the practical skills needed in order to execute and report Bayesian analyses for the models that form the bread-and-butter of statistical inference: t-tests, correlations, regression, ANOVA, and contingency tables.

This workshop will provide attendees with a friendly, gentle introduction to the theory behind Bayesian hypothesis testing and is therefore suited for everyone new to Bayesian inference. The possibilities of practical Bayesian hypothesis testing using JASP will be illustrated with concrete examples. At the end of this workshop, participants should be able to carry out statistical analyses in JASP, interpret the output, and report the results. Our examples are general and the proposed methodology applies across the empirical disciplines (e.g., psychology, sociology, biology, economics, etc.).

The workshop is taught by Eric-Jan Wagenmakers and Richard Morey.

More details about the workshop can be found here.

Register(PhD) Student | € 150 |

Faculty | € 250 |

Other | € 450 |

UvA Students | € 100 |

UvA Faculty | € 200 |

In “Science and Method”, the chapter “Mathematical Creation” starts as follows: “The genesis of mathematical creation is a problem which should intensely interest the psychologist. It is the activity in which the human mind seems to take least from the outside world, in which it acts or seems to act only of itself and on itself, so that in studying the procedure of geometric thought we may hope to reach what is most essential in man’s mind.” (p. 383) After some musings about why some people do not understand mathematics, Poincaré then starts an exposition on mathematical insight and unconscious thought.

I have not considered the conceptual similarities to Ap Dijksterhuis’ unconscious thought theory, although a first reading suggests considerable overlap (cf. Dijksterhuis & Nordgren, 2006; Zhong et al., 2008). To be clear, Dijksterhuis and colleagues have never claimed that the main idea is novel; for instance, Dijksterhuis and Nordgren (2006) features the following epigraph:

One might almost believe that half of our thinking takes place unconsciously . . . . I have familiarized myself with the factual data of a theoretical and practical problem; I do not think about it again, yet often a few days later the answer to the problem will come into my mind entirely from its own accord; the operation which has produced it, however, remains as much a mystery to me as that of an adding-machine: what has occurred is, again, unconscious rumination. (Schopenhauer, 1851/1970, pp. 123–124)

Furthermore, the article by Zhong et al. (2008) credits Poincaré not once but twice. Zhong et al. start as follows:

“The ability to associate remotely connected elements underlies many discoveries and creations in fields such as physics, mathematics, and art. Poincaré, for instance, noted that ‘‘to create consists of making new combinations of associative elements which are useful . . . . the most fertile will often be those formed of elements drawn from domains which are far apart’’ (Poincaré, 1913, p. 386).

A little later, Zhong et al. remark that

“In fact, conscious thought can subvert the search for creative solutions, and novel connections or ideas often insinuate themselves into the conscious mind when conscious attention is directed elsewhere (Ghiselin, 1952; Mednick, 1962; Olton, 1979). Poincaré described this very phenomenon:

I turned my attention to the study of some arithmetical questions apparently without much success and without a suspicion of any connection with my preceding researches. Disgusted with my failure, I went to spend a few days at the seaside, and thought of something else. One morning, walking on the bluff, the idea came to me . . . that the arithmetic transformations of indeterminate ternary quadratic forms were identical to those of non-Euclidean geometry’’ (quoted in Hadamard, 1945, pp. 13-14).

As the reader may confirm below, Poincaré did go beyond the anecdote of sudden insight and the remark that creativity requires the combinations of associative elements. Those interested may find the full, multi-page fragment from Poincaré reproduced here:

“In fact, what is mathematical creation? It does not consist in making new combinations with mathematical entities already known. Any one could do that, but the combinations so made would be infinite in number and most of them absolutely without interest. To create consists precisely in not making useless combinations and in making those which are useful and which are only a small minority. Invention is discernment, choice.

How to make this choice I have before explained; the mathematical facts worthy of being studied are those which, by their analogy with other facts, are capable of leading us to the knowledge of a mathematical law just as experimental facts lead us to the knowledge of a physical law. They are those which reveal to us unsuspected kinship between other facts, long known, but wrongly believed to be strangers to one another.

Among chosen combinations the most fertile will often be those formed of elements drawn from domains which are far apart. Not that I mean as sufficing for invention the bringing together of objects as disparate as possible; most combinations so formed would be entirely sterile. But certain among them, very rare, are the most fruitful of all.

To invent, I have said, is to choose; but the word is perhaps not wholly exact. It makes one think of a purchaser before whom are displayed a large number of samples, and who examines them, one after the other, to make a choice. Here the samples would be so numerous that a whole lifetime would not suffice to examine them. This is not the actual state of things. The sterile combinations do not even present themselves to the mind of the inventor. Never in the field of his consciousness do combinations appear that are not really useful, except some that he rejects but which have to some extent the characteristics of useful combinations. All goes on as if the inventor were an examiner for the second degree who would only have to question the candidates who had passed a previous examination.

But what I have hitherto said is what may be observed or inferred in reading the writings of the geometers, reading reflectively. It is time to penetrate deeper and to see what goes on in the very soul of the mathematician. For this, I believe, I can do best by recalling memories of my own. But I shall limit myself to telling how I wrote my first memoir on Fuchsian functions. I beg the reader’s pardon; I am about to use some technical expressions, but they need not frighten him, for he is not obliged to understand them. I shall say, for example, that I have found the demonstration of such a theorem under such circumstances. This theorem will have a barbarous name, unfamiliar to many, but that is unimportant; what is of interest for the psychologist is not the theorem but the circumstances.

For fifteen days I strove to prove that there could not be any functions like those I have since called Fuchsian functions. I was then very ignorant; every day I seated myself at my work table, stayed an hour or two, tried a great number of combinations and reached no results. One evening, contrary to my custom, I drank black coffee and could not sleep. Ideas rose in crowds; I felt them collide until pairs interlocked, so to speak, making a stable combination. By the next morning I had established the existence of a class of Fuchsian functions, those which come from the hypergeometric series; I had only to write out the results, which took but a few hours.

Then I wanted to represent these functions by the quotient of two series; this idea was perfectly conscious and deliberate, the analogy with elliptic functions guided me. I asked myself what

properties these series must have if they existed, and I succeeded without difficulty in forming the series I have called theta-Fuchsian. Just at this time I left Caen, where I was then living, to go on a geologic excursion under the auspices of the school of mines. The changes of travel made me forget my mathematical work. Having reached Coutances, we entered an omnibus to go some place or other. At the moment when I put my foot on the step the idea came to me, without anything in my former thoughts seeming to have paved the way for it, that the transformations I had used to define the Fuchsian functions were identical with those of non-Euclidean geometry. I did not verify the idea; I should not have had time, as, upon taking my seat in the omnibus, I went on with a conversation already commenced, but I felt a perfect certainty. On my retum to Caen, for conscience’ sake I verified the result at my leisure.

Then I turned my attention to the study of some arithmetical questions apparently without much success and without a suspicion of any connection with my preceding researches. Disgusted with my failure, I went to spend a few days at the seaside, and thought of something else. One moming, walking on the bluff, the idea came to me, with just the same characteristics of brevity, suddenness and immediate certainty, that the arithmetic transformations of indeterminate ternary quadratic forms were identical with those of non-Euclidean geometry.

Retumed to Caen, I meditated on this result and deduced the consequences. The example of quadratic forms showed me that there were Fuchsian groups other than those corresponding to the hypergeometric series; I saw that I could apply to them the theory of theta-Fuchsian series and that consequently there existed Fuchsian functions other than those from the hypergeometric series, the ones I then knew. Naturally I set myself to form all these functions. I made a systematic attack upon them and carried all the outworks, one after another. There was one however that still held out, whose fall would involve that of the whole place. But all my efforts only served at first the better to show me the difficulty, which indeed was something. All this work was perfectly conscious.

Thereupon I left for Mont-Valérien, where I was to go through my military service; so I was very differently occupied. One day, going along the street, the solution of the difficulty which had stopped me suddenly appeared to me. I did not try to go deep into it immediately, and only after my service did I again take up the question. I had all the elements and had only to arrange them and put them together. So I wrote out my final memoir at a single stroke and without difficulty.

I shall limit myself to this single example; it is useless to multiply them. In regards to my other researches I would have to say analogous things, and the observations of other mathematicians given in

L’enseignement mathématiquewould only confirm them.Most striking at first is this appearance of sudden illumination, a manifest sign of long, unconscious prior work. The rôle of this unconscious work in mathematical invention appears to me incontestable, and traces of it would be found in other cases where it is less evident. Often when one works at a hard question, nothing good is accomplished at the first attack. Then one takes a rest, longer or shorter, and sits down anew to the work. During the first half-hour, as before, nothing is found, and then all of a sudden the decisive idea presents itself to the mind. It might be said that the conscious work has been more fruitful because it has been interrupted and the rest has given back to the mind its force and freshness. But it is more probable that this rest has been filled out with unconscious work and that the result of this work has afterward revealed itself to the geometer just as in the cases I have cited; only the revelation, instead of coming during a walk or a journey, has happened during a period of conscious work, but independently of this work which plays at most a rôle of excitant, as if it were the goad stimulating the results already reached during rest, but remaining unconscious, to assume the conscious form.

There is another remark to be made about the conditions of this unconscious work: it is possible, and of a certainty it is only fruitful, if it is on the one hand preceded and on the other hand followed by a period of conscious work. These sudden inspirations (and the examples already cited sufficiently prove this) never happen except after some days of voluntary effort which has appeared absolutely fruitless and whence nothing good seems to have come, where the way taken seems totally astray. These efforts then have not been as sterile as one thinks; they have set agoing the unconscious machine and without them it would not have moved and would have produced nothing.

The need for the second period of conscious work, after the inspiration, is still easier to understand. It is necessary to put in shape the results of this inspiration, to deduce from them the immediate consequences, to arrange them, to word the demonstrations, but above all is verification necessary. I have spoken of the feeling of absolute certitude accompanying the inspiration; in the cases cited this feeling was no deceiver, nor is it usually. But do not think this a rule without exception; often this feeling deceives us without being any the less vivid, and we only find it out when we seek to put on foot the demonstration. I have especially noticed this fact in regard to ideas coming to me in the morning or evening in bed while in a semi-hypnagogic state.

Such are the realities; now for the thoughts they force upon us. The unconscious, or, as we say, the subliminal self plays an important rôle in mathematical creation; this follows from what we have said. But usually the subliminal self is considered as purely automatic. Now we have seen that mathematical work is not simply mechanical, that it could not be done by a machine, however perfect. It is not merely a question of applying rules, of making the most combinations possible according to certain fixed laws. The combinations so obtained would be exceedingly numerous, useless and cumbersome. The true work of the inventor consists in choosing among these combinations so as to eliminate the useless ones or rather to avoid the trouble of making them, and the rules which must guide this choice are extremely fine and delicate. It is almost impossible to state them precisely; they are felt rather than formulated. Under these conditions, how imagine a sieve capable of applying them mechanically?

A first hypothesis now presents itself: the subliminal self is in no way inferior to the conscious self; it is not purely automatic; it is capable of discernment; it has tact, delicacy; it knows how to choose, to divine. What do I say? It knows better how to divine than the conscious self, since it succeeds where that has failed. In a word, is not the subliminal self superior to the conscious self? You recognize the full importance of this question. Boutroux in a recent lecture bas shown how it came up on a very different occasion, and what consequences would follow an affirmative answer. (See also, by the same author,

Science et Religion, pp. 313 ff.)Is this affirmative answer forced upon us by the facts I have just given? I confess that, for my part, I should hate to accept it. Reexamine the facts then and see if they are not compatible with another explanation. It is certain that the combinations which present themselves to the mind in a sort of sudden illumination, after an unconscious working somewhat prolonged, are generally useful and fertile combinations, which seem the result of a first impression. Does it follow that the subliminal self, having divined by a delicate intuition that these combinations would be useful, has formed only these, or has it rather formed many others which were lacking in interest and have remained unconscious?

In this second way of looking at it, all the combinations would be formed in consequence of the automatism of the subliminal self, but only the interesting ones would break into the domain of consciousness. And this is still very mysterious. What is the cause that, among the thousand products of our unconscious activity, some are called to pass the threshold, while others remain below? Is it a simple chance which confers this privilege? Evidently not; among all the stimuli of our senses, for example, only the most intense fix our attention, unless it has been drawn to them by other causes. More generally the privileged unconscious phenomena, those susceptible of becoming conscious, are those which, directly or indirectly, affect most profoundly our emotional sensibility.

It may be surprising to see emotional sensibility invoked

à proposof mathematical demonstrations which, it would seem, can interest only the intellect. This would be to forget the feeling of mathematical beauty, of the harmony of numbers and forms, of geometric elegance. This is a true esthetic feeling that all real mathematicians know, and surely it belongs to emotional sensibility.Now, what are the mathematic entities to which we attribute this character of beauty and elegance, and which are capable of developing in us a sort of esthetic emotion? They are those whose elements are harmoniously disposed so that the mind without effort can embrace their totality while realizing the details. This harmony is at once a satisfaction of our esthetic needs and an aid to the mind, sustaining and guiding. And at the same time, in putting under our eyes a well-ordered whole, it makes us foresee a mathematical law. Now, as we have said above, the only mathematical facts worthy of fixing our attention and capable of being useful are those which can teach us a mathematical law. So that we reach the following conclusion: The useful combinations are precisely the most beautiful, I mean those best able to charm this special sensibility that all mathematicians know, but of which the profane are so ignorant as often to be tempted to smile at it.

What happens then? Among the great numbers of combinations blindly formed by the subliminal self, almost all are without interest and without utility; but just for that reason they are also without effect upon the esthetic sensibility. Consciousness will never know them; only certain ones are harmonious, and, consequently, at once useful and beautiful. They will be capable of touching this special sensibility of the geometer of which I have just spoken, and which, once aroused, will call our attention to them, and thus give them occasion to become conscious.

This is only a hypothesis, and yet here is an observation which may confirm it: when a sudden illumination seizes upon the mind of the mathematician, it usually happens that it does not deceive him, but it also sometimes happens, as I have said, that it does not stand the test of verification; well, we almost always notice that this false idea, had it been true, would have gratified our natural feeling for mathematical elegance.

Thus it is this special esthetic sensibility which plays the role of the delicate sieve of which I spoke, and that sufficiently explains why the one lacking it will never be a real creator. Yet all the difficulties have not disappeared. The conscious self is narrowly limited, and as for the subliminal self we know not its limitations, and this is why we are not too reluctant in supposing that it has been able in a short time to make more different combinations than the whole life of a conscious being could encompass. Yet these limitations exist. Is it likely that it is able to form all the possible combinations, whose number would frighten the imagination? Nevertheless that would seem necessary, because if it produces only a small part of these combinations, and if it makes them at random, there would be small chance that the

good, the one we should choose, would be found among them.Perhaps we ought to seek the explanation in that preliminary period of conscious work which always precedes all fruitful unconscious labor. Permit me a rough comparison. Figure the future elements of our combinations as something like the hooked atoms of Epicurus. During the complete repose of the mind, these atoms are motionless, they are, so to speak, hooked to the wall; so this complete rest may be indefinitely prolonged without the atoms meeting, and consequently without any combination between them.

On the other hand, during a period of apparent rest and unconscious work, certain of them are detached from the wall and put in motion. They flash in every direction through the space (I was about to say the room) where they are enclosed, as would, for example, a swarm of gnats or, if you prefer a more learned comparison, like the molecules of gas in the kinematic theory of gases. Then their mutual impacts may produce new combinations.

What is the rôle of the preliminary conscious work? It is evidently to mobilize certain of these atoms, to unhook them from the wall and put them in swing. We think we have done no good, because we have moved these elements a thousand different ways in seeking to assemble them, and have found no satisfactory aggregate. But, after this shaking up imposed upon them by our will, these atoms do not return to their primitive rest. They freely continue their dance.

Now, our will did not choose them at random; it pursued a perfectly determined aim. The mobilized atoms are therefore not any atoms whatsoever; they are those from which we might reasonably expect the desired solution. Then the mobilized atoms undergo impacts which make them enter into combinations among themselves or with other atoms at rest which they struck against in their course. Again I beg pardon, my comparison is very rough, but I scarcely know how otherwise to make my thought understood.

However it may be, the only combinations that have a chance of forming are those where at least one of the elements is one of those atoms freely chosen by our will. Now, it is evidently among these that is found what I called the

good combination. Perhaps this is a way of lessening the paradoxical in the original hypothesis.Another observation. It never happens that the unconscious work gives us the result of a somewhat long calculation

all made, where we have only to apply fixed rules. We might think the wholly automatic subliminal self particularly apt for this sort of work, which is in a way exclusively mechanical. It seems that thinking in the evening upon the factors of a multiplication we might hope to find the product ready made upon our awakening, or again that an algebraic calculation, for example a verification, would be made unconsciously. Nothing of the sort, as observation proves. All one may hope from these inspirations, fruits of unconscious work, is a point of departure for such calculations. As for the calculations themselves, they must be made in the second period of conscious work, that which follows the inspiration, that in which one verifies the results of this inspiration and deduces their consequences. The rules of these calculations are strict and complicated. They require discipline, attention, will, and therefore consciousness. In the subliminal self, on the contrary, reigns what I should call liberty, if we might give this name to the simple absence of discipline and to the disorder born of chance. Only, this disorder itself permits unexpected combinations.I shall make a last remark: when above I made certain personal observations, I spoke of a night of excitement when I worked in spite of myself. Such cases are frequent, and it is not necessary that the abnormal cerebral activity be caused by a physical excitant as in that I mentioned. It seems, in such cases, that one is present at his own unconscious work, made partially perceptible to the over-excited consciousness, yet without having changed its nature. Then we vaguely comprehend what distinguishes the two mechanisms or, if you wish, the working methods of the two egos. And the psychologic observations I have been able thus to make seem to me to confirm in their general outlines the views I have given.

Surely they have need of it, for they are and remain in spite of all very hypothetical: the interest of the questions is so great that I do not repent of having submitted them to the reader.” (Poincaré, 1913, pp. 386-394)

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.

]]>The fact that we can do what we want, however, does not present a compelling argument against determinism. Yes, we may watch TV because we feel like it — but where did that feeling come from? A determinist believes that ‘free will’ is merely an illusion. You may experience the desire to do something and then do it, but that desire itself is the inevitable result of a myriad causal factors that were set in motion since the beginning of time. As summarized by Schopenhauer: “You can *do* what you *will*: but at each given moment of your life you can *will* only one determined thing and by no means anything other than this one.”

This deterministic perspective on life is visualized in the figure below. The artwork is by Vikor Beekman, and it is available under a CC-BY license in our Artwork Library.

The white lighting bolt running from top to bottom represents your life path, from which no deviation whatsoever is possible. The black lightning bolts in the top panel represent alternative life paths that you now know were always closed to you. It is not just that these alternative realities did not happen; they could never have happened. For instance, it would be tempting to think “had I not folded my hand but called her bluff instead then I would have won the poker tournament”; instead, the correct deterministic thought is “I now know that I did not call her bluff, and did not win the poker tournament”. The purple lighting bolts in the bottom panel represent alternative life paths that you do not yet know will never materialize. It is tempting to think “If I participate in this lottery and I’m lucky, I may win the jackpot”; a determinist would correct this to “I do not yet know whether or not I will win the lottery. However, this is not an eventuality or a matter of luck — it is a certainty, but one of which I will only become aware after the fact.”

An apt analogy is presented by Schopenhauer: “(…) we ought to regard events as they occur with the same eye as the print that we read, knowing full well that it stood there before we read it.” When in the middle of a book, you know how the story started but you are still unsure about how it will end — but it can end in only one way, just as it started in only one way. For a determinist, the difference between what lies in the past and what lies in the future can therefore be attributed solely to a difference in knowledge.

Jevons, W. S. (1874/1913). The Principles of Science: A Treatise on Logic and Scientific Method. London: MacMillan.

Schopenhauer, A. (2009). The Two Fundamental Problems of Ethics. Cambridge: Cambridge University Press. The original German edition dates from 1841 and is entitled *Die Beiden Grundprobleme der Ethik*.

Eric-Jan (EJ) Wagenmakers is professor at the Psychological Methods Group at the University of Amsterdam.

]]>*Wagenmakers, E.-J. (2022). Approximate objective Bayes factors from p-values and sample size: The 3p√n rule. Preprint available on ArXiv: **https://psyarxiv.com/egydq*

“In 1936, Sir Harold Jeffreys proposed an approximate objective Bayes factor that quantifies the degree to which the point-null hypothesis H0 outpredicts the alternative hypothesis H1. This approximate Bayes factor (henceforth JAB01) depends only on sample size and on how many standard errors the maximum likelihood estimate is away from the point under test. We revisit JAB01 and introduce a piecewise transformation that clarifies the connection to the frequentist two-sided p-value. Specifically, if p ≤ .10 then JAB01 ≈ 3p√n; if .10 < p ≤ .50 then

JAB01 ≈ √(pn); and if p > .50 then JAB01 ≈ p^(¼)√n. These transformation rules present p-value practitioners with a straightforward opportunity to obtain Bayesian benefits such as the ability to monitor evidence as data accumulate without reaching a foregone conclusion. Using the JAB01 framework we derive simple and accurate approximate Bayes factors for the t-test, the binomial test, the comparison of two proportions, and the correlation test.”

This is Jeffreys’s general expression for the Bayes factor in favor of the null hypothesis:

Using the p-value transformations we obtain the following approximate relations:

When applied to popular tests, the results closely approximate those from default Bayes factors. For instance, here is an example application to a test between two proportions, where a sequence of observations is generated under the null hypothesis; the black line shows the fixed-N p-value, which drifts randomly as sample size grows; the colors indicate posterior probabilities for the null hypothesis computed using different Bayesian methods, including the exact default Bayes factor proposed by Kass & Vaidyanathan, 1992, implemented by Gronau et al., 2021, and tutorialized by Hoffmann et al., 2021):

And this shows a similar example, now for the Pearson correlation test:

Researchers who question the unit-information prior that underlies Jeffreys’s simple Bayes factor may instead report a band of Bayes factors, one that ranges from the result obtained by Jeffreys to an upper bound obtained by Edwards et al. (1963), that is, a Bayes factor based on a normal prior distribution centered on the point at test, and with variance cherry-picked to yield the strongest evidence against the null hypothesis. The azure band in the top panel is for p=.05 results, and in the bottom panel it is for p=.005 results.

Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for psychological research. *Psychological Review, 70*, 193-242.

Gronau, Q. F., Raj, K. N. A., & Wagenmakers, E.-J. (2021). Informed Bayesian inference for the A/B test. *Journal of Statistical Software, 100*, 1-39. Url: https://www.jstatsoft.org/article/view/v100i17

Hoffmann, T., Hofman, A., & Wagenmakers, E.-J. (2021). A tutorial on Bayesian inference for the A/B test with R and JASP. Manuscript submitted for publication. Preprint: https://psyarxiv.com/z64th

Kass, R. E., & Vaidyanathan, S. K. (1992). Approximate Bayes factors and orthogonal parameters, with application to testing equality of two binomial proportions. *Journal **of the Royal Statistical Society, Series B, 54*, 129-144.

Wagenmakers, E.-J., & Ly, A. (2021). History and nature of the Jeffreys-Lindley paradox. Manuscript submitted for publication. Preprint: https://arxiv.org/abs/2111.10191

The relation between religiosity and well-being is one of the most researched topics in the psychology of religion, yet the directionality and robustness of the effect remains debated. Here, we adopted a many-analysts approach to assess the robustness of this relation based on a new cross-cultural dataset (*N *= 10,535 participants from 24 countries). We recruited 120 analysis teams, from whom (*n* = 61) preregistered their analysis and (*n* = 59) used analysis blinding. Teams investigated (1) whether religious people self-report higher well-being, and (2) whether the relation between religiosity and self-reported well-being depends on perceived cultural norms of religion (i.e., whether it is considered normal and desirable to be religious in a given country). In a two-stage procedure, the teams first created an analysis plan and then executed their planned analysis on the data. For the first research question, all but 3 teams reported positive effect sizes with credible/confidence intervals excluding zero (median reported beta = 0.120). For the second research question, this was the case for 65% of the teams (median reported beta = 0.039).

Furthermore, we compared the reported efficiency and convenience of preregistration and analysis blinding. Analysis blinding ties in with current methodological reforms for more transparency since it safeguards the confirmatory status of the analyses while simultaneously allowing researchers to explore peculiarities of the data and account for them in their analysis plan. Our results showed that analysis blinding and preregistration imply approximately the same amount of work but that in addition, analysis blinding reduced deviations from analysis plans. As such, analysis blinding constitutes an important addition to the toolbox of effective methodological reforms to combat the crisis of confidence.

In the current project, we aimed to shed light on the association between religion and well-being and the extent to which different theoretically- or methodologically-motivated analytic choices affect the results. To this end, we initiated a many-analysts project, in which several independent analysis teams analyze the same dataset in order to answer a specific research question (e.g., Bastiaansen et al., 2020; Boehm et al., 2018; Botvinik-Nezer et al., 2020; Silberzahn & Uhlmann, 2015; van Dongen et al., 2019).

We believe that our project involves a combination of elements that extend existing many-analysts work. First, we collected new data for this project with the aim to provide new evidence for the research questions of interest, as opposed to using an existing dataset that has been analyzed before. Second, we targeted both researchers interested in methodology and open science, as well as researchers from the field of the scientific study of religion and health to encourage both methodologically sound and theoretically relevant decisions. Third, in comparison to previous many-analysts projects in psychology, the current project included a lot of teams (i.e., 120 teams). Fourth, we applied a two-step procedure that ensured a purely confirmatory status of the analyses: in stage 1, all teams first either completed a preregistration or specified an analysis pipeline based on a blinded version of the data. After submitting the plan to the OSF, teams received the real data and executed their planned analyses in stage 2. Fifth, the many-analysts approach itself was preregistered prior to cross-cultural data collection (see https://osf.io/xg8y5).

We were able to extract 99 beta coefficients from the results provided by the 120 teams that completed stage 2. As shown in Figure 1, the results are remarkably consistent: all 99 teams reported a positive beta value, and for all teams the 95% confidence/credible interval excludes zero. The median reported beta is 0.120 and the median absolute deviation is 0.036. Furthermore, 88% of the teams concluded that there is good evidence for a positive relation between religiosity and self-reported well-being. Notably, although the teams were almost unanimous in their evaluation of research question 1, only eight of the 99 teams reported combinations of effect sizes and confidence/credibility intervals that matched those from another team (i.e., four effect sizes were reported twice).

Do note that in contrast to the unanimity in results based on the beta coefficients, out of the 21 teams for whom a beta coefficient could not be calculated, 3 teams reported evidence against the relation between religiosity and well-being: 2 teams used machine learning and found that none of the religiosity items contributed substantially to predicting well-being and 1 team used multilevel modeling and reported unstandardized gamma-weights for within- and between-country effects of religiosity whose confidence intervals included zero (see the Online Appendix).

*Figure 1: *Beta coefficients for the effect of religiosity on self-reported well-being (research question 1) with 95% confidence or credible intervals. Green/blue points indicate effect sizes of teams that subjectively concluded that there is *good evidence for a positive relation* between individual religiosity and self-reported well-being, grey points indicate effect sizes of teams that subjectively concluded that *the evidence is ambiguous*, and brown/orange points indicate effect sizes of teams that subjectively concluded that there is *good evidence against a positive relation* between individual religiosity and self-reported well-being. The betas are ordered from smallest to largest.

Out of the 120 teams who completed stage 2 we were able to extract 101 beta coefficients for research question 2. As shown in Figure 2 the results for research question 2 are more variable than for research question 1; 97 out of 101 teams reported a positive beta value and for 66 teams (65%) the confidence/credible interval excluded zero. The median reported effect size is 0.039 and the median absolute deviation is 0.022. Furthermore, 54% of the teams concluded that there is good evidence for an effect of cultural norms on the relation between religiosity and self-reported well-being. Again, most reported effect sizes were unique; only 3 out of the 101 reported combination of effect size and confidence/credible intervals appeared twice. *Figure 2:* Beta coefficients for the effect of cultural norms of the relation between religiosity and self-reported well-being (research question 2) with 95% confidence or credible intervals. Green/blue points indicate effect sizes of teams that subjectively concluded that there is *good evidence for the hypothesis* that the relation between individual religiosity and self-reported well-being depends on the perceived cultural norms of religion, grey points indicate effect sizes of teams that subjectively concluded that *the evidence is ambiguous*, and brown/orange points indicate effect sizes of teams that subjectively concluded that there is *good evidence against the hypothesis* that the relation between individual religiosity and self-reported well-being depends on the perceived cultural norms of religion. The betas are ordered from smallest to largest.

Figure 3 displays the reported deviations for teams who preregistered their analyses and teams who did analysis blinding. Our results concerning the comparison between preregistration and analysis blinding provide strong evidence (*BF* = 11.40) for the hypothesis that analysis blinding leads to fewer deviations from the analysis plan and if teams deviated they did so on fewer aspects. Teams in the analysis blinding condition better anticipated their final analysis strategies, particularly with respect to exclusion criteria and operationalization of the independent variable. That is, from the 11 teams who deviated with respect to the exclusion criteria, 10 were in the preregistration condition. Similarly, from the 9 teams who deviated with respect to the independent variable, 8 were in the preregistration condition. The estimated probability that a team would deviate from their analysis plan was almost twice as high for for teams who preregistered (i.e., 38%) compared to team who did analysis blinding (i.e., 20%). We conclude that analysis blinding does not mean less work, but researchers can still benefit from the method since they can plan more appropriate analyses from which they deviate less frequently.

*Figure 3*: Reported deviations from planned analysis per condition. The green bars represent teams in the analysis blinding condition, the orange bars represent teams in the preregistration condition. More teams in the analysis blinding condition reported no deviations from their planned analysis and if they had deviated, they did so on less aspects than teams in the preregistration condition.

Bastiaansen, J. A., Kunkels, Y. K., Blaauw, F. J., Boker, S. M., Ceulemans, E., Chen, M., Chow, S.-M., de Jonge, P., Emerencia, A. C., Epskamp, S., Fisher, A. J., Hamaker, E. L., Kuppens, P., Lutz, W., Meyer, M. J., Moulder, R., Oravecz, Z., Riese, H., Rubel, J., … Bringmann, L. F. (2020). Time to get personal? The impact of researchers’ choices on the selection of treatment targets using the experience sampling methodology. *Journal of Psychosomatic Research, 137,* 110211. https://doi.org/10.1016/j.jpsychores.2020.110211

Boehm, U., Annis, J., Frank, M. J., Hawkins, G. E., Heathcote, A., Kellen, D., Krypotos, A.-M., Lerche, V., Logan, G. D., Palmeri, T. J., van Ravenzwaaij, D., Servant, M., Singmann, H., Starns, J. J., Voss, A., Wiecki, T. V., Matzke, D., & Wagenmakers, E.-J. (2018). Estimating across-trial variability parameters of the Diffusion Decision Model: Expert advice and recommendations. *Journal of Mathematical Psychology, 87,* 46–75. https://doi.org/10.1016/j.jmp.2018.09.004

Botvinik-Nezer, R., Holzmeister, F., Camerer, C. F., Dreber, A., Huber, J., Johannesson, M., Kirchler, M., Iwanir, R., Mumford, J. A., Adcock, R. A., Avesani, P., Baczkowski, B. M., Bajracharya, A., Bakst, L., Ball, S., Barilari, M., Bault, N., Beaton, D., Beitner, J., … Schonberg, T. (2020). Variability in the analysis of a single neuroimaging dataset by many teams. *Nature, 582*, 84–88. https://doi.org/10.1038/s41586-020-2314-9

Silberzahn, R., & Uhlmann, E. L. (2015). Many hands make tight work. *Nature, 526, *189–191. van Dongen, N. N. N., van Doorn, J. B., Gronau, Q. F., van Ravenzwaaij, D., Hoekstra, R., Haucke, M. N., Lakens, D., Hennig, C., Morey, R. D., Homer, S., Gelman, A., Sprenger, J., & Wagenmakers, E.-J. (2019). Multiple perspectives on inference for two simple statistical scenarios. *The American Statistician, 73,* 328–339. https://doi.org/10.1080/00031305.2019.1565553

Suzanne Hoogeveen is a PhD candidate at the Department of Social Psychology at the University of Amsterdam.

Alexandra Sarafoglou is a PhD candidate at the Psychological Methods Group at the University of Amsterdam.