This is an anonymized version of the pre-registration. It was created by the author(s) to use during peer-review.

A non-anonymized version (containing author names) should be made available by the authors when the work it supports is made public.

No, no data have been collected for this study yet.

1) If an uncertain outcome is associated with painful consequences, does that lead people to engage in wishful thinking by underestimating the probability of that outcome?

2) Can higher incentives for accuracy reduce wishful thinking?

There are two key dependent variables:

1) Accuracy: this is a binary variable (1:correct; 0:incorrect), indexing at each trial whether the participant correctly identified the displayed pattern.

2) Confidence: this is elicited trial-by-trial, as a number between 50% and 100%: after each choice participants are asked to report the probability that this answer is correct (50% is chance level, and 100% is certainty). On the basis of this we can construct a “belief” measure, which measures on a scale from 0 to 100% the belief of the subject about the orientation of the true pattern.

In total, a participant sees 216 Gabor patches, and has to recognize whether these patterns are tilted to the right or left. After deciding between these two answers, the participant indicates his confidence in this decision in percentages. The participants are incentivized monetarily for providing accurate answers by a matching probability. At the start of the experiment, each participant is connected to an electric stimulation device that is personally calibrated to deliver mild but unpleasant shocks. Participants will receive an electric shock with a probability of 1/3 if the true answer is either right or left (depending on the condition).

This is a 2x2 design with 4 conditions. Each participant will participate in each condition (a within-subject design), and the pattern recognition tasks are equally divided over all conditions (54 in each condition). The two treatments dimensions are a) the incentives for accuracy (high and low), and b) whether the shock is associated with the left-leaning Gabor patch or the right-leaning Gabor patch.

We are interested how the shock influences the accuracy and confidence of the participants. Specifically, we will test the following directional hypotheses:

1) Wishful thinking 1: Does the accuracy of identifying a given pattern go down if the potential shock is aligned with the true answer (the unpleasant true answer), relative to the case where the potential shock is not aligned with the answer? We use one-sided t-tests to evaluate the differences in the average accuracy and confidence between conditions, where each observation is the average of a subject’s answers in that condition.

2) Wishful thinking 2: Similarly, we will use one-sided t-tests to examine whether the confidence in the true answer decreases if the potential shock is aligned with the true answer.

3) Accuracy incentives: We will test use one-sided t-tests whether accuracy and confidence in the true answer are higher in the condition with high incentives for accuracy.

In addition to t-tests, we will also use multivariate linear regression analysis to test the effect of our treatments (accuracy incentives, shock alignment) as well as their interaction. Finally, we will use linear mixed effect models (with or without individual fixed effects) where we can control for trial characteristics and/or subject characteristics (see below).

We calibrate the difficulty of the task in the beginning of the experiment, so that we expect participants to be accurate 75% of the time on average. The actual accuracy in the experiment may deviate from this, and we will exclude the participant if actual accuracy is outside the [60%-90%] range, as this may indicate that, despite the calibration, the task was either too easy or too hard to detect meaningful differences.

No need to justify decision, but be precise about

We ran a previous study that we had to discard due to an error in the code but was very similar. On the basis this study, running the same tests as specified above, we calculated that we could achieve more than 80% power with a sample of 60 people, so we will invite 60 participants

(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

We are likely to do some further exploratory analysis. For instance, we will see if the strength of the effect size differs by the difficulty of the task. We also look whether people who score higher on the trait anxiety, which we measure with a psychological questionnaire, are more affected by the shock, both in their accuracy and their beliefs. To investigate this, we will run mixed models which feature interactions between the shock treatment and the trait anxiety measured by the questionnaire.

Finally, we might explore how the effects of shocks play in typical models of confidence formation (inspired from signal-detection theory): confidence is known to be an increasing function of evidence for correct answers and decreasing for incorrect answers. We can test if the presence of shocks modulate the intercept of the slopes of this model (see e.g. Lebreton, et al. (2017) biorXiv for similar analysis)