'Calibration of confidence on U.S. city-pairs' (AsPredicted #47425)
Author(s) Kevin Dorst (University of Pittsburgh) - kevindorst@pitt.edu
Pre-registered on 09/10/2020 01:30 AM (PT)
1) Have any data been collected for this study already? No, no data have been collected for this study yet.
2) What's the main question being asked or hypothesis being tested in this study? The question is whether variation in calibration curves across hard/easy questions will qualitatively match variations predicted from simulations of rational calibration curves.
3) Describe the key dependent variable(s) specifying how they will be measured. Confidence in guesses, measured on a 50–100% scale, in increments of 10%.
4) How many and which conditions will participants be assigned to? One condition; all participants will be asked which of two cities has a bigger population, for 20 total trials. For each participant the city-pairs will be randomly selected from amongst all possible pairings of the 20 most-populous US cities.
5) Specify exactly which analyses you will conduct to examine the main question/hypothesis. I will divide up questions into those that are *easy* and those that are *hard* based on the proportion of test-takers who answered them correctly. Questions with at least 75% correct answers are easy questions, those with less than 75% correct are hard questions. I will pool all subjects' answers in each set of questions (easy/hard/all).
I'll then perform a (one-way) paired t-test to test the hypothesis that the the average confidence amongst the hard questions exceeds the proportion of such answers that were true, and likewise perform a (one-way) paired t-test to test the hypothesis that the average confidence amongst the easy questions is exceeded by the proportion of such answers that were true.
Finally, I will run versions of the simulation of rational confidence to qualitatively compare the generated mean calibration curves to those of participants'. In particular, I will use the number of easy/hard questions to fix the "number of questions" parameter in each simulation. And I will use the proportion-true in the easy-, hard-, and all-questions sets to determine which trials from the simulation to include for the hard/easy/all sets---in particular, for each question-set, I'll adjust the max or min hit-rate allowed (in increments of 0.01) until the resulting simulation curve matches the observed hit-rate in the easy/hard/all condition to within 0.02. If the simulations has <300 instances within that hit-rate range, I'll redo the process with a bigger initial simulation.
The simulations also have a free "noise" parameter. I will go through the above process at noise levels 0, 0.5, 1.0,...,2.5, 3.0, and then choose the parameter whose outputted mean calibration curves minimizes mean-squared error from the participants' mean calibration curves across both easy and hard conditions. I will report comparisons between participants' mean calibration curves in the hard/easy/all conditions with both the 0-error (i.e. error-free) model and the one with the noise parameter that minimizes mean-squared error.
6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations. Participants will be asked one "attention check" question---whether New York or Columbia, MO has a bigger population. Data from those who get this attention check incorrect will be excluded.
7) How many observations will be collected or what will determine sample size? No need to justify decision, but be precise about exactly how the number will be determined. 200 participants will be recruited through Prolific
8) Anything else you would like to pre-register? (e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?) n/a