#150,277 | AsPredicted

'The Inherence Bias in Scientific Explanation'
(AsPredicted #150,277)


Author(s)
Zachary Horne (University of Edinburgh) - zachary.horne@ed.ac.uk
Mert Kobas (New York University) - mertkobas@nyu.edu
Andrei Cimpian (New York University) - andrei.cimpian@nyu.edu
Pre-registered on
2023/11/08 08:15 (PT)

1) Have any data been collected for this study already?
No, no data have been collected for this study yet.

2) What's the main question being asked or hypothesis being tested in this study?
We are investigating whether practicing scientists (in this study, chemists, biologists and neuroscientists) rely on inherent facts when generating explanations of unfamiliar scientific phenomena (in this study, astrophysical phenomena).

3) Describe the key dependent variable(s) specifying how they will be measured.
There are two key dependent variables. First, we ask participants to provide an open-ended explanation for why planets lose mass over time, why planets differ in their atmosphere density, or why planets have different size magnetospheres. (Each participant will explain a single phenomenon.) Here, we will measure whether their explanation focuses on inherent facts of the planet (e.g., the planet's composition) and/or whether their explanation focuses on extrinsic facts about the planet (e.g., the star the planet orbits).

Second, after participants learn about how both composition and distance to star can affect the phenomena described above, we have them rate using a percentage sliding scale the degree to which composition vs. distance to star explains mass loss or magnetosphere size or atmosphere density. We will compare participants' responses on this sliding scale against a "ground-truth" estimate of the variance accounted for by composition vs. distance to star based on simulations from the realistic physics simulator Universe Sandbox. These ground-truth estimates provide an approximate way to assess the level of practicing scientists' inherence bias (if such a bias exists).

4) How many and which conditions will participants be assigned to?
Several factors were varied, which we will include as covariates in our regression models, but we do not expect these factors to substantially impact the inherence bias in people's explanations. The following set of factors were varied:

Names of planet: We varied the names of the unfamiliar planet participants learned above. The planet's names were either scientific names similar to those given to exoplanets' host stars followed by a letter designation (e.g., HD 209459 b) or a name similar to actual exoplanet names (e.g., Galanthis).

Presentation of composition: We varied how the compositions of planets were presented. They were presented either in terms of total mass in scientific notation (e.g., 18e40 silicate) or in terms of percentages of mass (e.g., 40% silicate).

Percentage of composition: We varied the percentage of silicate/hydrogen or iron/silicate presented to participants. For instance, some participants were told the percentage of silicate was 40% whereas others were told it was 2%.

Presentation of distance to star: We varied how the distance to a star was presented, namely in terms of either light minutes or kilometers.

Consequence of distance to star: We described three different consequences of planets orbiting closer to a star (between subjects):consequences for its average temperature (presented in kelvin), for the speed of its orbit around the star (in terms of kilometers per second), or in the case of magnetosphere size, for the average solar pressure (presented in Newtons per square meter).

5) Specify exactly which analyses you will conduct to examine the main question/hypothesis.
We will fit a series of regression models to assess our main predictions.

First, we will compare the frequency of inherent and extrinsic open ended explanations by fitting a Bayesian logistic mixed-effects model with a weakly regularizing prior and with a single fixed effect: type of explanation (inherent vs. extrinsic). (If the mixed-effects model does not converge, we may instead cluster the standard errors by participant in a regular logistic regression.)

Second, we will predict participants' percentage inherent slider responses and, separately, percentage extrinsic slider responses by fitting two Bayesian beta regression models which will include covariates for all of the varying factors listed above. We will then compare the estimated posterior distribution against the ground-truth estimate of the variance accounted for by planet composition and distance to star. (We will use beta regression because it is well suited for outcome variables with distributions between 0 and 1.)

6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.
We will exclude participants who indicated they did not pay attention -- the question asks participants if they paid attention and took the study seriously. Participants who say they did not pay attention will be excluded from analyses but will still be compensated. We will also exclude participants who answered >=5 out of 6 questions correctly on a quiz that tests their knowledge of the astronomical phenomena included in this study. This study aims to understand how practicing scientists explain the unfamiliar. Getting this many quiz questions correct would indicate a high level of expertise in the topics of interest.

7) How many observations will be collected or what will determine sample size?
No need to justify decision, but be precise about exactly how the number will be determined.

We will invite via email all members of the top 15 departments of chemistry, biology, and neuroscience (according to US News & World Report Best Global Universities rankings) who have their email addresses publicly available on their department's website. We will only include universities from English-speaking countries. If a top-15 university is in a non-English speaking country, we will include in our sample the next-ranked university that is.
The nature of our recruitment procedures mean that we are unsure of exactly how many participants will agree to participate in the study -- participants are emailed asking if they are willing to participate but response rates to these sorts of studies can be quite low.
A power analysis for a two-tailed one-sample t test with 0.95 power, a Cohen's d of 0.4 (estimated via a series of prior pilot studies with undergraduate participants), and 0.05 alpha threshold suggested we need 84 participants to reliably detect an inherence bias in explanation (on the slider task). We aim to recruit more participants than this to ensure we have an accurate estimate of the predicted effects, but – again – our recruitment procedures preclude us from being able to say with certainty exactly how many participants will be in the study once data collection is complete (i.e., 1 week after they receive a reminder email if a given participant hasn't already completed the study).
If the sample size after emailing the top 15 departments in these three fields does not reach 84, we will email more departments in increments of 5 ranks (16-20, 21-25, etc.) until we reach this sample size.

8) Anything else you would like to pre-register?
(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

We will explore possible relationships between domain of expertise (e.g., chemistry) and the inherence bias for both open ended explanations and percentage slider responses, though we have no specific predictions about this possible relationship.

Version of AsPredicted Questions: 2.00