'Beliefs about Generative Artificial Intelligence in Economic Decisions' (AsPredicted #210376)
Author(s) This pre-registration is currently anonymous to enable blind peer-review. It has 3 authors.
Pre-registered on 2025/01/30 - 01:48 PM (PT)
1) Have any data been collected for this study already? No, no data have been collected for this study yet.
2) What's the main question being asked or hypothesis being tested in this study? We aim to study whether people hold accurate beliefs about how generative artificial intelligence (generative AI) agents make decisions in various economic domains: risk, intertemporal choice, social preference, and strategic interactions. One specific hypothesis being tested is whether people err in the direction of projecting their own decisions onto the AI. At the group level, this hypothesis says that people's predictions about AI's decisions are closer to average human behavior than to average AI behavior. At the individual level, this hypothesis says that people's predictions about AI's decisions in each problem are positively correlated with their own decisions in the same problem.
3) Describe the key dependent variable(s) specifying how they will be measured. We have a panel of 10 questions spanning the domains of risk, intertemporal choice, social preference, and strategic interactions. Each question requires either a numerical or a binary answer.
We will first measure GPT 4o's average responses to these questions by querying OpenAI's API to obtain the distribution of the next token when GPT 4o is presented with these questions. This lets us directly observe the frequencies with which GPT 4o would provide various answers. We will query the API 100 times for each question to obtain 100 distributions of the next token, and we will average across these 100 distributions to calculate the expected GPT 4o response for each question.
Our experiment involving human subjects has two parts. In the first part, subjects will be presented with 9 out of the 10 questions. All questions will be incentivized. We will record the subjects' responses and calculate the average human subject response to each question.
Finally, the same human subjects will be told that we also asked an AI chatbot the same questions, and will be shown the exact query that was used for the AI. The human subjects will be asked to make incentivized predictions about the average AI response for each of the 10 questions (including the one question that they did not answer themselves). We will record the subjects' predictions and calculate the average prediction about AI response to each question.
4) How many and which conditions will participants be assigned to? Participants will not be partitioned into different conditions. We will randomize the order of the tasks within the choice problems and the prediction problems for each participant.
5) Specify exactly which analyses you will conduct to examine the main question/hypothesis. For the group-level analysis, we will compute a measure of the relative prediction accuracy using the AI's average response as the truth versus using the human's average response as the truth. More specifically, for each of the 10 choice problems that we present to both the AI chatbot and the human subjects, if the average prediction about AI's response is p, AI's actual average response is ya and human's actual average response is yh, then we define the relative prediction accuracy to be 1- [ |p-ya| / (|p-ya| + |p-yh|) ], where |.| is the absolute value function. A relative prediction accuracy of 1 means the prediction fully matches the average AI response, a relative prediction accuracy of 0 means the prediction fully matches the average human response, and a larger measure corresponds to a prediction that is closer to the AI's average response than the human's average response. Those questions where the average AI response and average human response are very close to each other (if any) will be excluded from the group-level analysis (see details in "Outliers and Exclusions").
For the individual-level analysis, for each of the 10 choice problems that we present to both the AI chatbot and the human subjects, we will run the regression:
prediction about AI behavior = constant + beta * subject's own decision in the same problem
We predict that the slope coefficient beta will be positive (i.e., subjects' predictions and decisions will be positively correlated).
6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations. Those questions where AI's average response is within 0.1 standard deviation of the average human response will be excluded from the group-level analysis. This is because when the average AI response and the average human response are too close for a given question, the value of the relative prediction accuracy measure will be largely driven by sampling noise.
7) How many observations will be collected or what will determine sample size? No need to justify decision, but be precise about exactly how the number will be determined. We will recruit 300 participants on Prolific who meet the following conditions: (1) Live in the US, (2) Previously completed at least 10 studies on Prolific, and (3) Have an approval rate of at least 95% on Prolific.
8) Anything else you would like to pre-register? (e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?) We will also conduct the following secondary analyses.
First, for the one question that we asked the AI chatbot but not human subjects, we will run an individual-level regression:
prediction about AI behavior = constant + beta* subject's own choice in a related problem.
For the questions that we asked both the AI chatbot and human subjects, we will also run the analogous individual-level regression as a robustness check:
prediction about AI behavior = constant + beta* subject's own choice in a related problem.
Prolific collects demographic data on their subjects, such as age and gender. We will also ask the subjects about their experience with generative AI products at the end of the study. Using these data, we will conduct regression analyses of how predictions about AI's responses relate to demographic variables and experience with generative AI products.