'Socio-demographics of large-language-model users' (AsPredicted #127763)
Author(s) This pre-registration is currently anonymous to enable blind peer-review. It has 2 authors.
Pre-registered on 04/05/2023 01:37 AM (PT)
1) Have any data been collected for this study already? No, no data have been collected for this study yet.
2) What's the main question being asked or hypothesis being tested in this study? What are the socio-demographic characteristics of users of large language models (LLMs)? What are LLM usage scenarios?
Hypotheses (for binary usage and usage frequency):
H1: Men are more likely to use LLMs than people of other genders.
H2: Younger age groups are more likely to use LLMs than older age groups.
H3: There is an interaction effect of gender x age on the likelihood of using LLMs.
3) Describe the key dependent variable(s) specifying how they will be measured. Usage (yes / no) and usage frequency (subjective frequency in ordered single-choice question) of LLMs
4) How many and which conditions will participants be assigned to? One condition; participants will be grouped based on their gender and age (18-24, then 10-year brackets, based on prior work)
5) Specify exactly which analyses you will conduct to examine the main question/hypothesis. Binary variables: logistic regression
Ordinal variables: linear regression with a cumulative link function
6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations. We do not expect critical outliers.
Participants will be excluded if they fail attention checks or write nonsense text in the open questions.
7) How many observations will be collected or what will determine sample size? No need to justify decision, but be precise about exactly how the number will be determined. 1500 participants from a representative sample based on US census data. Sample size is determined by the maximum number of participants in the platform (Prolific on April 4, 23).
8) Anything else you would like to pre-register? (e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?) We will conduct additional exploratory analyses for potential factors influencing LLM usage, such as household income, education level, ZIP codes, and employment status.
We will also collect and cluster typical usage scenarios and reasons for (not) using LLMs to motivate a second experimental study.