#154,976 | AsPredicted

'Competence evaluation of moral vs. technology texts from ChatGPT vs. humans'
(AsPredicted #154,976)


Author(s)
Finn Weymann (University of Tübingen) - finn.weymann@student.uni-tuebingen.de
Julia Schühle (University of Tübingen) - julia.schuehle@student.uni-tuebingen.de
Sebastian Proksch (University of Tübingen) - sebastian.proksch@student.uni-tuebingen.de
Elisabeth Streeb (University of Tübingen) - elisabeth.streeb@student.uni-tuebingen.de
Joachim Kimmerle (Leibniz-Institut fuer Wissensmedien Tuebingen) - j.kimmerle@iwm-tuebingen.de
Pre-registered on
2023/12/12 07:05 (PT)

1) Have any data been collected for this study already?
No, no data have been collected for this study yet.

2) What's the main question being asked or hypothesis being tested in this study?
This research examines how people evaluate texts with a moral vs. a technological topic, which they believe were written either by a human or by ChatGPT. The research questions consider how people evaluate 1) author competence and 2) content quality, and 3) whether they indicate they would hand in this text as a student in a university course (sharing intention).
For each of the following hypotheses, we expect an interaction effect between authorship label and text topic.
H1: Interaction effect for author competence: ChatGPT's author competence will be evaluated better with a technological text topic, while a human's author competence will be evaluated better with a moral text topic.
H2: Interaction effect for content quality: The evaluation of the content quality of a text that was labeled as written by ChatGPT will be evaluated better with a technological text topic, while the evaluation of the content quality of a text that was labeled as written by a human will be evaluated better with a moral text topic.
H3: Interaction effect for sharing intention: Sharing intention for a text that was labeled as written by ChatGPT will be stronger with a technological text topic, while sharing intention for a text that was labeled as written by a human will be stronger with a moral text topic.

3) Describe the key dependent variable(s) specifying how they will be measured.
Our dependent variables will be 1) the evaluation of author competence, 2) the evaluation of content quality, and 3) sharing intention. After each of six different texts, the participants will evaluate the text based on a questionnaire.
The questionnaire for author competence includes the following items:
• The author is trustworthy.
• The author is knowledgeable of the subject.
• The author is smart.
This questionnaire for content quality contains the following items:
• The proposed solution described in the text is very concrete.
• The content of the text is very creative.
• The text is easy to understand.
• The text is well written.
• The text is credible.
Sharing intention will be measured with one item:
• I would hand in this text as a student in a university course. (In this scenario no legal consequences have to be considered.)
The items are taken from Böhm et al. (2023). They will be measured on 7-point Likert scales.
Furthermore, we add a manipulation check item in the questionnaire that will ask participants to indicate whether the topic of the text is moral or technological on a 7-point scale. The last question of the questionnaire contains an attention check that will ask participants if the author of the text was ChatGPT or a human (2AFC).
Böhm, R., Jörling, M., Reiter, L., & Fuchs, C. (2023). People devalue generative AI's competence but not its advice in addressing societal and personal challenges. Communications Psychology, 1(1), 32.

4) How many and which conditions will participants be assigned to?
Our experiment will be structured as a balanced within-subject design. Thus, every participant will be assigned to all conditions. There will be six texts: three allegedly written by a human author and three allegedly by ChatGPT; independent of authorship three text will deal with a technological and three will deal with a moral topic. This will result in six trials per participant: Every participant gets to see every text, but the label of the author will differ. The order of the texts and the labels of the authorship will be randomized.
This will be a 2x2 factor design with the independent variables authorship label (ChatGPT vs. human) and text topic (technological vs. moral topic).

5) Specify exactly which analyses you will conduct to examine the main question/hypothesis.
Our analysis for each hypothesis contains a multiple linear regression. For each of the three dependent variables, one mean value will be calculated for each condition per participant in the analysis. Authorship label and text topic will be the two predictors in the regression model. The ChatGPT condition will be coded as 0 and the human condition will be coded as 1. Furthermore, the moral text condition will be coded as 0 and the technology text condition will be coded as 1.

6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.
We will only include participants who completed the experiment and passed the attention check. We are going to exclude the complete dataset of the participants who did not pass the attention check, that is, participants who choose the wrong author at least once.

7) How many observations will be collected or what will determine sample size?
No need to justify decision, but be precise about exactly how the number will be determined.

Our participants need to be college students and native German speakers. They must be at least 18 years old. We will recruit our participants online via a link to the study. To provide an incentive to take part in the experiment, participants have the opportunity to win a gift certificate for an online shop. We calculated that we would require at least n = 45 participants to achieve a power of 95% for the possibility of detecting a minimal relevant interaction effect of -1.25 on a 7-point Likert scale for all three analyses with an alpha of 5%.

8) Anything else you would like to pre-register?
(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

To ensure that the quality of the presented texts was even among the texts, we conducted a pilot study with n = 34 participants. The descriptive data showed no meaningful differences in the quality of the texts. The mean scores of the texts were 3.85; 3.71; 3.80; 3.93; 4.01; 3.78 (on five-point Likert scales).

Version of AsPredicted Questions: 2.00