'ChatGPT vs. human agent in brief chat-coaching for procrastination (PRO-KI)'
(AsPredicted #150559)


Author(s)
Severin Hennemann (University of Mainz (Germany)) - s.hennemann@uni-mainz.de
Stefanie M. Jungmann (Johannes Gutenberg-Universität Mainz) - jungmann@uni-mainz.de
Julia Fähnrich (Johannes Gutenberg-Universität Mainz) - jfaehnri@students.uni-mainz.de
Cornelius Tietze (University of Mainz (Germany)) - ctietze@students.uni-mainz.de
Pre-registered on
11/10/2023 05:03 AM (PT)

1) Have any data been collected for this study already?
No, no data have been collected for this study yet.

2) What's the main question being asked or hypothesis being tested in this study?
This study investigates the efficacy, safety, and acceptability of an AI agent (ChatGPT4, intervention group) as compared to human agents (control group) in a 3-session chat coaching for reducing procrastination in adults.
The main research questions are:
a. Are there differences in efficacy with regard to reducing self-reported procrastination from baseline to post-assessment (primary endpoint) between ChatGPT4 vs. human coach?
b. Are there differences in secondary outcomes (depression, anxiety, worry, or rumination) from baseline to post-assessment between ChatGPT vs. human coaches?
c. Is the treatment effect on the primary endpoint moderated by demographic characteristics, psychological distress (depression, anxiety), worry, or rumination?
d. Are there differences in common impact factors (working alliance, problem actualization, mastery) between ChatGPT vs. human coaches?
e. Is the treatment effect on the primary endpoint mediated by these common impact factors?
f. To what extent and nature do participants report adverse effects of the coaching at post-assessment?

3) Describe the key dependent variable(s) specifying how they will be measured.
The primary endpoint will be the change in self-assessed procrastination, as assessed by the German "Allgemeiner Prokrastinationsfragebogen" (Engl. „General Procrastination Questionnaire"; APROF; Höcker et al., 2017) in the overall and subscale scores, from baseline (T0) to post-assessment (T3).

Secondary outcomes: Worry will be assessed via the Penn State Worry Questionnaire (PSWQ; Meyer et al., 1990) and rumination via the brooding subscale of the Ruminative Response Scale (RRS; Treynor et al., 2003) at baseline and post-assessment. Depressiveness will be assessed with the depression scale of the Patient Health Questionnaire (PHQ-9; Kroenke & Spitzer, 2002) and anxiety with the General Anxiety Disorder Questionnaire (GAD-7; Spitzer et al., 2006).

The "Mainzer Stundenbeurteilungsbogen" (Engl. "Mainz Hourly Assessment Form"; MSB; Bräscher et al., 2021) will be used to assess common impact factors of the coaching process after each chat session (T1, T2, T3).

Unwanted subjective effects of the coaching will be assessed with the Negative Effects Questionnaire (NEQ; Rozental et al., 2019) post-assessment.

Summary of measurement points and instruments:
Baseline (T0): APROF, PHQ-9, GAD-7, PSWQ, RRS, demographics, experience with ChatGPT
During coaching (T1-T2): MSB
Post-Assessment (T3): APROF, PHQ-9, GAD-7, PSWQ, RRS, MSB, NEQ.

4) How many and which conditions will participants be assigned to?
When the pre-survey is available, the participants are randomly assigned to one of the two study conditions: The intervention group will receive chat-coaching with ChatGPT4 and the control group will receive chat-coaching with a human agent, i.e., two trained master students of clinical psychology. Participants will be randomly assigned to one of the two groups using an independently generated random number list and will be blinded to their assignment (study personnel will not be blinded due to the nature of the intervention).
The chat coaching for both study arms will consist of three (weekly) sessions of approx.15-25min each over a period of approx. 21 days (i.e., scheduled to be weekly). After each chat coaching session, participants will be asked to complete a short (approx. 5-10min) online survey (T1, T2). The post-assessment (T3, duration: approx. 15min) will then take place after the third chat coaching session.
In terms of content and methodology, the chat coaching interventions are based on a cognitive behavioral therapy manual for the treatment of pathological procrastination (Höcker et al., 2017). The agenda of the three chat coaching sessions is shown below. After each session, participants will receive a scheduled homework (PDFs).
Contents of the three chat coaching sessions:
(1) Behavioral analysis (type and consequences of current procrastination), self-monitoring, psychoeducation (definition and explanatory model of procrastination).
(2) Strategies for time management and planning (e.g., setting priorities, identification and modification of distractors, work plan).
(3) Cognitive modification (identification of procrastination-promoting thoughts and development of alternative thoughts), summary of strategies.

5) Specify exactly which analyses you will conduct to examine the main question/hypothesis.
To test the efficacy of the coaching for the primary outcome, that is, reducing procrastination (assessed via APROF) and in secondary outcomes (PHQ-9, GAD-7, PSWQ, RRS), univariate repeated measures analyses of variance (rmANOVAs) will be calculated, testing for the time (T0 vs. T3) by group (ChatGPT vs. human agent) interaction.
A putative moderating effect of baseline variables (i.e., demographic characteristics, procrastination severity, psychopathology [PHQ-9, GAD-7, PSWQ, RRS]) on the effect of group (IV = ChatGPT vs. human agent) on the treatment effect (DV = APROF at post, controlled for baseline values) will be tested using the PROCESS macro.
Between-group differences in common impact factors (assessed via MSB) will be analyzed with rmANOVA (time: T1, T2, T3) and adjusted posthoc tests will be applied to further investigate group differences.
Cross-lagged mediation models will be estimated to investigate putative mediating effects of common impact factors (via MSB) with group (ChatGPT vs. human agent) as the independent variable and procrastination (APROF) at post-assessment as the dependent variable.
Group differences in nature and frequency of side effects will be analyzed using X²-test and t-test.

6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.
Participants will be excluded beforehand if they are under 18 years of age, have not provided informed consent, do not have sufficient German language skills, or are currently undergoing psychotherapeutic treatment.
Otherwise, main analyses will be performed following the intention-to-treat approach. Strategies for handling missing data and subgroup analysis strategies will be developed upon data closure.
Outliers are considered exploratorily. A sensitivity analysis will be conducted after removing possible outliers (+/- 3 SD in questionnaire data).

7) How many observations will be collected or what will determine sample size?
No need to justify decision, but be precise about exactly how the number will be determined.

A total of N = 56 adults, mainly university students, and individuals from the general population is planned to be included, that is n = 28 subjects per condition. This sample size allows for sufficient statistical power for at least small between-group effects (f ≥ .15) at β = .80, α = .05 in a two-factor rmANOVA (time: pre vs. post; group: ChatGPT vs. human agent). Note that this is a pilot study to be followed by a larger-scale investigation.

8) Anything else you would like to pre-register?
(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

(a) How reliably can participants identify the type of coaching support they received (ChatGPT vs. Human support) post-assessment?
Frequencies of actual/participant-rated allocation will be compared using interrater statistics (e.g., sensitivity/specificity; Cohen's Kappa).

(b) Are there any differences between the types of coaching in the therapeutic competence as rated by independent raters based on the chat content?
Situation-specific (e.g., setting an agenda) and general therapeutic competencies (e.g., dealing with problems/questions/objections) will be rated afterward by independent raters using the Cognitive Therapy Scale (CTS; Weck et al., 2010) based on randomly selected chat sessions. For this purpose, students (other persons as coaches) are trained in advance regarding the manual used and the CTS. We plan that 10% of the chat coaching sessions (i.e., 18 sessions) will be randomly selected, with each session number (1st/2nd/3rd) occurring equally often. In addition to comparing the test scores (M, SD) of the CTS between conditions, the assessor agreement will be calculated using intraclass correlations (ICC).

(c) The chat histories will be exported from the chat portal and stored electronically in order to have documentation available in case of queries or crises. In addition, this allows for further analyses (e.g., qualitative text analysis, or linguistic analyses).

Version of AsPredicted Questions: 2.00