This is an anonymized version of the pre-registration. It was created by the author(s) to use during peer-review.

A non-anonymized version (containing author names) should be made available by the authors when the work it supports is made public.

No, no data have been collected for this study yet.

Hypothesis 1: Corrections of misleading graphs directly lead to lower perceived differences compared to the initial perceptions of the same misleading graphs; Hypothesis 2: Corrections of misleading graphs on the long run lead to lower perceived differences compared to the initial perception of the same misleading graphs; Hypothesis 3: Corrections of misleading graphs directly lead to lower perceived differences in new misleading graphs compared to perceptions of previously evaluated misleading graphs; Hypothesis 4: Corrections of misleading graphs on the long run lead to lower perceived differences in new misleading graphs compared to perceptions of previously evaluated misleading graphs; Hypothesis 5: Correction effects on the long run as described in hypothesis 4 differ per correction type.

We use accurate, misleading, and corrected graphs. We measure the effect of four types of corrections on how participants change their evaluation of the difference between the bars in the charts, we call this the correction effect. For each graph, the participants evaluate the difference between the bars on a visual analogue scale (VAS) ranging from "very small" to "very big" which is trans-coded into values between 0 (very small) to 100 (very big). The correction effect is measured as the difference between the participants' evaluation of the misleading graph, and of the corrected graph on the same context. Each participant is shown only one of the four types of correction. We calculate the average difference for each participant between four misleading graphs and their four corrections.

To estimate the correction effect for graphs with a new context, we cannot compare the change in evaluation directly. Hence, we introduce a new measure we refer to as the misled score. A misled score indicates the degree to which a person is deceived by the misleading graph in reference to the evaluation of the accurate graph on the same context. If we would present the participants both a misleading and an accurate graph on the same context, the two evaluations could influence each other. Therefore, we divide the participants into two groups: in each series, the accurate graphs that are presented to group 1 are presented as misleading graphs to group 2 and vice versa. The misled score is then calculated by subtracting the median evaluation of the accurate graph (other group) from a participant's evaluation of the misleading graph on the same context (i.e. the deviance). Hence, a positive score indicates that a participant was misled; the higher the score, the more they were misled.

The four misleading graphs are presented in a random series with four accurate graphs on different topics to create a flow and to avoid familiarizing participants with the omitted base line. After the series of the four misleading graphs and four accurate graphs (series 1) we present the participants a randomized series of four corrections of the misleading graphs in series 1 (series 2). This is followed by a randomized series of four new misleading graphs and four new accurate graphs (series 3) to also explore how the correction might affect encounters with new misleading graphs. This captures the first survey. In a follow-up survey one week later, we present the participants with the same first four misleading and accurate graphs of the first survey in series 1, together with four new misleading graphs and four new accurate graphs in a randomized series to explore the effect of time after correction on the evaluation of the same misleading graphs. We also include four new misleading graphs to explore the effect of correction on encounters with new misleading graphs.

We created four types of corrections for misleading graphs with an omitted baseline. Each correction type functions as a single condition:

Condition A presents an alternative accurate graph; Condition B presents a visual warning on the misleading y-axis; Condition C presents a textual warning on the misleading y-axis; Condition D presents a textual warning on the non-objective representation of the data in the graph. Each participant is randomly assigned either one of the four conditions, equally divided among groups 1 and 2.

Hypothesis 1: We first subtract per context the evaluation of the misleading graph in series 1 (first survey) from the evaluation of the corresponding corrected graph in series 2 (first survey) to calculate the differences in evaluation per context for each participant. Then we calculate the mean over these four differences for each participant and test in a one-sided one sample t-test whether on average these mean differences were indeed smaller than zero.

Hypothesis 2: We first subtract per context the evaluation of the misleading graph in series 1 (first survey) from the evaluation of the same misleading graph one week later (second survey) to calculate the differences in evaluation per context for each participant. Then we calculate the mean over these four differences for each participant and test in a one-sided one sample t-test whether on average these mean differences were indeed smaller than zero.

Hypothesis 3: For each participant we calculate the mean misled scores over the four misleading graphs in series 1 (first survey) as well as for the new misleading graphs presented directly after correction in series 3 (first survey). With a one-sided paired t-test, we test whether on average the participants' mean misled scores dropped significantly.

Hypothesis 4: For each participant we calculate the mean misled scores over the four misleading graphs in series 1 (first survey) as well as for the new misleading graphs presented one week later (second survey). With a one-sided paired t-test, we test whether on average participants' mean misled scores were significantly lower on the long run.

Hypothesis 5: We are interested in differences in effect between the four correction types. Therefore, we use visualizations to get a first impression of their effects for the comparisons described in each of the four hypotheses. Additionally, we test for significant differences between the correction types on the long run (as is done in hypothesis 4). For each participant we calculate the mean misled scores over the four misleading graphs in series 1 (first survey) as well as for the new misleading graphs presented one week later (second survey). We subtract the latter from the former to calculate the difference in misled score per participant on the long run (which we expect to be smaller than zero). On these differences we then run an ANOVA analysis to test whether on average there are any differences between the effects of the four correction types. If we find any differences, we will perform post hoc comparisons to test which pairs of correction types are significantly different.

We apply the Holm-Bonferroni correction to compensate for the five tests that we perform.

Participants that do not finish the first survey will not be paid for their contribution and thus are excluded from the analyses all together. Participants that do not finish the second survey are only excluded from the analyses that involve data from the second survey. Furthermore, we exclude participants who finished the survey within 90 seconds (first survey) or 45 seconds (second survey) because we assume that they did not take the survey seriously. We work with series of graphs on different topics and use their means to minimize the influence of context on the results. To explore any structural influence on this mean by one context, we will calculate the Z-scores of all evaluations of the accurate graphs in series 1 (first survey) and examine whether the outliers (Z-score > 3) are predominantly in the same context. If so, we will consider excluding the context, individual evaluations, or participant.

No need to justify decision, but be precise about

Sample size calculation t-tests (hypotheses 1-4), assumptions and settings:

4 One-sided one sample or paired t-test; Alpha = 0.05/5 (lower bound for the Holm-Bonferroni correction for 5 tests); Power = 0.9; Smallest effect size of interest = 5 points on the 100-point scale. Resulting in minimum sample size of 272 if equal variances are assumed (stats package in R, power.t.test). Resulting minimum sample size of 379 if equal variances are not assumed (MKmisc package in R, power.welch.t.test).

Sample size calculation ANOVA, assumptions and settings:

ANOVA to compare 4 correction type; Alpha = 0.05/5 (lower bound for the Holm-Bonferroni correction for 5 tests); Power = 0.9; Effect size of interest (Cohen's f) is small to medium, f = 0.25 (Cohen, 1988). Resulting minimum sample size of 79 per group, so 316 in total (pwr package in R, pwr.anova.test).

Conclusion: Minimum required sample size is 379. We add about 15% to account for drop-out. Hence, in total we will collect data from 440 participants, 110 per correction type.

(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

To check whether the omission of the baseline in the graphs was indeed misleading, we will compare the mean evaluations of all single misleading graphs with the mean evaluations of their related accurate graphs displaying the same context in the other group in an independent two-sided t-test per context (with Holm-Bonferroni correction for multiple testing). If a context was not found to be misleading, then we will consider excluding this context from the analyses.

Additionally, for exploratory purposes, we will run the analyses again for determining how effective the correction types are with exclusion of the participants that have not been misled by the misleading graphs. This extra analysis will generate a cleaner picture of the correction effect and which of the four correction types are most effective. To determine which participants were misled, we calculate the misled scores for each of the four misleading graphs in series 1 (first survey) for each participant. Participants that show at least three misled scores of 5 points or more are regarded as being misled. Non-misled participants are excluded from this analysis. To develop ideas for further research, we explore education level and graph literacy as possible predictors for being misled and the correction effect. For this purpose, we include a question on highest completed educational level and four questions from the Short Graph Literacy scale.