#60390 | AsPredicted

'Gender Differences in Task Performance and Beliefs'
(AsPredicted #60390)


Author(s)
This pre-registration is currently anonymous to enable blind peer-review.
It has 4 authors.
Pre-registered on
03/09/2021 05:35 AM (PT)

1) Have any data been collected for this study already?
No, no data have been collected for this study yet.

2) What's the main question being asked or hypothesis being tested in this study?
We will have two groups of participants: Group 1 will be subject to a task (emotion-recognition, verbal ability, mathematical ability, and mental rotation) while Group 2 will be asked their beliefs about the performance of males and females in each task.
H1: Females perform (and are expected to perform) better than males in the face-recognition task.
H2: Females are expected to perform better (but do not perform better) than males in the verbal task.
H3: Males perform (and are expected to perform) better than females in the mathematical task.
H4: Males are expected to perform better than females in the rotation task.
Participants in Group 2 may or may not be paid for their belief accuracy:
H5: If there is not social desirability bias (or demand-effect), both incentivized and non-incentivized Group 2 participants will report the same beliefs regarding the performance of males and females in the four tasks.
Finally, we will conduct the experiment with students and participants in Prolific:
H6: Students in online experiments do not behave differently than participants in Prolific.

3) Describe the key dependent variable(s) specifying how they will be measured.
Subjects confronted with the task (Group 1) will need to complete a multiple-choice questionnaire (15 items) during 15 minutes. Expected performance of males/females will be measured by their number of correct answers. Once subjects finish the task, we will ask whether they expect for males/females to perform better in the task without incentivizing their choices.

Group 2 participants (in both the incentive and no-incentive condition) will be asked their beliefs on the performance of a group of 100 males/females that were subject to the task (2 tasks will be presented to each observer, randomizing the order of the tasks and whether they are first asked on the performance of males or females of participants in Group 1).

To collect the expectations of Group 2 participants, we will present them with 5 different categories; in particular, we will ask observers the number of males/females that they expect to have “less than 20% of correct answers (lowest performance) / 21-40% of correct answers / 41-60% of correct answers / 61-80% of correct answers / More than 80% of correct answers (highest performance)”. By this design choice, we will have a discrete distribution of expected performance. In addition, Group 2 participants will be asked how sure they are about their answers. We will include a simple question to assess whether they “expect for males to perform better / for females to perform better / no differences in performance between males and females”. As a result, expectations will be elicited both using distributions and a simple question.

4) How many and which conditions will participants be assigned to?
Each participant will be assigned to one of the following conditions:
• Group 1 (Task): Each participant will need to complete 1 of the 4 tasks above (emotion-recognition, verbal ability, mathematical ability, and mental rotation). We will have the same number of males and females in each task. After performing the task, subjects will be asked whether they expect for males/females to perform better in the task (distribution + simple question).
• Group 2 (no incentives). After explaining the task that Group 1 participants have been subject to (and which they will not be asked to complete), Group 2 will be asked their expectations (distribution + simple question) regarding the performance of males/females who performed the task (Group 1). Each participant will be presented with two different tasks – these will be randomly selected per Group 2 participant. These subjects will not be paid for their expectations.
• Group 2 (incentives). As in the previous condition, but participants will be paid for the accuracy of their expectations. In particular, we will use a linear scoring rule so that participants receive a fixed payment for each category they have guessed correctly and would be penalized depending on the absolute value of the difference (in percentual points) between the share of females/males that perform in a given category and the belief reported by the participant.

5) Specify exactly which analyses you will conduct to examine the main question/hypothesis.
Analysis 1. Test if the performance of women differs from the performance of men in each task. The dependent variable is the number of correct answers in each task. We will present cumulative distribution functions (cdf) for each task and will use the Epps and Singleton (1986) test (ES, hereafter) to analyse if the underlying distributions of performance differs across gender. We will also undertake OLS regressions with and without covariates. The key variable of interest is the female dummy that would tell us if women perform differently to men in that particular task.

Analysis 2. Test if the beliefs about the performance of men and women are different in each task. We will present cdf plots for the beliefs of performance and test using Chi-square and ES if the men are women are expected to perform differently. We will also present data on the percentage of times that the null hypothesis, “H0: the underlying distributions of beliefs for the two groups, men and women, are equal” is rejected, and will check differences in the tails by looking at number of males/females they expect to have “61-80% of correct answers” and “More than 80% of correct answers (highest performance)”. Test if the proportion of men/women in 61-80% and >80% exceeds 50%. We will also look at the responses from a simple question “expect for males to perform better / for females to perform better / no differences in performance between males and females”. Histogram with the relative frequencies of each answer. Test using Chi-square if the relative frequencies for “expect for males to perform better” differ from “expect for females to perform better”.

Analysis 3. Test for the presence of social desirability, i.e. if the beliefs under incentives are more consistent with our hypotheses 1-4 than the beliefs elicited without incentives. As in Analysis 2 but the cdf plots and the analysis will consider the beliefs for male/females with and without incentives.

Analysis 4. Differences in the behavior of students and online participants in Prolific We do not plan to perform any direct test but we will present the data as two separate studies.

The research team will conclude and summarize findings based on three significant levels: 1%, 5% and 10%.

6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.
We will check for multiple IPs before analyzing the data. In case we find multiple IPs we will only consider the first observation. Participants who do not complete the entire experiment (or do not consent to participate) will not be paid for their participation nor their data will be analyzed.

7) How many observations will be collected or what will determine sample size?
No need to justify decision, but be precise about exactly how the number will be determined.

The research team collected pilot data between September 24th and October 19th, 2020 on expected performance for all four tasks (with no monetary incentives). Based on pilot findings, assuming an 80% statistical power and a 5% significance level, the require sample size is at least 35 observations per tasks to be powered for the text and facial recognition tasks. This calculation assumes a 23 and 49 percentage point difference between females and males, respectively. It also assumes that there is no social desirability bias. Because of financial constraints, this study will be underpowered for maths and rotation tasks (if there is no social desirability bias). The total sample for this study will be 40 observations per task, 20 males and 20 females for experiments run with students. The sample size would be of at least 150 observations per task for experiments conducted in Prolific (depending on budget constraint).

8) Anything else you would like to pre-register?
(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

We will test whether going through the task affects the expectation of subjects, thus we will compare the expectations reported by Group 1 with those reported by Group 2 (no incentives) for each of the tasks.

We would also like to test if the belief’s distributions about the performance of men’s and women’s in each task depends on the gender of the individual.

Finally, using the elicited distributions of expectations, we will analyze whether people tend to report the mode or mean of expectations when using simpler questions.

Our experiment will recruit participants using two platforms: online recruitment of students from the University of Valencia (Spain) and Prolific. Same questionnaires and experimental protocol will be used in both. Our questionnaires include the elicitation sociodemographic variables (e.g., age, education or political orientation) that will be used as control in our analyses. Spanish and English-translated questionnaires will be used for this experiment.

Version of AsPredicted Questions: 2.00