This is an anonymized version of the pre-registration. It was created by the author(s) to use during peer-review.

A non-anonymized version (containing author names) should be made available by the authors when the work it supports is made public.

No, no data have been collected for this study yet.

Hypothesis 3. "Differences between everyday behaviors in the degrees to which they are pleasurable, loud, and aggressive, and take up space, are largely independent of the sample of raters used."

Hypothesis 4. "The overall-appropriateness rank order of everyday behaviors is largely independent of (a) the sample of raters and (b) the sample of situations."

We study 37 behaviors in 15 situations. To avoid fatigue, we only ask participants to rate each of 10 behaviors (selected at random from the 37) on its possession of each of the four characteristics (e.g., "How loud is it to eat?") on a five-point Likert scale (strongly disagree, somewhat disagree, neither agree nor disagree, somewhat agree, strongly agree), coded from 1 to 5. Further, we only ask participants to rate the appropriateness of each of the 10 behaviors in each of 5 situations (selected at random from the 15); e.g., "How appropriate is it for someone to eat on the bus?". Following Price and Bouffard (1974), ratings are given on a ten-point scale from 0 to 9 where 0 = The behavior is extremely inappropriate in this situation and 9 = The behavior is extremely appropriate in this situation.

No conditions.

We test Hypothesis 3 using four different splits of the full sample of 400 raters into two non-overlapping subsamples, "A" and "B": male vs. female, above vs. below 40 years, with vs. without college degree, and liberal vs. conservative ideological affiliation. For every split we aggregate characteristic-possession ratings per behavior in each subsample. For a given characteristic, we accept the hypothesis if, in every split, we find a Pearson correlation between the "A" ratings and the "B" ratings of at least 0.71 (corresponding to at least 50% of the variance of aggregated ratings in one subsample accounted for by the aggregated ratings in the other subsample).

We test Hypothesis 4a by the same method as Hypothesis 3, substituting overall-appropriateness for characteristic-possession and aggregating over the subsample of raters as well as over the full sample of situations. We accept the hypothesis if, in every split, we find a Pearson correlation between the "A" ratings and the "B" ratings of at least 0.71.

To test Hypothesis 4b in a corresponding way, we need to repeatedly split the full sample of situations into non-overlapping subsamples. As there are no corresponding demographic variables for situations, we make four random splits of the15 situations into an "A" sample of size 7 and a "B" sample of size 8. We aggregate appropriateness ratings over the subsample of situations as well as over the full sample of raters. and use them to calculate two separate overall-appropriateness ratings of each behavior by aggregating their ratings across all situations in the subsample and across the entire sample of raters. We accept the hypothesis if, in every split, we find a Pearson correlation between the "A" ratings and the "B" ratings of at least 0.71.

In the primary analysis we will not exclude any data. Secondary analyses may investigate reasons to exclude participants and whether it affects results.

No need to justify decision, but be precise about

400 participants.

(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

No.

This pre-registration is part of a bundle of similar and/or related pre-registrations sharing at least one author. When a pre-registration in a bundle is shared with reviewers or made public, all of them are. Links to all other pre-registrations in the bundle are listed below:

#118571 - https://aspredicted.org/9G7_TK4 - Title: 'US Everyday behavior H1 & H2'