#55338 | AsPredicted

'Isolation clusters'
(AsPredicted #55338)


Author(s)
This pre-registration is currently anonymous to enable blind peer-review.
It has 3 authors.
Pre-registered on
2021/01/05 - 05:12 AM (PT)

1) Have any data been collected for this study already?
It's complicated. We have already collected some data but explain in Question 8 why readers may consider this a valid pre-registration nevertheless.

2) What's the main question being asked or hypothesis being tested in this study?
H1: Isolated nouns (compared to nouns in multiword utterances) occur closer to other instances of the same noun (that is, isolated words and repetition travel together in caregiver input).
H2: Nouns that occur more often in isolation clusters (i.e., clusters of repetitions of a noun that contain isolated instances) are produced at an earlier age by a higher proportion of infants, even after controlling for both isolation and repetition frequency.

3) Describe the key dependent variable(s) specifying how they will be measured.
DV1: Distance from a given noun instance to the closest instance of the same noun (either preceding or following the current instance) – computed as the number of caregiver utterances between the two noun instances in a CHILDES transcript.

DV2a: Proportion of children producing a given noun (based on child vocabularies in Wordbank)
DV2b: Normative age of acquisition of the noun – computed as the age at which 50% of children produce this noun, based on Wordbank vocabularies.

4) How many and which conditions will participants be assigned to?
Isolated: A noun instance will be marked as isolated if the noun is the only word in an utterance.
Repetition cluster: A noun instance will be counted as a part of a repetition cluster if it occurs at least 3 times within 6 consecutive caregiver utterances
Isolation cluster: An isolation cluster will be defined as a repetition cluster containing at least one isolated instance.

5) Specify exactly which analyses you will conduct to examine the main question/hypothesis.
Analysis 1: Mixed effect model predicting the number of utterances between a noun instance and it closest repetition (DV1) with a fixed effect of isolation and random intercepts and slopes per noun, speaker and conversation (transcript). If this model does not converge, we will remove random slopes, followed by random intercepts from this model, until the model converges, and we will select the best fitting model that converges for the data analyses. The anova function in R will be used to compare models.
Analysis 2: We will use a regression predicting the normative age of acquisition of each noun from the log frequency of noun instances occurring in isolation clusters, controlling for the log frequency of the noun occurring in isolation and within repetition clusters. Log frequency will be computed based on the full CHILDES set of transcripts.
Exploratory Analysis 1: We will use a beta regression predicting the proportion of children producing each noun at a given age from the log frequency of the noun occurring in isolation clusters in input from caregivers addressing children younger than that age (controlling for log frequency in isolation and in repetition clusters).
Exploratory Analysis 2: We will test how the log frequency of isolation clusters changes as a function of the age of the child in CHILDES.
Exploratory Analysis 3: We will assess contributions of child noun productions to patterns of isolation, repetition, and isolation clusters. We will do so in two ways. First, we will descriptively report the proportion of isolation clusters (operationalized as above) that also contain child utterances using the target noun. Second, we will use a mixed effect model predicting the number of utterances (both caregiver and child) between a given caregiver noun instance and the closest child noun production with fixed effects of isolation, repetition cluster, and of isolation cluster. The same random effects and procedure for model selection as Analysis 1 will be followed.

6) Describe exactly how outliers will be defined and handled, and your precise rule(s) for excluding observations.
Only CHILDES transcripts of caregivers addressing North-American, English-learning, monolingual, typically developing children during free play at home or in the lab will be included. Transcripts that focus on children from different populations or of tasks other than free play will be excluded.

7) How many observations will be collected or what will determine sample size?
No need to justify decision, but be precise about exactly how the number will be determined.

28 corpora of North American English on CHILDES. The sample size was determined based on all the corpora in CHILDES that met our exclusion criteria (see 6).

8) Anything else you would like to pre-register?
(e.g., secondary analyses, variables collected for exploratory purposes, unusual analyses planned?)

Some of these hypotheses are based on prior pilot analyses on a subset of the dataset (i.e., a portion of the 28 CHILDES corpora we plan to analyze), thus this preregistration primarily serves as an intermediate data analysis plan.

Version of AsPredicted Questions: 2.00