Experience-driven recalibration of learning from surprising events

Different environments favor different patterns of adaptive learning. A surprising event that in one context would accelerate belief updating might, in another context, be downweighted as a meaningless outlier. Here, we investigated whether people would spontaneously regulate the influence of surprise on learning in response to event-by-event experiential feedback. Across two experiments, we examined whether participants performing a perceptual judgment task under spatial uncertainty (n = 29, n = 63) adapted their patterns of predictive gaze according to the informativeness or uninformativeness of surprising events in their current environment. Uninstructed predictive eye movements exhibited a form of metalearning in which surprise came to modulate event-by-event learning rates in opposite directions across contexts. Participants later appropriately readjusted their patterns of adaptive learning when the statistics of the environment underwent an unsignaled reversal. Although significant adjustments occurred in both directions, performance was consistently superior in environments in which surprising events reflected meaningful change, potentially reflecting a bias towards interpreting surprise as informative and/or difficulty ignoring salient outliers. Our results provide evidence for spontaneous, context-appropriate recalibration of the role of surprise in adaptive learning.


Introduction
Different environments involve different sources of uncertainty, with profound implications for how decision makers should respond to surprising events. An event that violates one's expectations might be interpreted either as a meaningless outlier or a sign of fundamental change. For example, suppose you supervise a reliable employee who one day misses work without notice. Whether you dismiss the event as a one-off or modify your predictions about the employee's future reliability will depend on your background knowledge and assumptions about the sources of uncertainty in this context. Previous research has approached the problem of inference and prediction in dynamic environments by making a distinction between expected uncertainty and unexpected uncertainty (Payzan-LeNestour, Dunne, Bossaerts, & O'Doherty, 2013;Payzan-LeNestour & Bossaerts, 2011;Soltani & Izquierdo, 2019;A. J. Yu & Dayan, 2005). Expected uncertainty relates to noisy event-to-event fluctuations that are compatible with one's current understanding of how events are generated. A less-than-fully-predictable event, like rolling a six-sided die and getting a six, need not drive updates to one's understanding of the outcome-generating mechanism. Unexpected uncertainty relates to events that do not conform to the known unreliability of predictive cues (such as rolling a six-sided die and getting a seven) and signal a need to update one's beliefs about the underlying context.
Previous research on adaptive learning has largely focused on scenarios in which events that signify fundamental change tend to be more extreme and surprising than events that reflect expected uncertainty. For example, in spatial prediction paradigms with "change point" structure, the generative statistics that govern target positions occasionally undergo discrete shifts. Targets subsequent to a change point tend to be associated with high prediction error. (Throughout this paper, we use "prediction error" to refer to the spatial distance between a target's predicted and actual locations. This usage is distinct from the concept of "reward prediction error" in reinforcement learning and instead is more similar to what has sometimes been called "state prediction error"; Gläscher, Daw, Dayan, & O'Doherty, 2010). In environments with change-point structure, an approximately Bayesian theoretical account  holds that a larger prediction error should lead to a higher inferred probability that a change point has occurred, resulting in a higher trial-specific learning rate (that is, a large belief update expressed as a proportion of the prediction error). A related theoretical proposal holds that learning is gated by the associability of a stimulus, which scales with surprise (Li, Schiller, Schoenbaum, Phelps, & Daw, 2011;Pearce & Hall, 1980). However, larger-magnitude prediction errors need not always lead to higher learning rates (O'Reilly et al., 2013;L. Q. Yu, Wilson, & Nassar, 2021). In some situations, good decisions require maintaining stable beliefs in the face of salient but non-predictive events, such as when an investor avoids overreacting to a transient market downturn. People can adapt to environments in which the ideal learning rate is equal or lower for surprising events relative to more moderate events (Cheadle et al., 2014;d'Acremont & Bossaerts, 2016;Lee et al., 2020;Nassar et al., 2019;O'Reilly et al., 2013;Summerfield & Tsetsos, 2015). Downregulating learning in response to non-predictive "oddball" events has been suggested to require vigilance and cognitive control (d'Acremont & Bossaerts, 2016), although other evidence suggests that actively updating beliefs requires additional cognitive resources compared with holding them stable (O'Reilly et al., 2013).
Given that surprising events can either upregulate or downregulate learning in different contexts, an outstanding question is to what degree adaptive learning undergoes spontaneous, experience-driven recalibration to track the current context's higher-order statistics. Calibration of adaptive learning, a form of metalearning (Griffiths et al., 2019;Wang, 2021), has previously been documented in human behavioral paradigms in which experimental participants were explicitly tasked with optimizing and reporting their predictions about upcoming events and received explicit description, training, and/or event-specific cues to ensure they clearly understood the relevant sources of uncertainty (d'Acremont & Bossaerts, 2016;Nassar et al., 2019;O'Reilly et al., 2013). In this previous work, participants successfully adapted their behavior to their environment, showing different patterns of learning rate modulation in different contexts. However, it is unclear to what degree the successful adaptation was attributable to descriptive information/instructions or to direct experience with environmental statistics.
Similar questions arise for current theoretical models of adaptive learning. Recent work has shown that a neural network model could be configured either to increase or decrease learning in response to extreme events, effectively treating such events as meaningful change points in one environment and meaningless oddballs in another (Razmi & Nassar, 2022). The model achieved this behavior by associating observations with a latent representation of mental context that evolved with different dynamics in the two environments. These dynamics were built into each version of the model a priori; that is, the model was endowed with knowledge about each environment's statistical structure, raising the question of whether such structural knowledge originates from explicit description, experiential learning, or both. Here, we investigated whether people would acquire patterns of adaptive learning solely through experience, and whether such patterns, once established, would be revised in response to a higher-order change in the experienced statistical structure.
The present work made use of an eye-tracking paradigm we developed and validated in a previous study, which used predictive gaze to measure spontaneous belief updating in the absence of overt instructions (Bakst & McGuire, 2021). The paradigm capitalized on the idea that prediction is a natural aspect of moment-to-moment cognition (Clark, 2013;Hayhoe, McKinney, Chajka, & Pelz, 2012;Henderson, 2017;Summerfield & De Lange, 2014). While making perceptual judgments about briefly presented targets, participants' eye movements tended to anticipate target onsets, which were temporally predictable but spatially uncertain. Despite not having been informed about the behaviors of interest or the optimal strategy, participants exhibited well-calibrated predictive inference and adapted to a manipulation of expected uncertainty in a change-point environment (Bakst & McGuire, 2021).
Here, across two experiments, we extended this eye-tracking paradigm beyond change-point environments to test the hypothesis that people would spontaneously adapt to the informativeness or uninformativeness of surprising events. In Experiment 1, we used a betweenparticipant design to contrast environments in which surprise reflected either meaningful change or meaningless outliers. In Experiment 2, using a within-participant design, we tested people's ability to adjust their behavioral strategy when the environment's structure reversed. This allowed us to investigate the possibility of an asymmetry in the difficulty of the two environments, such that participants would adapt more easily to environments in which surprising outcomes warranted updating or to environments in which surprising outcomes were better ignored.

Participants
All human participant procedures were approved by the Boston University Institutional Review Board. Informed consent was obtained from all participants. Participants were recruited as a convenience sample from the Boston University community (Experiment 1: N = 29, 18 female and 11 male, Age: mean = 20.5, range 18-24; Experiment 2: N = 63, 48 female and 15 male, Age mean = 20.3, range 18-31). All participants had normal or corrected-to-normal vision. Nine additional participants were excluded (6 in Experiment 1 and 3 in Experiment 2): two due to technical problems, three due to task accuracy below 60%, and four due to more than half of their trials not meeting eccentricity thresholds as explained below. The sample size for Experiment 1 was determined based on effect sizes derived from pilot data, while the sample size for Experiment 2 was determined based on effect sizes observed in Experiment 1, both using a power of 80% and a significance threshold of 5%. For Experiment 2, the sample size, exclusion criteria, and key analyses were preregistered with AsPredicted (aspredicted.org/ cd54j.pdf).

Task
In two experiments, participants performed an implicit spatial prediction task based on Bakst and McGuire (2021) ; Fig. 1A) programmed in Python using PsychoPy (Peirce et al., 2019) while their eye movements were tracked (EyeLink 1000+ desk-mounted eye tracker, SR Research Ltd., Osgoode, Canada). Gaze position was collected monocularly at 1000 Hz. Head position was stabilized using a chinrest positioned 57 cm from the display (BENQ XL2430 with a resolution of 1920 × 1080), and the eye tracker was calibrated at the beginning of each run. An additional post-hoc calibration was performed following data collection (see below). The nominal task was to report whether briefly presented numerical digits were even or odd in order to earn reward. The digits were presented in varying locations, implicitly requiring participants to anticipate the location of the next digit so that they could use central vision to make the odd/even judgment.
Each digit appeared between two flanking Xs for 180 ms before being backward-masked by another X. The participant then had unlimited time to respond by pressing "1" with their left hand for "odd" or "0" with their right hand for "even." Accuracy feedback (a filled or empty circle signifying a correct or incorrect response) was then displayed in the same location as the digit for 500 ms, followed by a 750-ms inter-trial interval. Thus, there was a fixed 1250 ms interval between the keypress and the appearance of the next digit, making targets temporally predictable.
Two types of trials, central and peripheral, occurred in alternating order. Central trials displayed the digit at the center of the screen, which was marked by a small white point throughout the task. Peripheral trials displayed the digit somewhere on the perimeter of a circle centered on the screen with a radius of 6.9 • of visual angle, which was marked throughout the task with a white outline. The purpose of including central trials was to recenter gaze at a point equidistant from all possible digit locations prior to each peripheral trial.
Two conditionschange point (CP) and random walk (RW) -governed the peripheral digit locations. Average spatial predictability was similar between the two conditions but the sequential contingencies differed. In the CP condition, each digit's location was drawn from a Gaussian distribution with a fixed width (σ = 11.25 • in angular distance around the A digit appeared between flanking Xs for 180 ms before being masked by a third X. The participant then had unlimited time to indicate via a keypress whether the digit was even or odd. Accuracy feedback was displayed at the same location for 500 ms, followed by a 750-ms inter-trial interval (ITI). The total reward earned in each run was represented by the width of a bar at the bottom of the screen. Each run began with a central trial, and alternated between central and peripheral trials. (B) Task conditions. In the Change Point (CP) condition, small changes in digit location from one peripheral trial to the next tended to represent noise around a stable mean, whereas large changes in digit location tended to represent meaningful shifts in the underlying position of the Gaussian. In the Random Walk (RW) condition, small changes in digit location tended to indicate meaningful shifts in the underlying Gaussian distribution, whereas large changes tended to represent meaningless outliers. circle) and a mean that usually remained fixed from one peripheral trial to the next but was resampled from a uniform distribution spanning the entire circle at occasional unsignaled change points. The generative mean was not resampled during a two-trial refractory period after each change point, and was resampled with a probability of 0.167 thereafter, leading to an overall change point probability of approximately 0.125. In the RW condition, digit locations were usually drawn from a Gaussian distribution (σ = 11.25 • ) with a mean equal to the digit's location on the previous peripheral trial. This random walk around the circle was punctuated by occasional uniformly distributed outliers, which occurred at the same frequency as change points in the CP condition. In contrast to the CP condition, these noisy outliers did not reset the mean of the Gaussian; the subsequent peripheral digit location was drawn from a Gaussian distribution centered on the location of the peripheral digit prior to the outlier.
Therefore, in both conditions, peripheral digits had a high probability of appearing at a Gaussian-distributed location near the previous generative mean and a lower probability of appearing at an arbitrary new location anywhere on the circle. The two conditions differed in the predictive significance of large versus small movements. Small trial-totrial differences in the CP condition tended to reflect noise around a stable mean, whereas large differences indicated a meaningful shift. In contrast, small trial-to-trial differences in the RW condition reflected meaningful shifts in the mean, whereas large differences represented non-predictive outliers (Fig. 1B).
Experiment 1 used a between-subject design in which each participant experienced only one condition, performing four 8-min runs of either the CP or RW condition. Experiment 2 used a within-subject design in which each participant experienced three 8-min runs of one condition followed by three 8-min runs of the other, with the order counterbalanced across participants. Participants were not informed that the conditions changed, or that there were, in fact, multiple conditions at all. They were only told that they would complete four 8-min runs in Experiment 1, and six 8-min runs in Experiment 2.
Because run duration was based on time rather than number of trials (and the duration of individual trials depended on the participant's response latency), participants varied in the total number of trials they completed. In Experiment 1, participants completed an average of 456 peripheral trials in total across all four runs (range 382-515). They completed 462 peripheral trials on average in the CP condition (range 432-515), and 450 trials on average in the RW condition (range 382-484). In Experiment 2, participants completed an average of 687 trials in total across all six runs (range 556-780). They completed 345 trials on average in the CP condition (range 283-393) and 342 trials on average in the RW condition (range 258-393).
Monetary compensation for both studies consisted of a show-up payment of $12 in addition to a bonus payment that scaled with the mean proportion of correct odd/even responses (95% accuracy resulted in a bonus of an additional $12 and payments were rounded to the nearest $0.25). A horizontal bar centered at the bottom of the screen increased in length proportional to the total number of correct responses in each run.
Following the experiment, participants were asked to respond to an open-ended question that asked, "What do you think the study was about?" Answers including any reference to digit locations/patterns or prediction were considered evidence of explicit awareness of task structure. Two independent raters assessed the responses and any discrepancies were resolved through discussion. Eleven individuals (38%) in the first experiment and 30 individuals (47%) in the second experiment were coded as having evidence for explicit awareness. Explicit awareness was not associated with odd/even judgment accuracy or predictive gaze accuracy in either experiment (all p ≥ 0.544, Wilcoxon rank sum test).

Post-hoc calibration
Calibration of the eye tracker was performed at the start of each run. However, small, consistent deviations were often observed between gaze position and visual targets. We therefore performed a post-hoc calibration of gaze position (Supplemental Fig. 1). The post-hoc calibration step was developed to remove structured error from gaze position estimates, although our results did not change materially if it was omitted.
Using gaze position at the time of peripheral digit appearance (which served as our measure of predictive beliefs), we calculated the residuals according to the following: where D=6.9 • of visual angle, the eccentricity of peripheral digits. Measured gaze position was decomposed into its x and y components B x and B y relative to screen center, and X adj and Y adj represent additive posthoc calibration adjustments. We then found the adjustments that minimized the following expression using fminsearch (Matlab): where n is the number of residuals. We only included trials on which gaze position was closer to the peripheral circle than the center (eccentricity >3.45 • of visual angle) so as to avoid basing the calibration on beliefs that lingered near the center of the screen. This procedure was completed for each run separately. Runs were only recalibrated if they included >40 trials that met the eccentricity-based inclusion criterion. Two additional runs were excluded from recalibration after manual inspection revealed large, unexpected shifts in gaze position. For X adj , the mean across participants from both experiments was − 0.086 • , with a range from − 4.908 • to 6.862 • . Y adj had an average of 0.018 • with a range from − 2.757 • to 3.621 • . On average, the residuals per run decreased by 0.23 • following post-hoc calibration.

Learning rate analyses
A participant's predictive belief on each peripheral trial, B t , was operationalized as the gaze position at the time of digit onset, quantified in terms of angular position on the peripheral circle. The learning rate on trial t (LR t ) was estimated as the belief update from trial t to t+1, scaled by the prediction error on trial t: where B represents gaze-derived belief, and X is the angular position of the digit. The numerator and denominator were circular differences with a possible range of ±180 • . Analyses only included trials in which gaze was nearer to the peripheral circle than to the center (eccentricity >3.45 • of visual angle), to exclude beliefs inaccurately represented by the angle on the peripheral circle. Three participants in Experiment 1 and one participant in Experiment 2 were excluded entirely because fewer than half their trials met this eccentricity threshold. Analyses assessed the relationship between per-trial learning rates and the absolute value of prediction errors. Median learning rate for each participant was calculated for different prediction error bins (bin edges were 0 • , 5 • , 10 • , 15 • , 20 • , 25 • , 35 • , 45 • , 75 • , 105 • , 135 • , and 180 • ). This procedure was repeated separately for trials in "Early" and "Late" epochs, defined as the first half (four minutes) of the first run and second half (four minutes) of the last run, respectively. In Experiment 1, this analysis compared the first half of Run 1 to the second half of Run 4. In Experiment 2, it compared the first half of Run 1 to the second half of Run 3 for the pre-reversal phase, and the first half of Run 4 to the second half of Run 6 for the post-reversal phase.
Additionally, a linear model with a probit link was fit to learning rate as a function of prediction error magnitude across individual trials, separately for each participant, with learning rate estimates constrained to the range − 0.5 to 1.5. Note that the probit function can take on values from 0 to 1, matching the theoretical range of learning rates. We allowed the input data (the empirical per-trial learning rate estimates) to retain a larger range to avoid excessively truncating the measurement error distribution for the least-squares fit, but limited the range to avoid the inclusion of extreme values > > 1 or < < 0.
The fmincon function in Matlab (Mathworks, Natick, MA) was used to determine the least-squares best-fitting probit coefficients, using bounds of ±3 for the intercept term (β 0 ) and ± 0.05 for the slope term (β 1 ). The intercept was constrained so the resulting learning rate intercept could span nearly the full range from zero and one, while the slope was limited to allow for functions of varying steepness but not so steep as to approximate a step function. This was repeated separately for Early and Late epochs.
Subsequent analyses focused solely on change point or outlier trials, on which digit locations were sampled from a uniform distribution. These analyses were restricted to digits that appeared more than two standard deviations from the previous generative mean (>22.5 • in angular distance) to focus only on extreme events. Outlier learning rate estimates were removed beforehand, defined as being outside the group interquartile range (IQR) by >1.5 times the IQR (which generally removed trials with learning rate estimates > > 1 or < <0, comprising about 2% of trials).
We calculated each participant's overall median learning rate for extreme events separately for the entire task experience, as well as the Early and Late epochs. The median Late-epoch learning rate per participant was considered their "final LR." The change in learning rate (ΔLR) was computed as the Late-minus-Early median learning rate for extreme events per participant. Participants with no valid learning-rate estimates for extreme events during the Early or Late epoch were excluded from this analysis (n = 3 in Experiment 1, n = 15 in the prereversal phase of Experiment 2, and n = 1 in the post-reversal phase of Experiment 2). The retained participants had a median of three extreme events included in the Early epoch and five in the Late epoch.
The time course of metalearning was estimated by taking the group median and standard error of the learning rate for each successive extreme event experienced. Data were plotted until the latest point at which at least 50% of participants per group contributed data. A larger number of extreme events were available for Experiment 1 than for the pre-reversal phase of Experiment 2 because participants completed an additional run of the task. Additionally, a larger number of extreme events were available post-reversal than pre-reversal in Experiment 2 because response times tended to decrease with experience in the task, and participants therefore encountered more trials in the second half of the experiment. We also fit probability distributions to each participant's set of extreme-event learning rates using a Gaussian kernel (bandwidth = 0.15).
The following comparisons were preregistered as primary confirmatory analyses in Experiment 2: (1) comparison of pre-reversal "final LR" between groups (to serve as a replication of Experiment 1); (2) comparison of post-reversal ΔLR between groups and against zero in each group separately (to test for recalibration after the reversal point); and (3) comparisons of both "final LR" and ΔLR between participant groups within each condition (to test for differences as a function of whether the condition was encountered first or second).

General task performance
We used an implicit spatial prediction paradigm (Bakst & McGuire, 2021) across two experiments to evaluate how successfully people could learn through experience to interpret surprising events either as meaningful changes or random outliers (Fig. 1A). Participants were given a nominal task of reporting whether briefly presented numerical digits were even or odd. The task implicitly led participants to make anticipatory saccades to the predicted locations of upcoming digits on the screen and to update their spatial predictive beliefs continually in response to new observations. The task had two conditions: In the Change Point (CP) condition, peripheral digit locations were drawn from a one-dimensional Gaussian distribution on the perimeter of a large circle (σ = 11.25 • of angular distance; see Fig. 1A) with the mean of the Gaussian distribution occasionally resampled from a uniform distribution spanning the entire circumference of the circle. Peripheral trials alternated with central trials (see Fig. 1A), which served to recenter the participant's gaze prior to each prediction. Small changes in digit location from one peripheral trial to the next therefore tended to represent noise around a stable mean, whereas large changes in digit location tended to represent meaningful shifts in the underlying position of the Gaussian (Fig. 1B). To maximize accuracy, participants in the CP condition ideally should converge on an increasingly precise estimate of the current generative mean and downregulate learning from small errors to avoid chasing noise, but should transiently raise their learning rate in response to large errors reflective of change points (Nassar et al., 2010).
In the Random Walk (RW) condition, the contingencies were reversed: the Gaussian-sampled digit location on one trial became the generative mean for the next trial. This random walk was punctuated by occasional outliers drawn from a uniform distribution spanning the circumference of the circle. Small changes in digit location therefore tended to indicate meaningful shifts in the underlying Gaussian distribution, whereas large changes tended to represent meaningless outliers. To maximize performance, participants in the RW condition ideally should fully update their beliefs in response to small errors but should downregulate learning from extreme events. Participants performed the nominal task with high accuracy for both central and peripheral trial types in both the CP and RW conditions. Mean accuracy for odd/even judgments in Experiment 1 was 82% (SD = 5.6%) for peripheral trials and 98% (SD = 1.4%) for central trials (Fig. 1D), with no significant differences between conditions (Wilcoxon rank sum, both p ≥ 0.616,). Participants were also successful at the uninstructed task of predicting digit locations. Because task events were temporally predictable, we used gaze position at the time of digit appearance as an indication of the participant's prediction (Bakst & McGuire, 2021). Data from a representative participant are shown in Fig. 1C. Gaze was directed near the appropriate eccentricity for both central and peripheral trials (Fig. 1C, top panel).
Examining peripheral trials only, the angle of the prediction was near to the subsequent digit location for both the CP and RW conditions (Fig. 1C, middle and bottom panels, respectively). The mean prediction error was 28.2 • of angular distance (SD = 4.7 • ) for peripheral trials, and 0.84 • of visual angle (SD = 0.53 • ) for central trials (Fig. 1E), with no significant differences between conditions (Wilcoxon rank sum, both p ≥ 0.230). As expected, both odd/even task accuracy and predictive gaze accuracy decreased transiently around extreme events (Supplemental Fig. 2).

Adaptive learning: Experiment 1 3.2.1. Effects of prediction error magnitude
We tested whether participants modulated their learning rate across trials as a function of prediction error magnitude. Per-trial learning rate was empirically estimated as the gaze-based prediction update (the distance between predictions on trials t and t + 1) expressed as a proportion of the gaze-based prediction error (the distance between the predicted and observed target locations on trial t). We hypothesized that in the CP condition, participants would use higher learning rates for larger prediction errors, which tended to reflect meaningful change points. In the RW condition, in contrast, larger prediction errors tended to reflect noisy outliers and should be associated with lower learning rates.
Because initial task instructions were identical in the CP and RW conditions, any systematic differences in behavior presumably emerged through metalearning over time, as participants used direct experience with the task's statistics to recalibrate the rate at which they learned from individual events. To examine calibration of learning over time, we repeated the procedure for Early and Late task epochs ( Fig. 2A, middle and right panels), defined as the first and last four minutes of the task.
To facilitate a quantitative test of the effects depicted descriptively in Fig. 2A, we fit probit functions to each participant's per-trial learning rates as a function of prediction error magnitude (Fig. 2B, left). Individual participant fits are shown in dotted lines and group medians are shown in thicker lines. Slopes differed significantly between the CP and RW groups over the full task (p = 0.014, Wilcoxon rank sum test), with the CP group showing a median slope of 0.009 (interquartile range [IQR] = 0.002, 0.037) and RW a median of − 0.001 (IQR = − 0.006, 0.002).
Early behavior showed minimal differences between the two groups ( Fig. 2B,  However, the magnitude of the change in slope from Early to Late did not differ between groups (Wilcoxon rank sum p = 0.326), implying there was no evidence of faster metalearning in one context than the other.

Effect of extreme events
We focused our next analyses on non-Gaussian extreme events, which represented change points in the CP condition and outliers in the RW condition. Analyses were restricted to digits that were sampled from the uniform distribution and appeared more than two standard deviations from the previous Gaussian generative mean (> 22.5 • of angular distance) to ensure we focused only on events that were distinguishable from Gaussian samples. In addition to being of theoretical interest, extreme events allowed for more accurate measurement of empirical learning rates compared with less-extreme events. Because per-trial learning rate estimates were calculated as a ratio with prediction error in the denominator, the estimates were less susceptible to measurement error on large-prediction-error trials.
To evaluate performance over time, we looked at extreme-event learning rates over the course of the task, with time expressed in terms of the cumulative number of extreme events encountered (Fig. 3A). The CP and RW groups exhibited different trajectories of learning rate as a function of experience. The overall difference in learning rate between the two conditions can be summarized in terms of the median extreme-event learning rate per participant (Fig. 3B). The median learning rate was significantly greater in the CP condition than in the RW condition (Wilcoxon rank sum p < 0.001; CP median [IQR] = 0.99 [0.97, 1.01]; RW = 0.21 [0.04, 0.74]). Extreme-event learning rates also displayed a bimodal within-subject distribution in both groups, with a peak centered at the optimal behavior for each condition (Supplemental Fig. 3).
Next, we evaluated whether the two groups differed in their final learning rate (median learning rate for extreme events during the Late epoch; Fig. 3C). In the CP condition, the median final learning rate was 1.00 (IQR = 0.95, 1.02), which was not significantly different from the optimal learning rate of 1 (Wilcoxon signed rank p = 0.670). In comparison, in the RW condition, the median final learning rate was 0.08 [0.004, 0.65], which was significantly greater than the optimal learning rate of 0 (Wilcoxon signed rank p = 0.002,) and significantly lower than the final learning rate for the CP group (Wilcoxon rank sum p < 0.001).
Finally, we quantified the amount of metalearning as the change in learning rate for extreme events (ΔLR) over the course of the task (Fig. 3D). We estimated ΔLR by calculating each participant's median learning rate for extreme events in Early and Late epochs and taking the difference (Late minus Early). For each group individually, the difference was not significantly different from zero (  , likely because of consistently wellcalibrated behavior over time in the CP condition and high interindividual variability in the amount of metalearning in the RW condition. The signed value of ΔLR differed significantly between the two groups (Wilcoxon rank sum p = 0.025) and its magnitude did not (Wilcoxon rank sum p = 0.857).

Adaptive learning: Experiment 2 3.3.1. Replication
To further investigate experience-driven metalearning, we conducted a preregistered second experiment in which the conditions reversed midway through each participant's experimental session. Each participant experienced three 8-min runs of one condition (CP or RW) followed by three runs of the other, with the order counterbalanced across participants. As before, participants were not informed about the structure of the task or the variables of interest.
Pre-reversal data from the first three runs provided an opportunity to replicate the analyses reported above for Experiment 1. In examining learning rate as a function of prediction error magnitude, we obtained similar results (see Fig. 4A for per-participant fits analogous to Fig. 2B and see Supplemental Fig. 4 for binned mean-of-median learning rates analogous to Fig. 2A). Probit slope coefficients summarizing the overall effect of prediction error magnitude on learning rate differed significantly between the two groups (Wilcoxon rank sum p < 0.001; Fig. 4A, left). The CP group showed a median slope of 0.009 (IQR = 0.003, 0.029; p < 0.001, Wilcoxon signed rank test against zero) and the RW group  Fig. 4A right). The change in slope between the Early and Late epochs differed between groups (Wilcoxon rank sum p = 0.002) but the magnitude of the change did not (Wilcoxon rank sum p = 0.093).
Patterns of learning from extreme events also matched those seen in Experiment 1. Both groups demonstrated marked metalearning (Fig. 5A), with a significant difference between groups in overall median learning rate (Wilcoxon rank sum p < 0.001; Fig. 5B). A preregistered replication analysis showed that final learning rates in the pre-reversal phase differed significantly between groups (Wilcoxon rank sum p < 0.001; CP median [IQR] = 0.99 [0.94, 1.04]; RW median [IQR] = 0.07 [− 0.004, 0.41]; see Fig. 5C). The final learning rate for the CP group was not significantly different from the optimal rate of 1 (Wilcoxon signed rank p = 0.254), whereas the final learning rate in the RW group was significantly greater than the optimal rate of 0 (Wilcoxon signed rank p < 0.001). Signed ΔLR significantly differed between the two conditions (Wilcoxon rank sum p < 0.001; Fig. 5D) but the magnitude of ΔLR did not (Wilcoxon rank sum p = 0.085). We again observed a bimodal pattern of individual trial learning rates in both conditions similar to that seen in Experiment 1 (Supplemental Fig. 3).

Post-reversal behavior
When the CP and RW conditions reversed halfway through the experimental session, participants were given no explicit indication that the environment had changed. We investigated whether they would revise their understanding of the task's statistical structure through experience alone.
In examining learning rate as a function of prediction error magnitude, slopes in the post-reversal block as a whole did not significantly differ between groups (Wilcoxon rank sum p = 0.532), with both groups  Fig. 4B, right), driven primarily by persistent suboptimal behavior in the group that experienced the RW condition after the reversal point. Post-reversal Late-epoch slopes did not significantly differ from zero in the RW condition (Wilcoxon signed rank p = 0.421), but were significantly positive in the CP condition (Wilcoxon signed rank p < 0.001).
In assessing learning from extreme events, participants who switched from the RW condition to the CP condition (RW ➔ CP) showed a substantial increase in their learning rate shortly after the reversal point, indicative of rapid metalearning (Fig. 5A). In comparison, participants who experienced CP and then RW (CP ➔ RW) showed more gradual metalearning post-reversal. They appeared to maintain their previous beliefs about the environment for an extended period before gradually adapting their behavior.
To assess whether the order of the conditions affected behavior, we compared the overall median extreme-event learning rate for each condition between groups (comparing one group's pre-reversal behavior to the other group's post-reversal behavior; Fig. 5B). Learning rates in the CP condition did not differ between groups (Wilcoxon rank sum p = 0.326), while those in the RW condition did (Wilcoxon rank sum p < 0.001; pre-reversal median [IQR] = 0.14 [0.05, 0.36], post-reversal = 0.94 [0.61, 0.98]), with pre-reversal learning rates closer to the optimal value of zero.
Analyses of final learning rates (extreme-event learning rates in the final 4 min of each phase; Fig. 5C) provided further evidence that participants' ability to calibrate to the RW environment was impacted by previous experience in the CP environment. In a pre-registered analysis, there was a significant difference in final learning rates between participants who experienced the RW condition in the pre-versus postreversal phases. The pre-reversal group showed better-calibrated behavior ( . In another pre-registered analysis, the change in learning rate (ΔLR, the difference between Early and Late epochs of each phase) did not significantly differ between the pre-reversal and post-reversal phases within either condition (Wilcoxon rank sum, both p ≥ 0.092; Fig. 5D), and both were significantly different from zero in the post-reversal phase (Wilcoxon signed rank, both p ≤ 0.013). Additionally, the signed ΔLR differed between groups in the post-reversal phase (Wilcoxon rank sum p < 0.001), while its magnitude did not (Wilcoxon rank sum p = 0.762).

Discussion
Across two experiments, we investigated the extent to which participants spontaneously used experience with the statistics of their environment to regulate the influence of surprise on learning. Learning was assessed via predictive eye movements while participants performed a perceptual judgment task under spatial uncertainty. Gazebased predictions exhibited successful metalearning, tending to approach context-appropriate patterns of learning rate modulation over time. At the same time, learning rates for extreme events were generally better calibrated in a context in which large prediction errors were associated with meaningful change (CP) compared to one in which they were not (RW). We found asymmetric order effects: participants who experienced the RW condition second showed worse performance compared to those who saw RW first, whereas order had no effect on performance in the CP condition.

Learning initial task structure
Our findings agree with previous results from some explicit prediction tasks, which demonstrated better performance in conditions in which surprise was meaningful than in conditions in which surprising events needed to be ignored (d'Acremont & Bossaerts, 2016;Nassar et al., 2019). This lends support to the idea that it is more cognitively taxing to ignore salient outliers than to update beliefs in response to informative surprise (d'Acremont & Bossaerts, 2016). However, the amount of metalearning (measured in terms of within-session change) was generally similar between conditions. This could suggest that performance differences were driven by initial assumptions about the environment. Participants may have held an initial bias or default towards assuming large predictions errors indicated meaningful change. A related possibility is that maintaining stable beliefs across an intervening outlier entails elevated working memory demands. Such a default could be adaptive insofar as, in real-world environments, there might be more dire consequences for erroneously ignoring events that differ from expectations than for erroneously overweighting them.
Questions remain as to whether other task contexts or framings could evoke different patterns of initial assumptions and inductive biases. For instance, O' Reilly et al. (2013) found additional behavioral costs associated with belief updating compared with merely reacting to isolated surprising events. Future work should examine the flexibility of default strategies and the cues relevant to assessing the relative costs of different patterns of learning. Future work should also examine how metalearning might be altered by task manipulations of arousal, motivation, or incentives (Jepma et al., 2018;Nassar et al., 2012;Urai, Braun, & Donner, 2017).
We noted substantial differences in behavioral policies across individuals, especially early on in the task. Behavior tended to converge over time, showing decreasing inter-individual variability in later epochs of the session. Why participants have such a variety of initial strategies is an open question. Whether it relates to differing interpretations of the experimental context, broader influence of their recent state and/or experiences, or trait-level factors (Browning et al., 2015;Kraus et al., 2021) has yet to be identified.

Learning after environmental change
In our second experiment, the conditions underwent an unsignaled reversal halfway through, and participants revised their behavioral strategy in line with the new statistical structure they encountered. The trajectory of metalearning appeared to exhibit condition-specific order effects. Participants took longer to adapt to the RW condition if they had previously experienced the CP condition than if they saw the RW condition first. In comparison, behavior in the CP condition did not appear to depend on condition order. If participants tended to hold initial biases towards a CP-like strategy, then experiencing the CP condition could have served to reinforce those biases, making them yet more difficult to overcome when the participant was suddenly thrust into the RW condition.

Theoretical implications
From a theoretical perspective, our results imply that calibration of adaptive learning can be guided by experience in the absence of descriptive information about the outcome-generating process. This conclusion is congruent with findings that statistical experience can shape learning and decision processes in a variety of domains (Constantino & Daw, 2015;McGuire & Kable, 2012;Ossmy et al., 2013;Schweighofer et al., 2006). Our findings support a class of models in which patterns of belief updating are guided by higher-order beliefs about environmental structure (Razmi & Nassar, 2022;Yu et al., 2021) and highlight the need to extend such models to incorporate experiential structure-learning or metalearning processes (Griffiths et al., 2019;Wang, 2021).
Findings from the present work will provide useful constraints for further model development. A successful theory of structure learning will need to account for the observation that behavior more readily calibrated (and recalibrated) to an environment in which surprise reflected change points than to one in which surprise reflected nonpredictive outliers. A possibility that merits further investigation is that the asymmetry in metalearning between the two conditions might be rooted in an asymmetric performance cost of miscalibration. A CPoptimized agent could be reasonably successful in the RW environment by treating an uninformative outlier as if it were two sequential change points. In contrast, an RW-optimized agent might suffer more prolonged performance costs in the CP environment. Future theoretical work might also explore parallels to other contexts in which asymmetric belief updating has been observed; for example, in updating general semantic knowledge, people are better at learning that previously nonbelieved statements are true than at learning that previously believed statements are false (Yang, Stone, & Marsh, 2022).
A related goal for future theoretical work is to identify computational parameters that enable some individuals to adjust their behavioral policy rapidly and successfully while others maintain suboptimal strategies for extended periods. Previous modeling work has demonstrated the varying influence of factors such as surprise and environmental volatility on participant behavior (Behrens et al., 2007;Lee et al., 2020;McGuire et al., 2014;Nassar et al., 2010;Nassar et al., 2019) and further efforts to model metalearning could help explain the sources of the behavioral patterns we identified (d'Acremont & Bossaerts, 2016; Nassar et al., 2019).

Limitations
There are limitations on our ability to generalize our findings to other situations and participant populations. An intriguing possibility is that different task contexts might evoke different patterns of bias and flexibility in metalearning. For example, our study focused solely on rapid, momentary decisions. Whether the type of spontaneous metalearning observed here would generalize to slower, deliberative processes is currently unknown. In addition, given that our participant samples were drawn from the Boston University community, the extent to which the results generalize across populations, cultures, or clinically defined groups is an open question. Though we would expect the visual system's propensity for predictive gaze to hold across populations, the relevant prior beliefs and the dynamics of feedback-driven cognitive flexibility could differ. Finally, while we observed clear evidence of adaptive learning in participants' behavioral responses to extreme events, it was not possible to reliably assess behavior in response to small prediction errors. Precise assessment of the rate of learning from small errors would require a different measurement approach and potentially also a different incentive scheme, given that our task did not require predictive gaze to be perfectly accurate to maintain a high level of performance (Bakst & McGuire, 2021).

Conclusions
Two experiments demonstrated that participants calibrated patterns of learning rate modulation to the structure of their environment through experience alone. Participants displayed metalearning, adapting to the informativeness of surprising events in an initial context and readjusting after the environment's statistics reversed. The findings motivate new questions about sources of bias and individual variation in the cognitive processes that guide learning, prediction, and decision making in complex environments.

Data availability
Task code and raw data are available via the Open Science Framework at https://osf.io/5pmhg/ and analysis code is available upon request.