Volume 58, Issue 2 p. 2523-2546
RESEARCH REPORT
Open Access

A machine learning approach towards the differentiation between interoceptive and exteroceptive attention

Zoey X. Zuo

Corresponding Author

Zoey X. Zuo

Department of Psychological Clinical Sciences, University of Toronto Scarborough, Scarborough, Ontario, Canada

Correspondence

Zoey X. Zuo, Graduate Department of Psychological Clinical Science, University of Toronto Scarborough, 1265 Military Trail, Scarborough, Ontario M1C 1A4, Canada.

Email: [email protected]

Search for more papers by this author
Cynthia J. Price

Cynthia J. Price

Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle, Washington, USA

Search for more papers by this author
Norman A. S. Farb

Norman A. S. Farb

Department of Psychological Clinical Sciences, University of Toronto Scarborough, Scarborough, Ontario, Canada

Department of Psychology, University of Toronto Mississauga, Mississauga, Ontario, Canada

Search for more papers by this author
First published: 11 May 2023

Edited by: Bernard Balleine

Abstract

Interoception, the representation of the body's internal state, plays a central role in emotion, motivation and wellbeing. Interoceptive sensibility, the ability to engage in sustained interoceptive awareness, is particularly relevant for mental health but is exclusively measured via self-report, without methods for objective measurement. We used machine learning to classify interoceptive sensibility by contrasting using data from a randomized control trial of interoceptive training, with functional magnetic resonance imaging assessment before and after an 8-week intervention (N = 44 scans). The neuroimaging paradigm manipulated attention targets (breath vs. visual stimuli) and reporting demands (active reporting vs. passive monitoring). Machine learning achieved high accuracy in distinguishing between interoceptive and exteroceptive attention, both for within-session classification (~80% accuracy) and out-of-sample classification (~70% accuracy), revealing the reliability of the predictions. We then explored the classifier potential for ‘reading out’ mental states in a 3-min sustained interoceptive attention task. Participants were classified as actively engaged about half of the time, during which interoceptive training enhanced their ability to sustain interoceptive attention. These findings demonstrate that interoceptive and exteroceptive attention is distinguishable at the neural level; these classifiers may help to demarcate periods of interoceptive focus, with implications for developing an objective marker for interoceptive sensibility in mental health research.

Abbreviations

  • BOLD
  • blood oxygen level dependent
  • BrainIAK
  • Brain Imaging Analysis Kit
  • DAN
  • dorsal attention network
  • dmPFC
  • dorsomedial prefrontal cortex
  • EPI
  • echo-planar-imaging
  • fMRI
  • functional magnetic resonance imaging
  • GAD
  • generalized anxiety disorder
  • IEAT
  • Interoceptive/Exteroceptive Attention Task
  • MABT
  • Mindful Awareness in Body-oriented Therapy
  • MAIA
  • Multidimensional Assessment of Interoceptive Awareness
  • MDD
  • major depressive disorder
  • moPFC
  • medial orbital prefrontal cortex
  • pACC
  • perigenual anterior cingulate cortex
  • PHQ-SADS
  • Patient Health Questionnaire–Somatic, Anxiety and Depressive Symptoms
  • PSS
  • Perceived Stress Scale
  • PTSD
  • posttraumatic stress disorder
  • SIAT
  • Sustained Interoceptive Attention Task
  • TE
  • echo time
  • TI
  • inversion time
  • TR
  • repetition time
  • UW
  • University of Washington
  • V1
  • primary visual cortex
  • V5
  • middle temporal visual area
  • vmPFC
  • ventromedial prefrontal cortex
  • 1 INTRODUCTION

    Interoception, the sense of the body's internal state, is widely regarded as a foundation for emotion (Wiens, 2005), motivation (Craig, 2003), intuition (Dunn et al., 2010) and wellbeing (Tsakiris & Critchley, 2016). While interoceptive signals such as respiration, heart rate, temperature or hunger are processed automatically to promote homeostasis in the body (Craig, 2002; Gu & FitzGerald, 2014), these signals also serve as a foundation for feeling states that guide consciously coordinated behaviour (Strigo & Craig, 2016). As such, interoceptive function has recently become the target of psychological theory casting the interoceptive sense at the heart of health and disease, characterizing interoception as affectively privileged and distinct from the exteroceptive senses of vision, hearing, taste, smell and touch (Farb et al., 2015; Khalsa et al., 2018; Quadt et al., 2018).

    Dysregulation of interoceptive processing has been linked to a variety of pathological conditions such as anxiety (Domschke et al., 2010), depression (Dunn et al., 2007; Furman et al., 2013; Terhaar et al., 2012), posttraumatic stress disorder (PTSD) (Glenn et al., 2016), somatic symptom disorders (Barsky et al., 2001), addiction (Naqvi & Bechara, 2010), obesity (Herbert & Pollatos, 2014) and chronic pain (Pollatos et al., 2012). Such conditions are strongly associated with abnormal interoceptive sensibility, the self-reported ability to sustain attention towards interoceptive signals (Calì et al., 2015; Mehling, 2016; Trevisan et al., 2019). Furthermore, changes in interoceptive ability mediate reductions in depression symptoms following evidence-based treatments (de Jong et al., 2016; Eggart & Valdés-Stauber, 2021). Such psychometric approaches converge with qualitative appraisals of treatment response: Women treated for substance use disorder using Mindful Awareness in Body-oriented Therapy (MABT) perceived interoceptive awareness as critical for emotional awareness, regulation and relapse prevention, and such endorsements were linked to symptom reduction (Price & Smith-DiJulio, 2016).

    However, interoceptive sensibility suffers from a lack of quantifiable measurement relative to measures of interoceptive accuracy, such as counting heartbeats over time (Brener & Ring, 2016). The promise of objective measurement has researchers to define interoceptive awareness as awareness of accuracy rather than focusing on sensibility as a marker of interest (Critchley & Garfinkel, 2017). Heartbeat detection accuracy appears unrelated to interoceptive sensibility (Ferentzi et al., 2018), and it is sensibility rather than accuracy that correlates with subjective wellbeing (Ferentzi et al., 2018; Schuette et al., 2021). Given the understandable desire to rigorously quantify mechanistic markers of health and vulnerability, there is utility in developing an objective method for assessing interoceptive sensibility.

    To objectively assess interoceptive sensibility, we must first have confidence in our ability to distinguish interoceptive attention from other mental states. Advances in the classification of brain states provide an emerging possibility for this enterprise. Time-series data from functional neuroimaging, such as functional magnetic resonance imaging (fMRI), are often analysed using machine learning algorithms to identify meaningful brain patterns (e.g., Haxby, 2012; Norman et al., 2006) such as predicting which traumatic film scenes became intrusive memories (Clark et al., 2014). Compared to univariate methods, which seek to localize specific areas of neural response, machine learning approaches aggregate these responses to estimate their predictive utility (Davatzikos, 2019). Applied to the classification of interoceptive attention, a machine learning approach could quantitatively assess a person's ability to sustain interoceptive awareness, regardless of that person's level of insight or confidence.

    Considerable research suggests that interoceptive processing in the brain is at least partially distinct from engagement with the exteroceptive senses. Interoception is supported by a dedicated neuroanatomical pathway, with signals transmitted via sense-receptor C-fibre afferents along the spinal cord to the brainstem and thalamus before reaching the cerebral cortex (Craig, 2002; Critchley & Harrison, 2013). Interoceptive attention appears to engage the ventromedial thalamus and right posterior insula (Farb et al., 2013), which serves as the primary interoceptive cortex (Craig, 2002; Flynn, 1999). Supporting the feasibility a classification approach, Weng et al. (2020) first documented the utility of machine learning to distinguish between sensory and conceptual targets of internally directed attention (i.e., focus on the breath, the self or mind wandering). The authors used a logistic regression model to show that participants' voluntary direction of attention may be sufficient to generate reliable classification results.

    Conversely, interoceptive attention has yet to be classified as distinct from attention towards the exteroceptive senses. While a set of frontoparietal brain regions constitute a well-validated dorsal attention network (DAN), which is broadly involved in the regulation of perceptual attention (Dixon et al., 2018; Szczepanski et al., 2013). It is plausible that the DAN also supports attention for interoceptive signals, given that interoceptive experiences are usually multimodal combinations of interoceptive and exteroceptive afferents (Khalsa et al., 2009; Quigley et al., 2021). Furthermore, despite evidence that the anterior insula facilitates subjective access to body sensations, this integration may not be modality specific (Craig, 2009; Critchley et al., 2004; Gu & FitzGerald, 2014). Instead, the anterior insula seems to integrate both interoceptive and exteroceptive signals (Medford & Critchley, 2010; Seth et al., 2012) and serves as the sensory/afferent hub of the salience network (Seeley, 2019), which could then provide a common ‘neural code’ to higher level cognitive processes such as the regulation of perceptual attention supported by the DAN.

    Our primary aim was therefore to determine whether BOLD activity in these attentional networks (and beyond) contains sufficient information to distinguish between interoceptive and exteroceptive attentional states. We applied a machine learning approach to explore classification between interoceptive and exteroceptive attention, using a recently developed fMRI paradigm (Farb et al., 2023). Data were collected as part of a randomized clinical trial of validated mindfulness-based intervention (MABT), which features unique focus on teaching fundamental skills critical to identifying, accessing, sustaining and appraising signals that arise within the body (Price & Hooven, 2018). The Interoceptive/Exteroceptive Attention Task (IEAT) used for classification analysis focused on four conditions: active interoception and exteroception, which involved continuous button-press tracking of respiration and a visual stimulus, respectively, and passive interoception and exteroception, which involved passive monitoring of respiration and visual targets in the absence of behavioural tracking. We also employed a well-validated self-report instrument to assess the influence of clinically relevant individual differences in interoceptive sensibility.

    To evaluate classifier sensitivity to interoceptive training effects, we followed classifier training and validation with an application to estimate attentional states during periods of sustained interoceptive attention. We had three aims in this study:
    1. Distinguish between the neural patterns of interoceptive and exteroceptive attention to understand if interoceptive attention involves distinct processes from exteroceptive attention.
    2. Predict periods of interoceptive and exteroceptive attention using out-of-sample tests to determine the robustness of the models.
    3. Apply the classifier to estimate momentary attentional states during sustained attention to explore interoceptive training effects and correspondence with self-reported interoceptive awareness and affective distress based on self- and clinician- reports.

    The clinical trial was registered with ClinicalTrials.gov (NCT03583060) and pre-registered with the Open Science Framework (OSF; https://osf.io/y34ja). Under the overarching clinical trial, machine learning was pre-registered as an exploratory analysis to classify different experimental conditions and predict sustained interoceptive attention. All study materials and code are available on the OSF (https://osf.io/ctqrh/). Univariate analysis of the IEAT task is described in a separate manuscript that is published as a preprint (https://www.biorxiv.org/content/10.1101/2022.05.27.493743v3), including quality control analyses and demonstrations of equivalent difficulty between task conditions. This study was reviewed and approved by the institutional review board at the University of Washington in accord with the World Medical Association Declaration of Helsinki.

    2 METHODS

    2.1 Research protocol

    This study was conducted in the context of a 2 (group: MABT vs. control) × 2 (session: baseline vs. post-intervention) randomized control trial of MABT to investigate training-related neural changes in the brain. Participants were assessed at baseline before being randomized to the MABT or the control condition and reassessed within 4 weeks of completing the 8-week MABT intervention period. At both baseline and post-intervention assessments, all participants completed a 20-min self-report questionnaire that surveyed body awareness and symptoms of distress. Participants then completed a series of fMRI scans, including standard anatomical scans, an IEAT and a Sustained Interoceptive Attention Task (SIAT).

    2.2 Participants

    Participants were recruited through postings on the University of Washington (UW) Institute of Translational Health Sciences website, flyers on campus and advertisements in local newspapers. Advertisements described the study as a mind–body investigation into the neural processes of body awareness for people with moderate stress. Participants provided written-informed consent and were compensated for their time. All assessments and data collection took place at the UW Integrated Brain Imaging Center.

    Primary inclusion criteria for the study included (1) over 18 years of age, (2) score on the Perceived Stress Scale (PSS) meeting screening cut-off for moderate stress, (3) no prior experience with mindfulness-based trainings, (4) agrees to forgo mind–body therapies for the duration of the study, (5) fluent in English, (6) able to attend all study sessions and (7) right-handed. Primary exclusion criteria included (1) current diagnosis of any mental health disorder, (2) unable to participate in all sessions, (3) cognitive impairment, (4) head injury involving loss of consciousness, (5) pregnant or (6) MRI contraindications.

    Fifty-seven participants were recruited. Twenty-four participants were excluded for reasons listed in the supporting information. Twenty-three remaining participants were randomly assigned to receive MABT (n = 12) or no-intervention to serve as the control (n = 11). Out of these participants, 22 completed the required assessments (11 MABT and 11 controls) and were included in the analyses (11 males and 11 females; age range: 18–62 years, mean age = 36.1 years; 20 self-identified as Caucasian, 1 as African American, 2 as Hispanic; highest education: 5 with high school degrees, 2 with 2 years of college, 8 with Bachelor's degrees, and 7 with Master's degrees or higher).

    2.2.1 Power analysis

    While machine learning approaches test categorization accuracy at the within-participant level, showing that accuracy is significantly better than chance levels can be assessed at the group level as a one sample t test. Furthermore, decoding of sustained attention following classifier training could also be evaluated in terms of Group × Time interactions. Given prior observation of large effects of MABT on subjective interoception (Price et al., 2019), this study was designed to detect medium to large effects (d > .6, f > .3). Power analysis conducted in G*Power software suggested that this sample size would be sufficient to detect such effects with 80% power. At the p < .05 significance level, detecting above-chance accuracy (a one sample t test of accuracies scores for all participants) would require N ≥ 19, and detecting Group × Time interactions on decoding results would require N ≥ 18. These analyses do not model power to achieve accurate classification within a given participant's data, as power analysis methods for MRI are still under development, and no prior data are available given the use of a novel task. Similarly, the study is not power to detect medium sized or weaker effects, so any non-significant effects must be interpreted with caution rather than as evidence of null results.

    2.3 MABT

    MABT uses an incremental approach to help build comfort and skills needed to develop and facilitate interoceptive awareness (Price & Hooven, 2018). The approach was delivered individually using the manualized eight-session protocol developed for research. This protocol involves three phases: sessions 1 and 2 focus on body literacy, sessions 3 and 4 on interoceptive training and sessions 5–8 on development of sustained mindful interoceptive attention and somatic appraisal processes. A take-home practice is collaboratively developed at the end of each session to facilitate integration of interoceptive awareness in daily life. Eight weekly 75-min individual sessions were delivered at the UW School of Nursing Clinical Studies Unit by one of two licenced massage therapists trained in the MABT protocol. Protocol compliance was monitored through audio recording of sessions, process evaluation forms and ongoing clinical supervision. Participants were given up to 10 weeks to complete all eight sessions to accommodate schedule conflicts. All MABT participants completed at least 75% of the sessions (i.e., at least six sessions): Eight participants completed all eight sessions, two completed seven sessions and one completed six sessions. Therapists monitored participants' progress with quantitative ratings and qualitative descriptions on the process evaluation forms completed after each MABT session.

    2.4 fMRI tasks

    FMRI data were collected during a novel IEAT and a SIAT at both baseline and post-intervention. In each assessment session, there were two fMRI scans (i.e., two runs). Each task was administered once in each functional scan. Altogether, participants performed each task four times in this study, twice at baseline and twice at post-intervention. Throughout the fMRI tasks, participants' respiration data were recorded using a Philips MRI Respiratory Sensor Air Bellows, model number 452213117812.

    2.4.1 Interoceptive/Exteroceptive Attention Task

    The novel IEAT (Farb et al., 2023) consisted of five conditions: passive exteroception, passive interoception, active interoception, active exteroception, and active matching as shown in Figure 1. These conditions varied in terms of reporting demand (active reporting vs. passive watching) and attentional target (interoceptive vs. exteroceptive attention).

    Details are in the caption following the image
    Schematics for the Interoceptive/Exteroceptive Attention Task (IEAT). In the exteroceptive conditions, participants watched a circle expand and contract; in the interoceptive conditions, participants paid attention to their inhalation and exhalation. In the passive conditions, participants simply observed the circle or their breath; in the active conditions, participants pressed buttons when the circle expanded or contracted and when they inhaled or exhaled. In the matching condition, participants reported on the circle's movements while intentionally matching their breathing to the circle's movements.

    Each condition started with a 10-s instruction screen followed by a 30-s task period. All conditions were order-counterbalanced and repeated twice in each functional run. Altogether, 6.7 min of data were collected in each run and 13.4 min in both runs.

    Passive conditions

    During passive exteroception, participants were asked to visually monitor a circle as it expanded and contracted periodically on the MRI-compatible visual display without making any behavioural responses. The circle's pulse frequency was set to match the participants' estimated in-scanner breathing frequency (usually around 12 Hz). During passive interoception, participants viewed a stationary circle on the screen while attending to sensations of the breath.

    Active conditions

    During active interoception, participants were asked to report on their inhalations and exhalations by making key presses with their right-hand index and middle fingers, respectively. The circle on the screen also responded to these key presses, approximating the frequency of circle movement during passive exteroception. During active exteroception, participants were asked to report on the expansion and contraction of the circle on the screen, which again was set to pulse at participants' in-scanner respiratory frequency.

    Active matching condition

    During active matching, participants were asked to report on the expansion and contraction of the circle (as in active exteroception) by making button presses while matching their inhalation to the circle's expansion and their exhalation to the circle's contraction. Together, these five experimental tasks were developed to address the limitations of prior interoception paradigms. However, the goal of the present study was to directly classify between factors of attentional target (interoception vs. exteroception) and reporting demands (active vs. passive monitoring). Because the active matching condition required aspects of both interoceptive and exteroceptive attention, it was not used in the classification models that are the focus of this paper but is reported in a forthcoming univariate analysis paper (https://www.biorxiv.org/content/10.1101/2022.05.27.493743v3).

    2.4.2 Sustained interoceptive attention task

    Immediately before fMRI data acquisition, participants listened to a 2.5-min audio-guided interoceptive awareness meditation in the scanner, directing them to place a hand on their chest and channel mindful attention to the inner space of the chest underneath their hand. After the guided meditation, participants were instructed to sustain attention on inner body awareness for 3 min with their eyes closed during fMRI data acquisition. This procedure was repeated across two runs at baseline and two runs at post-intervention to yield a total of four scans, that is, 12 min of fMRI data.

    2.5 Questionnaire measures

    During the baseline and the post-intervention sessions, participants answered a self-report questionnaire consisting of the Multidimensional Assessment of Interoceptive Awareness (MAIA), the Patient Health Questionnaire–Somatic, Anxiety and Depressive Symptoms (PHQ-SADS) and the PSS. In addition, the MABT therapists rated participants' capacity for sustained interoceptive attention over the second half of the training period (sessions 5–8). For the descriptive statistics of these self-report measures, see our Open Science Framework page (https://osf.io/ctqrh/).

    2.5.1 Multidimensional Assessment of Interoceptive Awareness

    The MAIA is a 32-item self-report questionnaire used to assess interoceptive body awareness (Mehling et al., 2012). It consists of eight scales each measuring an aspect of interoceptive awareness: noticing, not-distracting, not-worrying, attention regulation, emotional awareness, self-regulation, body listening and trust. These scales have good evidence of internal-consistency reliability with alphas ranging from .66 to .82 and good evidence of construct validity as assessed by inter-scale correlations as well as differential scores between individuals who were expected to have higher or lower body awareness.

    2.5.2 Composite affective symptom burden

    A composite affective symptom burden score was obtained based on responses on the PHQ-SADS (Kroenke et al., 2010) and the PSS (Cohen et al., 1983). The PHQ-SADS is a 37-item self-report questionnaire consisting of the PHQ-9 depression scale, PHQ-15 somatic symptom scale and Generalized Anxiety Disorder (GAD)-7 anxiety scale (Kroenke et al., 2010). All three scales have good evidence of internal-consistency reliability (α = .80 to .92), test–retest reliability (r = .60 to .84) and good sensitivity and specificity to detect depression, anxiety and somatic symptoms. The PSS is a 10-item self-report questionnaire used to assess how feelings and perceived stress levels are affected by various situations (Cohen et al., 1983). A review study showed that the PSS has good internal-consistency reliability (α > .70 in all 12 studies evaluated) and test–retest reliability (r > .70 in all four studies evaluated), although criterion and known-groups validity need to be further evaluated (Lee, 2012). We extracted the first principal component from affective symptom scales PHQ-SADS and PSS, which explained 67.2% of the overall variance. A simulation run using the ‘paran’ library confirmed that one factor was sufficient to explain the variances and was used as a composite affective symptom score for this study.

    2.5.3 Therapist rating: Capacity for sustained interoceptive attention

    In MABT sessions 5–8, therapists rated participants' capacity for sustained interoceptive attention on a scale of 0–5 based on their observation: 0 = none, 1 = momentary, 2 = fluctuating in and out (being in the state for brief time, i.e., less than 3 min), 3 = steady contact for many minutes, 4 = fluctuating in and out (being in the state for longer periods, i.e., more than 3 min) and 5 = sustained contact (10–30 min).

    2.6 Data analysis

    2.6.1 Imaging data acquisition and preprocessing

    Neuroimaging data were collected using a 3 T Philips Achieva scanner (Philips Inc., Amsterdam, Netherlands) at the Diagnostic Imaging Sciences Center, University of Washington. Imaging began with the acquisition of a T1-weighted anatomical scan (MPRAGE) to guide normalization of functional images with repetition time (TR) = 7.60 ms, echo time (TE) = 3.52 ms, inversion time (TI) = 1,100 ms, acquisition matrix = 256 × 256, flip angle = 7°, shot interval = 2530 ms and 1 mm isotropic voxel size. Functional data were acquired using a T2*-weighted echo-planar-imaging (EPI) sequence with TR = 2,000, TE = 25 ms, flip angle α = 79°, field of view = 240 × 240 × 129 mm, 33 slices and a voxel size of 3 × 3 × 3.3 mm with 3.3 mm gap. Button presses were registered using a two-button MR-compatible response pad.

    Neuroimaging data preprocessing was performed using the fMRIPrep pipeline 20.0.6 (Esteban et al., 2019) (see the supporting information for full details). Preprocessing consisted of realignment and unwarping of functional images, slice timing correction and motion correction. The functional images were resliced using a voxel size of 2 × 2 × 2 mm and smoothed using a 6-mm FWHM isotropic Gaussian kernel.

    2.6.2 Analysis software

    The Python Language (Python Software Foundation, https://www.python.org/) was used primarily for machine learning analysis. In-house code was developed with reference to BrainIAK (the Brain Imaging Analysis Kit, http://brainiak.org; Kumar et al., 2020, 2022). The scikit-learn package was used for machine learning analysis (Pedregosa et al., 2011), the nilearn package for brain maps (Abraham et al., 2014) and the seaborn package for data visualization (Waskom, 2021). The R Language (R Core Team, 2020) was also used for statistical analyses. The lme4 package was used for statistical modelling (Bates et al., 2015) and the ggplot2 package for data visualization (Wickham, 2016).

    2.6.3 Aim 1: Distinguish—Within-sample classification of interoceptive versus exteroceptive attention

    We aimed to classify neural patterns to distinguish between states of interoceptive and exteroceptive attention. To do so, we trained machine learning classifiers on fMRI data when participants engaged in interoceptive and exteroceptive attention, assessing how accurately these states could be separated and which brain regions contributed to the separation.

    Specifically, we used fMRI blood-oxygen-level-dependent (BOLD) data collected in four conditions: active interoception, active exteroception, passive interoception and passive exteroception. An active matching condition was not analysed in this study, because it blended interoceptive and exteroceptive attention and was therefore deemed unsuitable for distinguishing between these two processes. We conducted the analyses in three steps. First, we combined active interoception and passive interoception into an interoceptive condition and active exteroception and passive exteroception into an exteroceptive condition to examine the gross differences between interoception and exteroception regardless of the reporting demand. Second, we examined active interoception, active exteroception, passive interoception and passive exteroception as four separate conditions, considering both the attentional target (interoception vs. exteroception) and the reporting demand (active tracking vs. passive monitoring). Third, we focused on differences between active interoception and active exteroception to eliminate the confounds in passive interoception, the only condition in which the circle stimulus remained stationary on the screen. Focusing on the active conditions also allowed for classification in a context of high participant engagement, as participants were required to continuous attend to the circle/breath stimulus to meet the reporting demands of the tasks. All three steps were conducted in a similar classification workflow as demonstrated in Figure 2.

    Details are in the caption following the image
    Machine learning classification workflow. First, four-dimensional fMRI BOLD data were reshaped into a two-dimensional voxel × timepoints matrix. Then, part of the data was used to train the machine learning model and the other part to test the model. The model used training data to learn weights associated with each voxel and used these weights to predict labels for the test data. Cross validation as performed by assigning different chunks of the dataset as the test set. An accuracy score was calculated by averaging the prediction accuracy across all test sets.

    Classification was performed at an individual participant level to maximize classification accuracy and account for individual anatomical and functional differences in the brain. For each participant, the four-dimensional fMRI data were reshaped into voxel-by-timepoint matrices. IEAT task-related timepoints were extracted from the full timecourse of the scans. A whole-brain mask was applied to the data to retain voxels that fell within the brain. We used this data-driven whole-brain approach to identify any brain regions that might drive the separation of interoception and exteroception without making a priori assumptions about which regions might be critical in the process.

    We implemented a penalized logistic regression with L2 regularization (i.e., Ridge regression) with reference to methods used by Weng et al. (2020) to classify internal states of attention during meditation. Regularization in general penalizes the overfitting of data and reduces the likelihood of models over-learning from the training data to the extent that they fail to generalize to out-of-sample data. We selected L2 regularization over other methods such as L1 regularization (i.e., Lasso regression) because L2 regularization retained more important features (i.e., voxels) and would therefore reveal more brain regions involved in interoceptive and exteroceptive processes. Many other machine learning algorithms have been used in fMRI studies (see Rashid et al., 2020, for a review). In our pilot testing, we compared L2 regularization to other commonly used algorithms such as sparse multinomial logistic regression, Gaussian Naïve Bayes, XGBoost and singular value decomposition linear regression but failed to see any evidence of superior classification. This lack of distinction therefore led us to proceed with our planned use of the L2 regularization algorithm.

    A k-fold cross-validation method was used to train and evaluate the performance of the classifier models. Each participant's voxel-by-timepoint matrix in each study session was split into a training set and a test set. In the training set, a logistic regression model used the brain activation values of each voxel at each timepoint and the true label of the experimental condition. Each voxel was assigned a weight that indicated how much evidence it provided for or against an experimental condition. Then, the classifier used these weights learned from the training set to predict the experimental condition of each timepoint in the novel held-out test set. These predictions were evaluated against the true experimental condition labels to derive a measure of classification accuracy. The classifiers were trained and tested within baseline data and within post-intervention data, respectively. This in-sample within-session classification allowed us to evaluate how well the classifiers differentiated the mental states in the same experimental session. We split each session's data into fivefolds and ran cross validation by training on four folds and testing on the fifth held-out fold (i.e., iteratively training on 80% and testing on 20% of the data). This process was repeated until all fivefolds had been used as the test fold once. An average accuracy score was obtained across the five iterations.

    For each participant at each assessment, we applied the binomial theorem to analyse whether the classification accuracy was significantly greater than chance at the p < .05 threshold. At each session, the four conditions were comprised of 60 volumes each over the two functional runs: (1 volume/2 s) × (30 s/block) (2 blocks/run) × 2 runs × 4 conditions = 240 volumes in total. For the two condition models, the chance probability of successfully classifying a given functional volume was 50% (i.e., choosing one out of two options at random). The binomial probability distribution suggested that each participant was required to achieve classification accuracy above 55.4% to be considered significantly above chance (1 − cumulative probability < .05). For the four condition models, the chance probability was 25%, which by the binomial theorem required classification accuracy above 29.6% to be significantly above chance.

    Group level classification maps were then created to identify important brain regions. For each participant, voxels that had a major contribution to the classification were identified: Voxels whose weights were above 2 standard deviations of the mean weight were assigned a value of 1, and those whose weights were below 2 standard deviations of the mean weight were assigned a value of −1. All participants' important maps were overlaid to create a group-level importance map in which higher absolute values indicated more discriminative voxels across participants.

    One possible confound of the fMRI BOLD signal classification for interoception and exteroception was participants' respiration during the task. Therefore, as a control analysis, we submitted participants' respiration rates for each block as features to the L2 regularization classifier to match the classification based on BOLD signals, comparing (1) interoception versus exteroception (collapsing active and passive conditions); (2) active interoception, active exteroception, passive interoception, versus passive exteroception; and (3) active interoception versus active exteroception. A k-fold cross validation (fivefolds) was conducted for each of these three comparisons, training on 80% of the data and testing on 20% of the data, repeated five times until all data have been used as the test set. An average accuracy score was computed across the five classifications. This process was repeated for both the baseline and post-intervention sessions. The only difference between the respiration rate classification and fMRI BOLD signal classification was that the respiration rate classification was conducted across rather than within participants due to a smaller amount of respiration datapoints available for each participant; for each participant, there were only four respiration rates per experimental condition per session. Despite this difference, this analysis would still allow us to obtain a sufficient estimate of the scale of accuracy differences between classification based on respiration rates versus BOLD signal.

    2.6.4 Aim 2: Predict—Out-of-sample classification of interoceptive versus exteroceptive attention

    To move from model generation to validation, we examined whether the trained models would generalize to predict attentional states in independent test datasets. Specifically, classifiers that were trained on each participant's baseline data were tested on post-intervention data, and vice versa. Out-of-sample testing would help us understand whether the neural distinction between interoceptive and exteroceptive attention was reliable within the same individual at different times of assessments, rather than risking model overfitting by cross-validating within the same training dataset.

    2.6.5 Aim 3: Apply—Decoding attention during a SIAT

    We finally aimed to apply interoceptive classification models during periods of sustained interoceptive attention to assess sensitivity to interoceptive training (MABT) and covariation with reports of interoceptive sensibility and affective symptom burden. To do so, we used the classifiers trained on IEAT data to estimate participants' attentional states during the SIAT. For each participant, L2 regularized regression classifiers were trained on each participant's IEAT data at baseline and post-intervention separately. The classifiers were then used to decode the participant's within-session SIAT data. As the SIAT instructed participants to maintain active engagement with interoceptive signals, the reporting demand classification was used as an estimator of task engagement, with periods decoded as ‘active tracking’ hypothesized to indicate periods in which a participant was actively engaged in sustained attention, compared to ‘passive monitoring’ periods that were more likely to indicate fatigue or mind wandering.

    The initial analysis used a three-factor, multilevel, fully within-participant design. The factors modelled included the four classifier conditions: attentional target (interoception and exteroception) × task engagement (active tracking vs. passive monitoring) × time within the sustained attention task (~150 volumes). As explained below, the second phase of analysis employed only a two-category classifier (active reporting of interoception and exteroception), removing the reporting demand factor. As this was an exploratory analysis, we focused only on highly significant findings in the main text, that is, p < .001, although complete results are available in the supporting information.

    To summarize participants' degree of engagement in interoceptive versus exteroceptive attention, several metrics were also developed, including (1) the average duration spent in each mental state, with a state defined as any period of six consecutive seconds (three TRs) or more with the same classification label, and (2) the frequency of each mental state. Together, the duration and frequency of events offer an estimate of the stability of interoceptive and exteroceptive attention throughout the task.

    Training effects

    Multilevel mixed models were used to examine whether MABT improved sustained interoceptive attention. Group membership (i.e., MABT vs. control) was the between-subjects independent variable; session (i.e., baseline vs. post-intervention) was the within-subjects independent variable. The frequency or proportion of interoceptive attention as well as metrics for the duration and number of interoceptive events was used as dependent variables. Significant Group × Session interactions would be regarded as evidence for group-specific training effects.

    Associations with subjective reports

    Additional exploratory analyses explored how classifier data related to reports of interoceptive awareness and wellbeing. Multilevel mixed models were used to predict the duration and number of interoceptive and exteroceptive events using self-reported interoceptive awareness (MAIA) or affective symptom burden. In addition, for the MABT group only, we examined the relationship between classifier estimates and therapist-rated interoceptive awareness. Multilevel mixed models were used to examine MAIA, composite symptom scores and therapist ratings as independent variables and the interoceptive attention metrics, that is, average duration of events and number of events, as dependent variables.

    3 RESULTS

    3.1 Classification accuracy based on respiration rates

    We submitted respiration rate data to the L2 classifier to distinguish between (1) interoception versus exteroception (collapsing active and passive conditions); (2) active interoception, active exteroception, passive interoception, versus passive exteroception; and (3) active interoception versus active exteroception. For interoception versus exteroception, the classifiers achieved accuracies of 55.6% at baseline and 50.0% at post-intervention (chance accuracy = 50%). Among the four conditions, the classifiers achieved accuracies of 30.1% at baseline and 23.9% at post-intervention (chance accuracy = 25%). For active interoception versus active exteroception, the classifiers achieved accuracies of 63.1% at baseline and 50.6% at post-intervention (chance accuracy = 50%).

    3.2 Aim 1: Distinguish—Within-sample classification of interoceptive and exteroceptive attention

    The first aim of the study was to evaluate whether machine learning classifiers could distinguish between neural patterns associated with four experimental conditions: [active vs. passive] monitoring of [interoceptive vs. exteroceptive] attention. First, we tested a two-state model contrasting both interoception conditions against both exteroception conditions. Second, we evaluated a four-state model featuring: active interoception, passive interoception, active exteroception, versus passive exteroception. Finally, we applied a two-state model again, focusing on the distinction between active interoception and active exteroception. As a liberal test of discriminability, classification was first performed within-session, that is, trained and tested using data from the same assessment session at either baseline or post-intervention.

    3.2.1 Two-category classification: Interoceptive versus exteroceptive attention

    The first model classified all interoception trials against all exteroception trials, collapsing together active and passive conditions. The classifier achieved a 73% accuracy at baseline and 72% accuracy at post-intervention (Figure 3a). Participants at both baseline and post-intervention were all classified with an accuracy of >55.4%, the p < .05 threshold for chance classification except for one participant at post-intervention who was slightly below chance. However, inspection of individual participant classification revealed considerable heterogeneity between well-classified and poorly classified participants (Figure 4). Inspection of the confusion matrices indicated that model was not biased in its errors; that is, it did not make any systematic errors in predicting one condition than the other (Figure 3b).

    Details are in the caption following the image
    Within-session IEAT classification accuracy: interoception versus exteroception. (a) Classification accuracy scores between interoceptive and exteroceptive attention at baseline (73% accuracy) were replicated at post-intervention (72% accuracy). The box represents the quartiles of the dataset; the whiskers extend to show the rest of the distribution, except for data points beyond 1.5 times of the interquartile range, which were considered outliers. The data points represent each participant's classification accuracy score. The red dotted line represents the threshold for individual participants' overall classification accuracy to be considered statistically above chance. (b) The dark diagonal of this confusion matrix shows that the machine learning models did not make any systematic errors. Refer to the supporting information for individual participants' confusion matrices.
    Details are in the caption following the image
    Sample IEAT classification output. The red circles designate classifier predicted labels. The blue lines designate true condition labels. Participant A was selected as one of the most accurate predictions, while Participant B was selected as one of the least accurate predictions. This figure serves to illustrate the most and least accurate classifications; overall classification largely fell within this range.

    We generated a group-level frequency map to examine the voxels that contributed the most to the classification (Figure 5). Since our aim was the classification of attentional states rather than the localization of brain regions, we are not claiming that we identified critical brain regions for interoception or exteroception through this analysis. These group-level brain maps are meant to serve an illustrative purpose to help readers visualize the overlap of sources of information across participants.

    Details are in the caption following the image
    Important voxels in the classification of interoceptive and exteroceptive attention. Voxels in red contributed to the classification of exteroceptive attention; voxels in blue contributed to the classification of interoceptive attention. These maps were aggregated across all participants; the visualization was thresholded to show voxels that were important to more than one participant. Top panels are left and right lateral views; bottom panels are left and right medial views.

    Overall, 89% of the important voxels were only important for two or fewer participants (supporting information); no voxel was important for more than 14 participants. Voxels in the posterior cingulate and the middle insula both contributed evidence for interoceptive attention, whereas voxels in the ventromedial prefrontal cortex (vmPFC), dorsomedial prefrontal cortex (dmPFC), motor and somatosensory areas, the primary visual cortex (V1) and the middle temporal visual area (V5) contributed evidence for exteroceptive attention.

    3.2.2 Four-category classification: Active interoception, active exteroception, passive interoception, and passive exteroception

    We then examined classification performance among a more nuanced model that aimed to distinguish between the four experimental conditions. The whole-brain L2 regularized logistic regression classifier achieved a 71% accuracy at baseline and at post-intervention (Figure 6). In addition, each mental state was differentiated from the other three states with above chance accuracy: Active Interoception = 72%, Active Exteroception = 73%, Passive Interoception = 69%, and PastExtero = 70% (Figure 4). Participants at both baseline and post-intervention were all classified with an accuracy of >29.6%, the p < .05 threshold for chance classification. Inspection of the confusion matrices indicated that model was not biased in its errors; that is, it did not make any systematic errors in predicting one condition over the others, although within-passive active and within-passive confusions were numerically greater than confusions between active/passive condition combinations.

    Details are in the caption following the image
    Within-session IEAT classification accuracy: active Interoception, active exteroception, passive interoception, versus passive exteroception. (a) Classification accuracy scores between the four IEAT conditions at baseline (71% accuracy) were replicated at post-intervention (71% accuracy). (b) The dark diagonal of this confusion matrix shows that the machine learning models did not make any systematic errors. Refer to the supporting information for individual participants' confusion matrices.

    Next, we generated group-level frequency maps to examine the voxels that contributed the most to the classification (Figure 7). Overall, 90% of the important voxels were important for only two or fewer participants (supporting information); no voxel was important for more than 12 participants in the classification of active interoception, 18 participants in the classification of active exteroception, 19 participants in the classification of passive interoception and 15 participants in the classification of passive exteroception.

    Details are in the caption following the image
    Important voxels in the classification of active interoception, active exteroception, passive interoception, versus passive exteroception. Voxels in red contributed to the classification of a specific task (a, active interoception; b, active exteroception; c, passive Interoception; d, passive exteroception). Voxels in blue contributed to the classification against that task. These maps were aggregated across all participants; the visualization was thresholded to show voxels that were important to more than one participant.

    In keeping with the prior literature, voxels in the posterior cingulate and middle insula both contributed evidence for active interoception, whereas voxels in the vmPFC, dmPFC and primary somatosensory cortex contributed evidence against active interoception. Voxels in the ventral visual pathway, especially V1 and V5, contributed evidence for active exteroception; voxels in the medial orbital prefrontal cortex (moPFC) contributed evidence against active exteroception. Voxels in the perigenual anterior cingulate cortex (pACC), vmPFC and moPFC contributed evidence for the passive interoception condition; the ventral visual pathway, especially V1 and V5, and the medial premotor cortex all contributed evidence against the passive interoception condition. Lastly, V1, V5, the right motor and somatosensory areas, and some small clusters in the prefrontal cortex contribute evidence for passive exteroception condition; the medial premotor cortex, left motor cortex and left somatosensory cortex contributed evidence against passive exteroception.

    In addition to voxels in regions that support representation of interoceptive and exteroceptive content, some classification seemed to capitalize on unequal reporting demands across conditions: for example, motor and somatosensory areas corresponding to button presses with the right hand contributed significantly to active interoception and active exteroception classification. As mentioned above, area V5 contributed to classification of passive interoception, the only condition that did not feature motion. In response, an additional two-state classification analysis was conducted, focused on the most closely matched conditions, active interoception versus active exteroception.

    3.2.3 Within-session classification: Active interoception versus active exteroception

    Whole-brain classification between the closely matched active interoception and active exteroception conditions achieved an 85% accuracy at baseline and 82% accuracy at post-intervention across participants (Figure 8). Once again, participants at both baseline and post-intervention were all classified with accuracies of >55.4%, the p < .05 threshold for chance classification. Inspection of the confusion matrices indicated that model was not biased in its errors; that is, it did not make any systematic errors in predicting one condition than the other.

    Details are in the caption following the image
    Within-session IEAT classification accuracy: active interoception versus active exteroception. (a) Classification accuracy scores between the active interoception and active exteroception conditions at baseline (85% accuracy) were replicated at post-intervention (82% accuracy). The dotted red line represents the threshold for the accuracy scores to be significantly different than chance based on the binomial probability distribution. (b) The dark diagonal of this confusion matrix shows that the machine learning models did not make any systematic errors. Refer to the supporting information for individual participants' confusion matrices.

    We generated a group-level frequency map to examine the voxels that contributed the most to the classification (Figure 9). Overall, 90% of the important voxels were only important for two or fewer participants (supporting information); no voxel was important for more than 10 participants. Regions identified in the four-category classification also demonstrated importance in this analysis. Specifically, voxels in the posterior cingulate and middle insula both contributed evidence for active interoception, whereas the vmPFC, dmPFC, motor and somatosensory areas, V1 and V5 all contributed evidence for the Active Exteroception condition. Although these two active tracking conditions were matched for motion and button-press requirements, sensorimotor activity supporting classification of exteroception in the four-category model was still retained in this two-category model.

    Details are in the caption following the image
    Important voxels in the classification of active interoception versus active exteroception. Regions in red contributed to the classification of exteroceptive attention; regions in blue contributed to the classification of interoceptive attention. These maps were aggregated across all participants; the visualization was thresholded to show voxels that were important to more than one participant.

    3.3 Aim 2: Predict—Out-of-sample classification of interoceptive and exteroceptive attention

    The second aim of the study was to test whether machine learning classifiers could predict individualized neural patterns associated with different attentional states using out-of-sample data. As all participants attended two assessment sessions (baseline and post-intervention), we tested classification models derived from baseline data and then tested on post-intervention data and vice versa. We applied this approach for both the four conditions (active interoception, active exteroception, passive interoception and passive exteroception) and two conditions (active interoception and active exteroception) models.

    The four-category classifier achieved 51% accuracy across participants when trained on baseline data and tested on post-intervention data and 50% accuracy when trained on post-intervention data and tested on baseline data (Figure 10a). Although less accurate than within-session classification, the overall classification remained significantly above chance.

    Details are in the caption following the image
    Out-of-sample IEAT classification accuracy. (a) Classification accuracy scores between the four IEAT conditions at baseline (51% accuracy) were replicated at post-intervention (50% accuracy). The box shows the quartiles of the dataset; the whiskers extend to show the rest of the distribution, except for data points beyond 1.5 times of the interquartile range, which were considered outliers. The data points represent each participant's classification accuracy score. The dotted red line represents the threshold for the accuracy scores to be significantly different than chance based on the binomial probability distribution. (c) The active interoception–active exteroception classification accuracy scores were 71% at baseline and 69% at post-intervention. (b and d) Confusion matrices show that the models did not make systematic prediction errors by consistently mistaking one condition for another. Refer to the supporting information for individual participants' confusion matrices.

    The binomial probability distribution determined that classification accuracy scores above 29.4% were statistically significant above chance (cumulative probability < .05 for a 25% chance level across all TRs). Of all participants, one failed to surpass chance at baseline (accuracy = 28%), and a different participant failed to surpass chance at post-intervention (accuracy = 29%). Confusion matrices suggested that each state was equally distinguishable from the other three states with above chance accuracy: Active Interoception = 52%, Active Exteroception = 49%, Passive Interoception = 50% and Passive Exteroception = 49% (Figure 10b).

    In the two-category classification model between active interoception and active exteroception, the whole-brain L2 regularized logistic regression classifier achieved an 51% accuracy at baseline and 50% at post-intervention (Figure 10c). The binomial probability distribution determined that classification accuracy scores above 55.4% were significantly above chance (cumulative probability < .05 for a 50% chance level across all TRs). Three participants failed to beat chance at baseline and four at post-intervention. Similar to the four-category classification, the machine learning models did not make any systematic errors over-estimating one condition than the other: Active Interoception = 69% accuracy and Active Exteroception = 70% accuracy (Figure 10d).

    Out-of-sample accuracy is almost invariably lower than within-sample classification, leading researchers to argue that only out-of-sample classification should be considered true classifier ‘prediction’ (cf. Poldrack et al., 2020). However, the reasons for this drop are multifaceted, including both measurement error and true changes in participants' neural representations over time. Accuracy drop was operationalized as the average of [baseline training to post-intervention decoding] and [post-intervention training to baseline decoding] accuracies for each participant. However, these average drop scores were unrelated to experimental group or change in MAIA scores, suggesting that the accuracy drop was not due to intervention-related change.

    3.4 Aim 3: Apply—Decoding attention during the SIAT

    Our third aim was to explore whether the classifiers trained on the IEAT data could decode participants' attentional state over a 3-min period of sustained interoceptive attention. We were particularly interested in exploring whether interoceptive training influenced these sustained attention metrics, and whether these metrics correlated with self- and clinician-reports of interoceptive engagement and symptom burden.

    We first applied the four-category IEAT classification model to the sustained attention data, which distinguished between Task Engagement (active vs. passive attention) and attentional target (interoception vs. exteroception). The aim of using the four-category model was to explore where participants most clearly showed active engagement with the sustained attention task, as this part of the timeseries would likely be most sensitive to training effects.

    Classification results were analysed in a multilevel model to explore the effects and potential interactions between task engagement (active vs. passive), attentional target (interoception vs. exteroception) and time within the sustained attention task (seconds). The analysis showed a significant main effect of engagement, such that participants were more likely to be classified as being active than passive, β = 7.23, 95% CI [5.12; 9.33], p < .001, and this effect was qualified by an interaction between Engagement × Time, β = −.07, 95% CI [−.11; −.05], p < .001, such that greater active engagement was found earlier in the sustained attention run (Figure 11a). Complete results of the multilevel model are available in the supporting information.

    Details are in the caption following the image
    Sustained attention over time. (a) The estimated frequency of passive and active states over time during the SIAT. (b) Time-course of estimated percentage of active interoception during the SIAT. The moment-by-moment proportion of participants having active interoceptive attention (ActInt) in MABT and control groups at baseline and post-intervention. See the supporting information for the timecourse of estimated frequency of other attentional states during the SIAT.

    Given the strong performance of active condition classifier, the discovery of a motion confound in the passive interoception condition, and the expectation of greater participant engagement when active responses were required, we focused our subsequent decoding of interoceptive sensibility on the classifier trained on the two active conditions alone. A multilevel model explored the interactions between group (MABT vs. control), session (baseline vs. post-intervention) and time within the sustained attention task (seconds). A significant Group × Session × Time interaction was observed, β = −.07, p < .001. Subsequent visualization of the effects (Figure 11b) indicated that active interoception was significantly more frequent in the MABT post-intervention group than the other Group × Session combinations, and the interaction was driven by differences in the first half of the time series, consistent with the previous finding that active Engagement was highest earlier in the task.

    Complete results of the multilevel model are available in the supporting information. Several additional exploratory analyses are also available in the supporting information, including an attempted replication of Weng et al. (2020); the time-course data were summarized using two derived scores: (1) the total number of events in the SIAT and (2) the average duration of these events over the course of the SIAT. These metrics were then used to investigate self-reported interoceptive sensibility (MAIA scores), affective symptom burden and clinician ratings of interoceptive engagement within the MABT group. While none of these analyses produced statistically significant findings following corrections for multiple comparisons, the approach provided a proof-of-concept for how classification scores could be validated in a better-powered future sample.

    4 DISCUSSION

    The present study demonstrated that machine learning classifiers can differentiate between neural patterns of interoceptive attention and exteroceptive attention, a candidate objective biomarker of the capacity for the clinically relevant construct of interoceptive sensibility. The fMRI BOLD signals carried substantially more information than respiration alone as observed from the comparison between classification accuracies based on respiration versus neural data. Following within-sample neural classification, we demonstrated classifier prediction by using a relatively more stringent out-of-sample test, predicting each participants' attentional state on test data that were independent from the training set, and acquired more than 8 weeks apart. These findings support qualitative distinctions between interoceptive and exteroceptive attention and suggest that whole-brain fMRI can provide sufficient information to distinguish between these states.

    We also applied the classification model to decode participants' attentional dynamics during a sustained interoception task. The classifier suggested that participants were only able to maintain an active attentional state for approximately 90 s of the 3-min attention periods, regardless of training status; however, within those periods, the classifier appeared to be sensitive to the clinical intervention, suggesting that training increased the duration of time that participants were able to maintain an active interoceptive state. Finally, we were able to illustrate how such decoding could also be applied to continuous covariates such as subjective interoceptive awareness and affective symptom burden. While the results of the decoding analysis were largely statistically insignificant and thus would be inappropriate for inference to the broader population, it provides a framework for future research to explore the associations between these important constructs.

    4.1 Distinguishing between interoceptive and exteroceptive attention

    A series of machine learning models each achieved high levels of accuracy in distinguishing between interoceptive and exteroceptive attention, in both in-sample and out-of-sample tests. While classification relied on a diffuse and often idiosyncratic set of brain regions for each participant, voxels in the posterior cingulate and middle insula often provided evidence for interoceptive attention, in accordance with the literature on these areas' involvement in interoceptive processing (Craig, 2003; Farb et al., 2013). Conversely, voxels in the cortical midline, consistent with the brain's default mode network, generally contributed evidence against interoceptive attention, which could suggest a break away from effortful, elaborative cognitive processing during internal body focus (Raichle & Snyder, 2007). It is important to note that few voxels were consistently important across participants in classifying different mental states, which would be the goal of a typical univariate localization approach. Instead, the classification approach generated reliable but idiosyncratic models distinguishing between interoceptive and exteroceptive attention, suggesting that the direction of attention between these targets may be experience-dependent rather than relying on consistent regions such as primary representation cortices.

    However, the ability to make more nuanced distinctions between active and passive reporting of the sensory targets was limited by non-equivalent task demands and stimulus presentation between the IEAT conditions. The four-category model revealed the opportunistic nature of the machine learning classifier, which capitalized on two design confounds. First, the active and passive conditions differed in their requirement of button-presses to track sensory targets. Accordingly, the four-category classifier relied heavily on left motor cortex activity to distinguish active from passive monitoring, consistent with the active monitoring requirement to perform right hand finger movements (e.g., Lotze et al., 2000; Rao et al., 1995). Second, passive interoception was the only condition in which the visual target (the circle stimulus) remained stationary on the screen rather than following a cycle of expansion and contraction. Accordingly, the classifiers relied on the differential neural representations in voxels in the visual pathway, especially V1 and V5, to distinguish passive interoception from the other three conditions, consistent with the literature on these visual areas' function in mapping the visual field and coding for motion (see Greenlee & Tse, 2008, for a review). This reliance of the classifier on task difference highlights the need to develop closely matched experimental conditions if one wishes for machine learning algorithms to base classification on constructs of interest. Future research might revisit the four-category model if the task confounds can be better addressed.

    Given the fixation of the general two-category and four-category classification models on nuisance covariates, we refocused our analyses on the two active conditions (Active exteroception and active interoception). Encouragingly, these conditions showed no obvious confounds, and participants demonstrated equivalent facility for performing the two active tracking tasks (cf. https://www.biorxiv.org/content/10.1101/2022.05.27.493743v3 for analyses confirming equivalent tracking performance between the two tasks). As mentioned above, the two active conditions were also well matched for stimulus dynamics, and both required button-press tracking of the appropriate sensory target (visual circle or respiratory cycle).

    The results supported the feasibility of classifying between carefully controlled periods of interoception and exteroception. Even without features driven by motion and motor confounds, the classification accuracy between active interoception and active exteroception was the highest of the three models tested, surpassing a more general model that collapsed together active and passive conditions, as well as the four-category model. The classification relied on voxels in a similar set of brain regions as the previous four-category analysis, excluding the motor and visual cortex confounds described above. What remained was a consistent account of the neural distinction between interoception and exteroception: voxels in the posterior cingulate and middle insula contributed evidence for active interoception, consistent with previous studies that identified these areas as being implicated in the interoceptive neural network (Li et al., 2017; Matsumoto et al., 2006). For active exteroception, areas including the vmPFC, dmPFC, motor and somatosensory areas, as well as the visual cortices, contributed important evidence.

    Why was exteroceptive attention associated with greater reliance on non-visual brain areas despite having the same visual stimulus as the interoceptive condition? One possible explanation is that greater attentional resources might have been required for active exteroception to track and report the movements of the visual stimulus as part of the task requirement. Conversely, active interoception recruited fewer resources in the cortical midline structures, which are associated with cognitively demanding tasks (Seeley et al., 2007). Interoceptive attention might have shifted cognitive resources from elaborative cognitive processing to an inward, sensory focus as suggested by the engagement of the cingulate cortex and insula (Farb et al., 2007, 2010, 2013).

    Another critical finding was the maintenance of above-chance classification when classifiers were applied out-of-sample. Classifiers trained on a participant at one time point were still able to classify attentional states using data acquired at a second timepoint 8 weeks apart, suggesting that the neural patterns of interoception and exteroception were truly differentiable rather than a result of overfitting classifier models. While accuracy predictably declined from within-sample classification, classification remained significantly above chance: approximately 50% accuracy versus a 25% chance in the four-state classification and 70% accuracy versus 50% chance in the two-state classification, comparable to similar studies that have attempted to classify between mental states. For example, Mitchell et al. (2003)'s Gaussian Naïve Bayes models achieved approximately 80% accuracy in classifying cognitive states such as reading a word about people versus buildings (50% chance accuracy). Weng et al. (2020)'s L2 regularization models achieved approximately 41% accuracy versus 20% chance for distinguishing between five mental states, including interoceptive sensations of breath, sensations of the feet, mind wandering, self-referential processing and ambient sound listening.

    In our study, classification rates decreased by 15% when we classified participants' data out-of-sample compared to within-sample. However, it is unclear whether this drop in accuracy was a ‘bug or a feature’: if within-sample classification led to overfitting, the lower out-of-sample classification rates would represent a more accurate and generalizable estimate of model prediction. Accuracy could also decline because individuals' neural dynamics truly changed, both because of natural variation over a 2-month interval and because half of the participants were engaged in interoceptive training over this period. In this case, greater model inaccuracy between time points could be a sign of the model's sensitivity to training effects: Participants who experienced greater training-related change in the neural dynamics of interoception would presumably show poorer model fit. In our sample, we conducted an analysis to show that the accuracy drop was unrelated to intervention group or change in MAIA score. Therefore, the loss of out-of-sample accuracy was unlikely an effect of intervention condition. Nevertheless, the drop could still occur because of natural variation over a 2-month interval, as well as the common loss of classification accuracy observed when moving from within-sample model fitting to true prediction in an out-of-sample dataset (Poldrack et al., 2020).

    Thus, the classification models seemed robust against overfitting and performed well above chance on datasets collected 2 months apart. The primary aim of the study was therefore successful: Whole-brain fMRI of participants toggling between interoceptive and exteroceptive attention produces sufficient information to distinguish between neural modes of interoceptive and exteroceptive attention. While classification accuracy can be improved, the current approach already suggests that active engagement in interoception and exteroception are neurally distinguishable states. Furthermore, while activity in primary representation cortices for interoception and vision was important for the classification, less prefrontal and sensorimotor activity appears was indicative of interoception. Why the interoceptive state seems to involve less cortical activity is a ripe topic for further investigation; however, this finding is consistent with the characterization of interoceptive attention as a reducing unnecessary energy demands and driving hemostatic regulation (Quigley et al., 2021), supporting the conditions for embodied self-awareness as a ‘minimal phenomenal self’ (Limanowski & Blankenburg, 2013) or ‘proto-self’ (Bosse et al., 2008).

    4.2 Decoding sustained interoceptive attention

    The machine learning models trained on IEAT data were applied to decode 3-min runs of the SIAT. The general properties of the task were decoded in terms of task engagement (active tracking vs. passive monitoring) and attentional target (interoception vs. exteroception) categories. In general, a greater proportion of the SIAT period was decoded as being in an active tracking than passive monitoring state. Two further interactions with were observed. First, active tracking was most prominent early in the sustained attention period, but this advantage diminished over time. This finding is not surprising, given the difficulty inherent to sustaining attention over time; attention should be most focused immediately following the guided audio meditation that preceded each SIAT run but then deteriorate over time as fatigue and/or distraction sets in. Second, participants tended to be engaged in exteroception more than interoception during these earlier phases. This may be a result of two salient exteroceptive cues—the guided meditation recording and then the onset of the scanner functional recording, both of which may bias attention away from the intended interoceptive target at the start of each run.

    Together, these results suggest some challenges in examining sustained interoceptive attention in the scanner. Attention may be inherently biased towards the noise of the scanner environment, and attention may also cease to be consistently engaged about halfway through each 3-min fMRI scanner run. These considerations provided an important context for our next analyses, suggesting that future research on sustained interoceptive attention might focus most fruitfully on the first few minutes of attention to maximize assessment during periods of participant engagement.

    4.2.1 MABT training effects

    Following general characterization of the SIAT, we then examined the influence of MABT intervention training effects upon the neural decoding data. Distinct patterns were observed between the two group, driven by post-intervention differences. At baseline, participants were generally more often engaged in exteroception than interoception. However, at post-intervention, MABT participants showed greater interoception than exteroception, a shift not shared by the control group. This statistically significant training effect therefore qualified a general finding of greater exteroception at the start of each SIAT run. A combination of audio recordings prior to the run, combined with the noise of the fMRI data acquisition, may have created a general bias towards exteroception, but MABT appeared sufficient to overcome this bias and allow for sustainable interoceptive processing. It should be noted that the initial ability to engage interoception in the MABT post-intervention group was still limited by the decline in active engagement over the 3-min run: as active engagement declined, so too did the distinction between the MABT and Control groups.

    As outlined in the supporting information, we more fully explored training effects using several other metrics of interoceptive capacity. We found some preliminary (but statistically insignificant) indications that post-intervention MABT participants might have experienced more stable periods of interoceptive and exteroceptive attention, suggesting an enhanced ability to sustain attention despite not meeting statistical significance. Reducing the full SIAT timecourse from hundreds of volumes into a single summary score may have resulted in lower statistical power, particularly if active engagement declined halfway through each task run. Hence, there appeared to be a tradeoff of lower power from this statistical reduction that attempted to simplify accounts at the expense of the power inherent when using complete data in multilevel models. Modelling raw rather than summary decoding data may therefore continue to be a useful technique for more powerfully interrogating participant mental states in future research.

    4.3 Limitations and constraints on generality

    There were some limitations in the design of the fMRI tasks that can be improved in future iterations of this study. One limitation was the lack of guarantee of participant compliance during the passive conditions of the IEAT in which participants were not required to produce any behavioural responses, making it difficult to verify whether and how much they paid attention to the task. In our analysis, we focused on the active conditions in which participants' attention was guaranteed through the high accuracy of button presses. Future studies can increase the usability of passive-condition fMRI data, for example, by incorporating occasional catch trials that require button presses.

    In this study, we modelled two experimental factors, task engagement (active tracking vs. passive monitoring) and attention target (interoceptive vs. exteroceptive attentional states). Future experiments might consider a greater variety of mental states so that interoception can be further compared to states such as mind-wandering, memory recall and focused attention on different types of exteroceptive stimuli. Post-scan qualitative interviewing can also be built into future studies to better understand what different attentional processes feel like for the participants at a phenomenological level (Petitmengin, 2021). With more experimental conditions designed, it is critical to keep in mind that these conditions need to be closely matched so that machine learning classifiers can focus on important constructs without leveraging too much on nuisance variables.

    Another limitation of this study was that no data were collected during the audio-guided interoceptive practice before the self-guided sustained attention. Since we found significant MABT training effect in the beginning of the self-guided sustained attention during which participants were more engaged in the task, it is likely that a high level of task engagement is needed to scaffold the recently developed interoceptive capacity in the training group. Therefore, future research might benefit from extending data collection to include the guided meditation period or from providing attentional prompts during the self-guided sustained attention. Individuals with varying degrees of experience in mindfulness or interoceptive trainings might benefit differently from such scaffolded attention, which can be another interesting topic of research.

    In terms of methods, we trained machine learning classifiers to discriminate between interoceptive and exteroceptive attentional states within individuals rather than across individuals. This analysis allowed us to make predictions and estimates based on each participant's unique task-related neural signature and generate group-level brain maps to identify common brain areas associated with interoceptive and exteroception. Yet we have not characterized whether interoceptive attention could be consistently observed across individuals from a single training model or whether models are differentiated across individuals. Future studies can potentially develop across-individual decoders, with the caveat that such decoding is highly complex and can produce superfluous results and that interpretations about the decodability of mental states should be made with caution (Jabakhanji et al., 2022).

    5 CONCLUSIONS

    In this proof-of-concept study, we showed that machine learning can be a promising tool to characterize the uniqueness of a mental process as compared to other related processes, in this case, interoceptive versus exteroceptive attention. Since interoception is theorized to be the foundation of our feeling states, meaning in life, and broader appraisals of wellbeing, understanding the neural underpinnings of interoception has great importance. Machine learning also has the potential for predicting individuals' mental states, especially when self-report is not preferable for reasons such as interrupting the mental process being studied, for example, during sustained attention. Furthermore, machine learning can allow us to examine training-related changes at a neural level, in interoception or other types of mental health interventions. For example, therapies that have elements of interoception can benefit from understanding what role and how big of a role interoception plays in improving wellbeing.

    The analyses revealed high accuracy of machine learning models in distinguishing interoceptive from exteroceptive attention. For an illustrative purpose, voxels in the posterior cingulate and the middle insula were associated with interoception, whereas voxels in the somatosensory, motor and cortical midline regions were associated with exteroception. We then estimated individuals' moment-to-moment attention when they were instructed to sustain focus on body sensations. We observed promising MABT training effects on interoceptive attention immediately following an audio-guided meditation, showing that individuals new to interoceptive training increased their capacity for interoceptive attention. Although our exploratory analyses did not reveal statistically significant associations between classifier-output and clinician--rated interoceptive sensibility, subjective interoceptive awareness and affective symptoms, our study provides a framework for analysing these constructs in more refined studies in the future.

    While further research is needed to address the limitations of this study, the present findings showed that interoceptive and exteroceptive attention appeared to have distinct neural signatures and that machine learning shows promise for advancing our knowledge in interoceptive processes.

    AUTHOR CONTRIBUTIONS

    Norman A. S. Farb and Cynthia J. Price designed the study and collected data. Zoey X. Zuo and Norman A. S. Farb conducted data analysis. Zoey X. Zuo, Norman A. S. Farb, and Cynthia J. Price wrote the manuscript.

    ACKNOWLEDGEMENTS

    This work was supported by a RIFP award from the School of Nursing at the University of Washington and a grant from the National Center for Advancing Translational Sciences of the National Institutes of Health (UL1 TR002319). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Data analysis infrastructure and trainee expenses were supported by a Canadian Natural Sciences and Engineering Council Discovery grant (RGPIN-2015-05901). We thank Natalie Koh and Sophie Xie for their help in coordination and data collection and to Dr. Thomas Grabowski, Dr. Chris Gatenby and Ms. Liza Young at the Integrated Brain Research Center at the University of Washington for their collaboration. We thank Dr. Helen Weng for providing the MATLAB code used for her 2020 paper, which served as a helpful conceptual guide in developing our analysis pipeline. Last but not least, we wish to express our appreciation to the study participants and the MABT therapists, Elizabeth Chaison and Carla Wiechman.

      CONFLICT OF INTEREST STATEMENT

      The authors declare no conflict of interest.

      PEER REVIEW

      The peer review history for this article is available at Web of Science (https://www.webofscience.com/api/gateway/wos/peer-review/10.1111/ejn.16045).

      DATA AVAILABILITY STATEMENT

      All study materials and code are available on the Open Science Framework (https://osf.io/ctqrh/).

        The full text of this article hosted at iucr.org is unavailable due to technical difficulties.