Breast cancer is a heterogeneous disease with variable gray-scale ultrasound appearances that overlap with those of benign lesions, resulting in performance of hundreds of thousands of ultrasound-guided biopsies of benign masses each year [
1–
5]. The potential harms of false-positive results of breast imaging examinations have garnered much attention, resulting in a drive to improve the specificity of imaging [
6,
7]. This goal has led to increased use of functional imaging tools that provide supplementary biologic information to help differentiate benign masses from cancer [
8].
Optoacoustic images can provide functional data in real time, spatially fused and temporally interleaved with standard ultrasound images without the need for contrast material injection or ionizing radiation [
9]. Optoacoustic imaging entails use of a pulsed ultrasound laser at long and short optical wavelengths to generate and detect acoustic signals from oxygenated, deoxygenated, and total hemoglobin [
10–
15]. Rapid cell proliferation in breast cancers causes hypoxia that triggers neovascularity. Optoacoustic images coregistered with conventional ultrasound images leverage this intrinsic tissue contrast to improve diagnostic precision [
16–
22]. The combination of ultrasound and optoacoustic imaging was previously found to improve the accuracy of classification of benign breast lesions that had suspicious ultrasound features, thus potentially reducing the number of benign biopsies significantly [
23–
25].
The Pioneer-01 pivotal study was designed to evaluate the safety and effectiveness of a first-generation system performing combined ultrasound and optoacoustic imaging [
24]. Optoacoustic imaging, compared with ultrasound, had a 14.9% increase in specificity, from 28.1% to 43.0%, but a 2.6% decrease in sensitivity, from 98.6% to 96.0%. Although feature scores on optoacoustic images were described as objective observations, conversion of the score on optoacoustic images to a probability of malignancy (POM) and BI-RADS category was at each reader's judgment. In addition, features on optoacoustic images did not contribute equally to POM assessments. Sensitivity, specificity, and POM were therefore likely substantially influenced by readers' training and familiarity with feature categorization on optoacoustic images. The 2.6% decrease in sensitivity may have been due in part to radiologists' difficulty integrating features on ultrasound images with those on optoacoustic images. To facilitate radiologists' evaluations, a machine learning–based decision support tool (DST) was developed to help synthesize findings on ultrasound and optoacoustic images and thereby improve radiologists' BI-RADS assessments. The primary goal of implementing a DST was to help better integrate the multitude of features on ultrasound and optoacoustic images and improve user experience in interpreting features on ultrasound and optoacoustic images in real time. We hypothesized that modifications of reader training and implementation of the DST would collectively help address the 2.6% decrease in sensitivity of optoacoustic imaging that was observed in the initial Pioneer-01 clinical study.
Methods
Patient Selection
The Reader-02 study (National Clinical Trials identifier NCT04030104) was a single-arm retrospective multireader study conducted with images previously obtained as part of the Pioneer-01 pivotal study (National Clinical Trials identifier NCT01943916). The Reader-02 study was conducted to evaluate the effect on diagnostic performance of two modifications with respect to the Pioneer-01 study: more extensive reader training and implementation of the DST.
The Pioneer-01 study was a HIPAA-compliant prospective clinical trial performed at 16 sites that enrolled women age 18 and older presenting with a solid or complex cystic and solid breast mass assessed with conventional ultrasound as BI-RADS category 3–5 between December 2012 and September 2015. The study sample and trial design have been previously reported [
24]. The institutional review boards of all participating institutions approved the study. Participants underwent both gray-scale ultrasound and optoacoustic imaging evaluation of the mass in a single session. A handheld duplex probe was used as both a standalone gray-scale ultrasound transducer and a first-generation optoacoustic imaging device (Imagio, Seno Medical Instruments). Biopsy and/or imaging follow-up of the mass was performed according to the study protocol.
In the Pioneer-01 study, 2105 patients with 2191 masses provided written informed consent. After exclusions, the intent-to-diagnose sample included 1739 patients with 1808 masses. Additional masses were excluded from potential inclusion in the Reader-02 study for the following reasons: additional mass in patients already included in study sample (
n = 69); mass used for reader training and proficiency testing (
n = 116); mass used for DST training; or mass not biopsied, BI-RADS category 3 assigned enrollment, and 12-month follow-up gray-scale ultrasound showed an increase in size or BI-RADS category (based on a retrospective review performed by a panel of radiologists as part of the Pioneer-01 study, as previously described [
24]) (
n = 8). These exclusions resulted in 1615 patients with 1615 masses who were eligible for potential selection for inclusion in the Reader-02 study. All included masses either were classified as benign (including high risk) or malignant on the basis of histologic evaluation or, in the absence of biopsy of a BI-RADS category 3 mass, were classified as benign on the basis of stable size and BI-RADS category at 12-month follow-up.
For the final study sample, eligible patients were selected by means of stratified random sampling to maintain the same distribution of BI-RADS categories and diagnoses as in the Pioneer-01 study. This stratified random sampling was used to construct blocks of 120 masses, each containing 20, 75, and 25 masses with BI-RADS categories of 3, 4, and 5, respectively, or of 45 malignant masses and 75 masses classified as benign (72 on the basis of benign histology or 12-month follow-up and three on the basis of high-risk histology). Four such blocks were constructed to provide a total of 480 masses, reflecting the prior determined sample size. The final study sample comprised these 480 patients (mean age, 49.9 years) with 480 masses (
Fig. 1).
Design of Reader Study
Ultrasound examinations were independently interpreted by 15 breast radiologists (seven academic, eight nonacademic) with 4–38 years of posttraining experience. The interpretations were conducted between July 30, 2019, and November 3, 2019. The readers were blinded to the reference standard outcome of each mass. Readers reviewed static images and video sweeps of ultrasound images alone and of fused ultrasound and optoacoustic imaging in orthogonal planes. The order of the masses was individually randomized within each block. All 15 readers interpreted the masses in the same order. Before starting the study interpretations, the readers underwent training in scoring and interpretation of features for both ultrasound and optoacoustic imaging, as described in the Supplemental Methods (available in the
online supplement).
The readers assessed five features on conventional ultrasound images (peripheral zone, boundary zone vessel, shape, internal texture, and sound transmission) and five features on optoacoustic images (external peripheral radiating vessels, boundary zone vessels, internal vessels, internal hemoglobin, and internal blush). Readers were provided with the following clinical variables: age, breast cancer history, indication for ultrasound, and mass location. Readers were also provided with preultrasound mammograms, if available.
Readers first assigned the mass a POM and BI-RADS category on the basis of review of ultrasound images alone. They were instructed to consider clinical information and any available mammograms when formulating these assessments. During this assessment, readers also measured the maximum diameter and the depth from skin to the posterior margin of each mass. No other ultrasound features were evaluated at this stage. Next, using an electronic form, readers evaluated fused ultrasound and optoacoustic images and recorded scores for the five ultrasound features and five optoacoustic imaging features, assigning scores on integer scales from 0 to 5 or 0 to 6. The DST then displayed a predicted POM, computed with the scores that the reader had just entered. The reader then assigned a final POM and BI-RADS category, considering the DST results. Readers were allowed to assign a final POM different from the DST-predicted POM. Both the DST-predicted POM and reader-assigned final POM, assisted by DST, were recorded.
The reader study design is further described in the Supplemental Methods.
Decision Support Tool
The machine learning–based DST was produced with the eX-treme Gradient Boosting (XGBoost) algorithm and trained with feature score data from seven independent readers in the Pioneer-01 study using distinct cases from those included in the Reader-02 study [
27]. The DST used reader-assigned scores for ultrasound and optoacoustic features and patient age, mass size (as measured by the reader), depth to posterior mass wall (as measured by the reader), and, if available, mammographic BI-RADS category (based on the clinical report, as entered by the reader in the DST interface). On the basis of this information, the interface graphically displayed a predicted POM (and its 95% CI), which ranged from 0% to 100%, and mapped this POM to a predicted FNR (and its 95% CI). Figure S1 (available in the
online supplement) shows screen shots of the DST interface. Development of the DST is further described in the Supplemental Methods.
Statistical Analysis
To reflect the counterbalance between increase in specificity and loss in sensitivity, the primary endpoint of the study was specificity at fixed sensitivity of 98%, and the secondary endpoint was the partial AUC (pAUC) within a range of interest of 95–100% sensitivity. Readers' observed specificity and sensitivity for ultrasound alone and for fused ultrasound and optoacoustic imaging with DST assistance were reported first. Then, model-adjusted specificity for both image sets was reported at 98% sensitivity (i.e., 2% FNR). Generalized estimating equations were used to determine the model-based results. Sample size calculations were based on the primary endpoint of evaluating the differences in specificity at fixed 98% sensitivity. To detect a 10% absolute increase in specificity at fixed sensitivity with 80% power—a hypothesis test with a two-sided 5% alpha value among 15 readers by use of previously observed intrareader and interreader variance estimates—a sample size of 480 masses was estimated. Statistical analysis was performed with SAS software (version 9.4, SAS Institute) for the generalized estimating equations and OR-DBM MRMC 2.51 software (Medical Image Perception Laboratory) [
28] for sample size determination and for calculation of specificity at fixed sensitivity of 98%. Additional aspects of the statistical analysis are described in the Supplemental Methods.
Discussion
Optoacoustic images provide complementary functional information to conventional ultrasound images that is intended to improve breast cancer diagnosis and reduce the rate of false-positive biopsies while maintaining sensitivity. In this study we reassessed previously collected images from the Pioneer-01 study after dedicated reader training in optoacoustic imaging and implementation of a DST. The study showed a significant difference for the primary endpoint in that specificity at fixed sensitivity of 98% was significantly higher for fused ultrasound and optoacoustic imaging with DST assistance (47.2%) than for ultrasound alone (38.2%). These results compare favorably with those of the original Pioneer-01 study with respect to performance of ultrasound alone and of fused ultrasound and optoacoustic imaging [
24]. After undergoing didactic and interactive case training, 14 of 15 readers achieved higher specificity at fixed sensitivity using fused ultrasound and optoacoustic imaging with DST assistance without significant loss in sensitivity. The pAUC was higher for fused ultrasound and optoacoustic imaging with DST assistance for all 15 readers, also comparing favorably with the Pioneer-01 study results [
28–
30].
Changes in observed sensitivity and specificity are difficult to assess because decreases in sensitivity offset increases in specificity. This trade-off is especially relevant at sensitivity of 95% or higher, where very small losses in sensitivity can offset a large gain in specificity. The fixed sensitivity of 98% used in this study represents the generally accepted POM threshold to recommend tissue sampling. The primary endpoint of specificity at fixed sensitivity and the secondary endpoint of pAUC account for the trade-off between sensitivity and specificity. In particular, use of fixed sensitivity indicates that fused ultrasound and optoacoustic imaging with DST support can achieve significantly improved specificity without significant loss in sensitivity. That is, the 9.0% gain in specificity at fixed sensitivity of fused ultrasound and optoacoustic imaging with DST assistance accounts for the 0.4% loss in observed sensitivity of the method. Improved specificity without loss of sensitivity can help reduce the frequency of benign biopsies.
The increase in specificity with supplemental optoacoustic ultrasound in the current study was lower than previously observed for elastography [
31,
32]. This comparison likely reflects differences in study design and population, given that studies of elastography included larger numbers of BI-RADS category 2 masses and cysts. The current study showed higher sensitivity of the studied method than for the previously found 80–95% range of sensitivities of elastography [
33–
35]; such sensitivities may be too low for use of elastography as an adjunctive test for down-classifying masses. Accordingly, elastography is likely to be used clinically for up-classification or for targeted applications, such as differentiating a complicated cyst with echogenic fluid from a fibroadenoma or other solid mass. The design of the current study allowed radiologists to be confident that the gain in specificity with fused ultrasound and optoacoustic imaging with DST assistance outweighed any potential loss of sensitivity when the technique was used for adjunctive diagnosis, leading to comfort in use of the method for down-classification. Additionally, elastography has had marked reduction in sensitivity (possibly to < 80%) in masses measuring 1 cm or smaller [
33,
34]. In the current study, fused ultrasound and optoacoustic imaging with DST assistance had observed sensitivity of 96.8% for masses smaller than 1 cm, compared with 98.9% for masses larger than 2 cm. This sensitivity for small masses could be particularly important at institutions that have active supplemental MRI or automated breast ultrasound screening programs, in which small masses are frequently detected and subsequent adjunctive diagnosis is needed.
In the Pioneer-01 study, use of fused ultrasound and optoacoustic ultrasound images was associated with observed sensitivity loss of 2.6% [
24]. In the current study, which incorporated dedicated training and use of a DST, the loss of observed sensitivity was only 0.4%. Moreover, although the improvement in NLR between ultrasound alone and fused ultrasound and optoacoustic imaging assisted by DST was not statistically significant in the current study, the mean NLR (0.047; 15 readers) was higher than the mean NLR for fused ultrasound and optoacoustic imaging in the Pioneer-01 study (0.094; seven readers) [
24].
Fused ultrasound and optoacoustic imaging with DST assistance allowed correct down-classification of 28.9% of reads of benign masses that were initially categorized BI-RADS 4A or greater on ultrasound alone to BI-RADS category 2 or 3. Benign histologies that induce physiologic angiogenesis may contribute to false-positive results that persist when optoacoustic imaging is applied. For example, for masses exhibiting inflammation on biopsy, POM scores were higher for fused ultrasound and optoacoustic imaging with DST assistance than for ultrasound alone. The use of a DST cannot overcome the loss of specificity that results from benign physiologic causes associated with new blood vessel development. Hence, such lesions will likely continue to require needle biopsy.
Fused ultrasound and optoacoustic imaging with DST assistance yielded a false-negative read in 1.3% of malignant masses. Cancers with false-negative reads had significantly shorter distances to the nipple; lesions close to the nipple may be affected by nipple or near-field artifacts that prevent optimal detection of optoacoustic signal. Likewise, fused ultrasound and optoacoustic imaging with DST assistance had lower sensitivity for masses located less than 1 cm from the skin than for masses located 1 cm or farther from the skin, possibly reflecting relative undercolorization of superficial lesions due to overcolorization of adjacent skin and subcutaneous tissues rich in vessels [
36]. False-negative reads had a nonsignificant association with smaller tumor size; small cancers may undergo minimal angiogenesis and relative deoxygenation, preventing detection with optoacoustic imaging.
The readers' 95.3% overall agreement with DST results indicates a high degree of reader confidence in the DST. This confidence is expected given that the DST relied heavily on reader input. At the decision threshold, DST had mean sensitivity of 97.0% and mean specificity of 50.9%. By comparison, readers had mean sensitivity of 97.7% and mean specificity of 50.1%. The DST had 44.1% specificity at fixed 98% sensitivity and a pAUC of 0.023, closely aligning with the readers' results for combined ultrasound and optoacoustic imaging with DST assistance (47.2% and 0.024).
The process of using fused ultrasound and optoacoustic imaging to estimate POM requires two steps. The first depends on visual recognition and proper scoring of the ultrasound and optoacoustic imaging features by the radiologist. In the second step, the radiologist uses the machine learning–based DST to objectively and precisely estimate POM from the feature scores. Although DST has the potential to improve POM prediction and/or FNR if the scores are well recognized and assigned, it cannot be expected to improve estimation of POM and/or FNR if the ultrasound and/or optoacoustic imaging features are not properly scored. Thus, training in the scoring of the ultrasound and optoacoustic imaging features, as used in this study, is an important step in implementing the new technology.
The improvement in specificity of fused ultrasound and optoacoustic imaging with DST assistance would likely favorably translate to supplemental screening by means of handheld ultrasound. In this study, 37.5% of masses were malignant, whereas the prevalence of malignancy in the general population, and therefore the pretest probability of a positive result of screening ultrasound, is much lower [
37]. Among patients with low cancer prevalence who have undergone mammography with negative results, fused ultrasound and optoacoustic imaging with DST assistance would be anticipated to yield even higher specificity than observed in this study.
This study had limitations. The DST was used only to support readers, and in a small fraction of cases, on the basis of their clinical judgment, the readers did not use the DST-predicted POM. In addition, consistent with the reference standard for the Pioneer-01 study, benign masses included masses assessed BI-RADS category 3 that were stable at 12-month follow-up. In comparison, in clinical practice the standard-of-care follow-up for BIRADS category 3 masses is 2 years. Nonetheless, earlier investigators suggested that 1-year follow-up may be sufficient for probably benign masses, even in patients at high risk [
38]. Also, this study did not show that it is feasible to prospectively apply optoacoustic imaging to avoid benign breast biopsies. However, the retrospective design represents typical breast imaging work-flow whereby patient demographics, mammographic findings, and ultrasound findings are integrated to determine whether the POM is greater than 2% and thus whether the sonographic lesion warrants tissue sampling. Finally, although the performance of fused ultrasound and optoacoustic imaging with DST assistance was indirectly compared with the performance of fused ultrasound and optoacoustic imaging without DST assistance from the Pioneer-01 study, the performance of fused ultrasound and optoacoustic imaging without DST assistance was not directly assessed in the current study.
Acknowledgments
We thank Mary Hayes, Mary Karst, Su-Ju Lee, Madelyn Lefranc, Jessica Leung, Sharp Malak, Rakesh Parbhu, and Nitin Tanna for their contribution to generating gray-scale and optoacoustic reader study data; Roger Aitchison, for statistical support and manuscript review; Shaan Schaeffer, Seno Medical Instruments, who oversaw the Reader-02 clinical trial; and Thomas Stavros, Seno Medical Instruments, who contributed time and expertise in training the readers and aided with data interpretation and analysis and manuscript review.