Original Research
Cardiothoracic Imaging
July 26, 2023

Lung-PNet: An Automated Deep Learning Model for the Diagnosis of Invasive Adenocarcinoma in Pure Ground-Glass Nodules on Chest CT

Please see the Editorial Comment by Takuma Usuzaki discussing this article.
Chinese (audio/PDF) and Spanish (audio/PDF) translations are available for this article's abstract.
To listen to the podcast associated with this article, please select one of the following:
iTunes, Spotify, Google, or direct download.

Abstract

BACKGROUND. Pure ground-glass nodules (pGGNs) on chest CT representing invasive adenocarcinoma (IAC) warrant lobectomy with lymph node resection. For pGGNs representing other entities, close follow-up or sublobar resection without node dissection may be appropriate.
OBJECTIVE. The purpose of this study was to develop and validate an automated deep learning model for differentiation of pGGNs on chest CT representing IAC from those representing atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), and minimally invasive adenocarcinoma (MIA).
METHODS. This retrospective study included 402 patients (283 women, 119 men; mean age, 53.2 years) with a total of 448 pGGNs on noncontrast chest CT that were resected from January 2019 to June 2022 and were histologically diagnosed as AAH (n = 29), AIS (n = 83), MIA (n = 235), or IAC (n = 101). Lung-PNet, a 3D deep learning model, was developed for automatic segmentation and classification (probability of IAC vs other entities) of pGGNs on CT. Nodules resected from January 2019 to December 2021 were randomly allocated to training (n = 327) and internal test (n = 82) sets. Nodules resected from January 2022 to June 2022 formed a holdout test set (n = 39). Segmentation performance was assessed with Dice coefficients with radiologists' manual segmentations as reference. Classification performance was assessed by ROC AUC and precision-recall AUC (PR AUC) and compared with that of four readers (three radiologists, one surgeon). The code used is publicly available (https://github.com/XiaodongZhang-PKUFH/Lung-PNet.git).
RESULTS. In the holdout test set, Dice coefficients for segmentation of IACs and of other lesions were 0.860 and 0.838, and ROC AUC and PR AUC for classification as IAC were 0.911 and 0.842. At threshold probability of 50.0% or greater for prediction of IAC, Lung-PNet had sensitivity, specificity, accuracy, and F1 score of 50.0%, 92.0%, 76.9%, and 60.9% in the holdout test set. In the holdout test set, accuracy and F1 score (p values vs Lung-PNet) for individual readers were as follows: reader 1, 51.3% (p = .02) and 48.6% (p = .008); reader 2, 79.5% (p = .75) and 75.0% (p = .10); reader 3, 66.7% (p = .35) and 68.3% (p < .001); reader 4, 71.8% (p = .48) and 42.1% (p = .18).
CONCLUSION. Lung-PNet had robust performance for segmenting and classifying (IAC vs other entities) pGGNs on chest CT.
CLINICAL IMPACT. This automated deep learning tool may help guide selection of surgical strategies for pGGN management.

Highlights

Key Finding
In the holdout test set, Lung-PNet achieved Dice scores for segmentation of pGGNs representing IAC and of pGGNs representing AAH, AIS, or MIA of 0.860 and 0.838. In the holdout test set, Lung-PNet achieved AUC ROC and PR AUC for pGGN classification (IAC vs other entities) of 0.911 and 0.842.
Importance
Lung-PNet is a noninvasive solution for automated segmentation and classification (IAC vs other entities) of pGGNs on chest CT with potential to inform surgical strategies.
Lung cancer is the leading cause of cancer-related death worldwide [1]. The use of chest CT for lung cancer screening has led to an increase in the detection of ground-glass nodules (GGNs) [2]. GGNs may represent a spectrum of pathologic entities, including atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), and invasive adenocarcinoma (IAC) [3, 4]. The 2021 WHO classification of lung tumors classifies AAH and AIS as adenomatous precursor lesions but classifies MIA and IAC as adenocarcinomas.
GGNs can be divided into pure GGNs (pGGNs) and part-solid nodules. Most pGGNs are AAH, AIS, or MIA. These entities have a 5-year disease-free survival rate after sublobar resection that approaches 100% and low probability of lymph node metastasis [5, 6]. Thus, sublobar resection without lymph node dissection, or even close follow-up, may be an appropriate treatment strategy [7, 8]. In comparison, for patients with IAC and pathologic stage IA disease, the 5-year disease-free survival is approximately 75% [9]. In these patients, lobectomy with lymph node dissection is considered the standard surgical treatment [10], and sublobar resection would not be appropriate. Thus, the ability to identify the small fraction of pGGNs that represent IAC would help guide clinical decision-making and selection of surgical strategies [11, 12].
Radiomic features help predict the pathologic diagnosis and prognosis of pGGNs [11, 1315]. However, the process of segmenting lesions and extracting features is complex and time-consuming, limiting the routine use of radiomics in clinical practice. Deep learning models help overcome these challenges and have been used for numerous tasks in medical image analysis [1618]. Compared with traditional radiomics methods, deep learning algorithms can automate lesion segmentation, feature extraction, and classification, enabling rapid and accurate analysis of large amounts of data [1921].
The aim of this study was to develop and validate an automated end-to-end deep learning model for the differentiation of pGGNs on chest CT representing IAC from those representing AAH, AIS, or MIA.

Methods

Patient Selection

This retrospective study received approval from the institutional review board of Peking University First Hospital, and the requirement for written informed consent was waived. The PACS and EHR were searched from January 2019 to June 2022 for patients with a surgically resected lung nodule that was found at histologic analysis to represent AAH, AIS, MIA, or IAC. This search identified 1861 patients. All of these patients had undergone preoperative chest CT. Patients were initially excluded for the following reasons related to the CT examinations: section thickness greater than 1.5 mm (n = 66) and CT performed more than 2 weeks before surgery (n = 59). Additional patients were excluded for the following reasons related to nodule characteristics: nodule was not a pGGN (i.e., had a solid component) (n = 826) and nodule diameter less than 5 mm or greater than 30 mm (n = 431). Finally, patients were excluded for the following additional clinical reasons: incomplete immunohistochemical assessment of nodule (n = 5); nodule represented a metastatic or recurrent tumor (n = 27); preoperative intervention (e.g., radiation, chemotherapy, or needle biopsy) (n = 45). These exclusions resulted in a final sample of 402 patients (283 women, 119 men; mean age, 53.2 years) with 448 pGGNs. The flow of patient selection is summarized in Figure 1.
Fig. 1 —Flowchart shows steps in patient selection. AAH = atypical adenomatous hyperplasia, AIS = adenocarcinoma in situ, MIA = minimally invasive adenocarcinoma, IAC = invasive adenocarcinoma, pGGNs = pure ground-glass nodules.

CT Examinations

CT examinations were performed without the use of contrast media with scanners from four manufacturers: GE Health-Care, Siemens Healthineers, Philips Healthcare, and Neusoft Medical Systems. The examinations were performed at tube voltages ranging from 100 to 120 kVp, a matrix size of 512 × 512, pitch between 0.95 and 1.375, and reconstruction slice thickness ranging from 0.625 to 1.25 mm. Examinations were performed with patients in the supine position during full inspiration. Scan coverage extended from the lung bases to the thoracic inlet. Table 1 summarizes the CT scanner manufacturers and model names.
TABLE 1: Manufacturers and Models of CT Scanners in Study
Characteristic Training Set (327 Nodules, 294 Patients) Internal Test Set (82 Nodules, 72 Patients) Holdout Test Set (39 Nodules, 36 Patients) All Nodules (448 Nodules, 402 patients) p
Manufacturer         .39
GE HealthCare 90 (30.6) 19 (26.4) 17 (47.2) 126 (31.3)  
Neusoft Medical Systems 3 (1.0) 0 (0.0) 0 (0.0) 3 (0.7)  
Philips Healthcare 106 (36.1) 23 (31.9) 13 (36.1) 142 (35.3)  
Siemens Healthineers 95 (32.3) 30 (41.7) 6 (16.7) 131 (32.6)  
Model         .59
Discovery CT (GE HealthCare) 1 (0.3) 0 (0.0) 0 (0.0) 1 (0.2)  
Discovery CT750 HD (GE HealthCare) 66 (22.4) 14 (19.4) 16 (44.4) 96 (23.9)  
iCT 256 (Philips Healthcare) 106 (36.1) 23 (31.9) 13 (36.1) 142 (35.3)  
LightSpeed VCT (Philips Healthcare) 21 (7.1) 4 (5.6) 0 (0.0) 25 (6.2)  
NeuViz Prime (Neusoft Medical Systems) 3 (1.0) 0 (0.0) 0 (0.0) 3 (0.7)  
Optima CT680 Expert (GE HealthCare) 2 (0.7) 1 (1.4) 1 (2.8) 4 (1.0)  
Somatom Definition Flash (Siemens Healthineers) 94 (32.0) 29 (40.3) 6 (16.7) 129 (32.1)  
Somatom Force (Siemens Healthineers) 1 (0.3) 1 (1.4) 0 (0) 2 (0.5)  

Note—Data are number of patients, with percentage in parentheses.

Pathologic Analysis

After surgical resection, specimens were fixed with formalin, and H and E staining and immunohistochemical analysis were performed. The pathologic analyses were performed clinically in a joint manner by two pathologists (nonauthors with 20 and 25 years of posttraining experience) using a multiheaded microscope. Each pGGN was classified as AAH, AIS, MIA, or IAC on the basis of the 2021 WHO Classification of Tumours Editorial Board guidelines for thoracic tumors [22]. The current investigation was conducted with the diagnoses from the clinical pathologic reports; the original slides were not rereviewed for this study.
In patients with multiple nodules, the resected nodules were routinely subjected to gene sequencing. The results of this genetic testing indicated in all instances that multiple nodules in individual patients exhibited no homology or correlation. Therefore, for all further analyses, nodules were considered to be independent, without consideration of intraindividual clustering effects.

Development of Deep Learning Model

Overview—The 3D deep learning neural network model developed in this study was labeled the Lung-PNet model. Model training and development were performed with nodules evaluated from January 2019 to December 2021. The nodules were divided randomly by 4:1 split into a training set and an internal test set; for this division, distinct nodules in individual patients were not required to be assigned to the same set. Nodules evaluated from January 2022 to June 2022 formed a holdout test set for assessing the performance of the model after completion of model training; no patients in the holdout test set overlapped with the training or internal test sets.
During the training stage, a fivefold cross-validation approach was adopted to maximize the training capabilities of the model. Specifically, five models were trained by use of different input data configurations. For each fold, a model was trained on four folds of training data, and the fifth fold was used for validation. This process resulted in five classification models for evaluation. During the testing stage, the mean of the models' predictions from the folds were used to represent the final prediction. The source code for the model is publicly available (github.com/XiaodongZhang-PKUFH/Lung-PNet.git).
Nodule annotation—For nodules in all three sets, a thoracic radiologist (K.Z., 10 years of posttraining experience) and a thoracic surgeon (H.L., 25 years of posttraining experience) manually traced volumes of interest (VOIs) along the boundaries of pGGNs on CT images in a slice-by-slice manner using software (ITK-SNAP version 3.6.0) [23] and a consensus process. These investigators were informed of the location of resected nodules but were blinded to other clinical and pathologic information. The VOIs covered the entire nodule in three dimensions while excluding large blood vessels and bronchial structures in the surrounding region. The VOIs were processed by custom software to calculate nodule diameter as the maximal 3D diameter nodule in any plane and nodule volume as the product of the number of voxels within the VOI and voxel size.
Image normalization—After nodule annotation and before further model training, several image preprocessing steps were performed, including image normalization with stochastic window normalization [24], VOI dilation, and image cropping, as described in Supplemental Methods.
Model architecture—The Lung-PNet framework was implemented with PyTorch (version 1.4.0) [25], fastai (version 2.1.10) [26], and faimed3d, running on an NVIDIA Tesla V100 GPU. The framework consisted of three key pipelines (one upstream task [image restoration] and two downstream tasks [nodule segmentation and nodule classification]). The three pipelines included custom implementations, each comprising three functional modules: an input module, a shared encoder module, and a decoder module. The image restoration pipeline and the segmentation pipeline had a shared encoder-decoder architecture, and all three pipelines (image restoration, segmentation, and classification) had a shared encoder architecture.
For the image restoration and segmentation pipelines, the shared encoder module was based on the R3D-18 model for feature extraction [27, 28], and the shared decoder module was a 3D-optimized modified version of the 2D U-Net [29, 30] segmentation model connected to the encoder module via skip connections. The image classification pipeline was built with the R3D-18 body architecture from the image restoration and segmentation pipelines as the shared encoder module. The ClassifierHead of the image classification pipeline was designed to comprise two concatenated 3D adaptive pooling layers (AdaptiveMaxPool3D and AdaptiveAvg-Pool3D) followed by fully connected linear layers that outputted classification probabilities for specified categories (Fig. 2).
Fig. 2A —Architecture framework of Lung-PNet model.
A, Chart shows three-phase pipeline featuring one upstream task (image restoration) and two downstream tasks (nodule segmentation and classification). Each pipeline contains input, encoder, and decoder modules. Encoder module is shared among pipelines. Initial parameters are transferred from pretrained weights of upstream task to downstream tasks. Figure generated with PlotNeuralNet. SSL = self-supervised learning, IAC = invasive adenocarcinoma.
Fig. 2B —Architecture framework of Lung-PNet model.
B, Chart shows key to symbols in A. NN = neural network, SWN = stochastic window normalization.
A three-step training workflow was used. In the first step, the image restoration pipeline was trained in a 3D self-supervised learning model [31] with random cropped VOIs from two open-source CT datasets: LUNA 2016 [32] and Medical Segmentation Decathlon [33]. This use of 3D image transformations for transfer learning in model initialization was intended to improve model accuracy and precision, particularly compared with traditional models constructed de novo. Once the image restoration pipeline was trained, shared modules were transferred to downstream modules and fine-tuned. Specifically, in the second step, the shared encoder-decoder module of the pretrained image restoration pipeline was transferred to the downstream segmentation pipeline and fine-tuned to boost performance. In the third step, the shared encoder module alone was transferred to the downstream classification pipeline and fine-tuned to improve performance. In the classification pipeline the cropped pGGN VOI was used as the input data; the features of the nodules were applied to improve classification performance.
The final Lung-PNet model took as input a cropped region containing the pGGN on the basis of a tight bounding box placed by a radiologist. The model then identified and segmented the pGGN and reported the probability that the pGGN represented IAC on a scale from 0% to 100%. For evaluation of Lung-PNet in the internal and holdout test sets in the current study, the bounding boxes were placed by the previously noted radiologist (K.Z.).

Evaluation of Model Segmentation Performance

The segmentation performance of the model was evaluated by use of the Dice coefficient, segmentation precision, and segmentation recall. The segmentations performed by the two previously noted investigators served as the reference standard for these metrics. The Dice coefficient was computed as 2 × (|X ∩ Y|) / (|X| + |Y|) (i.e., overall similarity between the two segmentations). Segmentation precision was computed as |X ∩ Y| / |X| (i.e., fraction of correctly segmented voxels among all voxels in the model's segmentation). Segmentation recall was computed as |X ∩ Y| / |Y| (i.e., percentage of correctly segmented voxels among all voxels in the reference-standard segmentation). In these formulas, X represents the model's segmentation, Y represents the reference-standard segmentation, the || operator indicates the number of voxels in a segmentation, and the ∩ operator indicates voxels present in both segmentations.

Human Observer Study

An observer study was performed by four human readers, including three radiologists (K.Z. [reader 1], the previously noted radiologist; J.W. [reader 2], 11 years of posttraining experience in chest CT interpretation; and J.L. [reader 3], 15 years of posttraining experience in chest CT interpretation) and one surgeon (W.H. [reader 4], 20 years of posttraining experience in management of thoracic surgical disease). The four readers independently reviewed all CT examinations in the internal and holdout test sets in a single combined session. They were aware of the location of resected nodules but blinded to clinical and pathologic details. The readers classified each nodule in a binary manner as representing IAC or as representing a diagnosis other than IAC (e.g., benign nodule, AAH, AIS, or MIA), according to NCCN guidelines for the evaluation of pulmonary nodules on screening CT scans [34].

Assessment of Class Activation Maps

The Lung-PNet model output class activation maps indicated the areas of the CT images that the model considered important in nodule classification. These maps were generated by means of the Grad-CAM technique [35]. This technique highlights regions in an input image that contribute most to a model's decision. The Grad-CAM process involves capturing the gradients of the output class (in this case, IAC) with respect to feature maps of the final convolutional layer within the model architecture. These gradients represent global average weights that reflect the importance of a feature map in contributing to the particular output. Subsequently, a linear combination of the feature maps and their associated weights is performed, resulting in a coarse heat map (i.e., the class activation map). For the present investigation, this heat map represented areas within the CT images that the Lung-PNet model perceived as most influential in decisions regarding the output IAC class. A radiologist (Y.Z., 15 years of posttraining experience in chest CT interpretation) qualitatively compared the heat maps between pGGNS representing IACs and those representing other entities.

Statistical Analysis

Patient and nodule characteristics were summarized in the entire sample and separately in the training set, internal test set, and holdout test set. Normally distributed continuous variables were reported as mean ± SD, and nonnormally distributed variables were presented as median with minimum and maximum. Categoric variables were expressed as frequencies. The normality assumption was assessed by Kolmogorov-Smirnov test. Characteristics were compared among groups using the chi-square test, Fisher exact test, Mann-Whitney U test, and Kruskal-Wallis test. The Dice coefficient, segmentation precision, and segmentation recall metrics were summarized separately for pGGNs representing IACs and for pGGNs representing other entities (AAH, AIS, or MIA) in both the internal test set and the holdout test set. The performance of the model for classifying pGGNs as IAC was assessed in the internal test set and the holdout test set by use of ROC AUC (i.e., AUC for plot of sensitivity versus 1 minus specificity) and PR AUC (i.e., AUC for plot of precision versus recall) [36]. PR AUC has an advantage over ROC AUC for indicating differences between models in the setting of datasets with imbalances in class frequencies.
The model's predictions of the presence of IAC converted to a binary classification by means of an a priori selected threshold of 50.0% or greater sensitivity [true-positives / (true-positives + false-negatives)], specificity [true-negatives / (true-negatives + false-positives)], accuracy [(true-positives + true-negatives) / (true-positives + true-negatives + false-positives + false-negatives)], and F1 score {true-positives / [(true-positives + 0.5 (false-positives + false-negatives)]} for prediction of IAC were computed in the internal test set and holdout test set for the four human observers and for the Lung-PNet model. Accuracy and F1 score were compared between each human observer and the model by McNemar test. Fleiss kappa coefficients were used to evaluate the agreement of the four observers' assessments and were classified as follows: less than 0.200, poor agreement; 0.200–0.399, fair agreement; 0.400–0.599, moderate agreement; 0.600–0.799, substantial agreement; 0.800 or greater, almost perfect agreement. A value of p less than .05 was considered statistically significant. Statistical analyses were conducted with R software (version 4.1.3).

Results

Patient and Nodule Characteristics

Table 2 summarizes characteristics of patients and nodules in the sample overall and in the various subsets. The final sample of 402 patients had 448 surgically resected pGGNs. Of the 402 patients, 42 had multiple pGGNs (39 patients with two pGGNs; three patients with three pGGNs). Of the 402 patients, 210 (52.2%) underwent wedge resection, 151 (37.6%) underwent segmental resection, and 41 (10.2%) underwent lobectomy. Of the 448 nodules, 29 (6.5%) were AAH, 83 (18.5%) were AIS, 235 (52.5%) were MIA, and 101 (22.5%) were IAC. Of 101 IACs, 34.7% were treated with wedge resection, 46.5% with segmental resection, and 18.8% with lobectomy. The training set comprised 327 pGGNs, the internal test set 82 pGGNs, and the holdout test set 39 pGGNs; 17 patients overlapped between the training set and internal test set. The three sets exhibited no significant differences in age, sex, or pathologic diagnosis (all p > .05).
TABLE 2: Clinical Characteristics of Study Sample
Characteristic Training Set (327 Nodules, 294 Patients) Internal Test Set (82 Nodules, 72 Patients) Holdout Test Set (39 Nodules, 36 Patients) All Patients (448 Nodules, 402 Patients) pa
Age (y)          
Mean ± SD 52.8 ± 11.3 54.3 ± 11.5 54.7 ± 11.8 53.2 ± 11.4 .78
Medianb 54.0 (18.0, 83.0) 54.0 (31.0, 82.0) 55.0 (27.0, 74.0) 54.0 (18.0, 83.0)  
Sex         .99
Female 207 (70.4) 51 (70.8) 25 (69.4) 283 (70.4)  
Male 87 (29.6) 21 (29.2) 11 (30.6) 119 (29.6)  
Pathology         .17
AAH 23 (7.0) 6 (7.3) 0 (0) 29 (6.5)  
AIS 65 (19.9) 15 (18.3) 3 (7.7) 83 (18.5)  
IAC 70 (21.4) 17 (20.7) 14 (35.9) 101 (22.5)  
MIA 169 (51.7) 44 (53.7) 22 (56.4) 235 (52.5)  
Pathology (binary classification)         .11
IAC 70 (21.4) 17 (20.7) 14 (35.9) 101 (22.5)  
Other diagnosis 257 (78.6) 65 (79.3) 25 (64.1) 347 (77.5)  
Nodule diameter (mm)          
Mean ± SD 12.0 ± 4.7 12.0 ± 6.6 12.2 ± 4.0 12.0 ± 5.01 .95
Medianb 11.0 (5.0, 35.0) 10.0 (4.0, 52.0) 11.0 (6.0, 20.0) 11.0 (4.0, 52.0)  
Nodule volume (mL)          
Mean ± SD 0.755 ± 0.914 0.715 ± 0.922 0.734 ± 0.630 0.746 ± 0.893 .93
Medianb 0.440 (0.070, 7.13) 0.355 (0.070, 5.56) 0.520 (0.150, 2.38) 0.430 (0.070, 7.13)  
Nodule location         .31
Left lower lobe 48 (14.7) 9 (11.0) 8 (20.5) 65 (14.5)  
Left upper lobe 71 (21.7) 22 (26.8) 8 (20.5) 101 (22.5)  
Right lower lobe 59 (18.0) 18 (22.0) 2 (5.1) 79 (17.6)  
Right middle lobe 26 (8.0) 7 (8.5) 2 (5.1) 35 (7.8)  
Right upper lobe 123 (37.6) 26 (31.7) 19 (48.7) 168 (37.5)  
Surgical approach         .71
Lobectomy 36 (11.0) 7 (8.5) 3 (7.7) 46 (10.3)  
Segmentectomy 121 (37.0) 27 (32.9) 17 (43.6) 165 (36.8)  
Wedge resection 170 (52.0) 48 (58.5) 19 (48.7) 237 (52.9)  

Note—Unless otherwise indicated, data are count (number of patients for sex; otherwise, number of nodules) with percentage in parentheses. AAH = atypical adenomatous hyperplasia, AIS = adenocarcinoma in situ, IAC = invasive adenocarcinoma, MIA = minimally invasive adenocarcinoma.

a
For comparison of three groups (training set, internal test set, and holdout test set).
b
Values in parentheses are minimum and maximum.

Segmentation Performance

Table 3 shows the metrics summarizing the segmentation performance of the model in the internal test set and holdout test set for IAC and for other lesions. In the holdout test set, the Dice coefficient, segmentation precision, and segmentation recall for IAC were 0.860, 0.869, and 0.847; and for other lesions were 0.838, 0.834, and 0.881. These metrics were not significantly different between IAC and other lesions in either the internal test set or the holdout test set (all p > .05).
TABLE 3: Segmentation Performance of Lung-PNet Model
Metric, Set, and Pathologic Diagnosis Value p
Dice coefficient    
Internal test set   .07
IAC 0.868 (0.848–0.900)  
Other 0.851 (0.823–0.869)  
Holdout test set   .57
IAC 0.860 (0.814–0.887)  
Other 0.838 (0.825–0.876)  
Segmentation precision    
Internal test set   .07
IAC 0.885 (0.856–0.915)  
Other 0.862 (0.799–0.903)  
Holdout test set   .08
IAC 0.869 (0.817–0.921)  
Other 0.834 (0.748–0.878)  
Segmentation recall    
Internal test set   .99
IAC 0.807 (0.706–0.834)  
Other 0.812 (0.705–0.852)  
Holdout test set   .08
IAC 0.847 (0.805–0.874)  
Other 0.881 (0.847–0.925)  

Note—Sample included 17 invasive adenocarcinomas (IACs) and 75 other lesions in internal test set and 14 IACs and 25 other lesions in holdout test set. Values in parentheses are 95% CIs.

Classification Performance

Figure 3 shows ROC curves for the classification performance of the Lung-PNet model in the training set, internal test set, and holdout test set. Tables 4 and 5 summarize the classification performance of Lung-PNet and the human observers in the internal test set and holdout test set. For prediction of IAC, Lung-PNet achieved an ROC AUC and PR AUC of 0.885 (95% CI, 0.845–0.917) and 0.712 (95% CI, 0.596–0.806) in the training set, 0.925 (95% CI, 0.845–0.971) and 0.810 (95% CI, 0.559–0.935) in the internal test set, and 0.911 (95% CI, 0.776–0.978) and 0.842 (95% CI, 0.559–0.957) in the holdout test set. At a threshold probability of 50.0% or greater for prediction of IAC, Lung-PNet had sensitivity, specificity, accuracy, and F1 score of 58.8%, 96.9%, 89.0%, and 69.0% in the internal test set and 50.0%, 92.0%, 76.9%, and 60.9% in the holdout test set.
Fig. 3 —Graph shows ROC AUCs of Lung-PNet model for predicting diagnosis of invasive adenocarcinoma in training, internal test, and holdout test sets. CV = cross-validation.
TABLE 4: Performance of Human Observers and of Lung-PNet Model in Classifying Nodules as Invasive Adenocarcinoma in Internal Test Set (82 Nodules)
Reader Sensitivity Specificity Accuracy pa F1 Score pb
1 70.6 (12/17) 72.3 (47/65) 72.0 (59/82) .003 51.1 (12/23.5) .001
2 76.5 (13/17) 78.5 (51/65) 78.0 (64/82) .0495 59.1 (13/22.0) .005
3 70.6 (12/17) 63.1 (41/65) 64.6 (53/82) < .001 45.3 (12/26.5) < .001
4 29.4 (5/17) 96.9 (63/65) 82.9 (68/82) .17 41.7 (5/12.0) .10
Lung-PNet 58.8 (10/17) 96.9 (63/65) 89.0 (73/82) NA 69.0 (10/14.5) NA

Note—Data are percentages with numbers of nodules in parentheses. NA = not applicable.

a
Comparison of accuracy with Lung-PNet model.
b
Comparison of F1 score with Lung-PNet model.
TABLE 5: Performance of Human Observers and of Lung-PNet Model in Classifying Nodules as Invasive Adenocarcinoma in Holdout Test Set (39 Nodules)
Reader Sensitivity Specificity Accuracy pa F1 Score pb
1 64.3 (9/14) 44.0 (11/25) 51.3 (20/39) .02 48.6 (9/18.5) .008
2 85.7 (12/14) 76.0 (19/25) 79.5 (31/39) .75 75.0 (12/16.0) .10
3 100.0 (14/14) 48.0 (12/25) 66.7 (26/39) .35 68.3 (14/20.5) < .001
4 28.6 (4/14) 96.0 (24/25) 71.8 (28/39) .48 42.1 (4/9.5) .18
Lung-PNet 50.0 (7/14) 92.0 (23/25) 76.9 (30/39) NA 60.9 (7/11.5) NA

Note—Data are percentages with numbers of nodules in parentheses. NA = not applicable.

a
Comparison of accuracy with Lung-PNet model.
b
Comparison of F1 score with Lung-PNet model.
Figure 4 shows two representative pGGNs, one histologically diagnosed as IAC and one histologically diagnosed as MIA. The nodules had similar visual appearance on CT but were both correctly classified by the Lung-PNet model on the basis of the applied threshold.
Fig. 4A —Pure ground-glass nodules (pGGNs) with similar appearance on CT.
A, 30-year-old woman with pGGN. Axial thin-section CT image shows right upper lobe pGGN (arrow). Histologic assessment after sublobar resection revealed nodule to represent invasive adenocarcinoma (IAC). Lung-PNet model yielded predicted 80.1% probability of IAC for nodule.
Fig. 4B —Pure ground-glass nodules (pGGNs) with similar appearance on CT.
B, 58-year-old woman with pGGN. Axial thin-section CT image shows right lower lobe pGGN (arrow). Histologic assessment after sublobar resection revealed nodule to represent minimally invasive adenocarcinoma. Lung-PNet model yielded 28.8% predicted probability of IAC for nodule.

Observer Study

In the observer study, accuracy and F1 score (with p values for comparison with the performance of Lung-PNet) in the internal test set were as follows: reader 1, 72.0% (p = .003) and 51.1% (p = .001); reader 2, 78.0% (p = .0495) and 59.1% (p = .005); reader 3, 64.6% (p < .001) and 45.3% (p < .001); and reader 4, 82.9% (p = .17) and 41.7% (p = .10). In the holdout test set, accuracy and F1 score (with p values for comparison with the performance of Lung-PNet) were as follows: reader 1, 51.3% (p = .02) and 48.6% (p = .008); reader 2, 79.5% (p = .75) and 75.0% (p = .10); reader 3, 66.7% (p = .35) and 68.3% (p < .001); and reader 4, 71.8% (p = .48) and 42.1% (p = .18). Agreement among the four observers for pGGN classification as IAC was fair (κ = 0.367).

Class Activation Maps

Figure 5 shows examples of the class activation maps generated by means of the Grad-CAM technique. The maps depicted differences between IACs and other lesions in terms of the areas of the images receiving greatest attention by the model. For IACs, the model's attention was focused on perinodular regions, whereas for other lesions, the model's attention was focused on intranodular regions.
Fig. 5A —Gradient-weighted class activation maps (Grad-CAM) showing areas of images that were most influential in Lung-PNet model's classifications.
A, 52-year-old man with pure ground-glass nodule (pGGN). Axial chest CT image shows pGGN in right upper lobe (red). Histologic assessment of surgically resected nodule revealed invasive adenocarcinoma.
Fig. 5B —Gradient-weighted class activation maps (Grad-CAM) showing areas of images that were most influential in Lung-PNet model's classifications.
B, 52-year-old man (same patient as in A). Sequential cropped CT images show radiologist-segmented region containing pGGN.
Fig. 5C —Gradient-weighted class activation maps (Grad-CAM) showing areas of images that were most influential in Lung-PNet model's classifications.
C, 52-year-old man (same patient as in A). Grad-CAM maps show areas of images that were most important in model's determination.
Fig. 5D —Gradient-weighted class activation maps (Grad-CAM) showing areas of images that were most influential in Lung-PNet model's classifications.
D, 63-year-old woman with pGGN. Axial chest CT image shows pGGN in right upper lobe (red). Histologic assessment of surgically resected nodule revealed minimally invasive adenocarcinoma.
Fig. 5E —Gradient-weighted class activation maps (Grad-CAM) showing areas of images that were most influential in Lung-PNet model's classifications.
E, 63-year-old woman (same patient as in D). Sequential cropped CT images show radiologist-segmented region containing pGGN.
Fig. 5F —Gradient-weighted class activation maps (Grad-CAM) showing areas of images that were most influential in Lung-PNet model's classifications.
F, 63-year-old woman (same patient as in D). Grad-CAM maps show areas of images that were most important in model's determination.

Discussion

In this study, we developed an end-to-end deep learning model to differentiate pGGNs on noncontrast CT images that represent IAC from those that represent AAH, AIS, or MIA. The model, Lung-PNet, performed favorably overall in comparison with human readers in this task. These results show the potential of the model as a noninvasive tool for predicting the pathologic diagnosis of pGGNs and thereby guiding the selection of surgical strategies.
Although sublobar resection without lymph node dissection is suitable for patients with AAH, AIS, or MIA, patients with IAC generally need more extensive surgery by lobectomy [8, 10]. Results of intraoperative frozen section analysis are often inconclusive, and postoperative immunohistochemical analysis is needed to establish the final pathologic diagnosis. Moreover, inadequate or excessive lung tissue resection can put patients at risk, indicating the potential benefit of accurate preoperative prediction of pathologic diagnosis for pGGNs.
In prior studies, morphologic characteristics and radiomics techniques have been used to differentiate pGGNs representing AAH, AIS, MIA, and IAC. Morphologic features, such as heterogeneity, lobulated margins, spiculation, coarse margin, bubble lucencies, air bronchogram, pleural indentation, and dilated or distorted vessels, have been found to assist in this characterization [3740]. For instance, Qi et al. [40] found that morphologic features had an AUC of 0.865 in differentiating IAC from non-IAC pathology in 255 pGGNs. However, pGGNs have intricate internal texture, and morphologic features based on visual assessment may reflect only a small portion of the information on images. Moreover, the assessment of these morphologic features can be biased by radiologists' subjective judgment.
Radiomics is a quantitative image analysis approach that addresses the limitations of traditional morphologic features by quantifying nodule characteristics to establish a relation between images and histology. In comparison with morphologic features, radiomics entails a quantitative features analysis that enables exploration of high-dimensional information. For instance, Hwang et al. [13] found in a study of 66 pGGNs that CT texture features, such as higher entropy and lower homogeneity, were significant distinguishing factors of IAC (AUC, 0.962; 95% CI, 0.883–0.994). Similarly, Xu et al. [41] developed radiomics models that had good predictive power for the differentiation of IAC from AIS or MIA in pGGNs (AUC, 0.833; 95% CI, 0.733–0.934). In another study, Sun et al. [11] developed a radiomics-based nomogram that served as a noninvasive marker for the assessment of invasiveness of pGGNs (AUC, 0.72; 95% CI, 0.63–0.81). Additionally, Jiang et al. [14] developed and validated a radiomics signature to identify IAC and MIA among pGGNs with pleural contact. Their approach involved extraction of 106 radiomics features, which had good discriminative performance (AUC, 0.862; sensitivity, 0.625; specificity, 0.800).
Although radiomics has advantages over traditional morphologic features, radiomic features may represent only low-order visual features or a small number of simple high-order features. Moreover, radiomics analysis entails time-consuming processes for segmentation and feature extraction. As traditionally performed, these processes are fixed across patients, lacking fine-tuning or other customization, thereby potentially disregarding important individual patient differences. Traditional radiomics methods have additional pitfalls, such as potential disregard of information near boundaries of segmented lesions. For such reasons, application of radiomics as a standard diagnostic tool in clinical practice has been challenging.
In contrast, deep learning algorithms can automate segmentation, feature extraction, and classification, allowing rapid and accurate analysis of large amounts of data. These algorithms can extract complex and subtle features from medical images beyond the capabilities of traditional radiomics methods, thereby improving model performance. Prior studies have applied deep learning algorithms to classify GGNs [4244], but application for evaluation of pGGNs has remained overall limited [41].
Despite their advantages, deep learning algorithms require vast amounts of high-quality training data to achieve high accuracy. Accumulation of such data may be challenging for models that evaluate uncommon conditions [37, 45, 46]. In this study, we used two techniques to improve the generalizability of the model. First, we used a stochastic window normalization method for image normalization. Second, we used a deep transfer learning method based on 3D self-supervised learning to enhance learning efficiency [31].
This study had limitations. First, data were collected from a single center, introducing bias and limiting the generalizability of the model. Further work should further train the model with larger and more diverse datasets from multiple centers and perform external validation. Second, this study included only patients who underwent surgical resection showing one of four primary lung pathologies and who did not have metastatic or recurrent tumors. This approach limits the applicability of the model to broader populations of patients with pGGNs representing a more heterogeneous range of entities. Third, the model was implemented in a retrospective manner. Further work is needed to explore implementation of the model in clinical practice. Such efforts must find solutions to integrate the model into existing clinical workflows while addressing concerns related to data privacy and patient consent. Fourth, all four readers were experienced in thoracic imaging or thoracic surgery. The performance of human observers may have differed among readers with less experience in thoracic imaging. Fifth, one radiologist participated in multiple study tasks (manual nodule segmentation, placement of bounding boxes before Lung-PNet analysis, subjective observer classification of diagnosis). Finally, the statistical analysis did not account for the presence of multiple nodules in individual patients; however, genetic sequencing supported the independent nature of the nodules.

Conclusion

Lung-PNet, a 3D end-to-end deep learning model, offers a novel noninvasive solution for evaluating pGGNs on routine chest CT images. In the current study, the model exhibited robust performance for pGGN segmentation and classification (i.e., differentiation of IAC from other entities). Although requiring external validation, the model has the potential to guide management and treatment decisions for patients with pGGNs.

Footnotes

Supported by the Seed Fund of Peking University First Hospital (2018SF078) and Youth Clinical Research Project of Peking University First Hospital (2018CR25).
Provenance and review: Not solicited; externally peer reviewed.
Peer reviewers: Takuma Usuzaki, Tohoku University Hospital; He Ci, Sixth Affiliated Hospital of Guangzhou Medical University; additional individual(s) who chose not to disclose their identity.

Supplemental Content

File (23_29674_suppl.pdf)

References

1.
Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J Clin 2022; 72:7–33
2.
Aberle DR, Adams AM, Berg CD, et al.; National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011; 365:395–409
3.
Aoki T. Growth of pure ground-glass lung nodule detected at computed tomography. J Thorac Dis 2015; 7:E326–E328
4.
Collins J, Stern EJ. Ground-glass opacity at CT: the ABCs. AJR 1997; 169:355–367
5.
Tsuta K, Kawago M, Inoue E, et al. The utility of the proposed IASLC/ATS/ERS lung adenocarcinoma subtypes for disease prognosis and correlation of driver gene alterations. Lung Cancer 2013; 81:371–376
6.
Murakami S, Ito H, Tsubokawa N, et al. Prognostic value of the new IASLC/ATS/ERS classification of clinical stage IA lung adenocarcinoma. Lung Cancer 2015; 90:199–204
7.
Lederlin M, Puderbach M, Muley T, et al. Correlation of radio- and histo-morphological pattern of pulmonary adenocarcinoma. Eur Respir J 2013; 41:943–951
8.
Zhang Y, Ma X, Shen X, et al. Surgery for pre- and minimally invasive lung adenocarcinoma. J Thorac Cardiovasc Surg 2022; 163:456–464
9.
Zhang J, Wu J, Tan Q, Zhu L, Gao W. Why do pathological stage IA lung adenocarcinomas vary from prognosis? A clinicopathologic study of 176 patients with pathological stage IA lung adenocarcinoma based on the IASLC/ATS/ERS classification. J Thorac Oncol 2013; 8:1196–1202
10.
Tsutani Y, Miyata Y, Nakayama H, et al. Appropriate sublobar resection choice for ground glass opacity–dominant clinical stage IA lung adenocarcinoma: wedge resection or segmentectomy. Chest 2014; 145:66–71
11.
Sun Y, Li C, Jin L, et al. Radiomics for lung adenocarcinoma manifesting as pure ground-glass nodules: invasive prediction. Eur Radiol 2020; 30:3650–3659
12.
Zhao W, Yang J, Ni B, et al. Toward automatic prediction of EGFR mutation status in pulmonary adenocarcinoma with 3D deep learning. Cancer Med 2019; 8:3532–3543
13.
Hwang IP, Park CM, Park SJ, et al. Persistent pure ground-glass nodules larger than 5 mm: differentiation of invasive pulmonary adenocarcinomas from preinvasive lesions or minimally invasive adenocarcinomas using texture analysis. Invest Radiol 2015; 50:798–804
14.
Jiang Y, Che S, Ma S, et al. Radiomic signature based on CT imaging to distinguish invasive adenocarcinoma from minimally invasive adenocarcinoma in pure ground-glass nodules with pleural contact. Cancer Imaging 2021; 21:1
15.
Moon Y, Sung SW, Lee KY, Sim SB, Park JK. Pure ground-glass opacity on chest computed tomography: predictive factors for invasive adenocarcinoma. J Thorac Dis 2016; 8:1561–1570
16.
Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017; 42:60–88
17.
Sahiner B, Pezeshk A, Hadjiiski LM, et al. Deep learning in medical imaging and radiation therapy. Med Phys 2019; 46:e1–e36
18.
Mazurowski MA, Buda M, Saha A, Bashir MR. Deep learning in radiology: an overview of the concepts and a survey of the state of the art with focus on MRI. J Magn Reson Imaging 2019; 49:939–954
19.
Ye W, Gu W, Guo X, et al. Detection of pulmonary ground-glass opacity based on deep learning computer artificial intelligence. Biomed Eng Online 2019; 18:6
20.
Chan HP, Samala RK, Hadjiiski LM, Zhou C. Deep learning in medical image analysis. Adv Exp Med Biol 2020; 1213:3–21
21.
Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021; 13:152
22.
WHO Classification of Tumours Editorial Board. Thoracic tumours, 5th ed. International Agency for Research on Cancer, 2021
23.
Yushkevich PA, Piven J, Hazlett HC, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 2006; 31:1116–1128
24.
Huo Y, Tang Y, Chen Y, et al. Stochastic tissue window normalization of deep learning on computed tomography. J Med Imaging (Bellingham) 2019; 6:044005
25.
Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, eds. Advances in neural information processing systems 32 (NeurIPS 2019). Curran Associates, 2019:8024–8035
26.
Howard J, Gugger S. Fastai: a layered API for deep learning. Information 2020; 11:108
27.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016:770–778
28.
Du T, Wang H, Torresani L, Ray J, Lecun Y. A closer look at spatiotemporal convolutions for action recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2018:6450–6459
29.
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, eds. Medical image computing and computer-assisted intervention: MICCAI 2015. Springer, 2015:234–241
30.
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021; 18:203–211
31.
Zhou Z, Sodha V, Siddiquee MMR, et al. Models genesis: generic autodidactic models for 3D medical image analysis. Med Image Comput Comput Assist Interv 2019; 11767:384–393
32.
Setio AAA, Traverso A, de Bel T, et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Med Image Anal 2017; 42:1–13
33.
Antonelli M, Reinke A, Bakas S, et al. The Medical Segmentation Decathlon. Nat Commun 2022; 13:4128
34.
Wood DE, Kazerooni EA, Aberle D, et al. NCCN guidelines insights: lung cancer screening, version 1.2022. J Natl Compr Canc Netw 2022; 20:754–764
35.
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017:618–626
36.
Boyd K, Eng KH, Page CD. Area under the precision-recall curve: point estimates and confidence intervals. In: Blockeel H, Kersting K, Nijssen S, Železny F, eds. Machine learning and knowledge discovery in databases: ECML PKDD 2013. Springer, 2013:451–466
37.
Heidinger BH, Anderson KR, Nemec U, et al. Lung adenocarcinoma manifesting as pure ground-glass nodules: correlating CT size, volume, density, and roundness with histopathologic invasion and size. J Thorac Oncol 2017; 12:1288–1298
38.
Zhang T, Pu XH, Yuan M, et al. Histogram analysis combined with morphological characteristics to discriminate adenocarcinoma in situ or minimally invasive adenocarcinoma from invasive adenocarcinoma appearing as pure ground-glass nodule. Eur J Radiol 2019; 113:238–244
39.
Chu ZG, Li WJ, Fu BJ, Lv FJ. CT Characteristics for predicting invasiveness in pulmonary pure ground-glass nodules. AJR 2020; 215:351–358
40.
Qi L, Xue K, Li C, et al. Analysis of CT morphologic features and attenuation for differentiating among transient lesions, atypical adenomatous hyperplasia, adenocarcinoma in situ, minimally invasive and invasive adenocarcinoma presenting as pure ground-glass nodules. Sci Rep 2019; 9:14586
41.
Xu F, Zhu W, Shen Y, et al. Radiomic-based quantitative CT analysis of pure ground-glass nodules to predict the invasiveness of lung adenocarcinoma. Front Oncol 2020; 10:872
42.
Wang D, Zhang T, Li M, Bueno R, Jayender J. 3D deep learning based classification of pulmonary ground glass opacity nodules with automatic segmentation. Comput Med Imaging Graph 2021; 88:101814
43.
Gong J, Liu J, Hao W, et al. A deep residual learning network for predicting lung adenocarcinoma manifesting as ground-glass nodule on CT images. Eur Radiol 2020; 30:1847–1855
44.
Zhao W, Yang J, Sun Y, et al. 3D deep learning from CT scans predicts tumor invasiveness of subcentimeter pulmonary adenocarcinomas. Cancer Res 2018; 78:6881–6889
45.
Henschke CI, Yankelevitz DF, Mirtcheva R, McGuinness G, McCauley D, Miettinen OS; ELCAP Group. CT screening for lung cancer: frequency and significance of part-solid and nonsolid nodules. AJR 2002; 178:1053–1057
46.
Chang B, Hwang JH, Choi YH, et al. Natural history of pure ground-glass opacity lung nodules detected by low-dose CT scan. Chest 2013; 143:172–178

Information & Authors

Information

Published In

American Journal of Roentgenology
PubMed: 37493322

History

Submitted: May 24, 2023
Revision requested: June 5, 2023
Revision received: July 1, 2023
Accepted: July 18, 2023
Version of record online: July 26, 2023

Keywords

  1. CT
  2. deep learning
  3. invasive adenocarcinoma
  4. pure ground-glass nodule

Authors

Affiliations

Kang Qi, MD
Department of Thoracic Surgery, Peking University First Hospital, Beijing, China.
Kexin Wang, BD
School of Basic Medical Sciences, Capital Medical University, Beijing, China.
Xiaoying Wang, MD, PhD
Department of Radiology, Peking University First Hospital, 8 Xishiku St, Beijing 100034, China.
Yu-Dong Zhang, MD
Department of Radiology, First Affiliated Hospital of Nanjing Medical University, Nanjing, China.
Gang Lin, MD
Department of Thoracic Surgery, Peking University First Hospital, Beijing, China.
Xining Zhang, MD
Department of Thoracic Surgery, Peking University First Hospital, Beijing, China.
Haibo Liu, MD
Department of Thoracic Surgery, Peking University First Hospital, Beijing, China.
Weiming Huang, MD
Department of Thoracic Surgery, Peking University First Hospital, Beijing, China.
Jingyun Wu, MD
Department of Radiology, Peking University First Hospital, 8 Xishiku St, Beijing 100034, China.
Kai Zhao, MD
Department of Radiology, Peking University First Hospital, 8 Xishiku St, Beijing 100034, China.
Jing Liu, MD
Department of Radiology, Peking University First Hospital, 8 Xishiku St, Beijing 100034, China.
Jian Li, MD, PhD
Department of Thoracic Surgery, Peking University First Hospital, Beijing, China.
Xiaodong Zhang, PhD [email protected]
Department of Radiology, Peking University First Hospital, 8 Xishiku St, Beijing 100034, China.

Notes

Address correspondence to Xiaodong Zhang ([email protected]).
First published online: Jul 26, 2023
Version of record: Oct 25, 2023
The authors declare that there are no disclosures relevant to the subject matter of this article.

Metrics & Citations

Metrics

Citations

Export Citations

To download the citation to this article, select your reference manager software.

Articles citing this article

View Options

View options

PDF

View PDF

PDF Download

Download PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share on social media