The National Lung Screening Trial (NLST) proved that lung cancer screening CT is effective in reducing mortality from lung cancer [
1]. Further analyses of the NLST results and of other data show that lung cancer screening CT is cost-effective in improving patient life expectancy [
2]. However, to be truly cost-effective, lung cancer screening programs must limit the number of false-positive scans, which can lead to additional cost, anxiety, radiation, and adverse events from biopsy of benign nodules. To establish a standardized framework for nodule reporting and follow-up, the American College of Radiology developed Lung-RADS [
3]. Lung-RADS provides criteria for nodules to be deemed either benign (category 2), which allows a patient to return for an annual screening examination, or suspicious (categories 3 and 4), which requires the patient to undergo earlier follow-up.
In the initial version of Lung-RADS, only solid nodules measuring less than 6 mm in diameter were classified as category 2. Since the development of Lung-RADS, evidence has accumulated showing that certain pulmonary nodules can be confidently diagnosed as benign intrapulmonary lymph nodes [
4–
8] and, therefore, should also be classified as category 2. The typical features of intrapulmonary lymph nodes are solid attenuation; triangular, polygonal, or oval shape; and location along pleural surfaces [
6]. The current version of Lung-RADS (version 1.1 [v1.1]) applies these features for characterizing a nodule as an intrapulmonary lymph node, and thus classifying the nodule as category 2, only to perifissural nodules measuring less than 10 mm [
9]. Because all solid nodules less than 6 mm are classified as category 2, this recognition of intrapulmonary lymph nodes in Lung-RADS v1.1 specifically reduces false-positive results for nodules measuring at least 6 mm but under 10 mm (hereafter, intermediate-size nodules). Intermediate-size solid nodules with a typical shape are also likely to represent intrapulmonary lymph nodes when present in subpleural locations other than that currently reflected in Lung-RADS v1.1 [
7,
10,
11] (Chelala L, presented at the Radiological Society of North America [RSNA] 2021 annual meeting).
Although Lung-RADS is the dominant scheme used to determine follow-up of nodules detected on lung cancer screening in the United States, other countries have adopted different approaches. Notably, the Dutch-Belgian Randomized Lung Cancer Screening (NELSON) trial followed a scheme that primarily relies on nodule volume rather than the linear diameter measurements that are central in Lung-RADS [
12]. Though Lung-RADS v1.1 added nodule volume measurements as an alternative measure, standard radiology practice in the United States continues to rely on diameter measurements given the specialized software and additional time required for volumetric measurement. To date, comparisons between diameter- and volume-based nodule classification have yielded mixed results; some studies show slight advantages for volume measurement [
13,
14], whereas other studies show superiority of diameter measurements [
15,
16]. In the study by Silva et al. [
13], volumetric measurements allowed improved risk stratification and reduction in follow-up examinations compared with standard diameter-based Lung-RADS evaluation.
This background suggests a couple of strategies to potentially reduce false-positive results for lung cancer screening examinations: expanding the criteria to consider intermediate-size nodules as representing benign intrapulmonary lymph nodes and using volume-based, instead of diameter-based, measurements. To our knowledge, these strategies have not been evaluated in terms of their impact on the frequency of false-positive lung cancer screening results, and they have not been compared with each other. We therefore conducted this study to evaluate the impact of the proposed strategies for reducing false-positive results for intermediate-size nodules on lung cancer screening CT evaluated using Lung-RADS v1.1.
Methods
Patient Population
This retrospective HIPAA-compliant study solely used nonidentifiable patient data from the NLST that were accessed for purposes of secondary data analysis. Consent to access NLST data was obtained from the national Cancer Data Access System of the National Cancer Institute through a data transfer agreement with the National Cancer Institute. The study was approved by the institutional review board at our institution. The requirement for written informed patient consent was waived for this post hoc analysis.
In brief, the NLST was a randomized controlled trial of current and former smokers who were screened for lung cancer by either chest radiography or low-dose chest CT. The CT protocol included one baseline examination and two subsequent annual screening rounds. Patients were followed after the three annual examinations to evaluate for the development of lung cancer and the occurrence of death [
1]. For the purposes of the present study, only baseline CT examinations were evaluated.
The NLST study readers annotated nodules on each CT examination, recording nodule features including location (lobe and slice number), attenuation, margins, and long- and short-axis diameters. Only the dominant (i.e., largest) annotated nodule on each patient's baseline CT was considered for purposes of the current study. Patients were eligible for the current study if the dominant annotated nodule had a mean diameter (mean of the short- and long-axis diameters) of 6.0–9.5 mm, was noncalcified with solid attenuation, and had nonspiculated margins, based on the NLST study reader's annotations. Of the 25,844 patients in the CT arm of the NLST study, 1387 patients (with 1387 dominant nodules) met these criteria (
Fig. 1).
The NLST data indicate cancers that developed during follow-up but do not directly link nodules and cancer diagnoses. During follow-up, lung cancer developed in the same lobe as the dominant nodule in 38 of the 1387 patients. All 38 of these patients were selected for further evaluation in the present study. Of the remaining 1349 patients in whom cancer did not develop in the same lobe as the dominant nodule, 200 were randomly selected for further evaluation. The baseline examinations of the 238 patients (38 with and 200 without cancer in the same lobe as the dominant nodule) were initially reviewed by a board-certified subspecialty-trained thoracic radiologist (M.M.H., with 6 years of posttraining experience). Nodules were excluded based on this review for the following reasons: images not available or not viewable (malignant group,
n = 0; benign group,
n = 4); no nodule found corresponding to the NLST-annotated nodule (malignant group,
n = 1; benign group,
n = 3); nodule deemed to represent atelectasis (malignant group,
n = 0; benign group,
n = 1); nodule was not solid (malignant group,
n = 5 [part-solid in three, cystic in two]; benign group,
n = 1 [ground-glass]) (
Fig. 1). At the time of the initial review, 11 nodules (all in the benign group) were found to have been annotated with the incorrect lobe by the NLST study reader; for these cases, the correct lobe was recorded for the purpose of further analysis. The exclusions resulted in 223 remaining nodules, which underwent independent evaluation by two readers, as described in the Image Review section.
After completion of the independent imaging review, the earlier noted investigator (M.M.H.) performed a subsequent review of certain patients, unblinded to the details of the patients' follow-up results. In this final session, the radiologist reviewed the nodules in patients who developed cancer to assess whether the cancer corresponded with the dominant nodule versus a different nodule in that lobe (e.g., a nodule that developed later). The nodule and the subsequent cancer corresponded in 26 patients. In six patients, the cancer was attributed to a different nodule in the lobe with the dominant nodule; these six nodules were reassigned to the benign nodule group. Also in this final session, the radiologist reviewed the nodules in the benign group that had been assigned the incorrect lobe by the NLST study reader to assess whether a cancer later developed in the correct lobe. In one patient, a cancer later developed in the correct lobe, which corresponded with the dominant nodule; this nodule was reassigned to the malignant nodule group. During this final session, nodules were only reassigned between groups; no additional nodules were excluded.
This process resulted in a final study sample of 223 dominant nodules in 223 patients (median age, 62 years; 143 men, 80 women; 196 with benign nodules, 27 with malignant nodules).
Image Review
Two investigators (the previously noted investigator [M.M.H.] and a board-certified, subspecialty-trained thoracic radiologist [A.R.H.] with 28 years of posttraining experience) independently reviewed the baseline CT examinations in 223 patients in random order, blinded to whether nodules were benign or malignant, using advanced visualization software (syngo.via, version VB40, Siemens Healthineers). The readers identified the dominant nodule in each patient based on the nodule locations in the NLST annotations. The readers measured each nodule using a semiautomated segmentation tool available in the advanced visualization software. After the radiologist marks anywhere within a nodule, the software automatically segments the nodule's margins, which the radiologist can manually adjust. The software then reports the nodule's volume as well as its short- and long-axis diameters on the slice with the nodule's greatest long-axis diameter. These parameters were recorded, and the mean diameter (mean of the short- and long-axis diameters) was calculated. Because the nodule diameters used for subsequent analysis were based on the measurements obtained using the automated software, the mean diameter may not have been within the range of 6.0–9.5 mm that was used for initial nodule selection based on the NLST annotations. The radiologists also recorded whether the nodule directly abutted a fissure (i.e., perifissural location) and whether the nodule directly abutted the costal or mediastinal pleural surface (i.e., other subpleural location). Finally, the radiologists assessed in a binary fashion whether or not the nodule's shape was triangular, polygonal, or oval. Nodules could be considered to have triangular, polygonal, or oval shape only if they had smooth margins. On the basis of the readers' assessments, nodules were classified as exhibiting lymph node characteristics if they had a perifissural or other subpleural location and a triangular, polygonal, or oval shape. Data were entered in REDCap (version 12, REDCap Consortium) [
17].
Lung-RADS Category Assignments
On the basis of the radiologists' assessments (including nodules' mean diameter from the semiautomated measurement) and the algorithm in the Lung-RADS v1.1 document [
9], Lung-RADS categories were generated for each nodule. Per Lung-RADS for diameter-based assessment, all nodules less than 6 mm as well as perifissural nodules less than 10 mm with triangular, polygonal, or ovoid shape were assigned category 2. Additional Lung-RADS categories were generated for each patient using three different schemes: classifying nodules with other subpleural locations and with a mean diameter less than 10 mm and triangular, polygonal, or ovoid shape as category 2 (reflecting a potential modification to Lung-RADS v1.1); using the current Lung-RADS v1.1 volume cutoffs (i.e., nodules classified as category 2 if under 0.113 mL or under 0.524 mL with perifissural location); and incorporating both the expanded criteria for category 2 and the use of current volume cutoffs (i.e., incorporating both of the prior considerations). For the purposes of analysis, Lung-RADS category 2 was considered negative, and category 3 or higher was considered positive. Nodules were also classified as negative or positive based on the NELSON trial algorithm [
12], which considers nodules negative if they have a volume less than 0.05 mL or if they have a perifissural or other subpleural location and a short-axis diameter less than 5 mm.
Extrapolation to NLST Cohort
Because the study sample only included intermediate-size nodules, the observed specificities of Lung-RADS v1.1 were not anticipated to represent the overall specificity of Lung-RADS v1.1 in the full NLST cohort. Specifically, of the 25,844 patients in the CT arm of the NLST, 19,164 (74%) had no nodules (thus assigned category 1), and 3838 (15%) had nodules less than 6 mm (thus assigned category 2). Because NLST predated Lung-RADS v1.1, nodules in the NLST were not assigned category 2 if measuring at least 6 mm but under 10 mm and perifissural in location. Thus, the anticipated impact on the overall NLST cohort of assigning category 2 for nodules likely representing lymph nodes was evaluated using the readers' initially assigned Lung-RADS v1.1 categories and each of the three additionally generated Lung-RADS v1.1 categories for each patient. This assessment of impact on the NLST was performed by extrapolating the number of nodules assigned category 2 among the 196 included patients with benign nodules to the sample of 1349 patients in the NLST without cancer, from whom the 196 patients had been randomly selected (i.e., presuming 6.9 additional nodules assigned category 2 in the NLST for each category 2 nodule in the current study's sample). Then, the overall percentage of patients in the NLST cohort who would have been assigned category 2 was calculated using the extrapolated value.
Statistical Analysis
Data were summarized descriptively using counts and percentages. Categoric variables were compared using Fisher exact test, and continuous variables were compared using the Wilcox-on rank sum test. Interobserver agreement was calculated for diameter and volume measurements using intraclass correlation coefficients and for binary variables (including Lung-RADS categories dichotomized as negative or positive) using Cohen kappa coefficients. Sensitivities and specificities were compared between the different classification schemes for each reader using the McNemar test. A p value less than .05 was considered statistically significant. Statistical analysis was conducted in JMP Pro (version 16, SAS Institute).
Results
Patient and Nodule Characteristics
Table 1 summarizes characteristics of benign and malignant nodules. Patients with malignant nodules were significantly older than patients with benign nodules (median age, 66 vs 62 years;
p = .04). The two groups were not significantly different in terms of sex distribution (
p = .36). The mean nodule diameters ranged from 6.0 to 9.5 mm using the NLST annotations, 3.5 to 16.0 mm using the software for reader 1, and 2.5 to 15.0 mm using the software for reader 2. Median nodule size was significantly larger for malignant than benign nodules based on mean diameter (7.5 vs 7.0 mm for NLST,
p = .002; 9.0 vs 7.0 mm for reader 1,
p < .001; 9.0 vs 7.0 mm for reader 2,
p < .001) and volume (0.30 vs 0.16 mL for reader 1,
p < .001; 0.33 vs 0.17 mL for reader 2,
p < .001).
The frequency of perifissural location was not significantly different between benign and malignant nodules for either reader 1 (19% vs 7%, p = .07) or reader 2 (13% vs 4%, p = .14). The frequency of other subpleural location was significantly higher for benign than malignant nodules for both reader 1 (28% vs 11%, p = .04) and reader 2 (31% vs 11%, p = .02). The frequency of triangular, polygonal, or oval shape was significantly higher for benign than malignant nodules for both reader 1 (89% vs 26%, p < .001) and reader 2 (48% vs 22%, p = .01). The frequency of lymph node characteristics was significantly higher for benign than malignant nodules for both reader 1 (43% vs 0%, p < .001) and reader 2 (37% vs 7%, p < .001).
Diagnostic Performance of Different Classification Schemes
Table 2 shows the sensitivity and specificity of the various classification schemes for lung cancer, defining category 2 as negative and category 3 or higher as positive. Lung-RADS v1.1 based on standard diameter measurements had a sensitivity and specificity of 93% (25/27) and 31% (60/196) for reader 1 and 89% (24/27) and 26% (51/196) for reader 2, respectively. Two cancers were classified as category 2 by both readers. These cancers were classified as category 2 because of mean diameter less than 6 mm (4.5 and 5.0 mm for reader 1, 4.5 and 5.0 mm for reader 2); neither cancer was classified by either reader as having a perifissural or other subpleural location or as having a triangular, polygonal, or oval shape.
Figure 2 shows one of these cancers. One additional cancer (
Fig. 3) was assessed by reader 1 as having a mean diameter of 8.5 mm and perifissural location but not as having a triangular, polygonal, or ovoid shape (category 3); reader 2 assessed the cancer as having a mean diameter of 8.5 mm, perifissural location, and a triangular, polygonal, or oval shape (category 2).
Figure 4 shows a benign intermediate-size nodule assigned category 2 by both readers on the basis of size, perifissural location, and triangular, polygonal, or ovoid shape. Given the 51–60 nodules assigned category 2 based on perifissural location and size less than 10 mm, it was extrapolated that 351–413 additional nodules would have been assigned category 2 in the NLST, increasing the total number of category 2 nodules in the NLST from 3838 to 4189–4251 (i.e., from 15% to 16% of all trial patients).
A modification of Lung-RADS v1.1 that also included other subpleural nodules (i.e., nodules abutting the costal or mediastinal pleura) less than 10 mm as category 2 had no significant difference compared with standard Lung-RADS v1.1 for sensitivity (93% [25/27] for reader 1,
p > .99; 85% [23/27] for reader 2,
p = .32) but had significantly higher specificity (51% [100/196] for reader 1,
p < .001; 47% [93/196] for reader 2,
p < .001).
Figure 5 shows a benign intermediate-size nodule that was assigned category 3 by both readers using standard Lung-RADS v1.1 but was assigned category 2 by both readers on the basis of the modification because of size, other subpleural location, and triangular, polygonal, or ovoid shape. Compared with malignant nodules categorized using Lung-RADS v1.1, no additional malignant nodule was assigned category 2 by reader 1, and one additional malignant nodule was assigned category 2 by reader 2 when the modification of Lung-RADS v1.1 was applied; reader 2 assessed this nodule as having a mean diameter of 8.0 mm, other subpleural location, and a triangular, polygonal, or oval shape (
Fig. 6). Given the 93–100 nodules assigned category 2 based on perifissural or other subpleural location and size less than 10 mm, it was extrapolated that a total of 640–688 additional nodules would have been assigned category 2 in the NLST, increasing the total number of category 2 nodules in the NLST from 3838 to 4478–4526 (i.e., from 15% to 17% of all trial patients).
Compared with Lung-RADS v1.1 using standard diame ter-based cutoffs, Lung-RADS v1.1 using existing volume cutoffs did not have a significantly different sensitivity (93% [25/27] for reader 1, p > .99; 89% [24/27] for reader 2, p > .99) but did have a significantly higher specificity (37% [72/196] for reader 1, p = .007; 37% [72/196] for reader 2, p < .001). Compared with Lung-RADS v1.1, no additional malignant nodule was assigned category 2 by either reader when volume-based Lung-RADS v1.1 was used. Given the 72 nodules assigned category 2 based on Lung-RADS v1.1 using volume measurements, it was extrapolated that a total of 496 additional nodules would have been assigned category 2 in the NLST, increasing the total number of category 2 nodules in the NLST from 3838 to 4334 (i.e., from 15% to 17% of all trial patients).
Compared with the Lung-RADS v1.1 modification that included other subpleural nodules and diameter measurements, a modification of Lung-RADS v1.1 incorporating both inclusion of other subpleural nodules in category 2 and volume measurements had no significant difference for sensitivity (93% [25/27] for reader 1, p > .99; 85% [23/27] for reader 2, p > .99) but had a significantly higher specificity (59% [116/196] for reader 1, p = .001; 58% [113/196] for reader 2, p < .001). Given the 113–116 nodules assigned category 2 using both subpleural location and volume measurements, it was extrapolated that a total of 778–798 additional nodules would have been assigned category 2 in the NLST, increasing the total number of category 2 nodules in the NLST from 3838 to 4616–4636 (i.e., from 15% to 18% of all trial patients).
The NELSON criteria showed no significant difference compared with standard Lung-RADS v1.1 for sensitivity (93% [25/27] for reader 1, p > .99; 93% [25/27] for reader 2, p = .32) but had significantly lower specificity (12% [24/196] for reader 1, p < .001; 14% [27/196] for reader 2, p < .001).
Interobserver Agreement
Interobserver agreement, expressed as intraclass correlation coefficient, was 0.87 for mean diameter and 0.96 for volume. Interobserver agreement, expressed as kappa coefficient, was 0.77 for perifissural location, 0.81 for other subpleural location, 0.20 for shape, and 0.73 for lymph node characteristics. Interobserver agreement, expressed as kappa coefficient, was 0.70 for Lung-RADS v1.1, 0.68 for Lung-RADS v1.1 including other subpleural nodules less than 10 mm as category 2, 0.83 for Lung-RADS v1.1 using volume thresholds, 0.81 for Lung-RADS v1.1 using both the expanded criteria for category 2 and volume thresholds, and 0.76 for NELSON criteria.
Discussion
This study used a subset of intermediate-size solid nodules from the NLST to evaluate the impact of strategies for decreasing the frequency of false-positive interpretations relating to nodules likely representing benign intrapulmonary lymph nodes. Assigning Lung-RADS category 2 for all subpleural nodules less than 10 mm (rather than only perifissural nodules less than 10 mm) that have a characteristic shape significantly increased the specificity of Lung-RADS without a significant loss in sensitivity. The use of current Lung-RADS v1.1 volume-based thresholds, instead of standard diameter-based thresholds, for assigning category 2 also significantly increased specificity without a significant loss in sensitivity. Combining both of these strategies maximized specificity, still without significant loss in sensitivity. The specificity of the combination of strategies in the present study sample corresponds with an anticipated increase in frequency of category 2 nodules from 15% to 18% of patients in the entire NLST cohort. Adoption of these strategies could therefore help prevent unnecessary 6-month follow-up CT examinations.
The findings support previous studies that found subpleural nodules with typical characteristics of intrapulmonary lymph nodes were almost always benign [
5,
10,
11] (Chelala L, RSNA 2021 annual meeting). Nonetheless, caution is required during such evaluation. In the present study, when Lung-RADS v1.1 was used, one malignant intermediate-size perifissural nodule was classified as category 2 by a single reader on the basis of perceived triangular, polygonal, or ovoid shape. When the explored modification of Lung-RADS v1.1 was used, another malignant intermediate-size nodule with other subpleural location was also classified as category 2 by a single reader on the basis of perceived triangular, polygonal, or ovoid shape. Such cases highlight the importance of being rigorous when assessing the shape of intermediate-size nodules before considering such nodules to likely represent benign intrapulmonary lymph nodes warranting category 2 assignment.
A paucity of studies have compared diameter-based with volume-based criteria for triaging pulmonary nodules, and which method, if either, is better is currently considered inconclusive. The available studies include an analysis of subsolid nodules from the NLST that found the volume-based NELSON algorithm was not superior to Lung-RADS for cancer diagnosis [
15], as well as an analysis that found the NELSON algorithm performed worse than Lung-RADS in evaluating nodules detected on follow-up lung cancer screening CT examinations [
16]. However, in the current study, volume-based criteria increased the number of benign nodules assigned category 2, significantly improving the specificity of Lung-RADS without a significant loss in sensitivity. This benefit may reflect the relatively flat nature of many benign nodules, particularly intrapulmonary lymph nodes, such that the nodule's actual volume is smaller than that of a sphere, corresponding to its diameter [
18]. In comparison, malignant nodules typically exhibit relatively isotropic growth, resulting in a larger volume for a given diameter.
Software for volumetric nodule measurement is not widely available, and its use is time-consuming. It is likely not necessary to measure the volume of all solid nodules—indeed, nodules that already meet the criteria for Lung-RADS category 2 using diameter-based measurement do not need volumetric measurement. However, in solid nodules that do not already meet the criteria for category 2 assignment, it may be beneficial to measure the nodule's volume to determine whether the nodule can be downgraded to category 2, thereby avoiding a follow-up examination. Interobserver agreement was also higher for volume-based than diameter-based measurement, indicating an additional benefit of the method.
This study has a number of limitations. The primary limitation is that the NLST data do not specify direct correspondence between nodules and subsequent cancer diagnoses. However, baseline and follow-up imaging examinations were reviewed for all malignant nodules to assess whether the cancer could be attributed to a different nodule, including any nodules that may have developed during follow-up. Second, the reference standard for benign nodules was imaging and clinical follow-up. It is unlikely that any cancers were missed given the multiple years of follow-up after baseline screening in all patients. Third, baseline imaging was evaluated by two subspecialist thoracic radiologists; the performance of assessment for characteristics of intrapulmonary lymph nodes by general radiologists also needs to be studied. Fourth, although all malignant intermediate-size nodules were assessed, only a random sample of the benign intermediate-size nodules were evaluated. Finally, the impact of the proposed strategies for reducing follow-up examinations when applied prospectively for real-world clinical cases remains unknown.
In conclusion, the classification of all intermediate-size nodules with perifissural or other subpleural location and with triangular, polygonal, or ovoid shape (suggestive of intrapulmonary lymph nodes) as Lung-RADS category 2 would substantially decrease false-positive lung cancer screening examination results without a significant reduction in sensitivity for lung cancer. Use of volume-based rather than diameter-based criteria would also substantially increase the number of nodules classified as category 2 without a significant reduction in sensitivity. The combination of the two methods achieved maximal specificity. Radiologists may consider using volumetric measurement for solid nodules that do not already meet the criteria for category 2 assignment. The described strategies could help reduce recommendations for unnecessary 6-month follow-up CT examinations. Prospective trials would help confirm these findings.