Introduction

Since December 2019, a new respiratory disease related to a new coronavirus, SARS-CoV-2, developed in China and rapidly spread to other countries, reaching a pandemic stage in March 2020 [1, 2]. Even if the disease follows a benign course in many cases, some patients develop respiratory difficulties requiring hospitalization, leading to a large amount of patients with clinical suspicion of coronavirus disease 2019 (COVID-19) presenting to the emergency departments [3]. Accurate identification of COVID-19 patients is crucial to isolate them from not infected patients and to limit the diffusion of the outbreak. The reference standard is the positivity of the real-time reverse transcription-polymerase chain reaction (RT-PCR); nevertheless, the sensitivity of this test remains unclear, having been reported between 42 and 71% in some early series [4,5,6], because of suboptimal sampling technique, limitations in performance assay, or low viral load in the nasopharyngeal area. Chest CT shows abnormalities in a large majority of cases, with some signs described as typical or very evocative of the disease in the current outbreak context [7,8,9,10,11]. Sensitivity of chest CT has been reported as high as 97% as compared to RT-PCR [4] and CT abnormalities could precede RT-PCR positivity [12]. Because it is readily available, chest CT may assist first-line triage of patients presenting to hospital [3]. Several radiology societies [6, 13,14,15] have proposed structured reporting of CT into categories, defined according to the typical or less typical appearance of lung involvement, to facilitate communication with physicians. In routine practice, categorization is based on each reader individual impression supported by numerous papers having described imaging signs of COVID-19 pneumonia [7,8,9,10,11]. Such categorization may directly impact the clinical decision-making. However, the reproducibility of the categorization is unknown and the clinical significance of the different categories is unclear. Thus, the objectives of our study were to assess interobserver agreement to categorize CT findings as well as performances of chest CT across the different categories in patients suspected of COVID-19 presenting to hospital.

Material and methods

Setting

This is a monocentric retrospective study conducted in a University Hospital (Bichat Claude-Bernard Hospital, Paris, France) between March 16, 2020, and March 24, 2020. Institutional review board was approved and written informed consent waived. During this period of COVID-19 outbreak, patients presenting at our hospital for COVID-19 suspicion and for whom hospitalization was considered had both chest CT scan and SARS-CoV-2 RT-PCR. Diagnosis of COVID-19 relied on the positivity of the RT-PCR and CT could assist early triage in critically ill patients or with clinically overt pneumonia. Patients with a negative RT-PCR result could have a subsequent RT-PCR test and/or another chest CT during the next few days, depending on the physician’s judgment.

Patients

Consecutive adult patients attending the emergency room or the infectious diseases department of our hospital with clinical suspicion of COVID-19 and having both chest CT and SARS-CoV-2 RT-PCR were included.

Clinical and laboratory data collection

Demographic, clinical and laboratory data at presentation, and follow-up data when available were extracted from electronic medical records. Clinical data included symptoms, any need for oxygen supply, time from symptom onset to CT, comorbidities, and pre-existing pulmonary diseases. RT-PCR was performed on nasopharyngeal swabs or aspiration, using RealStar® SARS-CoV-2 RT-PCR Kit (Altona Diagnostics) or Cobas® SARS-CoV-2 Test (Roche).

Chest CT protocols

Chest CT scans were acquired on a multidetector-row CT (Aquilion One Genesis or Prime, Canon Medical Systems Corporation) without contrast medium injection. They were performed in the supine position at full inspiration. The scanning parameters were as follows: 120 kVp, automatic exposure control for tube current (SD:15), exposure time 0.27–0.35 s per rotation depending on the CT unit, collimation 40 mm. Images were reconstructed with 1-mm slice thickness and 0.8-mm inter-slice gap, using a high-frequency reconstruction algorithm.

Image analysis

All CT scans were analyzed by 8 readers, including 2 senior emergency physicians (T.P., D.B., with 10 years of experience each), 2 radiology residents (H.T., N.C., 4 and 5 years of experience), 2 senior general radiologists (L.M., E.M., 6 and 9 years of experience), and 2 senior thoracic radiologists (M.P.D., A.K., 23 and 25 years of experience), blindly to RT-PCR results and final diagnosis. All readers classified each examination into one out of 4 categories, as follows: evocative, compatible, not evocative of COVID-19, and normal following the recommendations of the French Society of Radiology [14]. The global impression of each reader was supported by previous typical and less typical reported signs in the literature. A guide was provided, recalling these signs. The “evocative” category included multifocal ground-glass opacities (GGO), being nodular or not, or crazy-paving with or without consolidations, with a bilateral, peripheral, or mixed distribution and involvement of the posterior zones. The intermediate category “compatible” corresponded to cases showing abnormalities already reported in COVID-19 but that may be encountered in other diseases or very limited in extent. It included GGO and/or consolidations with very few lesions and unilateral distribution, exclusive central distribution or absence of posterior lung area involvement, halo sign as main abnormality, and association of typical opacities, with atypical signs or other lesions. The category “not evocative” corresponded to cases showing abnormalities very rarely reported in COVID-19 or typical of another diagnosis as isolated systematized consolidation, discrete centrilobular nodules with tree-in-bud appearance or lung cavitation in favor of other lung infection, centrally distributed GGO with septal lines and pleural effusion in favor of cardiogenic pulmonary edema, peripheral reticulations with or without honeycombing, traction bronchiectasis, and GGO in favor of interstitial lung disease. This category also included non-specific abnormalities as sub-segmental atelectasis or opacities considered to be sequelae. Because the distinction of such minor non-specific or typically sequellar abnormalities from normal parenchyma may have little clinical relevance in the present study, findings were analyzed according both to the 4 categories and to the 3 categories that resulted from merging of “not evocative” and “normal.” Any disagreement between the 4 senior radiologists was analyzed in consensus of these 4 readers giving a final consensus categorization for all cases.

Finally, all chest CTs were described by one thoracic radiologist (M.P.D.) for presence and distribution of various elementary signs, as well as signs of any underlying pulmonary disease (significant pulmonary emphysema, interstitial lung disease, bronchiectasis, parenchymal sequelae, bronchial carcinoma).

Statistical analysis

Categorical variables were described by numbers and percentage for each category. The agreement between two and more than two readers was evaluated with the Cohen’s kappa coefficient and the Fleiss’ kappa, respectively, and their 95% confidence interval, which measures the excess proportion of agreement after taking chance into account. Interobserver agreement was considered poor for a kappa value < 0.20, fair for 0.21–0.40, moderate for 0.41–0.60, good for 0.61–0.80, and very good for 0.81–1.00. Comparisons between dependent kappas (e.g., for different couple of readers for the same images) were performed with bootstrapping (N = 10000 samples) and the p value corresponds to the proportion of bootstrap samples that yield a couple of kappa value in a different order than the observed one. Comparisons between independent kappa (e.g., for different levels of a categorical variable) were performed according to the method proposed in [16].

Comparison of the frequency of radiologic signs between categories was performed with the fisher exact test. All analysis were done using R v4.0.2.

Results

General description of the population

In total, 241 patients were included. Their demographic, signs at presentation, and comorbidities are in Table 1. COVID-19 was confirmed in 158 patients by RT-PCR positivity. COVID-19 was deemed likely despite two negative RT-PCR in 2 cases with clinical and CT follow-up strongly supportive of this diagnosis. Fifteen patients were considered non-COVID-19 because of at least 2 consecutive negative RT-PCR and absence of clinical and radiological signs favoring COVID-19 during follow-up. Sixty-six patients were considered non-COVID-19 with only one negative RT-PCR but including 38 with CT and/or clinical follow-up (Fig. 1).

Table 1 Clinical characteristics of 241 included patients with clinical suspicion of COVID-19
Fig. 1
figure 1

Flow chart of the study. SARS-CoV-2 RT PCR and chest CT were performed at presentation, except in 7 cases for which RT-PCR had been performed 1 to 3 days earlier. Among 158 COVID-19 confirmed cases, RT-PCR was positive at admission in 151 cases and in subsequent days during hospitalization in 7 cases.*Clinical and chest CT follow-up. **Absence of follow-up available because the patient had been transferred to another hospital or turned back home

Interobserver agreement for chest CT categorization

Kappa coefficient between all readers across the 4 CT categories was good (0.61, 95% CI 0.60–0.63). It was moderate to good for each pair of readers, significantly better between any pair of radiologists as compared to emergency physicians (p < 0.001) (Table 2). The kappa value between all readers was lower when abnormalities were unilateral as compared to bilateral lesions (p = 0.002) and in the presence of underlying pulmonary lesions (p < 0.001). It was lower for patients older as compared to those younger than 70 years (p ≤ 0.001), and when time from symptom onset to CT was shorter than 5 days (p ≤ 0.001). Observer agreement was very good between all readers for the category “evocative” (0.81, 95% CI 0.79–0.83), fair for the category “compatible” (0.32, 95% CI 0.29–0.34), moderate for each category “not evocative” and “normal” (0.56, 95% CI 0.54–0.58; and 0.58, 95% CI 0.56–0.61, respectively), and good (0.74, 95% CI 0.71–0.76) for chest CTs classified either not evocative or normal (Table 3).

Table 2 Chest CT categorization of the 8 readers and consensus reading, with observer agreement between pairs and all readers
Table 3 Interobserver agreement for each category and correlation to SARS-CoV-2 RT-PCR results

Correlation of RT-PCR and categories on chest CT

The RT-PCR positivity rate was highly significantly different among the 4 categories (p < 0.0001): 119 out of 123 (97%), 15 out of 30 (50%), 22 out of 70 (31%), and 2 out of 18 (11%) cases considered evocative, compatible, not evocative, and normal by the consensus reading had a positive RT-PCR, respectively (Table 3). The rate of RT-PCR positivity was 27% (24 out of 88) among CTs categorized either not evocative or normal.

With RT-PCR as reference, chest CT classified evocative had 75% sensitivity (95% CI 68–81%) and 95% specificity (95% CI 87–98%) whereas chest CT classified evocative or compatible had 85% sensitivity (95% CI 78–90%) and 77% specificity (95% CI 66–85%) for COVID-19. Sensitivity for evocative CT was significantly lower for patients with a delay since symptom onset lower than 5 days (68% 95% CI [60–78%] vs 83% 95% CI [73–91%], p = 0.038) and for patients with an underlying pulmonary disease (36% 95% CI [21–54%] vs 87% 95% CI [80–92%], p < 0.001), but the difference was not significant between younger and older patients (age > 70, p = 0.23).

Among 90 patients with a first negative RT-PCR, 22 were re-tested, including 5 out of 6 patients (83%) with CT considered evocative, 5 out of 17 patients (29%) with CT considered compatible, and 12 out of 67 patients (18%) with CT considered not evocative or normal. Of these subsequent RT-PCR, 2 out of 5 were positive for each category “evocative” and “compatible” and 3 out of 12 for the third category.

Clinical and chest CT findings in the different categories

CT features of the whole population, of RT-PCR positive cases, and of the different CT categories are in Table 4. The most frequent pattern of CT considered evocative was mixed with predominant GGO. Typical bilateral and peripheral distribution with posterior involvement was almost constant. Some centrally distributed lesions were associated to peripheral lesions in 72% of cases (Fig. 2). The 30 chest CTs considered compatible more frequently showed pure GGO as compared to evocative cases (p = 0.0012). Among these 30 cases, 12 and 6 showed features of an underlying pulmonary disease and of an associated pulmonary edema, respectively (Figs. 3, 4). As compared with cases classified “not evocative” (Figs. 5, 6), cases classified compatible more frequently showed a typical distribution and atypical signs were absent among those with positive RT-PCR. Time from symptom onset to CT was longer, patients were older, and need for oxygen supply was more frequent in patients whose CT was categorized evocative as compared to other patients.

Table 4 CT features in all cases, SARS-CoV-2 RT-PCR positive cases and in the different CT categories
Fig. 2
figure 2

Chest CT scan categorized “evocative of COVID-19 pneumonia” in two different patients with positive nasopharyngeal SARS-CoV-2 RT-PCR at admission and secondarly (panels a and b and panels c and e, respectively). Multifocal bilateral ground-glass opacities with subpleural and posterior predominance, associated with band-like (panel b) or more extensive consolidations (panel d)

Fig. 3
figure 3

Chest CT scan categorized “compatible with COVID-19 pneumonia,” in association with fibrosing interstitial lung disease (ILD) showing a non-specific interstitial pneumonia pattern (panel c, axial plane through the lung bases). Pure ground-glass opacities, both centrally and peripherally distributed, have appeared in the left upper lobe (panels a, b), as compared to the previous CT performed 4 months earlier (panel d). Such new opacities could be attributable to COVID-19, another infection, or acutisation of ILD. Nasopharyngeal SARS-CoV-2 RT-PCR positive

Fig. 4
figure 4

Chest CT scan categorized “compatible with with COVID-19 pneumonia,” in association with pulmonary edema, manifesting as ground-glass opacities with a predominant central distribution, septal lines, and bilateral pleural effusion (panels a, b, axial plane; panel c, coronal plane) in a patient with history of chronic renal insufficiency on dialysis. Subpleural consolidation (panel a) in the posterior zone of the right upper lobe is consistent with associated COVID-19 pneumonia. Nasopharyngeal SARS-CoV-2 RT-PCR positive

Fig. 5
figure 5

Chest CT scan categorized “not evocative of COVID-19 peumonia,” showing small consolidations in the left lower lobe associated with bronchial thickening and endobronchial filling, in favor of bronchopneumonia. Nasopharyngeal SARS-CoV-2 RT-PCR negative

Fig. 6
figure 6

Chest CT scan categorized “not evocative of COVID-19 pneumonia” showing combination of areas of ground-glass opacity and consolidation systematized in the middle lobe (arrows), evoquing a lobar pneumonia. Despite positivity of the SARS-CoV-2-RT-PCR, bacterial co-infection was suspected and the patient received antibiotics with favorable evolution

Discussion

The current study describes observer agreement for chest CT reporting in a large series of consecutive patients suspected of COVID-19. We found that categorization of CT reports was reproducible and meaningful in patients considered for hospitalization. With SARS-CoV-2 RT-PCR as reference, chest CT reported “evocative of COVID-19 pneumonia” was highly predictive of the disease in this population during the outbreak and agreement for this category between observers of various experiences and sub-specialties was very good. The positivity rate of RT-PCR was highly significantly different among the categories, suggesting that CT may help in disease likelihood stratification. It should be emphasized that almost a third of patients with chest CT classified “not evocative” had a positive RT-PCR, highlighting that no CT pattern can rule out COVID-19 in an epidemic context. As previously reported [4, 17], RT-PCR may also be positive in patients without lung abnormalities on CT.

We observed only fair observer’s agreement for the category “compatible.” This may be explained by the method used to classify patients. Thus, categorization of chest CT was mainly based on the global impression of each reader, according to the routine practice in our hospital, where more than thousand chest CTs have been performed to date in COVID-19 patients. The guide we provided to all observers before readings to help case classification partly relied on interpretation of findings which may vary according to each reader experience. Cases classified “compatible” encompassed only 12% of all, i.e., 30 cases at all. More than half of them showed features of an underlying pulmonary disease or of a mixed pattern with some opacities that could be attributable to pulmonary edema. Such mixed features complicate interpretation and categorization of findings and lower the reader’s confidence. Indeed we observed significant lower observer agreement in cases showing an underlying pulmonary disease. These mixed and complex cases are part of the routine practice.

The recent Radiological Society of North America proposal for CT findings related to COVID-19 includes 4 categories: typical, indeterminate, atypical appearance, or CT negative for pneumonia [6], the “indeterminate” category appearing similar to the “compatible” category we used in the present study. Terms and categories proposed by other radiology societies vary, according to whether or not they individualize a normal category, and to the intermediate category being named either “compatible” or “indeterminate” [13,14,15]. We herein retained 4 categories, including a normal category, following the recommendations of the French Society of Radiology but also performed an analysis on 3 categories, resulting from merging of a normal appearance of the lung parenchyma with non-specific lung abnormalities or features suggesting an alternative diagnosis, in accordance with the guidelines of the European Society of Radiology. The surprisingly only moderate interobserver agreement we observed for the category “normal” may be explained by minor disagreements between minor non-specific abnormalities, as plate-like atelectasis or even dependent-induced opacities and strictly normal findings.

By showing significant differences in the RT-PCR positivity rate among CT categories, our study supports that chest CT can participate in estimating the likelihood of COVID-19, in association with contact history, clinical presentation, and prevalence of the disease in the population [18]. The role of chest CT for patients suspected of COVID-19 is not completely established. Despite limitations in sensitivity and result delays, the RT-PCR remains the diagnostic reference and chest CT is not recommended for screening by most radiology societies [6, 15, 19, 20]. According to a recent consensus statement from the Fleischner Society [20], imaging may be indicated for diagnosis when RT-PCR is negative or unavailable in patients having risk factors for worsening or moderate-to-severe respiratory signs. In our study, most patients who had chest CT at the emergency room had indeed either moderate or severe clinical features or comorbidities. Chest CT helped addressing or transferring some patients showing a chest CT evocative of COVID-19 into the proper COVID-19 hospitalization area, especially those needing urgent decision, before the RT-PCR result was provided. However, none of the other categories could rule out the disease, even the “normal” category whose RT-PCR positivity rate was 11%. Chest CT could favor re-testing in cases with negative RT-PCR [12]. Patients with a first negative RT-PCR and a chest CT considered “compatible” tended to be more frequently re-tested and to have a subsequent PCR more frequently positive, as compared to patients with a first negative RT-PCR and a chest CT considered “not evocative.”

To date, performances of chest CT have been analyzed according to a binary consideration, i.e., CT positive or negative for COVID-19 pneumonia, using RT-PCR as reference. Most studies have reported high sensitivity, up to 97% [4, 5, 21], and specificity between 25 and 56%, with pooled sensitivity and specificity of 94% and 37%, respectively, according to a recent meta-analysis [22]. Whether positive CTs in these studies showed typical imaging features is unclear. Our results differ, chest CT classified evocative having 75% sensitivity and 95% specificity and chest CT classified evocative or compatible having 85% sensitivity and 77% specificity. These differences may be attributable to differences in CT features between an “evocative” CT and a “positive” CT as well as differences in characteristics of the population having chest CT. The prevalence of the disease in the population, severity and type of clinical presentation, time from symptom to CT, age of patients, and any underlying pulmonary pathology may modify the performances of chest CT for diagnosing COVID-19 pneumonia [20, 23]. Indeed we observed that the presence of an underlying pulmonary disease lowered the sensitivity for an evocative CT. This concerned a quarter of the whole population in our study and almost a quarter of the patients with positive RT-PCR. CT reporting in several categories seems best suited to the routine practice than a binary conclusion, when CT abnormalities may mix different types of lesions or are very limited in extent. It allows identifying a category with typical features, whose high specificity can be useful in an epidemic context, allowing relying on chest CT for diagnosis in some cases.

Our study has some limitations. Firstly, because it is monocentric and because of various presentation and prevalence of the disease around the world, caution should be taken to extrapolate CT predictive values, which vary according to the disease prevalence, to other populations and periods [24, 25]. We may assume that the high positive predictive value we report for the category “evocative” would be lower if the prevalence of COVID-19 decreased and the one of other viral pneumonia or some interstitial lung diseases, as drugs or connective tissue diseases related, increased. On the contrary, in a very low prevalence context, we may expect the negative predictive value of chest CT classified “not evocative” or “normal” would be very high and CT would be useful for triage of negative cases. Secondly, the clinical significance of CT reporting should integrate the risk level of each patient that we have not precisely taken into account, even if the study period took place during the outbreak. Thirdly, the performances of chest CT have been evaluated in comparison to the results of the RT-PCR, whose sensitivity is imperfect. Two patients with repeated negative tests but typical clinical and CT features, driving a likely diagnosis of COVID-19 although remaining uncertain, were merged in the analysis with other unequivocal negative RT-PCR cases.

In conclusion, interobserver agreement to report chest CT findings into categories for clinical suspicion of COVID-19 is good, among readers of various experience levels and sub-specialties. Chest CT can participate in estimating the likelihood of COVID-19 in patients presenting to hospital during the outbreak. CT categorized evocative of COVID-19 pneumonia were highly predictive of the disease, whereas the predictive value of CT decreased between the categories “compatible”, “not evocative” and “normal,” from 50 to 11%. Category reports need to be integrated to the clinical presentation and risk level for COVID-19.