Introduction

In December 2019, a succession of cases of pneumonia with unknown causes appeared in Wuhan, Hubei Province, China. On January 7, 2020, the 2019 novel coronavirus (with the virus officially named by the World Health Organization as SARS-CoV-2) was identified as the causative agent based on virus typing [1, 2]. Recent studies have revealed that the SARS-CoV-2 is more closely related to bat-SL-CoV ZC45 and bat-SL-CoV ZXC21 [2] and shows human-to-human spread mainly through respiratory droplets, aerosol, contact, and the oral-fecal route [3]. At present, the infection has spread across China and other countries around the world [4,5,6].

The National Health Commission of China [7, 8] formulated the Diagnosis and Treatment Program of 2019 New Coronavirus Pneumonia (trial seventh version) based on the recommendations of the World Health Organization (WHO) on severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) [9,10,11]. In the trial seventh version, the clinical type was classified into mild, moderate, severe, and critical types according to the clinical manifestations (Table 1). The mild type was defined as mild clinical symptoms with negative imaging findings, and the critical type was defined as respiratory failure, shock, and “extrapulmonary” organ failure. The patients with critical COVID-19 pneumonia were hard to undergo CT scan in intensive care unit. Thus, positive CT findings were often found in patients with moderate and severe disease. Patients with mild and moderate disease have a good prognosis. However, when COVID-19 pneumonia develops to severe and critical levels, pulmonary edema, respiratory failure, shock, and multiple organ failure can eventually cause death. Therefore, it is very important to accurately and easily classify the CT findings and administer rapid clinical interventions accordingly.

Table 1 The clinical classification of COVID-19 pneumonia from the 7th edition of the National Commission of China classification

Numerous CT findings almost always co-exist in all clinical types. Thus, multiple CT findings may be more strongly associated with clinical types than single CT findings. Latent class analysis (LCA) has recently been tested and found to be useful for identifying latent classes (subgroups) of CT findings within multivariable datasets. However, a classification of individual patients with COVID-19 pneumonia based purely on CT characteristics has not been presented so far. Hence, the primary objective of our study was to investigate whether meaningful subgroups sharing the CT features of patients with COVID-19 pneumonia could be identified using latent class analysis (LCA) and explore the relationship between the LCA-derived subgroups and clinical types (moderate and severe types).

Methods

Patients

This retrospective cross-sectional study was reviewed and approved by the Biomedical Research Ethics Committee of our institution, and the requirement for patient consent was waived.

The inclusion criteria were (1) the availability of a positive reverse-transcription polymerase chain reaction (RT-PCR) tests confirming the viral origin of the pneumonia and (2) the availability of a chest CT at the time of diagnosis. The exclusion criteria were (1) patients who had not been confirmed by RT-PCR tests, (2) patients who had not undergone pulmonary HRCT, (3) normal lung parenchyma on chest CT, (4) presence of non-infectious lung parenchyma lesions on chest CT (e.g., lung cancer, pneumothorax, pulmonary edema), and (5) a delay between chest CT and confirmation of the clinical type longer than 3 days. Finally, we retrospectively identified 499 consecutive patients with COVID-19 pneumonia at Wuhan Huoshenshan Hospital between February 11 and March 8, 2020. All clinical results were extracted from the patients’ electronic medical records in the hospital information system (Fig. 1).

Fig. 1
figure 1

Flowchart visualizing the patient selection process

CT scanning

Pulmonary CT was performed using 128-slice multidetector row CT scanners (uCT 760, United Imaging Healthcare, respectively). CT scans were obtained with the following parameters: 120 kV, adaptive tube current, beam collimation of 128 × 0.6 mm, and a 512 × 512 matrix. A non-enhanced CT was performed. The slice thickness was 0.625 mm, respectively. Images were captured at window settings that allowed viewing of the lung parenchyma (window level, − 600 to − 700 HU; window width, 1200–1500 HU) and the mediastinum (window level, 20–40 HU; window width, 400 HU). The scanning range was from the level of the superior aperture of thorax to the diaphragm.

Radiological imaging analysis

We used the original cross-sectional images for analysis. All the images were analyzed by two abdominal radiologists (X.L. and X.F., with 8 years of experience each) who were blinded to the clinical details; the final results were determined by consensus.

All lesions were evaluated for the following characteristics: (a) pulmonary infection volume: total, left, and right pulmonary infection volume, which was calculated by artificial intelligence software (uAI-Discover-NCP R001, United Imaging Healthcare); (b) location: right, left, or bilateral; (c) distribution: peripheral, central, or diffuse distribution; (d) attenuation: ground glass attenuation including ground glass opacity (GGO), crazy-paving pattern, consolidation, or mixed pattern [12, 13]; (e) maximum lesion range: ≤ 5 cm, 5–10 cm, or > 10 cm only for the biggest one; (f) involvement of lobes; (g) number of lesions: 1, 2, 3, or more; (h) air bronchogram; (i) dilatation of vessels; (j) hilar lymphadenopathy: short-axis diameter of lymph node > 10 mm [14]; and (k) pleural effusion.

Statistical analyses

Normal distribution and variance homogeneity tests were performed on all continuous variables; those with a normal distribution were expressed as the mean and standard deviation while those with non-normal distributions were expressed as medians and ranges. First, LCA was performed using R software supplemented with the mclust package. LCA is a statistical method in which a set of multivariate data are used to identify groups of related subjects (“latent classes”) within data that share similar characteristics. The Akaike information criteria (AIC) were used to identify the optimum number of classes in the model as the number yielding the lowest AIC value [15]. Second, all patients were divided into LCA-derived subgroups. We examined subgroup differences in all variables. Kruskal-Wallis H test (skewed distribution) and the chi-squared test (categorical variables) were used to determine the statistical differences among the subgroups. Third, univariate regression analysis was applied to estimate effect sizes for the relationships between all variables and LCA-derived subgroups. Finally, multivariable logistic models were used to evaluate the associations between exposure (clinical characteristics) and outcome (LCA-derived subgroups). These models included model 1 (not adjusted for other covariates), model 2 (adjusted for age, sex, and body mass index [BMI]), and model 3 (adjusted for the same factors as model 2 as well as for other significantly associated clinical and imaging characteristics in univariate regression analysis). Moderate of the clinical types was considered as the reference group.

A two-tailed p value less than 0.05 was considered statistically significant. All analyses were performed using R software (version 3.6.1, The R Foundation for Statistical Computing).

Results

Identification of latent classes

Two latent classes were identified using the AIC. There were 228 subjects (45.69%) in class 1 and 271 (54.31%) in class 2. The CT features of each of the latent classes are shown in Table 2 and Figs. 2 and 3. The CT findings of class 1 were smaller pulmonary infection volume, more peripheral distribution, more GGO, more maximum lesion range ≤ 5 cm, a smaller number of lesions, less involvement of lobes, less air bronchogram, less dilatation of vessels, less hilar and mediastinal lymph node enlargement, and less pleural effusion than the CT findings of class 2.

Table 2 CT findings of latent classes identified
Fig. 2
figure 2

CT findings (continuous variables) of latent classes identified

Fig. 3
figure 3

CT findings (categorical variables) of lung segments of latent classes identified

Demographic and clinical differences

The demographic and clinical characteristics of the LCA-derived subgroups are shown in Table 2. Among the characteristics that we investigated, age, outcomes, fever, hypertension, lymphocyte count, and C-reactive protein (CRP) level were significantly different (Table 3).

Table 3 Clinical characteristics of latent classes analysis

Univariate analysis of LCA-derived subgroups and clinical characteristics

The univariate analysis results are shown in Table 4, demonstrating that older age (p < 0.0001), therapy of outcomes (p < 0.0001), presence of fever (p = 0.001), presence of hypertension (p = 0.006), decreased lymphocyte count (p = 0.014), and increased CRP levels (p < 0.0001) were significant parameters with an increased risk for class 2 defined by LCA (Fig. 4).

Table 4 The result of univariate analysis
Fig. 4
figure 4

ac Chest CT findings of a 34-year-old man with moderate COVID-19 pneumonia a CT image of lung parenchyma showed multi-focal crazy-paving pattern and consolidation peripherally distributed in the superior lobes of both lungs. b The lesions were automatically labeled by artificial intelligence software. c Three-dimensional volume-rendered reconstruction showed the extent of crazy-paving pattern and consolidation with scattered pattern. df Chest CT of a 68-year-old man with severe COVID-19 pneumonia. d CT image of lung parenchyma showed multi-focal GGO and consolidation diffusely distributed in the middle and inferior lobes of the right lung and of inferior lobe of the left lung. e The lesions were automatically labeled by artificial intelligence software. f Three-dimensional volume-rendered reconstruction showed the extent of the crazy-paving pattern and consolidation with a scattered pattern. GGO, ground glass opacity

Multivariate analyses of the LCA-derived subgroups and clinical types

Multivariable logistic models were used to evaluate the associations between exposure (clinical types) and outcome (the LCA-derived subgroups). In the crude model (model 1), clinical types were correlated with the LCA-derived subgroups (odds ratio [OR] 3.49, 95% confidence interval [CI] 2.26–5.39], p < 0.0001). In the minimally adjusted model (adjusted for age, sex, and BMI) (model 2), the effect size also showed a significant correlation (OR 3.01, 95% CI 1.92–4.70, p < 0.0001). After further adjustment for outcome, fever, hypertension, lymphocyte count, and CRP levels, significance was still identified in the fully adjusted model (model 3) (OR 1.97, 95% CI 1.09–3.54, p = 0.025). The results of multivariate analysis are shown in Table 5.

Table 5 Relationship between the clinical types and LCA-derived subgroups

Discussion

The present study aimed to investigate meaningful subgroups sharing the CT features of patients with COVID-19 pneumonia that could be identified using LCA, and to explore the relationship between the LCA-derived subgroups and clinical types. Two latent classes were identified using LCA. Furthermore, in the fully adjusted model (model 3), the LCA-derived subgroups were significantly associated with clinical types.

In the current study, we found that class 1 (median 4154.55, range 1943.48–7600.00 cm3) was significantly larger than class 2 (median 3367.14, range 1386.08–8059.42 cm3) in the total pulmonary volume. The current results are consistent with the findings of the previous studies. Iwasawa et al. [16] found smaller CT lung volume in severe cases was observed, and ultra-high-resolution CT showed that secondary lobes in the crazy-paving pattern were smaller than in unaffected lungs. Wu et al. [17] reported that these lesions frequently pulled the adjacent pleura. Allbarello et al. [18] found decreased normal lung volume in COVID-19 pneumonia patients with acute respiratory distress syndrome. These results indicated that pulmonary fibrosis destroyed the alveoli leading to the local volume loss, which was more common in severe than moderate of COVID-19 pneumonia.

A few studies have explored the relationship between CT and clinical characteristics and disease severity. Wu et al. [17] used the pulmonary inflammation index (PII) value to evaluate the relationship between CT findings and clinical features, and found that the PII value was significantly correlated with the lymphocyte count, monocyte count, C-reactive protein level, procalcitonin level, days from illness onset, and body temperature. Li et al. [19] found that the severe/critical patients were older and showed higher incidence of comorbidities, cough, expectoration, chest pain, and dyspnea, and the incidences of consolidation, linear opacities, crazy-paving pattern, and bronchial wall thickening were higher in severe/critical patients, while the incidences of lymph node enlargement, pericardial effusion, and pleural effusion were significantly higher than those in ordinary patients. Xiong et al. [20] found the CRP level, erythrocyte sedimentation rate, and lactate dehydrogenase level showed a significantly positive correlation with the severity of pneumonia assessed on initial CT, and the highest temperature and the severity of opacifications assessed on initial CT were significantly related to the progression of opacifications on follow-up CT. Although these studies explored some CT and clinical characteristics related to disease severity, they could not accurately and easily classify the CT findings.

LCA is a statistical method that takes advantage of unobserved, or latent, classes in the data that can be used to determine diagnostic performance characteristics [21]. LCA has recently been tested and found to be useful for identifying latent classes (subgroups) of radiological findings within multivariable datasets [22,23,24]. In this study, two radiological subgroups were identified based on the LCA of CT findings of patients with COVID-19 pneumonia. We found that peripheral distribution, GGO, maximum lesion range ≤ 5 cm, involvement of 1–4 lobes, no air bronchogram, and no dilatation of vessels were more commonly in class 1 corresponding to the clinically moderate type. In contrast, diffuse distribution, mixture with GGO, crazy-paving pattern, consolidation, maximum lesion range > 10 cm, involvement of 5 lobes, air bronchogram, and dilatation of vessels were more common in the class 2 corresponding to the clinically severe type. These CT characteristics are consistent with the findings of previous studies [25,26,27,28,29,30,31].

In the current study, noteworthy differences in clinical characteristics were observed between the LCA-derived subgroups. Older age, death, fever, hypertension, decreased lymphocyte count, and increased CRP levels were more common in class 2. In the univariate analysis, we found that the older age, therapy of outcomes, presence of fever, presence of hypertension, decreased lymphocyte count, and increased CRP levels were parameters significantly associated with an increased risk for class 2. To explore the relationship between the LCA-derived subgroups and clinically moderate and severe types, multivariable logistic models were used. In the crude model, minimally adjusted model, and fully adjusted model, significant associations were all found between the LCA-derived subgroups and clinical moderate and severe types. The patients with clinical severe type had a 1.97-fold higher risk of class 2 compared with the patients with clinical moderate type.

Our study had several limitations. First, it was a retrospective single-center study; second, interpretation of CT images was performed by consensus. Finally, we only included the clinical moderate and severe patients, which would lead to a potential inclusion bias.

Conclusions

Two radiological subgroups were identified based on the LCA of CT findings of the patients with COVID-19. The demographic and clinical differences between the identified subgroups were significantly different. The two radiological subgroups were significantly associated with clinical moderate and severe types.