Volume 14, Issue 16 p. 4059-4066
Research Article
Free Access

Ensemble learning-based COVID-19 detection by feature boosting in chest X-ray images

Kamini Upadhyay

Corresponding Author

Kamini Upadhyay

Centre for Applied Research in Electronics, IIT Delhi, New Delhi, India

Search for more papers by this author
Monika Agrawal

Monika Agrawal

Centre for Applied Research in Electronics, IIT Delhi, New Delhi, India

Search for more papers by this author
Desh Deepak

Desh Deepak

Department of Respiratory Medicine, Dr. Ram Manohar Lohia Hospital, New Delhi, India

Search for more papers by this author
First published: 18 February 2021
Citations: 5

Abstract

The novel coronavirus has spread quite rapidly across the globe. The current testing rate is failing to match the exponential rate of rising cases. Moreover, the available testing methodologies are expensive and time-consuming. A sensitive automated diagnosis is one of the biggest need of the hour. In the proposed work, the authors analyse the chest X-ray images of normal, pneumonia and coronavirus disease-2019 (COVID-19) patients and process them to boost the COVID-specific features (opacities etc.), which enable to perform sensitive identification of COVID-19 patients. The sets of original and processed images are used with a stack of pre-trained deep models for ensemble learning. They used VGG-16 as base-learners, trained with a diverse set of inputs followed by a logistic regression model, the meta learner, to combine the base-learner predictions. The proposed fusion-based model is trained and tested for three types of classification, TYPE-I: binary (NORMAL/ABNORMAL), TYPE-II: binary (PNEUMONIA/COVID-19) and TYPE-III: multi-class (NORMAL/PNEUMONIA/COVID-19). The diagnosis results are quite promising, with high accuracy and sensitivity values for all the cases. The proposed algorithm can be used to assist the medical experts for quick identification and isolation of COVID-19 patients and thereby mitigating the effect of the virus.

1 Introduction

In December 2019, an unknown, highly contagious respiratory illness started from Wuhan, China and within a quick span of a few months, it spread worldwide [[1]]. On 11 March 2020, World Health Organisation [[2]] declared this severe acute respiratory syndrome coronavirus 2 (termed as coronavirus disease-2019, i.e. COVID-19), a pandemic due to its exponential rate of rising cases. Presently, the world is going through a severe crisis, where people are suffering, dying and we await for vaccine to date. Apart from the development of a vaccine, the next best effort to mitigate the effect of the virus would be to increase the rate of testing. Conducting a large number of tests can help in isolating the COVID-positive cases from non-infected people, and thus the transmission-chain of the virus can be broken. Currently, the most commonly accepted COVID-19 diagnostic method is a real-time reverse transcription–polymerase chain reaction (RT–PCR) [[3]]. However, these tests are expensive and take time to produce results. Also, they have a sensitivity (Sen) of 71–98% and need, at least, a biosafety level 2 lab to perform this test correctly [[4]]. These conditions associated with the RT–PCR tests are making the situation worse. Not only the developing countries, with huge population densities but almost all nations are facing difficulties in conducting such tests at the required rate. Thus, an efficient alternative COVID-19 test (quick and economical) is needed to curb this menace.

The novelty of COVID-19 makes its diagnosis a challenge. The most commonly reported clinical features of this infection include high fever, sore throat, fatigue, muscle pain and shortness of breath [[5]]. However, these symptoms are also said to be varying with time and place due to the observed mutations in the virus structure [[6]]. These fluctuations make the symptomatic diagnosis less reliable. Another modality that is becoming quite popular for detection and progression of COVID-19 is diagnostic radiology [[7]], which includes chest X-rays (CXR) and high-resolution computerised tomography (HRCT) scans. These imaging modalities can help in observing the visible effects of the virus on the pulmonary region. Rodriguez-Morales et al. [[8]] reported bilateral ground-glass opacities (GGOs) in chest X-ray images of COVID-19 patients. Salehi et al. [[9]] observed a bilateral multilobar GGO with a peripheral or posterior distribution in computerised tomography (CT) scans. Ng et al. [[7]] reported GGO with occasional consolidation in the peripheries of lungs. Zhao et al. [[10]] found GGO or mixed GGO in most positive cases. They also observed a vascular dilation in form of a lesion. Most of the observations commonly indicate the presence of GGO, followed by consolidations in peripheral regions of the lungs. HRCT scans are reported to have much higher sensitivity (Sen) and specificity (Spe) than CXR images [[7]]. However, CXR images display focused information, mainly the area of interest. Chest X-rays cause much less radiation exposure as compared to CT scans and also, the X-ray machines are more easily available and accessible, even at community levels. Moreover, conducting a CXR scan is relatively economical than CT-scans and can also be performed with less trained staff. In some cases, it is observed that CXR abnormalities start appearing even before the positive test on RT–PCR [[11]]. Thereby, in the proposed work, we use CXR images to develop the COVID-19 detection algorithm.

2 Related work

Medical imaging has paved the way for the extraction of enormous intriguing, otherwise hidden information. Machine-aided image analysis helps in studying and experimenting with this two-dimensionally captured information, in-depth. Deep models are helping in object detection, segmentation, various predictions and thereby diagnosis. People have designed intelligent models to learn physical properties using images. For example Gao et al. [[12]] proposed a model that could perceive blood flow dynamics from static coronary CT angiography images. Graham et al. [[13]] designed dense steerable filters convolutional neural networks (CNNs) for breast tumour classification, colon gland segmentation and multi-tissue nuclear segmentation. Images can reveal a lot if experimented in apt directions. However, sometimes machines may get over-confident while predicting the result thus, in clinical practices it is necessary to include proper measures, with which the diagnosis is reported [[14]].

For COVID-19 diagnosis, researchers are exploring different lung imaging modalities to understand and detect the disease. Several research groups have reported successful COVID-19 detection using new deep learning models, existing models, transfer learning and various ensemble learning methods. Narin et al. [[15]] used pre-trained CNN models, InceptionV3, ResNet50 and InceptionResNetV2 to detect coronavirus affected lungs using CXR images. Apostolopoulos and Mpesiana[[16]] also used transfer learning to detect COVID-19 using a dataset of 1427 X-ray images. Zhang et al. [[17]] developed a deep learning anomaly detection model for COVID-19 screening based on CXR images. Hall et al. [[18]] proposed ensemble learning with pre-trained ResNet50, VGG-16 plus and a small CNN model. Wang et al. [[19]] used 453 CT images of COVID-19 and pneumonia patients to train inception model. Hemdan et al. [[20]] proposed a novel CNN-based architecture, COVIDX-Net, to diagnose COVID-19. This network includes seven different CNN models to analyse and classify CXR images. Chowdhury et al. [[21]] performed a comparative study of the classification of CXR images using different pre-trained models. Wang and Wong [[22]] proposed a computationally efficient model, COVID-Net, which is based on the projection–expansion–projection design pattern. Ozturk et al. [[23]] proposed DarkCovidNet to diagnose COVID-19 using CXR images. Sethy and Behera [[24]] classified CXR images using support vector machine along with ResNet50 model. Ucar and Korkmaz [[25]] tuned the SqueezeNet for the COVID-19 diagnosis using chest X-ray images with Bayesian optimisation additive. Abbas et al. [[26]] validated the CNN model called decompose, transfer, and compose, for COVID-19 chest X-ray image classification. We can see a significant amount of work has been done to diagnose COVID-19. However, in the case of transfer learning, models are pre-trained on non-medical data and the newly proposed deep models are either computationally demanding or may lead to over-fitting due to data scarcity. Thus, we need an optimised model that can target the COVID-specific features while handling the data limitations, simultaneously.

The novelty of COVID-19 leaves us with very little data for any modality. The challenges include data scarcity, limited understanding of COVID-specific features and highly unbalanced public data towards non-COVID cases. Relying only on the machine with a handful of data can be risky in a medical crisis. On the other side, if we opt for purely unsupervised analysis of images using conventional methods of image processing, we need much more insights about the appearance and progression of the disease, which are still awaited. Keeping both of these aspects in mind, we combine the advantages of both, supervised and unsupervised image analysis, in an optimised manner. Sometimes, the radiological images lead to false-diagnosis due to very similar appearance of COVID and pneumonia-affected lungs, with various image processing techniques we can enhance the COVID-specific abnormalities and make them more differentiable from other similar non-COVID cases. Furthermore, deep learning can be used to understand the pattern of appearance of disease and its progression as machines can dig into much deeper details which may not be possible for humans. In this paper, we combine deep learning with unsupervised COVID-specific feature boosting. Here, we prefer transfer learning due to data scarcity and ensemble learning to fuse the deep and unsupervised portions of the algorithm. The proposed algorithm has produced promising results, especially in terms of Sen, which is the most crucial measure in the case of a disease diagnosis.

3 Proposed method

In this paper, we propose a sensitive COVID-19 detection method using CXR images. The key points on which we focussed while developing this algorithm are as follows:
  • (1) We have data scarcity due to the novelty of disease. Deep models may overfit without learning anything in such cases. Transfer learning methods can play a very useful role here.

  • (2) However, pre-trained models used for transfer learning are generally trained on non-medical data, thus we cannot solely depend on them to classify CXR images.

  • (3) The model may overlook some important COVID-specific features. Therefore, to resolve this issue, we use some handcrafted COVID-specific feature enhanced images to train the model explicitly.

  • (4) Also, we cannot deny the fact that machines can go much deeper and make use of pattern-based, other high-level features for classification. So, we train the model with raw images too.

  • (5) We need to combine the learning from both handcrafted and raw images, in an optimised manner.

  • (6) Moreover, significant similarities in the appearance of pneumonia and COVID-19 infections cannot be ignored. So, we include pneumonia images in our study for a robust classification.

These points motivate us in developing an efficient solution for COVID-19 detection using CXR images.

In the proposed method, we use CXR images of normal (healthy), pneumonia and COVID-19 patients. The signs and symptoms of pneumonia and COVID-19 are mostly overlapping and sometimes almost indistinguishable, thus including pneumonia images in our study makes the diagnosis more robust. We use the original set of red–green–blue (RGB) CXR images, along with two more handcrafted COVID-specific feature-boosted image sets, to train three separate deep models. This robust training of the base -model (which was previously trained using non-medical data), leads to a promising classification of CXR images. Using original images we allow the base model to independently learn the hidden patterns of the disease. The other two handcrafted sets are to facilitate the remaining two base models to focus more on COVID-specific opacities. The three models learn separately from their exclusive sets of images and predict their results. These predictions are combined using a new model (learner) to fuse the benefits of deep models and COVID-specific feature boosting. The combined prediction denotes the final optimised result. Fig. 1 presents the block diagram of the proposed algorithm. The next sub-sections discuss the stages shown in the diagram in detail.

Details are in the caption following the image

Block diagram of the proposed COVID-19 detection algorithm using chest X-ray images depicting different stages of methodology: input CXR images, handcrafting COVID-specific feature images and ensemble learning

3.1 Handcrafting COVID-specific feature images

As discussed in Section 1, the COVID-19 affected lungs are observed to have bilateral peripheral opacities with haziness, referred to as GGO and consolidations [[8]-[10]]. This paper exploits the opacity feature and enhances the input CXR images with appropriate image processing, to widen the gap between normal, pneumonia and COVID-19 CXR images. We term these enhanced output images as handcrafted feature images. Thus, we have three sets of images, explained in the following sub-sections.

3.1.1 SET-1: CXR images in RGB colourspace

All the publicly available chest X-ray images are, generally, in RGB colourspace. We consider these original images as our SET-1. This set is used along with other pre-processed enhanced image sets to avoid any information loss due to pre-processing. Figs. 2a–c present three original sample CXR images of normal, pneumonia and COVID-19 patient, respectively.

Details are in the caption following the image

Illustration of original CXR images with corresponding hue, saturation, value (HSV) images

(a),(d) Normal, (b),(e) Pneumonia, (c),(f) COVID-19

We performed various image processing experiments [[27]] on sample CXR images from the three classes. Contrast-related changes appeared to be disturbing the haziness feature of input images. To understand texture-based differences in the two classes, we extracted local standard deviation and entropy features in 9 × 9 neighbourhood. However, these features have very close values for all the classes, thus cannot be used for class differentiation. In other experiments, we analysed the CXR images in different colour spaces, studied gradient-based features, tried colour corrections etc.

3.1.2 SET-2: CXR images in HSV colourspace

We observe that the opacities are more clearly visible in HSV colourspace as compared to the original input images. Figs. 2d–f show the sample CXR images from the normal, pneumonia and COVID-19 CXR images in HSV colourspace, respectively. This colourspace conversion of CXR images highlights the differences amongst the classes. We generate a new set of HSV images, SET-2 ( S 2 ), corresponding to each original RGB input image of SET-1 ( S 1 ), for the training purpose. For RGB to HSV conversion, we use the standard procedure [[28]]
  • • Let, X ( r , s , t ) denotes the RGB input image, with r, s denoting the two-dimensional spatial coordinates and t as the colour channel

    R ( r , s ) = X ( : , : , 1 ) ; G ( r , s ) = X ( : , : , 2 ) ; B ( r , s ) = X ( : , : , 3 ) . (1)

  • • As 8 bits are used for red, green and blue channel, so each value is divided by 255 to normalise the channels in the range [ 0 , 1 ] , i.e.

    R norm ( r , s ) = R ( r , s ) / 255 ; (2)
    G norm ( r , s ) = G ( r , s ) / 255 ; (3)
    B norm ( r , s ) = B ( r , s ) / 255. (4)

  • • Find maximum and minimum of normalised R ( r , s ) , G ( r , s ) , B ( r , s ) values for each pixel

    C max ( r , s ) = max ( R norm ( r , s ) , G norm ( r , s ) , B norm ( r , s ) ) (5)
    C m i n ( r , s ) = min ( R norm ( r , s ) , G norm ( r , s ) , B norm ( r , s ) ) (6)

  • • Assign V ( r , s ) = C max ( r , s )

  • • Calculate difference between maximum and minimum, denoted by δ ( r , s ) ,

    δ ( r , s ) = C max ( r , s ) C min ( r , s ) (7)

  • • Calculate S ( r , s ) as

    S ( r , s ) = δ ( r , s ) / C max ( r , s ) (8)

  • H ( r , s ) is evaluated based on values of C max ( r , s ) and C min ( r , s )

    H ( r , s ) = 1 6 . G ( r , s ) B ( r , s ) δ ( r , s ) , if C max ( r , s ) = R ( r , s ) and C min ( r , s ) = B ( r , s ) 1 + R ( r , s ) B ( r , s ) δ ( r , s ) , if C max ( r , s ) = G ( r , s ) and C min ( r , s ) = B ( r , s ) 2 + B ( r , s ) R ( r , s ) δ ( r , s ) , if C max ( r , s ) = G ( r , s ) and C min ( r , s ) = R ( r , s ) 3 + G ( r , s ) R ( r , s ) δ ( r , s ) , if C max ( r , s ) = B ( r , s ) and C min ( r , s ) = R ( r , s ) 4 + R ( r , s ) G ( r , s ) δ ( r , s ) , if C max ( r , s ) = B ( r , s ) and C min ( r , s ) = G ( r , s ) 5 + B ( r , s ) G ( r , s ) δ ( r , s ) , if C max ( r , s ) = R ( r , s ) and C min ( r , s ) = G ( r , s ) undefined , if C max ( r , s ) = C min ( r , s ) (9)

Now, the H, S and V obtained above are used to produce an image in HSV colourspace, denoted by X ( r , s , t ) , i.e. X ( : , : , 1 ) = H ( : , : ) , X ( : , : , 2 ) = S ( : , : ) and X ( : , : , 3 ) = V ( : , : ) .

3.1.3 SET-3: Prewitt edge image set

Moreover, we observed that the haziness due to infection leads to loss of basic Prewitt edges [[27], [29]], specifically the peripheral edges are observed to be missing in COVID-19 CXR images. Figs. 3d–f show the Prewitt edge images corresponding to original CXR images for the normal, pneumonia and COVID-19 patients, respectively. The loss of peripheral edges in Fig. 3f, may account for the presence of bilateral peripheral opacities and haziness. Thus, these Prewitt edge images can be very useful in learning the difference between the two classes. This motivates us to generate another set of images, SET-3 ( S 3 ), with Prewitt edge images (denoted by X ( r , s ) ) corresponding to each RGB input image. Now we have three sets of CXR images, S 1 (raw images), S 2 (HSV images) and S 3 (Prewitt edge images). Both of the above chosen pre-processing steps are validated in the experimental section.

Details are in the caption following the image

Illustration of original CXR images with corresponding Prewitt edge images

(a),(d) Normal, (b),(e) Pneumonia, (c),(f) COVID-19

For creating a Prewitt edge images set ( S 3 ), we convolve the greyscale CXR images, denoted by X grey ( r , s ) with two 3 × 3 kernels to obtain horizontal and vertical edge images X grey h ( r , s ) and X grey v ( r , s ) , respectively
X grey h ( r , s ) = + 1 0 1 + 1 0 1 + 1 0 1 X grey ( r , s ) (10)
X grey v ( r , s ) = + 1 + 1 + 1 0 0 0 1 1 1 X grey ( r , s ) (11)
where * denotes the convolution operator. These horizontal and vertical edges are combined to produce the required Prewitt edge image, X ( r , s )
X ( r , s ) = ( X grey h ( r , s ) ) a 2 + ( X grey v ( r , s ) ) a 2 (12)
Thereby, with these simple transformations, we complete the pre-training stage of the proposed algorithm. Next, we use these sets for ensemble learning.

3.2 Ensemble learning: stacked generalisation

We have created three sets of CXR images for the detection of COVID-19. For each original CXR image, we have handcrafted two more images for COVID-specific feature boosting, i.e. we have SET-1 corresponding to original (raw) CXR images, SET-2 is the images in HSV colourspace and SET-3 consists of the Prewitt edge images. For the effective fusion of these sets, we first apply ensemble learning on each of these images separately such that deep models can learn their high-level features. Three base models or base learners denoted as BM 1 , BM 2 , and BM 3 are used to infer from the three sets, SET-1, SET-2 and SET-3, separately. A different input set to each of these independent models, adds diversity to the base predictions. Each base model learns from its own set of images and makes independent predictions.

The predictions from the base level need to be combined in the best possible manner to give a single-final prediction at the meta level. We use stacked generalisation-based ensemble learning [[30]] to combine these predictions in an optimised manner. Predictions from stacked base models are used to train another model known as meta learner (or meta model). The meta model is like a secondary classifier, which learns the best-weighted combination of base predictions. This fuses the base models into a single predictor model. We have given the pseudo-code for the complete algorithm (refer to Algorithm 1 (see Fig. 4)).

Details are in the caption following the image

Algorithm 1. Pseudo-code for proposed algorithm

3.3 Database

This paper uses two open-source databases for experimentation purposes. Cohen et al. [[31]] supplied a dataset of CXR and CT images of patients with COVID-19 or similar symptoms. This database is continuously being updated to facilitate researchers working on COVID-19 detection. We extract 191 CXR images of patients with COVID-19, with posterior–anterior and anteroposterior view for our work.

For normal and pneumonia-affected CXR images, we have referred to another public database available on Kaggle. This dataset has a total of 5863 CXR images belonging to normal and pneumonia classes. It is a subset of the National Institutes of Health CXR-14 dataset [[32]]. We have sampled 382 normal and 191 pneumonia images from this dataset for our experiment.

Apart from these two open-source databases, we have also tested the proposed algorithm on six locally collected, COVID-19 CXR images.

4 Experiment

The proposed COVID-19 detection algorithm is the solution to a CXR image classification problem. As the infection is novel and has pneumonia-like features, we include pneumonia images, along with normal and COVID-19 images, in our experiment. Moreover, to overrule any ambiguity, we have done three types of classifications, termed as
  • • TYPE-I: NORMAL and ABNORMAL;

  • • TYPE-II: PNEUMONIA and COVID-19;

  • • TYPE-III: NORMAL, PNEUMONIA and COVID-19.

Thus, TYPE-I and TYPE-II are binary classification problems whereas TYPE-III is a multi-class classification, separating the CXR images into three classes. As COVID-19 CXR images most closely resemble the pneumonia CXR images, so we have added TYPE-II and TYPE-III classifications to get better insights into the proposed method. The open-source databases are unbalanced, biasing towards some particular classes. In this algorithm, we have extracted and used the balanced sets of images to train our model in different cases. For TYPE-I classification, we use 382 normal and 382 abnormal CXR images. The abnormal set, also, has an equal number of pneumonia (191) and COVID-19 (191) images. For TYPE-II, we use 191 pneumonia and 191 COVID-19 CXR images. In the case of TYPE-III, we use an equal number of normal, pneumonia, and COVID-19 CXR images, i.e. 191 images for each of the three classes. We have done data-augmentation by rotating the images.

The data used in this work are from different sources, thus need to be normalised. We resize all the images into 224 × 224 × 3 dimension and adjust the pixel-intensities to the range [ 0 , 1 ] . Corresponding to this normalised set of input, we derive two more sets of images for COVID-specific opacity enhancement. Thus, we have three sets of images with us, SET-1: normalised RGB CXR images, SET-2: normalised CXR images in HSV colourspace, and SET-3: the Prewitt edge images corresponding to each image of SET-1. Though the algorithm seems to take a lot of inputs, the two sets are derived from the original RGB images by applying simple pre-processing steps, so it is not difficult. Moreover, adding these inputs bring diversity to the learners, thereby the performance of the proposed method is improved, significantly. So, it is quite required.

This work is based on ensemble learning, where we train three identical base models (VGG-16) with the aforementioned sets of images. Furthermore, to use the output of these base models in the best possible way, they are combined using meta model (logistic regression). It provides better efficiency even in less data. We need to avoid over-fitting at both the training levels (base level and meta level). At the base level, we split the data into training (80%) and testing (20%) sets. The training set is again split into training (80%) and validation (20%) for the base models. Later, the initial testing set kept aside at the base level, is fed to base models and the predictions are used to train the meta model.

At the base level, we use the VGG-16 [[33]] network, pre-trained on ImageNet database [[34]]. After freezing the layers with the pre-trained weights, we append average pooling and fully connected layers with sigmoid for TYPE-I and -II and softmax for TYPE-III activation function. Fig. 5 illustrates the frozen and fine-tuning layers of this model, with the output size of each layer on the top. Owing to transfer learning, we need to train only 32 , 962 parameters out of the total 14 , 747 , 650 parameters in the case of binary classification and 33 , 027 out of 14 , 747 , 715 , in the case of multi-class classification. We use binary and categorical cross-entropy as the loss functions and Adam optimiser. Figs. 6-8 depict the learning curves for 50 epochs corresponding to each individual base model for TYPE-I, -II, and -III classification, respectively. The continuous decrease in losses and increasing accuracy (Acc) show that base models are not saturating till 50 epochs.

Details are in the caption following the image

Illustration of frozen and fine-tuning layers of the VGG-16 base model

Details are in the caption following the image

Illustration of learning curves (training and validation) corresponding to each base model for TYPE-I classification

(a) Base model 1, (b) Base model 2, (c) Base model 3

Details are in the caption following the image

Illustration of learning curves (training and validation) corresponding to each base model for TYPE-II classification

(a) Base model 1, (b) Base model 2, (c) Base model 3

Details are in the caption following the image

Illustration of learning curves (training and validation) corresponding to each base model for TYPE-III classification

(a) Base model 1, (b) Base model 2, (c) Base model 3

After training the base models, we need our meta model to learn the best combination of base predictions. As discussed earlier, to avoid over-fitting, we use the test set kept aside at the base level to generate new predictions and train the meta model. In this paper, we use logistic regression [[35]] as our meta model.

In this paper, for evaluating the performance of the proposed algorithm we have used five metrics namely, Accuracy (Acc), Sensitivity (Sen), Specificity (Spe), precision (Pr) and F1-score ( F 1 ). Mathematical definitions of these measures are given below
  • • Acc:

    Acc = ( TP + TN ) ( TP + TN + FP + FN )

  • • Sen :

    Sen = ( TP ) ( TP + FN )

  • • Spe :

    Spe = ( TN ) ( TN + FP )

  • • Pr :

    Pr = ( TP ) ( TP + FP )

  • F 1 :

    F 1 = ( 2 × Pr × Sen ) ( Pr + Sen )

where TP is the true positive; TN is the true negative; FP is the false positive; and FN is the false negative.

Acc gives the measure of correct predictions out of the total number of predictions made. The other four metrics are class-dependent. In this paper, we have evaluated these values w.r.t. ABNORMAL for TYPE-I and COVID-19 for TYPE-II and -III classifications. The Spe identifies the correctness in classifying NORMAL cases (TYPE-I), PNEUMONIA cases (TYPE-II) and NORMAL, PNEUMONIA cases in combination (TYPE-III). The high value of Sen leads to a low value of false negatives, which is of utmost importance in any disease diagnosis. Pr value denotes the correct classification to a particular class out of all the classifications to this class. F 1 -score evaluates the proposed method on the basis of a number of false positives and false negatives, which is way more crucial when it comes to the classification of medical images.

We evaluate the performance of the proposed method at both the levels (base and meta levels) in terms of above-defined performance metrics. Table 1 presents the results of the proposed algorithm on our testing data. The performance of base models 2 and 3 in comparison with base model 1 validates the applied pre-training steps. Moreover, it can be clearly seen that the combined model (meta model) has performed better than each individual model (base model 1, base model 2 and base model 3), i.e. it works well for all three kinds of classifications, TYPE-I, -II, and -III. The value of each performance metrics is maximum in the last row corresponding to the meta model (fused prediction), for each type. Moreover, after fusion, all the performance measures are above 90%. In fact, the Sen of the proposed method is above 96% after fusion, in all cases, which is quite promising for a medical problem. To get a better insight into the performance of the proposed model, we also evaluate the confusion matrix corresponding to each model and type shown in Fig. 9-11. These matrices also prove the improvement in the performance of the fusion model in comparison with each individual model. We observe that the false negatives are minimised while maximising the true positives for the fusion model matrix, which is the most crucial requirement in our case.

Table 1. Performance of the proposed COVID-19 detection algorithm
Classification Model Acc Sen Spe Pr F 1
TYPE-I Base model 1 0.948 0.961 0.934 0.937 0.949
Base model 2 0.954 0.922 0.987 0.986 0.953
Base model 3 0.928 0.935 0.921 0.923 0.929
Fusion 0.974 0.961 0.987 0.987 0.974
TYPE-II Base model 1 0.974 0.949 1.000 1.000 0.974
Base model 2 0.909 0.872 0.947 0.944 0.906
Base model 3 0.935 0.872 1.000 1.000 0.932
Fusion 0.987 0.974 1.000 1.000 0.987
TYPE-III Base model 1 0.826 0.868 0.974 0.943 0.904
Base model 2 0.826 0.895 0.974 0.944 0.919
Base model 3 0.809 0.868 0.974 0.943 0.904
Fusion 0.887 0.974 0.974 0.949 0.961
  • The overall resulting values of the system are highlighted in bold.
Details are in the caption following the image

Confusion matrix corresponding to TYPE-I classification

(a) Base model 1, (b) Base model 2, (c) Base model 3, (d) Fusion model

Details are in the caption following the image

Confusion matrix corresponding to TYPE-II classification

(a) Base model 1, (b) Base model 2, (c) Base model 3, (d) Fusion model

Details are in the caption following the image

Confusion matrix corresponding to TYPE-III classification

(a) Base model 1, (b) Base model 2, (c) Base model 3, (d) Fusion model

Further to get more insight, we compare the performance of the proposed algorithm with other CXR-based COVID-19 detection methods. Researchers have worked with different types of formulations to detect COVID-19. Methods based on pre-trained models [[15], [18], [21], [24]], although take less time in training but are risky, as they were partially-trained with an entirely different (non-medical) kind of data, previously. COVID-specific deep models [[17], [20], [23]] are computationally demanding. Table 2 provides a comparison between the proposed approach and earlier existing methods. Also, it can be seen that the suggested method outperforms the other state-of-the-art approaches. The main reason behind the better performance of the proposed method is the use of handcrafted feature-boosting to enhance COVID-specific features. This facilitates the pre-trained deep model to focus more on our problem-specific features along with other high-level features that only a machine can learn. The pre-training stage involves simple transformation steps (creating SET-2 and SET-3), which improve the algorithm's performance by quite a margin. However, for a normalised comparison with other methods, we experimented (trained and tested) the proposed model for one more type of classification, which is normal versus COVID-19 (can be considered as TYPE-IV). The overall fusion Acc corresponding to all types of classifications are given in Table 2.

Table 2. Performance comparison of the proposed COVID-19 detection algorithmwith other methods tested on CXR images
Method Samples Model Accuracy
Sethy and Behera [[24]] 25 COVID-19, 25 normal ResNet50+SVM 0.953
Zhang et al. [[17]] 106 COVID-19, 107 normal CAAD model 0.728
Hemdan et al. [[20]] 25 COVID-19, 25 normal COVIDX-Net 0.900
Li and Zhu [[36]] 179 normal, 179 COVID-19, 179 pneumonia DenseNet 0.889
Narin et al. [[15]] 50 COVID-19, 50 normal DeepCNN+ResNet50 0.980
Hall et al. [[18]] 135 COVID-19, 320 pneumonia ResNet50+VGG16+CNN 0.912
Chowdhury et al. [[21]] 423 COVID-19, 423 normal, 423 Pneumonia AlexNet+ResNet18+DenseNet201+SqueezeNet 0.983
Ozturk et al. [[23]] 125 COVID-19, 500 normal DarkCovidNet 0.981
500 normal, 125 COVID-19, 500 pneumonia DarkCovidNet 0.870
proposed method 191 COVID-19, 191 normal VGG-16 (base) + logistic regression (meta) 0.984
382 normal, 382 abnormal VGG-16 (base) + logistic regression (meta) 0.974
191 COVID-19, 191 pneumonia VGG-16 (base) + logistic regression (meta) 0.987
191 normal, 191 COVID-19, 191 pneumonia VGG-16 (base) + logistic regression (meta) 0.887
  • The values corresponding to our proposed system are highlighted in bold.

The proposed method is implemented on NVIDIA TESLA P100, Kaggle's free GPU. The training approximately takes 700 s. This implementation time is quite less in comparison with many other state-of-the-art methods, as we have used pre-trained models in the proposed work.

As mentioned in Section 3.3, apart from testing on open-source databases, we have cross-tested the proposed algorithm on six local COVID-19 CXR images, which are privately collected. The proposed algorithm has predicted these images with a 100 % Acc value. This cross-testing proves the robustness of the proposed algorithm on diverse datasets. The proposed algorithm has produced quite promising results in terms of Acc and Sen measures, which are two very crucial performance measures in a balanced-classification problem. Lack of data due to the novel nature of coronavirus, may require us to retrain the model. Our experience with the other open-source images and privately obtained images look promising, but the real picture will emerge when more data will be available.

5 Conclusion and future work

This work proposes a highly sensitive COVID-19 detection algorithm using chest X-ray images. The classification of images is done in three (plus one) ways, NORMAL versus ABNORMAL, PNEUMONIA versus COVID-19 and NORMAL versus PNEUMONIA versus COVID-19 (and NORMAL versus COVID-19). The CXR images are analysed to enhance COVID-specific bilateral GGOs. This helps in the handcrafting of two more feature-boosted image sets from the original set. These sets bring diversity to the base models in ensemble learning. The meta model learns from the predictions of base models and combines them. With the limited data, the proposed method has fused the advantages of deep learning and unsupervised learning to produce quite promising prediction results on the test images, especially in terms of minimum false negatives, i.e high Sen.

In the future, we would combine other modalities along with chest X-rays to ensure a more robust COVID-19 detection algorithm. Also, we intend to do a much detailed handcrafted feature-based image analysis to extract some more COVID-19 specific features.