ieee8023 / covid-chestxray-dataset

We are building an open database of COVID-19 cases with chest X-ray or CT images.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using different dataset for negative class

imanpalsingh opened this issue · comments

Goal

I'm trying to train a classifier to label an X-ray covid-19 positive or negative.

What I have tried

  1. Used an architecture as defined in this research
    Screenshot (49)

  2. Used X ray images with views AP, PA and AP Supine views

  3. Used images which are marked with 'COVID-19' in finding column as positive images

  4. Used images which are marked as 'No Finding' in finding column as negative images

Results

Due to large number of positive images the models gives ~96% by possibly predicting same class all the time (even after using augmentation)

Next steps

To get more healthy images I have decided to use this kaggle dataset.

My question is, is it okay to use two different distributions of datasets for this classification task? Also If my approach to the classification is flawed in any way.

Uaing that kaggle dataset is as a negative example dataset is very bad because all the images are of children while this dataset is mostly adults so your model will likely learn to predict age and not the pathology.
I would check out the RSNA Pneumonia challenge dataset.
There are two papers linked at the top of the repo that are related to this issue.
I also suggest you read our paper about this dataset which discusses the possible tasks and their clinical value: https://arxiv.org/abs/2006.11988