awsaf49 / sfr-covid19-detection

Best Student Team & 4th Place solution of SIIM-FISABIO-RSNA COVID-19 Detection || Identify and localize COVID-19 abnormalities on chest radiographs

Home Page:https://www.kaggle.com/competitions/siim-covid19-detection/discussion/264243

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Best Student Team & 4th Place Solution of SIIM-FISABIO-RSNA COVID-19 Detection

Identify and localize COVID-19 abnormalities on chest radiographs

This is a collaboration between BUET and NVIDIA

Team Members

Name Affiliation Country Position
Md Awsafur Rahman Dept. of EEE, BUET πŸ‡§πŸ‡© Undergrad Student
Bishmoy Paul Dept. of EEE, BUET πŸ‡§πŸ‡© Undergrad Student
Najibul Haque Sarker Dept. of CSE, BUET πŸ‡§πŸ‡© Undergrad Student
Zaber Ibn Abdul Hakim Dept. of CSE, BUET πŸ‡§πŸ‡© Undergrad Student
Chris Deotte Nvidia πŸ‡ΊπŸ‡Έ Senior Data Scientist

Solution Reproduction

Below you can find an outline of how to reproduce our solution.

If you run into any trouble with the setup/code or have any questions please contact me at awsaf49@gmail.com

0. Video Summary on YouTube

GitHub YouTube

Watch the video

1. Requirements:

1.1 Hardware:

  • GPU : 4x Tesla V100
  • GPU Memory : 4x32 GiB
  • CUDA Version : 11.0
  • Driver Version : 450.119.04
  • CPU RAM : 16 GiB
  • DISK : 2 TB

1.2 Libraries:

  • python-gdcm==3.0.9.1
  • pydicom==2.1.2
  • joblib==1.0.1
  • tensorflow==2.4.1
  • torch==1.7.0
  • torchvision==0.8.1
  • numpy==1.19.5
  • pandas==1.2.4
  • matplotlib==3.4.2
  • opencv-python==4.5.2.54
  • opencv-python-headless==4.5.2.54
  • Pillow==8.2.0
  • PyYAML>=5.3.1
  • scipy==1.6.3
  • tqdm==4.61.1
  • tensorboard==2.4.1
  • seaborn==0.11.1
  • ensemble_boxes==1.0.6
  • albumentations==1.0.1
  • thop==0.0.31.post2005241907
  • Cython==0.29.23
  • pycocotools==2.0
  • addict==2.4.0
  • timm==0.4.12
  • efficientnet==1.1.1

2. External Packages

External Packages with version number are listed on requirements.txt

! pip install -qr requirements.txt

3. Data Preparation

3.1 Description

After this ./data directory should look something like this.

.
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ chexpert
β”‚   β”‚   β”œβ”€β”€ train
β”‚   β”‚   β”œβ”€β”€ train.csv
β”‚   β”‚   β”œβ”€β”€ valid
β”‚   β”‚   └── valid.csv
β”‚   β”œβ”€β”€ ricord
β”‚   β”‚   β”œβ”€β”€ MIDRC-RICORD
β”‚   β”‚   └── MIDRC-RICORD-meta.csv
β”‚   β”œβ”€β”€ rsna-pneumonia-detection-challenge
β”‚   β”‚   β”œβ”€β”€ GCP Credits Request Link - RSNA.txt
β”‚   β”‚   β”œβ”€β”€ stage_2_detailed_class_info.csv
β”‚   β”‚   β”œβ”€β”€ stage_2_sample_submission.csv
β”‚   β”‚   β”œβ”€β”€ stage_2_test_images
β”‚   β”‚   β”œβ”€β”€ stage_2_train_images
β”‚   β”‚   └── stage_2_train_labels.csv
β”‚   └── siim-covid19-detection
β”‚       β”œβ”€β”€ sample_submission.csv
β”‚       β”œβ”€β”€ test
β”‚       β”œβ”€β”€ train
β”‚       β”œβ”€β”€ train_image_level.csv
β”‚       └── train_study_level.csv

In case you are wondering to have a look at complete directory structure, you can see it in data_structure.txt

After this run prepare_data.py. It does the following

  • Read training data from RAW_DATA_DIR (specified in SETTINGS.json)
  • Run any preprocessing steps
  • Save the cleaned data to CLEAN_DATA_DIR (specified in SETTINGS.json)

3.2 Script

prepare_data.py

  • --img-size image size in which we want our cleaned to to be
  • --debug if given 1, it will only process 100 images
! python prepare_data.py 

4. Training

4.1 Description

Simply run the train.py script. It does the following

  • Read training data from TRAIN_DATA_CLEAN_PATH (specified in SETTINGS.json)
  • Pretrains classification and detection backbones in chexpert data.
  • Finetunes them on competition data and external data.
  • Save model to MODEL_DIR (specified in SETTINGS.json)

4.2 Script

train. py

  • --settings-path path to SETTINGS.json. Default value uses the correct path.
  • --clsbs-path path to json file containing necessary batch sizes for different classification models. Default value uses the correct path.
  • --detbs-path path to json file containing necessary batch sizes for different detection models. Default value uses the correct path.
  • --debug will process only 100 images
! python train.py

5. Prediction

5.1 Description

Before proceeding download this already trained checkpoints and unzip them into the path specified in CHECKPOINT_DIR in SETTINGS.json.

./checkpoints then should look like

.
β”œβ”€β”€ checkpoints
β”‚   β”œβ”€β”€ 2cls
β”‚   β”œβ”€β”€ 4cls
β”‚   β”œβ”€β”€ det

For predicting on test data run predict.py. It does the following

  • Read test data from TEST_DATA_CLEAN_PATH (specified in SETTINGS.json)
  • Loads models from MODEL_DIR(specified in SETTINGS.json) when everything is trained from scratch or CHECKPOINT_DIR(specified in SETTINGS.json) when predicting from our previously trained checkpoints.
  • Use our models to make predictions on new samples
  • Save our predictions to SUBMISSION_DIR (specified in SETTINGS.json)

5.2 Script

predict. py

  • --mode if used "full", then it will use the weights saved in MODEL_DIR (which was saved after training from scratch) and when used "fast" it will use the weights saved in CHECKPOINT_DIR (already trained checkpoints)
  • --debug if given 1, it will infer on only first 100 images
!python predict.py --mode "fast"

or

! python predict.py --mode "full"

Acknowledgement ❀️

About

Best Student Team & 4th Place solution of SIIM-FISABIO-RSNA COVID-19 Detection || Identify and localize COVID-19 abnormalities on chest radiographs

https://www.kaggle.com/competitions/siim-covid19-detection/discussion/264243

License:GNU General Public License v3.0


Languages

Language:Python 98.2%Language:Shell 1.4%Language:Dockerfile 0.4%