COMP3003 Individual Dissertation

Credits to large parts of original code belonging to Jonas Wacker @ https://github.com/joneswack/brats-pretraining published in the paper Transfer Learning for Brain Tumor Segmentation on arXiv.

This repository contains code to reproduce our proposed extension (Extending Upon a Transfer Learning Approach) of the original project.

Recommended Hardware Specifications

Disk space: We highly recommend at least allocating 40GB of disk space to store both preprocessed data and original data in the next few steps.

GPU: We use a RTX2080Ti 11GB for our experiments. A smaller GPU will affect the required batch size and a larger GPU will allow a larger batch size.

Memory: We recommend allocating 24GB of memory for this experiment.

Setup and Installation

To run this project, Python versions that are between Python 3.0 and Python 3.6.9 must be installed on your system. We recommend downloading Python 3.6.8 from the official source as this is the official version of Python we use in our project.

We will provide two instruction sets for setup, one for running on the University's GPU Cluster, and one for running locally on your own computer.

NOTE: python and python3 in the command line can be interchangeable depending on your installation method.

This setup section has been tested and proven to work on the University's GPU Cluster running CentOS Linux 7 (Core), Ubuntu 20.04.2 LTS and Windows 10 Home. For Ubuntu, kindly adapt the commands to use python3.6 as the prefix for your commands.

Setting up a Virtual Environment and Installing Dependencies

We recommend installing all our packages and dependencies in a virtual environment to prevent conflicts with other possible packages installed on the base system.

To set up all the dependencies required to run this project, the following commands can be ran in your terminal (Command Prompt, Bash) in the root directory of this project brats-pretraining_jiachenn. We recommend using the pip package manager to install our dependencies as some dependencies are not available on Anaconda/Conda.

Running on the University's GPU Cluster

You could create a virtual environment on the GPU Cluster with the commands in the next section if you do not want package conflicts, else there is no need because Carey automatically creates a virtual environment on sign in and you can just install all packages with:

$ python -m pip install --upgrade pip
$ pip install -r requirements.txt

Running Locally

To create and activate a virtual environment using the venv package that is shipped with Python 3.3 and above:

$ python -m venv virtualenv
$ source virtualenv/Scripts/activate (for Unix based systems)
$ .\virtualenv\Scripts\activate (for Windows based systems)

Run the following commands to install dependencies using pip:

$ python -m pip install --upgrade pip
$ pip install -r requirements.txt

Setting up PyTorch and CUDA

To install the version of PyTorch and CUDA that our project uses, simply run the following commands in sequence:

Running on the University's GPU Cluster

$ module load nvidia/cuda-10.1
$ module load nvidia/cudnn-v7.6.5.32-forcuda10.1
$ python -m pip install --upgrade pip
$ SRC=https://download.pytorch.org/whl/torch_stable.html
$ pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f $SRC

Running Locally on your Computer

$ python -m pip install --upgrade pip
$ SRC=https://download.pytorch.org/whl/torch_stable.html
$ pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f $SRC

This will install PyTorch 1.5.1 and CUDA 10.1 from the source specified.

Dataset and Preprocessing

Obtaining our dataset

We use the both the Training and Validation dataset from the Multimodal Brain Tumour Segmentation Challenge 2020 (BraTS2020) in our experiments. The Training dataset consists of 369 entries, while the Validation dataset consists of 125 entries. We are using the Validation dataset as our Test dataset due to the unavailability of the actual Test dataset.

The dataset folder can be downloaded from Google Drive or OneDrive by clicking either one of the links and downloading the folder.

The downloaded folder should be extracted into the root directory of this project. The resulting path for the dataset folder should be brats-pretraining-jiachenn/dataset that includes two folders, MICCAI_BraTS2020_TrainingData and MICCAI_BraTS2020_ValidationData

Pre-processing our dataset

Our dataset is first preprocessed into a format that the code accepts by using the BraTS preprocessing example from batchgenerators/examples/brats2017 courtesy of Fabian Issense.

The first step is to check the config.py file. The file works with just the relative paths to the downloaded dataset and destination folder instead of requiring absolute paths. However there are subtle differences when using a Windows-based system and a Unix-based system due to the foward-slash / used in Unix paths, and the backslash \\ used in Windows paths.

For a Windows-based system, your config.py should look like this:

brats_preprocessed_destination_folder_train_2020 = "brats_data_preprocessed\\Brats20TrainingData"
brats_folder_with_downloaded_data_training_2020 = "dataset\\MICCAI_BraTS2020_TrainingData"

brats_preprocessed_destination_folder_test_2020 = "brats_data_preprocessed\\Brats20ValidationData"
brats_folder_with_downloaded_data_test_2020 = "dataset\\MICCAI_BraTS2020_ValidationData"

num_threads_for_brats_example = 8

For a Unix-based system, your config.py should look like this:

brats_preprocessed_destination_folder_train_2020 = "brats_data_preprocessed/Brats20TrainingData"
brats_folder_with_downloaded_data_training_2020 = "dataset/MICCAI_BraTS2020_TrainingData"

brats_preprocessed_destination_folder_test_2020 = "brats_data_preprocessed/Brats20ValidationData"
brats_folder_with_downloaded_data_test_2020 = "dataset/MICCAI_BraTS2020_ValidationData"

num_threads_for_brats_example = 8

Run the commands provided below to perform the preprocessing step. It is worth noting that the data is preprocessed into .npy files and takes up significantly more disk space (22GB preprocessed training data compared to 3GB for raw training data!!) compared to the original .nii.gz files.

The preprocessing command is different for both Windows and Unix based systems due to their path differences.

For Unix-based systems (Linux, MacOS), run the following:

$ python preprocessing.py -type Training
$ python preprocessing.py -type Test

For Windows-based systems, run the following:

$ python preprocessing.py -type Training -os Windows
$ python preprocessing.py -type Test -os Windows

All raw data will be preprocessed into the directory brats_data_preprocessed/Brats20TrainingData and brats_data_preprocessed/Brats20ValidationData respectively.

Training and Predictions

Running the training process

We perform our training process on the University's GPU cluster that uses the Slurm scheduler. However, we provide commands for both running on the scheduler, or running it locally, provided you have a CUDA-enabled GPU.

Training is performed over 50 epochs and a model is saved after 50 epochs. The available arguments that are able to be passed to the train command can be displayed with python main.py -h which will bring up the following:

Require value as arguments flags

The general format for these flags are -flag <value> or --flag <value>, e.g. --epochs 50

-name : Name of the model

--batch_size: Batch size (default: 12 for RTX2080Ti 11GB)

--patch_depth: Patch depth to reisze image to (default: 24)

--patch_width: Patch width to reisze image to (default: 128)

--patch_height: Patch height to reisze image to (default: 128)

--training_max: Number of patients to train on training set (range 0 - 369)

--training_batch_size: Size of training minibatch (default: 100)

--validation_batch_size: Size of validation minibatch (default: 100)

--brats_train_year: Year of the BraTS Training dataset

--brats_test_year: Year of the BraTS Testing dataset

--num_channels: Number of input channels to the model (default: 4)

--learning_rate: Sets the learning rate for the model training (default: 1e-3)

--epochs: Number of training epochs (default: 50)

--seed: Can be a random value for PyTorch weight initialization

Toggle-basis flags

These flags default to a certain value if not added into the training command e.g. putting --no_gpu in the training command uses CPU instead of GPU for training.

--no_validation: Choose whether to use validation set

--no_gpu: If selected, uses CPU for training. Else, uses GPU

--no_multiclass: If enabled, only trains on Tumor Core, else trains on all classes provided.

--no_pretrained: Whether to use pretrained model or not

If you are running on the University's GPU cluster using the Slurm scheduler, use the command to run our configuration:

$ sbatch run_experiments_3d.sh

However if you are running locally with a CUDA-enabled device, use this command to run our configurations:

$ python3 main.py -name albunet_4_channels_1 --batch_size 12 --num_channels 4 --seed 1

When training is completed, the model produced will be saved to brats-pretraining-jiachenn/saved_models with the name specified in the -name argument.

Producing segmentation output

To produce the segmentation output, we require a crucial piece of information, the model name. We have standardized the model name for ease of reproducing our results, hence the commands below can be ran as it is. Test set predictions are saved to segmentation_output/<model name>.

Require value as arguments flags

-model_name: Name of the model

--patch_depth: Patch depth of the image (default: 24)

--patch_width: Patch width of the image (default: 24)

--patch_height: Patch height of the image (default: 24)

--epochs_max: Number of epochs the model trained (default: 50)

--brats_test_year: Year of the BraTS Testing dataset

-testing_train_set: Whether to use training set as the prediction set (default=0 meaning not using training set)

--num_channels: Number of input channels of the model (default: 4)

Toggle-basis flags

These flags default to a certain value if not added into the training command e.g. putting --no_gpu in the training command uses CPU instead of GPU for training.

--no_gpu: If selected, uses CPU for training. Else, uses GPU

--no_multiclass: If enabled, only trains on Tumor Core, else trains on all classes provided.

If you are running on the University's GPU cluster with the Slurm scheduler:

$ sbatch produce_seg_output_saved_models.sh

If you are running locally,

$ python3 run_saved_models.py -model_name albunet_4_channels_1 --num_channels 4

Evaluation of Results

To evaluate our results, the segmentation output has to be uploaded to CBICA Image Processing Portal. You must first create an account, and await approval from the administrators. Once the account has been approved, kindly select BraTS'20 Validation Data: Segmentation Task under the MICCAI BraTS 2020 section, and upload the segmentation labels into the space provided. It will take some time for the portal to process the results, and the output will be a .zip file containing log files, and a stats_validation_final.csv file which contains our results.

Common issues: Sometimes the .zip file comes back without stats_validation_final.csv, in this case just create another job and upload the labels again.

A few Jupyter Notebooks have been created under brats-pretraining_jiachenn/Jupyter_Notebooks to provide interpretations of the results, the notebooks are as follows:

Dice-Plots.ipynb - Interprets our results
Read-Logs.ipynb - Provides graphs of training-validation training loss progress
Seg-Graphic.ipynb - Visualizes the segmentation output from segmentation_output/ and provides options to save figures.

Common Errors

CUDA out of memory error

Try reducing the batch size from 12 to a lower number. We recommend using the RTX2080Ti 11GB for our processing.

Expected Directory Structure

The structure of this repository should be as follows:

brats-pretraining_jiachenn
│   .gitignore
│   brats_data_loader.py
│   config.py
│   Dice-Plots.ipynb
│   Fixing Preprocessing.ipynb
│   loss.py
│   main.py
│   models.py
│   preprocessing.py
│   produce_seg_output_saved_models.sh
│   Read-Logs.ipynb
│   README.md
│   requirements.txt
│   run_experiments_3d.sh
│   run_saved_models.py
│   Seg-Graphic.ipynb
│   tb_log_reader.py
│   train_test_function.py
│   Trying_Resize.ipynb
│
├───brats_data_preprocessed
│   ├───Brats20TrainingData
│   │       BraTS20_Training_001.npy
│   │       BraTS20_Training_001.pkl
│   │
│   └───Brats20ValidationData
│           BraTS20_Validation_001.npy
│           BraTS20_Validation_001.pkl
│
├───CSV_Results
│       README.md
│
├───dataset
│   ├───MICCAI_BraTS2020_TrainingData
│   │   └───BraTS20_Training_001
│   │           BraTS20_Training_001_flair.nii.gz
│   │           BraTS20_Training_001_seg.nii.gz
│   │           BraTS20_Training_001_t1.nii.gz
│   │           BraTS20_Training_001_t1ce.nii.gz
│   │           BraTS20_Training_001_t2.nii.gz
│   │
│   └───MICCAI_BraTS2020_ValidationData
│       └───BraTS20_Validation_001
│               BraTS20_Validation_001_flair.nii.gz
│               BraTS20_Validation_001_t1.nii.gz
│               BraTS20_Validation_001_t1ce.nii.gz
│               BraTS20_Validation_001_t2.nii.gz
│
├───figures
│       README.md
│
├───Images
│       brats2020 ipp.png
│       job section.png
│
├───saved_models
│       README.md
│
├───segmentation_output
├───tensorboard_logs
└───Unused_old_files
        augmented_test_data.py
        ternaus_unet_models.py

Jiachenn99 / brats-pretraining_jiachenn

COMP3003 Individual Dissertation

Recommended Hardware Specifications

Setup and Installation

Setting up a Virtual Environment and Installing Dependencies

Running on the University's GPU Cluster

Running Locally

Setting up PyTorch and CUDA

Running on the University's GPU Cluster

Running Locally on your Computer

Dataset and Preprocessing

Obtaining our dataset

Pre-processing our dataset

Training and Predictions

Running the training process

Require value as arguments flags

Toggle-basis flags

Producing segmentation output

Require value as arguments flags

Toggle-basis flags

Evaluation of Results

Common Errors

Expected Directory Structure

About

Languages