avinsit123 / AIC_Project_Final_Notebooks

Final Project Notebooks for AIC Project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RetinaFed: Exploring Novel Techniques for Predicting Diabetic Retinopathy through Federated Learning

Group 15 Members: Avinash Swaminathan, Kanika Dhiman, Sagar Nandkumar Badlani, Shivam Agarwal

Introduction

In this project, we aim to explore different federated learning approaches to train distributed models on private healthcare data. We explore the federated learning setup for diabetic retinopathy area and delve deep into how different data distributions can affect the performance of our models

Federated Learning

The proposed Federated Learning process consists of 3 steps :-

  1. Local Training - Each Hospital serves as a local client that trains its local models on a unique dataset it hosts. The weights of the trained model are shared with a central server.
  2. Global Aggregration - The centralized server is connected with several clients (hospitals) and receives a locally trained model from each of them. To ensure further privacy, these weights can be encrypted while being sent over the network. The central server uses several strategies to create a globalized model from all the local models.
  3. Evaluation - The globalized model is evaluated on a validation/test set to determine F1 score. Further, it is sent to each of the hospitals/clients and replaces the local model.

The globalized model can be created using several strategies. We explore FedAvg, FedProx and FedOpt strategies in our work.

Dataset

  • Dataset: Diabetic Retinopathy 224x224 Gaussian Filtered
  • Source: Kaggle
  • Training-Validation Split: 80%-20%
  • Length of Dataset: 3662 samples of (224 * 224)
  • Classes:
    • No Diabetic Retinopathy: 1805
    • Mild: 370
    • Moderate: 999
    • Severe: 193
    • Proliferative Diabetic Retinopathy: 295

Scenarios

The table illustrates the different scenarios of data distribution that we experimented on

Nomenclature Cases Classes per Client Length per Client
Case 1 Easiest Case (I.I.D) Equal (=5) Equal
Case 2 Easy Case Random Equal
Case 3 Average Case Random Random
Case 4 Average Case 4 Random
Case 5 Average Case 3 Random
Case 6 Hard Case 2 Random
Case 7 Hardest Case 1 Random

Methodology

System Architecture

System Architecture

Hyperparameters

OpenFL Flower
Model GoogleNet (Inception-V1) with single FC layer AlexNet with single FC layer
Learning Rate 0.001 0.001
Optimizer Adam (FedOpt, FedAvg) Adam (FedOpt, FedProx, FedAvg)
Mu 0.8 (FedProx) 0.8 (FedProx)
Number of Collaborators / Clients 5 5
Implementation Library PyTorch PyTorch
Learnable Parameters 5125 20485
Batch Size 16 8

Experiment Results

All values denote model accuracy on global validation set.

OpenFL

FedAvg* FedOPT* FedProx*
Case 1 0.722372 0.699015 0.61705
Case 2 0.692098 0.71618 0.496689
Case 3 0.75039 0.755319 0.747816
Case 4 0.71875 0.683089 0.692641
Case 5 0.692641 0.678082 0.720419
Case 6 0.641992 0.696118 0.755319
Case 7 0.72861 0.475783 0.507422
Baseline 0.791667

OpenFL comparision

Flower

FedAvg* FedOPT* FedProx*
Case 1 0.722372 0.699015 0.61705
Case 2 0.692098 0.71618 0.496689
Case 3 0.75039 0.755319 0.747816
Case 4 0.71875 0.683089 0.692641
Case 5 0.692641 0.678082 0.720419
Case 6 0.641992 0.696118 0.755319
Case 7 0.72861 0.475783 0.507422
Baseline 0.791667

Flower Comparision

Code

OpenFL

Dataset

Download dataset from Kaggle: Link into the dataset folder. Perform Following operations in the main directory

cd dataset
unzip archive.zip
cd gaussian_filtered_images/gaussian_filtered_images/

All the OpenFL experiments can be running using OpenFL-Diabetic-Demo-CustomDataDistributor notebook In order to experiment on a particular data distribution use the data_splitter function.

import DataSplitterMethods
data_splitter = DataSplitterMethods.SplitFunctionGenerator("Equal-Equal-Split") #Use strings defined below to experiment on data distribution

# Pass this data splitter to fl_data in the train_splitter argument
fl_data = FederatedDataSet(train_data, train_labels, test_data, test_labels, 
                           batch_size = batch_size , num_classes = num_classes, 
                           train_splitter=data_splitter)

In the notebook, you only need to change the string in the cell for data splitter methods and run the entire notebook

Scenario String
Case 1 Equal-Equal-Split
Case 2 Random-Equal-Split
Case 3 Random-Unequal-Split
Case 4 4-Class-per-collab-split
Case 5 3-Class-per-collab-split
Case 6 2-Class-per-collab-split
Case 7 1-Class-per-collab-split

Flower

Note: Google Colab and Google Drive are needed for executing the Diabetic Retinopathy experiments using the Flower federated learning framework. This environment choice is dictated by the memory requirements and installation dependencies.

Dataset

Download dataset from Kaggle: Link into an archive folder. Upload the archive folder to your Google Drive. The resulting file structure will be as follows:

File Structure

Code Execution

All data distribution experiments for Flower FedAvg aggregation strategy can be executed by running the /Flower/Flower_Diabetic_Demo_Custom_Model_Fed_Avg_With_Images.ipynb notebook.

All data distribution experiments for Flower FedProx aggregation strategy can be executed by running the /Flower/Flower_Diabetic_Demo_Custom_Model_Fed_Prox_With_Images.ipynb notebook.

All data distribution experiments for Flower FedOpt aggregation strategy can be executed by running the /Flower/Flower_Diabetic_Demo_Custom_Model_Fed_Opt_With_Images.ipynb notebook.

About

Final Project Notebooks for AIC Project


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%