Self-supervised Federated Learning (SSL-FL)

Label-Efficient Self-Supervised Federated Learning for Tackling Data Heterogeneity in Medical Imaging

*TL;DR: Pytorch implementation of the self-supervised federated learning framework proposed in our paper for simulating self-supervised classification on multi-institutional medical imaging data using federated learning.

Our framework employs masked image encoding as self-supervised task to learn efficient representations from images.
Extensive experiments are performed on diverse medical datasets including retinal images, dermatology images and chest X-rays.
In particular, we implement BEiT and MAE as the self-supervision learning module.

Pre-requisites:

Set Up Environment

conda env create -f environment.yml
NVIDIA GPU (Tested on Nvidia Tesla V100 32G x 4, and Nvidia GeForce RTX 2080 Ti x 8) on local workstations
Python (3.8.12), torch (1.7.1), numpy (1.21.2), pandas (1.4.2), scikit-learn (1.0.2), scipy (1.7.1), seaborn (0.11.2)

Data Preparation

We will release the data preparation instruction and the data soon.

	Retina	Derm	COVID-FL	Skin-FL
Link	link	TODO	TODO	TODO

Self-supervised Federated Learning for Medical Image Classification

In this paper, we choose ViT-B/16 as the backbone for all the methods:

BEiT-B: #layer=12; hidden=768; FFN factor=4x; #head=12; patch=16x16 (#parameters: 86M)

The models were pretrained with 224x224 resolution. The following tables provide the pre-trained checkpoints used in the paper.

Self-supervised Federated Pre-training

(i.e., pre-training directly on decentralized target task data)

You can run self-supervised Federated Pre-training on your own datasets with the following python files:

Fed-BEiT: beit/run_beit_pretrain_FedAvg.py
Fed-MAE: mae/run_mae_pretrain_FedAvg.py

If you want to test on new datasets, please modify datasets.py and FedAvg_utils/data_utils.py

Federated pre-training with Retina

Method	Pre-training Data	Central	Split-1	Split-2	Split-3
Fed-BEiT	Retina	download	download	download	download
Fed-MAE	Retina	download	download	download	download

Federated pre-training with COVID-FL

Method	Pre-training Data	Central	Real-world Split
Fed-BEiT	COVID-FL	download	download
Fed-MAE	COVID-FL	download	download

Supervised Pre-training with ImageNet-22k

Download the ViT-B/16 weights pre-trained on ImageNet-22k:

wget https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz

See more details in https://github.com/google-research/vision_transformer.

Self-supervised pre-training with ImageNet-22k

BEiT ImageNet: Download BEiT weights pre-trained on ImageNet-22k:

wget https://unilm.blob.core.windows.net/beit/beit_base_patch16_224_pt22k.pth

Download Dall-e tokenizers:

wget https://cdn.openai.com/dall-e/encoder.pkl
wget https://cdn.openai.com/dall-e/decoder.pkl

MAE ImageNet: Download MAE weights pretrained on ImageNet-22k:

wget https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth

Self-supervised Federated Fine-Tuning

You can also run self-supervised Federated Fine-tuning on your own datasets with the following python files:

Fed-BEiT: beit/run_class_finetune_FedAvg.py
Fed-MAE: mae/run_class_finetune_FedAvg.py

Scripts are in beit/script and mae/script. More details about model training will be added.

Funding

This work was funded by National Institutes of Health (NIH) under grants R01CA256890, R01CA227713, and U01CA242879.

Reference

The current work is on Arxiv and under review. If you find our work helpful in your research or if you use the code or datasets, please consider citing our paper.

Yan, R., Qu, L., Wei, Q., Huang, S.C., Shen, L., Rubin, D., Xing, L. and Zhou, Y., 2022. Label-Efficient Self-Supervised Federated Learning for Tackling Data Heterogeneity in Medical Imaging. arXiv preprint arXiv:2205.08576.

@article{yan2022label,
  title={Label-Efficient Self-Supervised Federated Learning for Tackling Data Heterogeneity in Medical Imaging},
  author={Yan, Rui and Qu, Liangqiong and Wei, Qingyue and Huang, Shih-Cheng and Shen, Liyue and Rubin, Daniel and Xing, Lei and Zhou, Yuyin},
  journal={arXiv preprint arXiv:2205.08576},
  year={2022}
}

Acknowledgements

This repository is based on BEiT and MAE.
The main FL setup is based on prior work "Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning"

kelenlv / SSL-FL