KarhouTam / FL-bench

Benchmark of federated learning. Dedicated to the community. πŸ€—

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Image

GitHub License GitHub closed issues GitHub Repo stars GitHub Repo forks

Evaluating Federated Learning Methods.

Realizing Your Brilliant Ideas.

Having Fun with Federated Learning.

πŸŽ‰ FL-bench now can perform FL training in parallel (with the help of ray)! πŸŽ‰

Methods 🧬

Traditional FL Methods
Personalized FL Methods
FL Domain Generalization Methods

Environment Preparation 🧩

Just select one of them.

PyPI 🐍

pip install -r .environment/requirements.txt

Poetry 🎢

# For those China mainland users
poetry install --no-root -C .environment

# For those oversea users
cd .environment && sed -i "10,14d" pyproject.toml && poetry lock --no-update && poetry install --no-root

Docker 🐳

# For those China mainland users
docker pull registry.cn-hangzhou.aliyuncs.com/karhoutam/fl-bench:master

# For those oversea users
docker pull ghcr.io/karhoutam/fl-bench:master
# or
docker pull docker.io/karhoutam/fl-bench:master

# An example for building container
docker run -it --name fl-bench -v path/to/FL-bench:/root/FL-bench --privileged --gpus all ghcr.io/karhoutam/fl-bench:master

Easy Run πŸƒβ€β™‚οΈ

ALL classes of methods are inherited from FedAvgServer and FedAvgClient. If you wanna figure out the entire workflow and detail of variable settings, go check src/server/fedavg.py and src/client/fedavg.py.

Step 1. Generate FL Dataset

# Partition the MNIST according to Dir(0.1) for 100 clients
python generate_data.py -d mnist -a 0.1 -cn 100

About methods of generating federated dastaset, go check data/README.md for full details.

Step 2. Run Experiment

python main.py <method> [your_config_file.yml] [method_args...]

❗ Method name should be identical to the .py file name in src/server.

# Run FedAvg with default settings. 
python main.py fedavg

How To Customize FL method Arguments πŸ€–

  • By modifying config file
  • By explicitly setting in CLI, e.g., python main.py fedprox config/my_cfg.yml --mu 0.01.
  • By modifying the default value in src/utils/constants.py/DEFAULT_COMMON_ARGS or src/server/<method>.py/get_<method>_args()

⚠ For the same FL method argument, the priority of argument setting is CLI > Config file > Default value.

For example, the default value of fedprox.mu is 1,

# src/server/fedprox.py
def get_fedprox_args(args_list=None) -> Namespace:
    parser = ArgumentParser()
    parser.add_argument("--mu", type=float, default=1.0)
    return parser.parse_args(args_list)

and your .yml config file has

# your_config.yml
...
fedprox:
  mu: 0.01
python main.py fedprox                           # fedprox.mu = 1
python main.py fedprox your_config.yml           # fedprox.mu = 0.01
python main.py fedprox your_config.yml --mu 10   # fedprox.mu = 10

Monitor πŸ“ˆ

FL-bench supports visdom and tensorboard.

Activate

πŸ‘€ NOTE: You needs to launch visdom / tensorboard server by yourself.

# your config_file.yml
common:
  ...
  visible: tensorboard # options: [null, visdom, tensorboard]

Launch visdom / tensorboard Server

visdom
  1. Run python -m visdom.server on terminal.
  2. Go check localhost:8097 on your browser.

tensorboard

  1. Run tensorboard --logdir=<your_log_dir> on terminal.
  2. Go check localhost:6006 on your browser.

Parallel Training via Ray πŸš€

This feature can vastly improve your training efficiency. At the same time, this feature is user-friendly and easy to use!!!

Activate (What You ONLY Need To Do)

# your_config_file.yml
mode: parallel
parallel:
  num_workers: 2 # any positive integer that larger than 1
  ...
...

Manually Create Ray Cluster (Optional)

A Ray cluster would be created implicitly everytime you run experiment in parallel mode. Or you can create it manually by the command shown below to avoid creating and destroying cluster every time you run experiment.

ray start --head [OPTIONS]

πŸ‘€ NOTE: You need to keep num_cpus: null and num_gpus: null in your config file for connecting to a existing Ray cluster.

# your_config_file.yml
# Connect to an existing Ray cluster in localhost.
mode: parallel
parallel:
  ...
  num_gpus: null
  num_cpus: null
...

Common Arguments πŸ”§

All common arguments have their default value. Go check DEFAULT_COMMON_ARGS in src/utils/constants.py for full details of common arguments.

⚠ Common arguments cannot be set via CLI.

You can also write your own .yml config file. I offer you a template in config and recommend you to save your config files there also.

One example: python main.py fedavg config/template.yaml [cli_method_args...]

About the default values of specific FL method arguments, go check corresponding FL-bench/src/server/<method>.py for the full details.

Arguments Type Description
dataset str The name of dataset that experiment run on.
model str The model backbone experiment used.
seed int Random seed for running experiment.
join_ratio float Ratio for (client each round) / (client num in total).
global_epoch int Global epoch, also called communication round.
local_epoch int Local epoch for client local training.
finetune_epoch int Epoch for clients fine-tunning their models before test.
test_interval int Interval round of performing test on clients.
eval_test bool true for performing evaluation on joined clients' testset before and after local training.
eval_val bool true for performing evaluation on joined clients' valset before and after local training.
eval_train bool true for performing evaluation on joined clients' trainset before and after local training.
optimizer dict Client-side optimizer. Argument request is the same as Optimizers in torch.optim.
lr_scheduler dict Client-side learning rate scheduler. Argument request is the same as schedulers in torch.optim.lr_scheduler.
verbose_gap int Interval round of displaying clients training performance on terminal.
batch_size int Data batch size for client local training.
use_cuda bool true indicates that tensors are in gpu.
visible bool Options: [null, visdom, tensorboard]
straggler_ratio float The ratio of stragglers (set in [0, 1]). Stragglers would not perform full-epoch local training as normal clients. Their local epoch would be randomly selected from range [straggler_min_local_epoch, local_epoch).
straggler_min_local_epoch int The minimum value of local epoch for stragglers.
external_model_params_file str The model parameters .pt file relative path to the root of FL-bench. ⚠ This feature is enabled only when unique_model=False, which is pre-defined by each FL method.
save_log bool true for saving algorithm running log in out/<method>/<start_time>.
save_model bool true for saving output model(s) parameters in out/<method>/<start_time>.pt`.
save_fig bool true for saving the accuracy curves showed on Visdom into a .pdf file at out/<method>/<start_time>.
save_metrics bool true for saving metrics stats into a .csv file at out/<method>/<start_time>.
check_convergence bool true for reporting convergence result after experiment ends.

Parallel Training Arguments πŸ‘―β€β™‚οΈ

Arguments Type Description
num_workers int The number of parallel workers. Need to be set as an integer that larger than 1.
ray_cluster_addr str The IP address of the selected ray cluster. Default as null, which means if there is no existing ray cluster, ray will build a new cluster everytime you run the experiment and destroy it at the end. More details can be found in the official docs.
num_cpus and num_gpus int The amount of computational resources you allocate for your Ray cluster. Default as null for all.

Supported Models πŸš€

This benchmark supports bunch of models that common and integrated in Torchvision:

  • ResNet family
  • EfficientNet family
  • DenseNet family
  • MobileNet family
  • LeNet5 ...

πŸ€— You can define your own custom model by filling the CustomModel class in src/utils/models.py and use it by defining model: custom.

Supported Datasets 🎨

Regular Image Datasets

  • MNIST (1 x 28 x 28, 10 classes)

  • CIFAR-10/100 (3 x 32 x 32, 10/100 classes)

  • EMNIST (1 x 28 x 28, 62 classes)

  • FashionMNIST (1 x 28 x 28, 10 classes)

  • Syhthetic Dataset

  • FEMNIST (1 x 28 x 28, 62 classes)

  • CelebA (3 x 218 x 178, 2 classes)

  • SVHN (3 x 32 x 32, 10 classes)

  • USPS (1 x 16 x 16, 10 classes)

  • Tiny-ImageNet-200 (3 x 64 x 64, 200 classes)

  • CINIC-10 (3 x 32 x 32, 10 classes)

Domain Generalization Image Datasets

Medical Image Datasets

Customization Tips πŸ’‘

Implementing FL Method

The package() at server-side class is used for assembling all parameters server need to send to clients. Similarly, package() at client-side class is for parameters clients need to send back to server. You should always has super().package() in your override implementation.

  • Consider to inherit your method classes from FedAvgServer and FedAvgClient for maximum utilizing FL-bench's workflow.

  • For customizing your server-side process, consider to override the package() and aggregate().

  • For customizing your client-side training, consider to override the fit() or package().

You can find all details in FedAvgClient and FedAvgServer, which are the bases of all implementations in FL-bench.

Integrating Dataset

  • Inherit your own dataset class from BaseDataset in data/utils/datasets.py and add your class in dict DATASETS.

Customizing Model

  • I offer the CustomModel class in src/utils/models.py and you just need to define your model arch.
  • If you want to use your customized model within FL-bench's workflow, the base and classifier must be defined. (Tips: You can define one of them as torch.nn.Identity() for bypassing it.)

Citation 🧐

@software{Tan_FL-bench,
  author = {Tan, Jiahao and Wang, Xinpeng},
  license = {GPL-2.0},
  title = {{FL-bench: A federated learning benchmark for solving image classification tasks}},
  url = {https://github.com/KarhouTam/FL-bench}
}

@misc{tan2023pfedsim,
  title={pFedSim: Similarity-Aware Model Aggregation Towards Personalized Federated Learning}, 
  author={Jiahao Tan and Yipeng Zhou and Gang Liu and Jessie Hui Wang and Shui Yu},
  year={2023},
  eprint={2305.15706},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

About

Benchmark of federated learning. Dedicated to the community. πŸ€—

License:GNU General Public License v2.0


Languages

Language:Python 96.6%Language:Shell 3.3%Language:Dockerfile 0.1%