Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces (PACFL)

The official code of paper ''Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces''. [Accepted at AAAI 2023]

In this repository, we release the official implementation for PACFL algorithm. We also release the implementation of the following algorithms:

FedAvg
FedProx
FedNova
Scaffold
Per-FedAvg
IFCA
LG-FedAvg
CFL
MTL
pFedMe
SOLO

Usage

We provide scripts to run the algorithms, which are put under scripts/. Here is an example to run the script:

cd scripts
bash pacfl.sh

Please follow the paper to modify the scripts for more experiments. You may change the parameters listed in the following table.

The descriptions of parameters are as follows:

Parameter	Description
ntrials	The number of total runs.
rounds	The number of communication rounds per run.
num_users	The number of clients.
frac	The sampling rate of clients for each round.
local_ep	The number of local training epochs.
local_bs	Local batch size.
lr	The learning rate for local models.
momentum	The momentum for the optimizer.
model	Network architecture. Options: simple-cnn, resnet9
dataset	The dataset for training and testing. Options are discussed above.
partition	How datasets are partitioned. Options: `homo`, `noniid-labeldir`, `noniid-#label1` (or 2, 3, ..., which means the fixed number of labels each party owns).
datadir	The path of datasets.
logdir	The path to store logs.
log_filename	The folder name for multiple runs. E.g., with `ntrials=3` and `log_filename=$trial`, the logs of 3 runs will be located in 3 folders named `1`, `2`, and `3`.
alg	Federated learning algorithm. Options are discussed above.
beta	The concentration parameter of the Dirichlet distribution for heterogeneous partition.
local_view	If true puts local test set for each client
gpu	The IDs of GPU to use. E.g., 0
print_freq	The frequency to print training logs. E.g., with `print_freq=10`, training logs are displayed every 10 communication rounds.

MIX-4

We have also released the codes for MIX-4 experiments in the paper under mix4 folder. Please follow the same instruction as in usage to run the scripts for each algorithm.

Generalization to Unseen Clients

We have also released the codes for the generalization to unseen clients experiments in the paper under unseen_clients folder. Please follow the same instruction as in usage to run the scripts for each algorithm.

Citation

Please cite our work if you find it relavent to your research and used our implementations.

@article{vahidian2022efficient,
  title={Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces},
  author={Vahidian, Saeed and Morafah, Mahdi and Wang, Weijia and Kungurtsev, Vyacheslav and Chen, Chen and Shah, Mubarak and Lin, Bill},
  journal={arXiv preprint arXiv:2209.10526},
  year={2022}
}

Acknowledgements

Some parts of our code and implementation has been adapted from NIID-Bench repository.

Contact

If you had any questions, please feel free to contact me at mmorafah@eng.ucsd.edu

MMorafah / PACFL