google-parfait / tensorflow-federated

An open-source framework for machine learning and other computations on decentralized data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to change the total number of users in federated learning experiments? (ex) https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/simulation/datasets/emnist.py)

Yeojoon opened this issue · comments

Is your feature request related to a problem? Please describe.
I am currently running some FL experiments with emnist dataset in the tensorflow-federated library (https://github.com/tensorflow/federated/blob/main/tensorflow_federated/python/simulation/datasets/emnist.py). The default total number of users for this dataset is 3400 (when only_digits=False). Is there any way to change the number of users for a particular dataset?

If not, would it be possible to add this feature? I believe this feature can be very helpful to researchers!

Thank you for your help a lot in advance!

Hi @Yeojoon. One easy way to do this would be to subsample client IDs from EMNIST. This gives you a smaller total number of clients, but also reduces the total numbers of examples seen, so it's probably not right for all settings.

A more robust way to proceed would be to use tff.simulation.datasets.TransformingClientData, which allows you to take a ClientData (eg. EMNIST) and expand each client into some number of sub-clients. This would allow you to experiment with larger population sizes.

If neither of these solutions are exactly what you're looking for, could you add some details about what kind of feature you're looking for?

Thank you for your quick and kind response!

What I want to do is increasing or decreasing the total number of users without changing the total number of data examples (For the emnist case, fix the total number of train examples as 341,873). So, I agree with you that the first method may not solve this problem.

Do you think I can use your second solution to solve this problem?

Could you provide a bit more detail here? How were you hoping to do this "re-partitioning", where the number of examples is fixed and the number of clients varies?

Note that [tff.simulation.datasets.TransformingClientData](https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/TransformingClientData) would allow increasing the number of clients, but would also increase the number of examples (essentially, each client would have their dataset "transformed" some number of times).

Do you mean the second solution increases the total number of examples?

I mean I would like to randomly choose the total number of users. Let's say # of total users = n. Then, for the emnist case, the number of train data in each user should be 341,873/n.