google-parfait / tensorflow-federated

An open-source framework for machine learning and other computations on decentralized data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tff differential privacy model stucks in learning process

deepquantum88 opened this issue · comments

aggregation_factory = tff.learning.model_update_aggregator.dp_aggregator(
noise_multiplier, clients_per_round)

sampling_prob = clients_per_round / total_clients

learning_process = tff.learning.algorithms.build_unweighted_fed_avg(
my_model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.01),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(1.0, momentum=0.9),
model_aggregator=aggregation_factory)

python=3.9.7
TF=2.11.0
TFF=0.48.0

the training of model stuck in learning_process " tff.learning.algorithms.build_unweighted_fed_avg". Can you please help.

This looks similar to #3756 - If what you are running has a call to tff.backends.native.set_sync_local_cpp_execution_context then I think it should be removed.

@zcharles8 Thank you for your response. I have not called to tff.backends.native.set_sync_local_cpp_execution_context.

I am using TFF tutorial for image classification with the changes as shown in above code.

Are you running this in colab? If so then you'll probably need to upgrade to TFF v0.52.0, which re-enabled colab support.

I am not running in colab. I am using TFF 0.48.0 version on my system.

Yeah, unfortunately TFF versions less than 0.52.0 generally don't have compatibility with colab. While 0.48.0 can be pip installed in colab, the execution stack doesn't work with colab (hence the indefinite hang). You'll need to upgrade to fix this.

I am not working in colab. And on my system , I am working with TFF 0.48.0, and not able to install TFF version >0.48.0

Is the hang issue can be solved in TFF 0.48.0 on local system?

Ah, sorry, I misread your comment. My recommendation would be to upgrade to TFF 0.52.0, I'm not sure if we have any mechanisms to fix things on older TFF versions.

Can you provide the full details of how you are running things? There's a template that loads when you file a bug that would be really helpful if you could fill out in detail. Otherwise it's extremely difficult to diagnose what the issue is. I've reproduced the template below.


Describe the bug
A clear and concise description of what the bug is. It is often helpful to
provide a link to a colab notebook that
reproduces the bug.

Environment (please complete the following information):

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Python package versions (e.g., TensorFlow Federated, TensorFlow):
  • Python version:
  • Bazel version (if building from source):
  • CUDA/cuDNN version:
  • What TensorFlow Federated execution stack are you using?

Note: You can collect the Python package information by running pip3 freeze
from the command line and most of the other information can be collected using
TensorFlows environment capture
script.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

The issue is that when i try to install TFF 0.52.0 on my system, it shows that no matching distribution for TFF0.52.0. It shows upto 0.48.0 version only.

TF=2.11.0
Python=3.9.7

There's more information in the template that would be really helpful here. Including:

  • What is your OS platform and distribution?
  • What CUDA versions?
  • Are you using any non-default TFF execution stacks?
  • A minimal repro of the behavior you are referring to.

DISTRIB_ID=CentOS
CentOS Linux Release 7.9.2009 (core)

The code works fine with python 3.9 upto TFF 0.48.0. But not able to install TFF 0.52.
Sorry, I did not understand "Are you using any non-default TFF execution stacks? "

@zcharles8 @ZacharyGarrett I updated the TFF to 0.52.0 and TF 2.11.0 on ubuntu my system

When i execute state = learning_process.initialize() of tff+differential privacy

The execution gets hang and not proceed further.

can you please help?

Even i run same file on google colab also, it hang at same point.

Can you print the result of running ldd --version on your system?

@michaelreneer here's the result
ldd (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35

@zcharles8 @ZacharyGarrett

I am working on my linux ubuntu system.
I have installed python3.9.0, TensorFlow 2.11.0, TFF=0.52.0

When i import the libraries (TFF). I throws a TypeError of unhashable type 'list'.

Even i tried to check lower version of TFF and versions>0.52.0. But it remains same.

I cannot run on colab because i have other dependencies also. So I want to install and rn the things on my system only.
Can you please help in this.

@zcharles8 @michaelreneer can you please help in it?
Even I tried tff version 0.61.0, but still it stucks at

data_frame = pd.DataFrame()
rounds = 100
clients_per_round = 50

for noise_multiplier in [0.0, 0.5, 0.75, 1.0]:
print(f'Starting training with noise multiplier: {noise_multiplier}')
data_frame = train(rounds, noise_multiplier, clients_per_round, data_frame)
print()

When you say it's stuck, what do you mean? Is it just taking a long amount of time? In particular, can you try reducing clients_per_round to something like 2, and rounds to something small like 3? That way we can see whether it is actually hanging indefinitely, or just slow.

As for the unhashable list error - I think we would need a full stack trace. In particular, this sounds like lists getting passed in as keys to a dictionary, but where?

I reduced the clients. But still it stucks in
learning_process = tff.learning.algorithms.build_unweighted_fed_avg(
my_model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.01),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(1.0, momentum=0.9),
model_aggregator=aggregation_factory)

and not processing further for several hours indefinite. I am executing it in virtual box ubuntu.

@zcharles8

Can you check that (1) you can run simpler TFF computations (eg. if you follow the examples in https://www.tensorflow.org/federated/tutorials/building_your_own_federated_learning_algorithm#federated_computations, do these terminate?)

and (2) that the dataset is loading correctly? Eg. after running https://www.tensorflow.org/federated/tutorials/federated_learning_with_differential_privacy#download_and_preprocess_the_federated_emnist_dataset, can you do something like

train_data.create_tf_dataset_for_client(train_data.client_ids[0])

Basically, I'm trying to figure out what call exactly is hanging.

@zcharles8 the first statement does not terminate
2nd statment working perfect.

i am not getting what can be the reason behind first statment ..not terminating..

@zcharles8 I thought instead of doing tff +differential privacy tutorial...i tried to switch to differential privacy aggregators

I am trying to use it in simple tff tutorial for image classification...https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification

I replaced only this part training_process = tff.learning.algorithms.build_weighted_fed_avg(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0))

with
dp_mean=tff.learning.dp_aggregator(noise_multiplier=0.2, clients_per_round=10)

training_process = tff.learning.algorithms.build_unweighted_fed_avg(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.02),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
model_aggregator=dp_mean,
use_experimental_simualtion_loop=False)

it worked but on initialzing it stucks in indefinite loop
train_state = training_process.initialize()

Please help..

The fact that even a simple computation like

@tff.federated_computation(tff.FederatedType(tf.float32, tff.CLIENTS))
def get_average_temperature(client_temperatures):
  return tff.federated_mean(client_temperatures)

get_average_temperature([68.5, 70.3, 69.8])

hangs indefinitely suggests a problem in the TFF installation itself. One last thing to help verify this. Can you try running

tff.federated_computation(lambda: 'Hello, World!')()

If this hangs too, then it's likely the installation of TFF. I would strongly recommend trying to re-install TFF, using the latest available version.

@zcharles8 it hangs too. I tried installing tff 0.48.0 still not worked.

is there any way that tff.learning.dp_aggregator(noise_multiplier=0.2, clients_per_round=10) can work with lower version?

I think we're up to v0.61.0 or something like that. Is there a reason you can't use that version?

As for earlier versions - I'm not sure. You're welcome to see what the corresponding tutorial looked like in v0.48.0. We keep a record of all the versions as tags on github, and generally try to keep the tutorials up-to-date with the versions.

@zcharles8 because the university server cannot support more than v0.48.0
and then i tried installing virtual box ubuntu on my system and installed v0.52.0, then diff privacy tutorial hangs there.

it worked only on colab. but i need to use install tfq nightly version, which not able to install on colab..

Is TFQ = TensorFlow Quantum? It's very possible that there are incompatible dependencies between the two. Moreover, I don't know off the top of my head if TFQ will work in the context of a TFF computation.

@zcharles8 yes that is tensorflow quantum. yes it worked and developed algorithms between the two.
But stucks when i am trying to work with differential privacy, tff and tfq

Based on your responses above, the problem is likely that TFF isn't installed correctly, not because of any specific differential privacy code in TFF. Again, there is no guarantee that TFQ will work in the context of a tff.federated_computation. Adding support is a nice feature request, but one that we likely do not have the capacity to add.

In light of this, I don't think there are any specific recommendations I can give. You could potentially look at the various versions of TFF and TFQ and try to see if there is some version of both that have compatible dependencies (eg. TensorFlow versions, python versions, numpy versions, etc.). This could be difficult though.

@zcharles8 Thank you for your prompt response.

I am trying other way..
using custom https://www.tensorflow.org/federated/tutorials/composing_learning_algorithms

can i use differential privacy tff dp aggreagtor in above custom code...

dp_mean=tff.learning.dp_aggregator(noise_multiplier=0.2, clients_per_round=10)

The tutorial you link is fully compatible with DP aggregators. Again, the problem is that something about your installation of TFF is making it so that all federated computations hang indefinitely. This really needs to be solved by re-installing TFF, ideally a newer version.

@zcharles8 Thank you for your prompt response. I installed new version after installing ubuntu and reming virtual box.it worked.

I am trying other way..
using custom https://www.tensorflow.org/federated/tutorials/composing_learning_algorithms

can i use differential privacy tff dp aggreagtor in above custom code...

dp_mean=tff.learning.dp_aggregator(noise_multiplier=0.2, clients_per_round=10)

You can replace the aggregator factory in https://www.tensorflow.org/federated/tutorials/composing_learning_algorithms#defining_the_building_blocks with whatever aggregator you want (including a DP aggregator).

You can also add such aggregators to the standard FedAvg API: https://www.tensorflow.org/federated/api_docs/python/tff/learning/algorithms/build_weighted_fed_avg

Given that you said that it works, I'm going to mark this as resolved. If you have other bugs, please file them in a separate issue.