Code for Gradient Inverting Model

Question

Code for Gradient Inverting Model

Siyuan89 opened this issue 3 years ago · comments

Hi

This repository is referenced in the following paper:
Kaissis, Georgios, et al. "End-to-end privacy preserving deep learning on multi-institutional medical imaging." Nature Machine Intelligence 3.6 (2021): 473-484.

But I cannot find any code about the gradient inverting model. You mention you can successfully invert images from ResNet18 model with high resolution of (224,224) by only using the iDLG approach.

The current version of iDLG does not get anywhere near to what is claimed in this paper, and also there seems to be many tips and tricks adopted. So I am highly interested to know how this is done in your work.

thanks

Georgios (George) Kaissis · Answer 1 · Tue Jul 20 2021 01:26:58 GMT+0800 (China Standard Time)

Thank you for your comment! We utilised the technique presented here. The work has high performance on large-size images (see here). We are not able to provide the code in our repository, as the repository of the work mentioned above is unlicensed and we cannot assume it is permissible to disseminate it either originally or in modified form. We would recommend to contact the authors of the above-mentioned repository directly, as they very likely also have more experience with their own technique and may be able to provide concrete guidance for your use-case.

Siyuan89 · Answer 2 · Tue Jul 20 2021 01:47:31 GMT+0800 (China Standard Time)

Thank you for your response. However, what is claimed in your paper is totally different than the work/code by Geiping. Here's the major items:

You claim that your inversion method is based on iDLG (code here), and not by Geiping. Although both approaches share similarities, they are very different.
The work by Geipin has several issues that seem to be addressed in your work. First, it only works for batch size of 1 in practice. Second, it is very hard to utilize it for large size images for networks that are trained from scratch. Also, the image you reference (link here) uses a pretrained ResNet to compute the ground truth gradients. So it is not applicable to a real federated learning setup.

Since your method seems to ,somehow, address all these issues, why not releasing a version that shows how the attack is done ? I think it would be possible to publish code which is inspired by another repository if properly cited.

I look forward to your response, as by following the instructions in the paper, I am not able to reproduce any of the attack results. I think others feel the same way.

Thanks

Georgios (George) Kaissis · Answer 3 · Tue Jul 20 2021 16:58:24 GMT+0800 (China Standard Time)

Thank you for your input and sorry if there was a misunderstanding. We utilised the method by Geiping with the modifications detailed in our manuscript (AdamW and uniform initialisation). We did not attempt to improve the gradient-based reconstruction technique beyond this. In fact, we also found the same issues you mention. As seen in Figure 4, the method does not provide good reconstructions at high batch sizes for large images or when gradient norms are low and fails to work when SMPC or Differential Privacy are used. If your are trying to to improve the attack methodology, we still feel it would be best to contact the creators of the original techniques.

Siyuan89 · Answer 4 · Wed Jul 21 2021 07:20:27 GMT+0800 (China Standard Time)

Thank you @gkaissis for your detailed response. I was also wondering if the ground truth gradient were obtained by using the model updates ( taking the difference between first and last local iteration ) ? I believe this is the right way to do in federated learning scenario. But it does not seem to work properly ( maybe impacted by the optimizer which is Adam in my case ) ?

The original implementation obtains gradients by simply feeding the input into the network. This also does not take into account the effect of local training.

Georgios (George) Kaissis · Answer 5 · Wed Jul 21 2021 16:09:17 GMT+0800 (China Standard Time)

In our case the federation is exchanging gradient updates directly, so we attacked the gradients. However, as you correctly state, when the federation is exchanging weight updates, it is possible to reconstruct the gradient from a weight update (the gradient is a linear operation and the learning rate is known so a linear interpolation is a good approximation). The Adam optimiser may indeed be problematic as it re-uses previous gradients and some works find that it's not optimal for FL (it also does not work very well with differential privacy).

Siyuan89 · Answer 6 · Wed Jul 21 2021 23:53:41 GMT+0800 (China Standard Time)

I see.. thanks for your response. Although I wonder if the "Adam Optimiser" is problematic for successful attack in your case or differential privacy as well?
I have seen some works adopting Adam successfully for some differential privacy algorithms.

I Would appreciate your clarification.

Alex Ziller · Answer 7 · Mon Jul 26 2021 16:16:57 GMT+0800 (China Standard Time)

We have not analysed the exact implications of Adam. Certainly we think the DP mechanism prevents the attack, as reconstructions with pure Adam were successful, as you can see in the paper.