Privacy guarantees of of privacy amplification by iteration example

Question

Privacy guarantees of of privacy amplification by iteration example

tudorcebere opened this issue a year ago · comments

Hi!

First, thanks for this excellent library and for publishing research experiments!

I have questions about the privacy amplification by iteration script. Could the authors provide a clear explanation of the following:

Which theorem are they using for privacy accounting?
How was the theorem implemented in tensorflow privacy?

As far as I understand from this file (but please correct me if I am wrong), TF Privacy is computing a average over clipped gradients, and then noise has a scale of sensitivity * noise_multiplier. So the updates rule is

$W_{T+1} = W_T - \eta(\frac{1}{B} (\underset{x \in B_i}{\sum}clip(\underset{W_t}{\nabla} loss(x, W_t), C)) + N(0, C^ 2\sigma^2))$

Where $\eta$ is the learning rate, C is the sensitivity, and B is the batch size. To account for this, the authors correctly multiply the noise term with the batch size so they can derive the correct privacy amplification by iteration guarantees, rewriting the above term as:

$W_{T+1} = W_T - \frac{\eta}{B}( (\underset{x \in B_i}{\sum}clip(\underset{W_t}{\nabla} loss(x, W_t), C) + N(0, B^2 C^2 \sigma^2))$

That's how we can observe a RDP coefficient of:

$\alpha \frac{2}{\sigma^2 B^2} \mathcal{O}(T^{-1})$

Now, this is neat, but I am not sure this is comparable with the analysis of DP-SGD from here, as they are considering an update rule of:

$W_{T+1} = W_T - \frac{\eta}{B}(\underset{x \in B_i}{\sum}clip( \underset{W_t}{\nabla} loss(x, W_t), C)) + N(0, C^2 \sigma^2))$

For them to be comparable, shouldn't we scale $\sigma$ with $B$ when computing the RDP analysis for SGM here?