lxuechen / private-transformers

A codebase that makes differentially private training of transformers easy.

Home Page:https://arxiv.org/abs/2110.05679

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Customize loss function / adding regularizer under privacy setting?

shi-kejian opened this issue · comments

Hi, thanks again for the great work and the codebase!

I have a question -- how I'd want to customize loss function in the codebase? I've been trying to do that, e.g. adding a per-example L1 regularization term to vector_loss in trainer, but I didn't manage to get it running after several attempts.

There's a related discussion/PR in Opacus codebase pytorch/opacus#249.

However, there're a few tricky things I can see:
-- In private-transformers, backward() behavior is not managed on the user end.
-- also, 1-D vector_loss is required for private gradient update - optimizer.step or optimizer.virtual_step

My intuition is that I can add to vector_loss (per-example loss) at this line before the loss gets passed to the privacy engine.

However I am afraid privacy concern is also an issue. I am aware of that Private-Transformers overrides compute_loss() in HF trainer, to exclude regularization terms that might mess up with privacy accounting.

Sorry my question is not super detailed but I hope this makes sense and really appreciate for any comments.

Thank you!

Yeah, regularization can be tricky to get right if you want to achieve that by adding some additional loss.

Since you're doing L1 regularization, the subderivative is essentially a vector of signs. You could directly modify the summed gradient at this line. This doesn't invalidate the privacy guarantee, since parameters from the last round are already privatized.

I am aware of that Private-Transformers overrides compute_loss() in HF trainer

This is not true. private-transformers doesn't interact with the HF trainer class in any way -- only the model classes.

Thank you for the comments and the correction about the trainer class.
Sorry for the confusion -- the 'L1 regularization' was just an arbitrary example.
What if I want to add a loss term to per-example loss and then privatize the gradients?

The algorithm is like:
Screen Shot 2022-11-27 at 4 39 14 PM
where R is a task-specific loss term

I'd want the gradients to be privatized altogether (followed by gaussian noise on a batch/lot)

Can you provide some feedbacks/ comments? Thank you!

If your loss is based on just the outputs of a supported model, the approach combining all losses into a single vector_loss should work in principle. What is the specific error you get?

Sorry for late reply -- I got it fixed and it was not really a relevant error. I'll close the issue for now, and will ask further questions along the way. Thank you. Hope this thread is helpful for similar questions.