Customize loss function / adding regularizer under privacy setting?
shi-kejian opened this issue · comments
Hi, thanks again for the great work and the codebase!
I have a question -- how I'd want to customize loss function in the codebase? I've been trying to do that, e.g. adding a per-example L1 regularization term to vector_loss
in trainer, but I didn't manage to get it running after several attempts.
There's a related discussion/PR in Opacus codebase pytorch/opacus#249.
However, there're a few tricky things I can see:
-- In private-transformers, backward() behavior is not managed on the user end.
-- also, 1-D vector_loss is required for private gradient update - optimizer.step
or optimizer.virtual_step
My intuition is that I can add to vector_loss
(per-example loss) at this line before the loss gets passed to the privacy engine.
However I am afraid privacy concern is also an issue. I am aware of that Private-Transformers overrides compute_loss()
in HF trainer, to exclude regularization terms that might mess up with privacy accounting.
Sorry my question is not super detailed but I hope this makes sense and really appreciate for any comments.
Thank you!
Yeah, regularization can be tricky to get right if you want to achieve that by adding some additional loss.
Since you're doing L1 regularization, the subderivative is essentially a vector of signs. You could directly modify the summed gradient at this line. This doesn't invalidate the privacy guarantee, since parameters from the last round are already privatized.
I am aware of that Private-Transformers overrides compute_loss() in HF trainer
This is not true. private-transformers
doesn't interact with the HF trainer class in any way -- only the model classes.
Thank you for the comments and the correction about the trainer class.
Sorry for the confusion -- the 'L1 regularization' was just an arbitrary example.
What if I want to add a loss term to per-example loss and then privatize the gradients?
The algorithm is like:
where R is a task-specific loss term
I'd want the gradients to be privatized altogether (followed by gaussian noise on a batch/lot)
Can you provide some feedbacks/ comments? Thank you!
If your loss is based on just the outputs of a supported model, the approach combining all losses into a single vector_loss
should work in principle. What is the specific error you get?
Sorry for late reply -- I got it fixed and it was not really a relevant error. I'll close the issue for now, and will ask further questions along the way. Thank you. Hope this thread is helpful for similar questions.