lxuechen / private-transformers

A codebase that makes differentially private training of transformers easy.

Home Page:https://arxiv.org/abs/2110.05679

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is the best way to handle large models?

Pier297 opened this issue · comments

Hi all,
I was trying to fine-tune GPT-J 6B but I encounter Out Of Memory errors if I use a single-gpu, for non-private training I managed to solve them by using deepspeed but it seems that I cannot use that with opacus or with this codebase. Do you know how I could solve this problem?
Thank you in advance:)

Hi,

Thanks for your interest. I have detailed thoughts on this, but the short answer is that we likely need to make some non-trivial changes to the codebase to enable that (If you have 80G A100 GPUs, things might be easier).

If you're interested in making progress on this, I'm happy to chat in depth via email.

Thanks.

Closing this now.