What is the best way to handle large models?

Question

What is the best way to handle large models?

Pier297 opened this issue 2 years ago · comments

Pier Paolo Tarasco commented 2 years ago

Hi all,
I was trying to fine-tune GPT-J 6B but I encounter Out Of Memory errors if I use a single-gpu, for non-private training I managed to solve them by using deepspeed but it seems that I cannot use that with opacus or with this codebase. Do you know how I could solve this problem?
Thank you in advance:)

Xuechen Li · Answer 1 · Wed Dec 28 2022 03:49:11 GMT+0800 (China Standard Time)

Hi,

Thanks for your interest. I have detailed thoughts on this, but the short answer is that we likely need to make some non-trivial changes to the codebase to enable that (If you have 80G A100 GPUs, things might be easier).

If you're interested in making progress on this, I'm happy to chat in depth via email.

Thanks.

Xuechen Li · Answer 2 · Thu Dec 29 2022 15:05:49 GMT+0800 (China Standard Time)

Closing this now.