microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Details on ZeRO++ tutorials

R0n12 opened this issue · comments

I am trying to evaluate ZeRO ++ by following this tutorial.

I was trying to look for pretrain_zeropp_gpt.py in this repo but no luck, is this still being developed or will the regular pretrain_gpt.py just work with the ZeRO ++ configs?

Just wondering where I can find a concrete example to re-produce GPT-2 training with ZeRO ++.

I am using the main branch here and DeepSpeed v0.11.1
Much Appreicated!

The same question.
And have you reproduced the GPT2 training with ZeRO by pretrain_gpt.py ? I don't know hich script I should use. Will examples_deepspeed/rebase/ds_pretrain_gpt_125M.sh work? Thanks.

Looks like the scripts for ZeRO++ nor the appendix for https://arxiv.org/pdf/2306.10209.pdf are available? We can't run ZeRO++ and verify the paper's speedups without these.