bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The difference between zero-3 and megatron with zero-2

nicosouth opened this issue · comments

hi, I looked up a lot of information.

but I still don't understand the difference between zero-3 and megatron with zero-2.

they all split the model.