Ongoing research training transformer language models at scale, including: BERT & GPT-2
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
nicosouth opened this issue 9 months ago · comments
hi, I looked up a lot of information.
but I still don't understand the difference between zero-3 and megatron with zero-2.
they all split the model.