Ongoing research training transformer language models at scale, including: BERT & GPT-2
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
Bob-cby opened this issue a year ago · comments
Is Megatron-DeepSpeed only targeting specific models such as GPT-2? Can it support parallel partitioning of relatively lightweight models such as CLIP?