bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Megatron-DeepSpeed only applies to specific models?

Bob-cby opened this issue · comments

Is Megatron-DeepSpeed only targeting specific models such as GPT-2? Can it support parallel partitioning of relatively lightweight models such as CLIP?