microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pipeline parallelism + CPU offload?

webber26232 opened this issue · comments

The config below is required for running CPU offload along with Megatron features:
--no-pipeline-parallel --cpu-optimizer

Could anyone tell me why using Pipeline parallelism together with CPU offload is not supported? In my opinion, these 2 optimization methods could work together. Please let me know if I am wrong.