Pipeline parallelism + CPU offload?

Question

Pipeline parallelism + CPU offload?

webber26232 opened this issue 3 months ago · comments

The config below is required for running CPU offload along with Megatron features:
--no-pipeline-parallel --cpu-optimizer

Could anyone tell me why using Pipeline parallelism together with CPU offload is not supported? In my opinion, these 2 optimization methods could work together. Please let me know if I am wrong.