Merging optimizer states from different pipeline parallel size to resume training

Question

Merging optimizer states from different pipeline parallel size to resume training

xrsrke opened this issue 5 months ago · comments

Suppose you start training with a pipeline parallel size of 4. We need to make it supports resuming training with a different pipeline parallel size, like 2, by merging optimizer states.