Could TransformerEngine work with Deepspeed Zero w/ offloading?
leiwen83 opened this issue · comments
Hi,
Since it is common to use with deepspeed zero w/ offloading when training large LLM, does TE currently support in this mode?
Currently deepspeed support is just unittest as refered by TE's readme: microsoft/DeepSpeed#3731
Thx~