[Features] support gradient checkpointing for memory saving
zguo0525 opened this issue · comments
zg commented
Nouamane Tazi commented
Hello! What do you mean by gradient checkpointing? Like the checkpoint method from torch?
Minimalistic large language model 3D-parallelism training
zguo0525 opened this issue · comments
Hello! What do you mean by gradient checkpointing? Like the checkpoint method from torch?