huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Features] support gradient checkpointing for memory saving

zguo0525 opened this issue · comments

commented

Hello! What do you mean by gradient checkpointing? Like the checkpoint method from torch?