HuangLK / transpeeder

train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

train.py中加载checkpoint似乎没效

GongCQ opened this issue · comments

commented

train.py中的第108行

engine.load_checkpoint(model_args.init_ckpt, load_module_only=True)

有没有这一行,训练初始的loss都一样。好像并没有成功加载到模型参数