train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
Geek Repo:Geek Repo
Github PK Tool:Github PK Tool
GongCQ opened this issue a year ago · comments
train.py中的第108行
engine.load_checkpoint(model_args.init_ckpt, load_module_only=True)
有没有这一行,训练初始的loss都一样。好像并没有成功加载到模型参数