基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？

Question

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？

ivankxt opened this issue 9 months ago · comments

deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py
--deepspeed deepspeed.json
--quantization_bit 8
...
在V100机器上进行4卡训练，加上--quantization_bit 8避免oom，训练一个epoch后，得到的模型进行推理，推理效果非常差。另外通过web_demo2.py启动web服务，经常回答输出一点就停了，观测推理进程是正常的。

tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)