我基于10B模型做继续训练，loss只从11下降到5

Question

我基于10B模型做继续训练，loss只从11下降到5

TccccD opened this issue a year ago · comments

我基于10B模型做继续训练，loss只从11下降到5后。一般来讲，最终loss收敛后是多少。
我用了12w文本，其中文本长度平均在5000。训练参数：
gpus=8
max length=1024
batchsize=8
梯度累计=2
lr=7e-6
总的iter=5000，约等于5个epochs

@jeffra @samyam @tjruwase @WrRan

shuangt · Answer 1 · Mon Apr 17 2023 19:22:58 GMT+0800 (China Standard Time)

我下载的10b-Chinese模型无法解压，报错，老哥你是怎么下载的？

Chudong Tian · Answer 2 · Mon Apr 17 2023 19:51:19 GMT+0800 (China Standard Time)

我基于10B模型做继续训练，loss只从11下降到5后。一般来讲，最终loss收敛后是多少。我用了12w文本，其中文本长度平均在5000。训练参数： gpus=8 max length=1024 batchsize=8 梯度累计=2 lr=7e-6 总的iter=5000，约等于5个epochs

@jeffra @samyam @tjruwase @WrRan

没有在windows上用过

superhg · Answer 3 · Tue Apr 18 2023 18:49:33 GMT+0800 (China Standard Time)

继续训练是如何做的？

Paul · Answer 4 · Wed May 31 2023 11:03:40 GMT+0800 (China Standard Time)

我基于10B模型做继续训练，loss只从11下降到5后。一般来讲，最终loss收敛后是多少。我用了12w文本，其中文本长度平均在5000。训练参数： gpus=8 max length=1024 batchsize=8 梯度累计=2 lr=7e-6 总的iter=5000，约等于5个epochs

@jeffra @samyam @tjruwase @WrRan

大佬，你预训练是怎么继续的？

gavinL · Answer 5 · Fri Jul 21 2023 17:26:45 GMT+0800 (China Standard Time)

请问您对这个问题有答案了吗？loss一般到什么水平算作合格呢

AlanGreen · Answer 6 · Tue Jul 25 2023 21:23:07 GMT+0800 (China Standard Time)

same question here, when fine-tuning GLM10B I got the loss curve below, but I'm not sure how to validate whether the loss is valid or rational.