Having issues with training RoBERTa. Loss not decreasing
GodXuxilie opened this issue · comments
XU Xilie commented
Hi!Thanks for your great repo.
I tried the script in fairseq-RoBERTa/launch/FreeLB/rte-fp32-clip.sh and used the same setting as that in Issue #11 .
# run_exp GPU TOTAL_NUM_UPDATES WARMUP_UPDATES LR NUM_CLASSES MAX_SENTENCES FREQ DATA ADV_LR ADV_STEP INIT_MAG SEED MNORM
run_exp 0 2036 122 1e-5 2 2 8 RTE 3e-2 3 1.6e-1 123 1.4e-1
run_exp 1 2036 122 1e-5 2 2 8 RTE 3e-2 3 1.6e-1 456 1.4e-1
But I got the best scores 0.5152, 0.5152
. This is the log. Seems that training loss does not decrease.
My implementation environment is python 3.6.9, torch 1.6.0, torchvision 0.7.0 and cuda 10.2.
It's really confused. Appreciate your help!
Chen Zhu commented
Line 274 of your first log:
| no existing checkpoint found pretrained/roberta.large/model.pt
You need to download the pretrained RoBERTa model from here and put it under this path.
XU Xilie commented
Thanks for your reply. I have fixed the problem.