Could you share the training loss to improve reproducibility?

Question

Could you share the training loss to improve reproducibility?

xuanqing94 opened this issue a year ago · comments

Hi, thanks for sharing the datasets! I'm trying to train a flan model using t5 and other backbone models. However i'm not confident enough on how well I reproduced your results. Specifically I got much lower MMLU scores. Could you please share the training loss curve (or simply the loss at convergence?) Below is mine:

I was using similar settings (batch size = 80, max_seq_len = 2300)
The final loss is around 0.6 after smoothing. What about the official values?

Stephen Fernandes · Answer 1 · Wed Jun 21 2023 22:04:25 GMT+0800 (China Standard Time)

hey could you please let me know where can I find the scripts/.gin files to train FLAN on t5x based models ?

Xuanqing Liu · Answer 2 · Wed Jun 21 2023 23:40:51 GMT+0800 (China Standard Time)

@StephennFernandes I can't help you on that because I am using PyTorch based training framework.

Stephen Fernandes · Answer 3 · Wed Jun 21 2023 23:44:16 GMT+0800 (China Standard Time)

you mean you used the huggingface model and fine tuned it on FLAN datasets ?

that works fine for me as well.

btw did you get relatively similar results to what the official FLAN-T5 has ?

Xuanqing Liu · Answer 4 · Wed Jun 21 2023 23:49:22 GMT+0800 (China Standard Time)

I use checkpoints downloaded from huggingface, but I ran with my in-house distributed training code.

I only tested and compared with FLAN-T5 it on MMLU dataset. It turns out that my results are way lower than the official FLAN-T5 checkpoints.