Training log of RepLLaMA

Question

Training log of RepLLaMA

kyriemao opened this issue 5 months ago · comments

Hi Xueguang,

Great work! I am training my own RepLLaMA now and find that the training loss starts from 90+ and quickly drops below 0.1 in around 30 steps (as shown below). Is it normal or could please provide your training log of RepLLaMA?

Thanks!

Xueguang Ma 马雪光 · Answer 1 · Sun Jan 14 2024 05:51:39 GMT+0800 (China Standard Time)

this looks a bit weird. what is your batch size/ training group size setting?

kyriemao · Answer 2 · Sun Jan 14 2024 09:32:54 GMT+0800 (China Standard Time)

this looks a bit weird. what is your batch size/ training group size setting?

The param settings are:

per_gpu_train_batch_size=8,
hard_negatives_per_sample=15,
learning_rate=1e-4,
gradient_accumulation_steps=4.

I use 6 A100 40G GPUs for training.

kyriemao · Answer 3 · Mon Jan 15 2024 21:27:00 GMT+0800 (China Standard Time)

Solved. It is because of my own bug about processing the EOS token. Thanks!

objdumb · Answer 4 · Wed Jan 17 2024 09:41:23 GMT+0800 (China Standard Time)

Hello, I met the same problem. Can you please tell me how do you solve it? Thank you a lot!