texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

Home Page:http://tevatron.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training log of RepLLaMA

kyriemao opened this issue · comments

Hi Xueguang,

Great work! I am training my own RepLLaMA now and find that the training loss starts from 90+ and quickly drops below 0.1 in around 30 steps (as shown below). Is it normal or could please provide your training log of RepLLaMA?

image

Thanks!

this looks a bit weird. what is your batch size/ training group size setting?

this looks a bit weird. what is your batch size/ training group size setting?

The param settings are:

  • per_gpu_train_batch_size=8,
  • hard_negatives_per_sample=15,
  • learning_rate=1e-4,
  • gradient_accumulation_steps=4.

I use 6 A100 40G GPUs for training.

Solved. It is because of my own bug about processing the EOS token. Thanks!

Hello, I met the same problem. Can you please tell me how do you solve it? Thank you a lot!