SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard

Home Page:https://arxiv.org/abs/2309.12871

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What values are you using for w1, w2 and w3 when defining loss

mengyao00 opened this issue · comments

Hello, I am wondering what constant values you were using for fine-tuning, the loss is L = w1 ∗ Lcos + w2 ∗ Libn + w3 ∗ Langle, but I did not find the values of w1, w2 and w3 in your paper.

commented

w1 and w3 can be set to 1.0. As for w2, you can search its value from [0.5, 1.0, 35.0].

commented

Here is our training script for model SeanLee97/angle-llama-7b-nli-v2. We set w2=35.0 in this model.

CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 --master_port=1234 train_angle.py \
--task NLI-STS --save_dir ckpts/NLI-STS-angle-llama-7b \
--model_name NousResearch/Llama-2-7b-hf \
--w2 35 --learning_rate 1e-4 --maxlen 50 \
--lora_r 32 --lora_alpha 32 --lora_dropout 0.1 \
--save_steps 500 --batch_size 120 --seed 42 --do_eval 0 --load_kbit 4 --gradient_accumulation_steps 4 --epochs 1

Thank you! How about WhereIsAI/UAE-Large-V1? what do you think is the best w2 value?

commented

Thank you! How about WhereIsAI/UAE-Large-V1? what do you think is the best w2 value?

35 is better