What values are you using for w1, w2 and w3 when defining loss

Question

What values are you using for w1, w2 and w3 when defining loss

mengyao00 opened this issue 5 months ago · comments

Hello, I am wondering what constant values you were using for fine-tuning, the loss is L = w1 ∗ Lcos + w2 ∗ Libn + w3 ∗ Langle, but I did not find the values of w1, w2 and w3 in your paper.

Sean · Answer 1 · Thu Dec 28 2023 13:41:07 GMT+0800 (China Standard Time)

w1 and w3 can be set to 1.0. As for w2, you can search its value from [0.5, 1.0, 35.0].

Sean · Answer 2 · Thu Dec 28 2023 13:46:38 GMT+0800 (China Standard Time)

Here is our training script for model SeanLee97/angle-llama-7b-nli-v2. We set w2=35.0 in this model.

CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 --master_port=1234 train_angle.py \
--task NLI-STS --save_dir ckpts/NLI-STS-angle-llama-7b \
--model_name NousResearch/Llama-2-7b-hf \
--w2 35 --learning_rate 1e-4 --maxlen 50 \
--lora_r 32 --lora_alpha 32 --lora_dropout 0.1 \
--save_steps 500 --batch_size 120 --seed 42 --do_eval 0 --load_kbit 4 --gradient_accumulation_steps 4 --epochs 1

Mengyao Xu · Answer 3 · Thu Dec 28 2023 14:12:43 GMT+0800 (China Standard Time)

Thank you! How about WhereIsAI/UAE-Large-V1? what do you think is the best w2 value?

Sean · Answer 4 · Sat Dec 30 2023 08:52:19 GMT+0800 (China Standard Time)

Thank you! How about WhereIsAI/UAE-Large-V1? what do you think is the best w2 value?

35 is better