I had an error when training Phi-3-mini-4k-instruct at DPO, so I fixed it locally.

Question

I had an error when training Phi-3-mini-4k-instruct at DPO, so I fixed it locally.

kawataki-yoshika opened this issue 2 months ago · comments

I made the following modifications and it works for now, but I am not familiar with the structure of the model and have not been able to determine if the modifications, especially around target_module, are appropriate.

diff --git a/sillm/dpo.py b/sillm/dpo.py
index b97f369..7107a73 100644
--- a/sillm/dpo.py
+++ b/sillm/dpo.py
@@ -94,7 +94,7 @@ if __name__ == "__main__":
 
     # Set conversation template
     if args.template:
-        template = sillm.Template(template=args.template)
+        template = sillm.Template(model.tokenizer, template_name=args.template)
     else:
         template = None

diff --git a/sillm/training/lora.py b/sillm/training/lora.py
index 72f1e4d..7fd4009 100644
--- a/sillm/training/lora.py
+++ b/sillm/training/lora.py
@@ -232,7 +232,7 @@ class TrainableLoRA(LLM):
                 self._lora_modules = [
                     (key, LoRALinear.from_linear(module, rank=rank, alpha=alpha, dropout=dropout, scale=scale))
                     for key, module in self.model.named_modules()
-                    if re.search(r"\.attention\.(wq|wv)$", key)
+                    if re.search(r"\.attention\.(wq|wv|wqkv)$", key)
                 ]
             if len(self._lora_modules) == 0:
                 logger.error(f"No target modules found for LoRA: {target_modules}")

training parameters

train:              data/dpo/
output_dir:         adapters/dpo/
save_checkpoints:   True
save_merge:         True
template:           phi3
max_length:         1024
learning_rate:      1.0e-6
grad_checkpoint:    True
rank:               16
layers:             -1
target_modules:     query_value
q4:                 True
epochs:             1
batch_size:         2
loss_type:          dpop
loss_beta:          0.01
report_steps:       10
eval_steps:         50
validation_samples: 20
seed: 42
plot: adapters/dpo/plot.png

Armin Buescher · Answer 1 · Thu May 23 2024 17:28:12 GMT+0800 (China Standard Time)

Good catch - thank you for sharing the fixed code! I'll add these to a commit and publish in the next version 0.1.5.

Out of curiosity: did the fine-tuning with DPO result on Phi-3-mini work, meaning were you happy with the output?

kawataki-yoshika · Answer 2 · Fri May 24 2024 08:56:33 GMT+0800 (China Standard Time)

The results of the fine-tuning of the phi-3 mini were not good.
I will try to train several times with different parameters, data sets, etc.