I had an error when training Phi-3-mini-4k-instruct at DPO, so I fixed it locally.
kawataki-yoshika opened this issue · comments
I made the following modifications and it works for now, but I am not familiar with the structure of the model and have not been able to determine if the modifications, especially around target_module, are appropriate.
diff --git a/sillm/dpo.py b/sillm/dpo.py
index b97f369..7107a73 100644
--- a/sillm/dpo.py
+++ b/sillm/dpo.py
@@ -94,7 +94,7 @@ if __name__ == "__main__":
# Set conversation template
if args.template:
- template = sillm.Template(template=args.template)
+ template = sillm.Template(model.tokenizer, template_name=args.template)
else:
template = None
diff --git a/sillm/training/lora.py b/sillm/training/lora.py
index 72f1e4d..7fd4009 100644
--- a/sillm/training/lora.py
+++ b/sillm/training/lora.py
@@ -232,7 +232,7 @@ class TrainableLoRA(LLM):
self._lora_modules = [
(key, LoRALinear.from_linear(module, rank=rank, alpha=alpha, dropout=dropout, scale=scale))
for key, module in self.model.named_modules()
- if re.search(r"\.attention\.(wq|wv)$", key)
+ if re.search(r"\.attention\.(wq|wv|wqkv)$", key)
]
if len(self._lora_modules) == 0:
logger.error(f"No target modules found for LoRA: {target_modules}")
training parameters
train: data/dpo/
output_dir: adapters/dpo/
save_checkpoints: True
save_merge: True
template: phi3
max_length: 1024
learning_rate: 1.0e-6
grad_checkpoint: True
rank: 16
layers: -1
target_modules: query_value
q4: True
epochs: 1
batch_size: 2
loss_type: dpop
loss_beta: 0.01
report_steps: 10
eval_steps: 50
validation_samples: 20
seed: 42
plot: adapters/dpo/plot.png
Good catch - thank you for sharing the fixed code! I'll add these to a commit and publish in the next version 0.1.5.
Out of curiosity: did the fine-tuning with DPO result on Phi-3-mini work, meaning were you happy with the output?
The results of the fine-tuning of the phi-3 mini were not good.
I will try to train several times with different parameters, data sets, etc.