Any plan for supporting DPO?

Question

Any plan for supporting DPO?

lorabit110 opened this issue 9 months ago · comments

🚀 Feature Request

Support DPO (Direct Preference Optimization) loss and data loader.

Motivation

Many recent open LLMs have achieved promising results from using DPO instead of RL-style tuning like PPO for alignment. And it seems to require less changes to llm-foundry than RLHF.

David Preti · Answer 1 · Thu May 09 2024 22:50:23 GMT+0800 (China Standard Time)

same question here