Direct Policy Optimization

Question

Reichenbachian opened this issue a year ago · comments

Hey all! Appreciate the work.

Is there any word on whether DPO (direct policy optimization) will be integrated into the trlx library soon?

No response

No response

Adrien B · Answer 1 · Wed Jun 14 2023 23:39:14 GMT+0800 (China Standard Time)

Zhuofeng Wu · Answer 2 · Wed Jul 26 2023 07:34:17 GMT+0800 (China Standard Time)

wonder if there is any updates regarding implementing dpo features in trlx, many thanks!

Max · Answer 3 · Wed Jul 26 2023 11:26:25 GMT+0800 (China Standard Time)

There hasn't been any updates regarding that. AFAIK nobody is currently working on it, so you can freely pick it up if you want!

sandeepchittilla · Answer 4 · Wed Jul 26 2023 17:58:56 GMT+0800 (China Standard Time)

Hi, is this something that is still open to work on? I would like to pick it up if that is okay :)

@CSerxy I've just forked and begun work on this feature, let me know if this conflicts with you