DPO about IMDB sentiment generation

Question

DPO about IMDB sentiment generation

junkangwu opened this issue 8 months ago · comments

Thank you for your contribution. May I ask you about the implementation of DPO in IMDB sentiment generation? Could you please share some steps? Thank you very much!

jdchang1 · Answer 1 · Wed Nov 15 2023 22:27:30 GMT+0800 (China Standard Time)

Hi! We are hoping to release this soon in the next few weeks but here would be the rough steps to implement in yourself.

In tril/algorithms implement dpo.py with a class that inherits BaseSupervised
Override the following methods
- _setup_dataloaders: change dataloaders prepared to be preference data
- _prepare_fsdp (for imdb. if you also want deepspeed support do _prepare_deepspeed)
- compute_loss: DPO loss here
- train_step: Take a look at bc.py to take. Much of it can be reused with just slight modifications to also get pi_sft log probs for DPO loss)
Create configs: add dpo.yaml in cfgs/alg
- The main difference from the BC config would be making sure to create_reference under the policy field would be set to True. This is so you can get pi_SFT log probs as well.

Note: The current IMDB dataset isn't a preference dataset so you may need to construct one.

Thanks for your interest and if you'd like to contribute and make pull request, I'd be more than happy to help out.

junkang Wu · Answer 2 · Thu Nov 16 2023 15:37:14 GMT+0800 (China Standard Time)

Thank you for your valuable suggestion. I will submit a pull request as soon as I successfully reproduce it.