Cornell-RL / tril

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DPO about IMDB sentiment generation

junkangwu opened this issue · comments

Thank you for your contribution. May I ask you about the implementation of DPO in IMDB sentiment generation? Could you please share some steps? Thank you very much!

Hi! We are hoping to release this soon in the next few weeks but here would be the rough steps to implement in yourself.

  1. In tril/algorithms implement dpo.py with a class that inherits BaseSupervised
  2. Override the following methods
    • _setup_dataloaders: change dataloaders prepared to be preference data
    • _prepare_fsdp (for imdb. if you also want deepspeed support do _prepare_deepspeed)
    • compute_loss: DPO loss here
    • train_step: Take a look at bc.py to take. Much of it can be reused with just slight modifications to also get pi_sft log probs for DPO loss)
  3. Create configs: add dpo.yaml in cfgs/alg
    • The main difference from the BC config would be making sure to create_reference under the policy field would be set to True. This is so you can get pi_SFT log probs as well.

Note: The current IMDB dataset isn't a preference dataset so you may need to construct one.

Thanks for your interest and if you'd like to contribute and make pull request, I'd be more than happy to help out.

Thank you for your valuable suggestion. I will submit a pull request as soon as I successfully reproduce it.