DPO about IMDB sentiment generation
junkangwu opened this issue · comments
junkang Wu commented
Thank you for your contribution. May I ask you about the implementation of DPO in IMDB sentiment generation? Could you please share some steps? Thank you very much!
jdchang1 commented
Hi! We are hoping to release this soon in the next few weeks but here would be the rough steps to implement in yourself.
- In
tril/algorithms
implementdpo.py
with a class that inheritsBaseSupervised
- Override the following methods
_setup_dataloaders
: change dataloaders prepared to be preference data_prepare_fsdp
(for imdb. if you also want deepspeed support do_prepare_deepspeed
)compute_loss
: DPO loss heretrain_step
: Take a look atbc.py
to take. Much of it can be reused with just slight modifications to also getpi_sft
log probs for DPO loss)
- Create configs: add
dpo.yaml
incfgs/alg
- The main difference from the BC config would be making sure to
create_reference
under thepolicy
field would be set toTrue
. This is so you can getpi_SFT
log probs as well.
- The main difference from the BC config would be making sure to
Note: The current IMDB dataset isn't a preference dataset so you may need to construct one.
Thanks for your interest and if you'd like to contribute and make pull request, I'd be more than happy to help out.
junkang Wu commented
Thank you for your valuable suggestion. I will submit a pull request as soon as I successfully reproduce it.