jamesliu / nanoDPO

A nimble and innovative implementation of the Direct Preference Optimization (DPO) algorithm with Causal Transformer and LSTM model, inspired by the paper of DPO in fine-tuning unsupervised Language Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

jamesliu/nanoDPO Watchers