huggingface/trl Issues
Error with SFT of LLaVA-Next
Updated 3Possible bug in SFTTrainer
Updated 2Support for Iterative DPO
Updated 1Not find sft.py
UpdatedThe metrics in wandb is abnormal
Closed 5Can bert be used for dpo training?
Updated 1The DPO 'grad_norm': 0.0,
Updated