uclaml/SPIN Issues
Generate Result
UpdatedConfused about iterations
Updated 4SPIN == DPO in self-iteration?
Updated 6Question about using peft (LoRA)
Closed 1the four reward metrics
Closed 2GPU Memory question
Updated 1Unable to reproduce performance
Updated 10use_peft Not working?
Closed 1the logps decrease
Updated 2The data num is wrong
Closedvllm version
Closed 2vllm generation issue.
Closed 2