dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

dvlab-research/Step-DPO Issues

Ablation between DPO and Step-DPO
Updated 8 days ago
Does step-dpo work?
Updated 15 days ago
question about StepDPOTrainer
Closed a month ago1
eval_math:143 prompt_answer = remove_text(prompt_answer)
Updated a month ago
question about Data Construction
Updated a month ago1
I followed the steps in the README file to train the model, but I got an error. Here is the error message.
Updated a month ago
复现问题
Closed 3 months ago8
Evaluation scripts for AIME and Odyssey-MATH
Updated a month ago
share sft-dataset
Updated 2 months ago4
deepseek-math-7b-rl-stepdpo推理后的结果问题
Updated 2 months ago1
Great work， what about the computation resources needed for each experiment
Closed 3 months ago4
validation set
Updated 3 months ago
appendix missing
Updated 3 months ago1
During DPO training, will SFT loss be calculated?
Updated 3 months ago
question about Data Construction Pipeline
Updated 3 months ago
questions about some parameter in config_full.yaml
Closed 3 months ago1
Data Generation Pipeline
Closed 3 months ago2
Question about the DPO vs. Step-DPO.
Closed 3 months ago2
Request for Citation
Closed 3 months ago1
About details of Step localization and Rectification
Closed 3 months ago1