dragen1860 / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Repository from Github https://github.comdragen1860/Logic-RLRepository from Github https://github.comdragen1860/Logic-RL

Logic Rl

πŸŽ‰ Successfully reproduced DeepSeek R1 Zero on 2K Logic Puzzle Dataset.

πŸ“’ Our detailed technical report is coming soon! Stay tuned!

See project explanation : here.

Wandb project : here.


Enhanced Features (After Rule-Based RL)

🚩 Uncertainty Marking πŸ“ Progressive Summarization βœ… Self Verification 🌐 Multilingual Switching
Flag ambiguous steps for verification Maintain intermediate conclusions First verify then answer Chinese reasoning traces with English answers

πŸ“Έ Results Preview

Test Score Output Length
Test Score Plot Average Output Length Plot
Model Output
Model Output Example

Benchmark

Model 2ppl 3ppl 4ppl 5ppl 6ppl 7ppl 8ppl
o1-2024-12-17 0.83 0.51 0.38 0.38 0.35 0.30 0.20
GPT-4o 0.68 0.57 0.49 0.32 0.23 0.21 0.11
Deepseek-Math-7b 0.35 0.21 0.08 0.06 0.02 0.00 0.00
Qwen2.5-7B-Instruct-1M 0.49 0.40 0.25 0.11 0.02 0.06 0.01
Qwen2.5-7B-Logic-RL (ours) 0.68 0.59 0.44 0.34 0.22 0.16 0.15

Our model only used 2K training data with 400 training steps. More model benchmarks will be updated later this week.


πŸ› οΈ Installation

conda create -n logic python=3.9
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.6.3 ray
pip3 install flash-attn --no-build-isolation
pip install -e .  # For verl integration
pip install wandb IPython matplotlib

Data Preparation

You can directly use /data.

For your own data generation, here's a demo:

Base Model

python ./examples/data_preprocess/kk.py \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

Instruct Model

python ./examples/data_preprocess/kk.py \
    --template_type=qwen-instruct \
    --local_dir {processed_data_path} \
    --data_path {raw_data_path}

Training Execution

conda activate logic
bash main_grpo.sh  # 4Γ—A100 80G

βš™οΈ Implementation Details

Component Location
Reward Modeling verl/utils/reward_score/kk.py
Data Preprocessing examples/data_preprocess/kk.py

Citation

@misc{logic-rl,
author       = {Tian Xie and Qingnan Ren and Yuqian Hong and Zitian Gao},
title        = {Logic-RL},
howpublished = {https://github.com/Unakar/Logic-RL},
note         = {Accessed: 2025-02-03},
year         = {2025}
}

Acknowledgements


Star History

Star History Chart

About

Reproduce R1 Zero on Logic Puzzle

License:Apache License 2.0


Languages

Language:Python 97.3%Language:Shell 2.7%