Danield21's repositories
Dual-Policy-Preference-Optimization
The codebase for the preprint `Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model'
all-rl-algorithms
Implementation of all RL algorithms in a simpler way
AR-Lopti
[arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
AReaL
Distributed RL System for LLM Reasoning
deep-reasoning
Official Implementation of Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
DFT
[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.
langmanus
A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, crawling, and Python code execution, while giving back to the community that made this possible.
LLMLandscape
The loss landscape of Large Language Models resemble basin!
MAYE
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
nano-aha-moment
Single GPU, From Scratch (No RL Library), Efficient, Full Parameter Tuning Implementation of DeepSeek R1-Zero style training.
openrlhf-pretrain
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
Pre-DPO
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
RAGEN
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
ReSearch
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
RLVR-Decomposed
Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
ROLL
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
simpleRL-reason
Simple RL training for reasoning
Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).
UFT
UFT: Unifying Supervised and Reinforcement Fine-Tuning