Danield21's repositories

Dual-Policy-Preference-Optimization

The codebase for the preprint `Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model'

Stargazers:1Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

all-rl-algorithms

Implementation of all RL algorithms in a simpler way

License:MITStargazers:0Issues:0Issues:0

AR-Lopti

[arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0

AReaL

Distributed RL System for LLM Reasoning

License:Apache-2.0Stargazers:0Issues:0Issues:0

deep-reasoning

Official Implementation of Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

License:MITStargazers:0Issues:0Issues:0

DFT

[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

langmanus

A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, crawling, and Python code execution, while giving back to the community that made this possible.

License:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

LLMLandscape

The loss landscape of Large Language Models resemble basin!

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

MAYE

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Stargazers:0Issues:0Issues:0

nano-aha-moment

Single GPU, From Scratch (No RL Library), Efficient, Full Parameter Tuning Implementation of DeepSeek R1-Zero style training.

Stargazers:0Issues:0Issues:0

openrlhf-pretrain

Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"

License:Apache-2.0Stargazers:0Issues:0Issues:0

Pre-DPO

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

License:Apache-2.0Stargazers:0Issues:0Issues:0

RAGEN

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

ReSearch

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

RLVR-Decomposed

Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"

License:Apache-2.0Stargazers:0Issues:0Issues:0

ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

License:Apache-2.0Stargazers:0Issues:0Issues:0

simpleRL-reason

Simple RL training for reasoning

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).

License:Apache-2.0Stargazers:0Issues:0Issues:0

UFT

UFT: Unifying Supervised and Reinforcement Fine-Tuning

License:Apache-2.0Stargazers:0Issues:0Issues:0