Danield21

User data from Github https://github.com/Danield21

followers

following

stars

Danield21's repositories

Dual-Policy-Preference-Optimization

The codebase for the preprint `Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model'

100

AdaptThink

MIT000

all-rl-algorithms

Implementation of all RL algorithms in a simpler way

MIT000

AR-Lopti

[arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

Apache-2.0000

AReaL

Distributed RL System for LLM Reasoning

Apache-2.0000

deep-reasoning

Official Implementation of Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

MIT000

DFT

[Preprint] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification.

000

DUMP

000

feature-circuits

MIT000

GRPO-LEAD

000

langmanus

A community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, crawling, and Python code execution, while giving back to the community that made this possible.

MIT000

Learning_dynamics_LLM

Apache-2.0000

LLMLandscape

The loss landscape of Large Language Models resemble basin!

000

LongCoT-Internal-Bias

000

LUFFY

Language:Python000

MAYE

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

000

nano-aha-moment

Single GPU, From Scratch (No RL Library), Efficient, Full Parameter Tuning Implementation of DeepSeek R1-Zero style training.

000

openrlhf-pretrain

Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"

Apache-2.0000

Pre-DPO

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

Apache-2.0000

RAGEN

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Language:PythonMIT000

reasoning_ladder

000

ReSearch

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

Language:PythonMIT000

Rethink_RLVR

000

RLVR-Decomposed

Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"

Apache-2.0000

ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Apache-2.0000

simpleRL-reason

Simple RL training for reasoning

MIT000

SplitReason

000

ThinkPrune

Apache-2.0000

Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).

Apache-2.0000

UFT

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Apache-2.0000