vwxyzjn

Costa Huang's repositories

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Language:PythonNOASSERTION5107 35 180

ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Language:PythonNOASSERTION602 3 6

portwarden

Create Encrypted Backups of Your Bitwarden Vault with Attachments

Language:GoMIT569 10 30

lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Language:PythonMIT141 4 7

invalid-action-masking

Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Language:PythonMIT128 2 3

cleanba

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL

Language:PythonNOASSERTION101 4 5

summarize_from_feedback_details

Language:PythonMIT95 40

benchmark-ci

Language:Python6 2 1

free-mujoco-py

MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.

Language:CythonNOASSERTION6 10

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Language:PythonMIT4 10

minimal-adam-difference

Language:Python4 30

ppo-atari-metrics

Language:PythonMIT4 20

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.04 10

microrts

Language:JavaGPL-3.03 20

alignment-handbook

Robust recipes for to align language models with human and AI preferences

Language:PythonApache-2.02 10

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonApache-2.02 10

cleanba-test

Language:Python1 20

envpool_bug

Language:Python1 20

hfblog

Public repo for HF blog posts

Language:Jupyter Notebook1 10

optax

Optax is a gradient processing and optimization library for JAX.

Language:PythonApache-2.01 10

PokemonRedExperiments

Playing Pokemon Red with Reinforcement Learning

Language:Jupyter NotebookMIT1 10

quickchat

Language:PythonMIT1 20

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.01 10

zero3_min_repro

Language:PythonMIT1 20

2024

Language:HTMLMIT010

MOSS-RLHF

Language:PythonApache-2.0010

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.0010

summarize-from-feedback

Code for "Learning to summarize from human feedback"

Language:PythonNOASSERTION010

tyro

Strongly typed, zero-effort CLI interfaces & config objects

Language:PythonMIT010

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000