Costa Huang's repositories
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
invalid-action-masking
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
free-mujoco-py
MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.
lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
alignment-handbook
Robust recipes for to align language models with human and AI preferences
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
PokemonRedExperiments
Playing Pokemon Red with Reinforcement Learning
summarize-from-feedback
Code for "Learning to summarize from human feedback"
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs