Costa Huang's repositories
ppo-implementation-details
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
portwarden
Create Encrypted Backups of Your Bitwarden Vault with Attachments
lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
alignment-handbook
Robust recipes for to align language models with human and AI preferences
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
PokemonRedExperiments
Playing Pokemon Red with Reinforcement Learning
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
huggingface_hub
The official Python client for the Huggingface Hub.
picotron
Minimalistic 4D-parallelism distributed training framework for education purpose
summarize-from-feedback
Code for "Learning to summarize from human feedback"
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs