Costa Huang (vwxyzjn)

vwxyzjn

User data from Github https://github.com/vwxyzjn

Company:@huggingface

Location:Philadelphia, PA

Home Page:https://costa.sh

GitHub:@vwxyzjn

Twitter:@vwxyzjn

Costa Huang's repositories

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Language:PythonLicense:NOASSERTIONStargazers:6295Issues:39Issues:188

ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Language:PythonLicense:NOASSERTIONStargazers:693Issues:3Issues:6

portwarden

Create Encrypted Backups of Your Bitwarden Vault with Attachments

Language:GoLicense:MITStargazers:605Issues:11Issues:30

lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Language:PythonLicense:MITStargazers:176Issues:4Issues:7

cleanba

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL

Language:PythonLicense:NOASSERTIONStargazers:109Issues:4Issues:5
Language:PythonLicense:MITStargazers:10Issues:2Issues:0

LeanRL

LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.

Language:PythonLicense:NOASSERTIONStargazers:6Issues:0Issues:0

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Language:PythonLicense:MITStargazers:4Issues:1Issues:0

trl

Train transformer language models with reinforcement learning.

Language:PythonLicense:Apache-2.0Stargazers:4Issues:1Issues:0

alignment-handbook

Robust recipes for to align language models with human and AI preferences

Language:PythonLicense:Apache-2.0Stargazers:2Issues:1Issues:0

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonLicense:Apache-2.0Stargazers:2Issues:1Issues:0
Language:PythonLicense:MITStargazers:2Issues:2Issues:0
Language:PythonStargazers:1Issues:2Issues:0
Language:PythonStargazers:1Issues:2Issues:0

hfblog

Public repo for HF blog posts

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

optax

Optax is a gradient processing and optimization library for JAX.

Language:PythonLicense:Apache-2.0Stargazers:1Issues:1Issues:0

PokemonRedExperiments

Playing Pokemon Red with Reinforcement Learning

Language:Jupyter NotebookLicense:MITStargazers:1Issues:1Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:1Issues:1Issues:0
Language:PythonLicense:MITStargazers:1Issues:2Issues:0
Language:HTMLLicense:MITStargazers:0Issues:1Issues:0

huggingface_hub

The official Python client for the Huggingface Hub.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

picotron

Minimalistic 4D-parallelism distributed training framework for education purpose

License:Apache-2.0Stargazers:0Issues:0Issues:0

summarize-from-feedback

Code for "Learning to summarize from human feedback"

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:HTMLLicense:MITStargazers:0Issues:2Issues:1