Costa Huang (vwxyzjn)

vwxyzjn

Geek Repo

Company:@huggingface

Location:Philadelphia, PA

Home Page:https://costa.sh

Twitter:@vwxyzjn

Github PK Tool:Github PK Tool

Costa Huang's repositories

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Language:PythonLicense:NOASSERTIONStargazers:4548Issues:34Issues:170

ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Language:PythonLicense:NOASSERTIONStargazers:558Issues:3Issues:6

portwarden

Create Encrypted Backups of Your Bitwarden Vault with Attachments

Language:GoLicense:MITStargazers:550Issues:9Issues:28

lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Language:PythonLicense:MITStargazers:126Issues:4Issues:7

invalid-action-masking

Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Language:PythonLicense:MITStargazers:124Issues:2Issues:3

cleanba

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL

Language:PythonLicense:NOASSERTIONStargazers:92Issues:4Issues:4

free-mujoco-py

MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.

Language:CythonLicense:NOASSERTIONStargazers:5Issues:1Issues:0

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Language:PythonLicense:MITStargazers:4Issues:1Issues:0
Language:PythonLicense:MITStargazers:4Issues:2Issues:0
Language:JavaLicense:GPL-3.0Stargazers:3Issues:2Issues:0

trl

Train transformer language models with reinforcement learning.

Language:PythonLicense:Apache-2.0Stargazers:3Issues:1Issues:0

alignment-handbook

Robust recipes for to align language models with human and AI preferences

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0
Language:PythonStargazers:1Issues:2Issues:0
Language:PythonStargazers:1Issues:2Issues:0

hfblog

Public repo for HF blog posts

Language:Jupyter NotebookStargazers:1Issues:1Issues:0

optax

Optax is a gradient processing and optimization library for JAX.

Language:PythonLicense:Apache-2.0Stargazers:1Issues:1Issues:0
Language:PythonLicense:MITStargazers:1Issues:2Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:1Issues:1Issues:0
Language:PythonLicense:MITStargazers:1Issues:1Issues:0
Language:HTMLLicense:MITStargazers:0Issues:1Issues:0

MOSS-RLHF

MOSS-RLHF

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

PokemonRedExperiments

Playing Pokemon Red with Reinforcement Learning

Language:Jupyter NotebookLicense:MITStargazers:0Issues:1Issues:0

summarize-from-feedback

Code for "Learning to summarize from human feedback"

Language:PythonLicense:NOASSERTIONStargazers:0Issues:1Issues:0

tyro

Strongly typed, zero-effort CLI interfaces & config objects

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0