vwxyzjn

Costa Huang's repositories

cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Language:PythonNOASSERTION7856 38 195

ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Language:PythonNOASSERTION841 4 8

portwarden

Create Encrypted Backups of Your Bitwarden Vault with Attachments

Language:GoMIT615 11 30

lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Language:PythonMIT186 4 7

summarize_from_feedback_details

Language:PythonMIT137 4 2

cleanba

CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL

Language:PythonNOASSERTION109 4 5

costa-utils

Language:PythonMIT10 20

benchmark-ci

Language:Python7 2 1

LeanRL

LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.

Language:PythonNOASSERTION600

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Language:PythonMIT4 10

minimal-adam-difference

Language:Python4 30

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.04 10

quickchat

Language:PythonMIT3 20

alignment-handbook

Robust recipes for to align language models with human and AI preferences

Language:PythonApache-2.02 10

direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Language:PythonApache-2.02 10

cleanba-test

Language:Python1 20

hfblog

Public repo for HF blog posts

Language:Jupyter Notebook1 10

Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Language:PythonMIT100

optax

Optax is a gradient processing and optimization library for JAX.

Language:PythonApache-2.01 10

PokemonRedExperiments

Playing Pokemon Red with Reinforcement Learning

Language:Jupyter NotebookMIT1 10

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.01 10

zero3_min_repro

Language:PythonMIT1 20

2024

Language:HTMLMIT010

huggingface_hub

The official Python client for the Huggingface Hub.

Language:PythonApache-2.0000

open-instruct

Language:PythonApache-2.0000

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.0010

picotron

Minimalistic 4D-parallelism distributed training framework for education purpose

Language:PythonApache-2.0000

summarize-from-feedback

Code for "Learning to summarize from human feedback"

Language:PythonNOASSERTION010

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

vwxyzjn.github.io

Language:HTMLMIT02 1