henry-yeh

⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Language:VueMIT5513 20 74

The-Art-of-Linear-Algebra-zh-CN

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone", 线性代数的艺术中文版, 欢迎PR.

Language:PostScriptCC0-1.03951 380

rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

Language:PythonMIT2054 41 577

awesome-ml4co

Awesome machine learning for combinatorial optimization papers.

Language:Python1574 38 2

hh-rlhf

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

MIT1525 190

safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Language:PythonApache-2.01263 17 82

MiniGPT-5

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

Language:PythonApache-2.0834 12 40

DREAMPlace

Deep learning toolkit-enabled VLSI placement

Language:C++BSD-3-Clause648 21 147

ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Language:PythonNOASSERTION593 3 6

rl4co

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

Language:PythonMIT350 8 73

Stable-Alignment

Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".

Language:PythonNOASSERTION334 5 8