Harry-mic

Harryis Wang's starred repositories

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.027833 228 4692

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonApache-2.04537 109 134

promptbench

A unified evaluation framework for large language models

Language:PythonMIT2402 21 52

OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

Language:PythonApache-2.02113 21 251

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonMIT1767 18 80

textgrad

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.

Language:PythonMIT1617 23 70

alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Language:Jupyter NotebookApache-2.01464 7 142

self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

Language:PythonMIT1318 23 17

safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Language:PythonApache-2.01311 18 84

Dromedary

Dromedary: towards helpful, ethical and reliable LLMs.

Language:PythonGPL-3.01115 23 12

reward-bench

RewardBench: the first evaluation tool for reward models.

Language:PythonApache-2.0374 5 64

Finetune_LLAMA

简单易懂的LLaMA微调指南。

Language:Python350 1 10

Stable-Alignment

Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".

Language:PythonNOASSERTION339 5 8

LLM-Agent-Paper-Digest

papers related to LLM-agent that published on top conferences

303 7 1

Visual-Adversarial-Examples-Jailbreak-Large-Language-Models

Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models

Language:PythonBSD-3-Clause164 3 30

ReMax

Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)

Language:Python145 2 3

RL-ViGen

This is the repo of "RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization"

Language:PythonMIT91 6 19

sdft

[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".

Language:ShellApache-2.085 6 12

DA-in-visualRL

Collection of papers and resources for data augmentation (DA) in visual reinforcement learning (RL).

69 40

LLM-Extrapolation

Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"

Language:Python63 5 1

CUT

Source code of "Reasons to Reject? Aligning Language Models with Judgments"

Language:PythonApache-2.055 1 4

args

Language:Python32 3 6

Value-Augmented-Sampling

Language:PythonMIT16 2 3

RE-Control

Language:Python900

tdpo

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

Language:PythonMIT9 20

TREvaL

Reasonable Reward Evaluation of Large Language Models

Language:PythonMIT7 1 1

CycAug

[NeurIPS 2023] CycAug implementation from paper 'Learning Better with Less: Effective Augmentation for Sample-Efficient Visual RL'.

Language:Python3 10

models

Models and examples built with TensorFlow

Language:PythonNOASSERTION100

reid-strong-baseline

Bag of Tricks and A Strong Baseline for Deep Person Re-identification

Language:PythonMIT100

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.0100