Harryis Wang (Harry-mic)

Harry-mic

Geek Repo

Company:Tsinghua

Github PK Tool:Github PK Tool

Harryis Wang's starred repositories

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:27833Issues:228Issues:4692

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonLicense:Apache-2.0Stargazers:4537Issues:109Issues:134

promptbench

A unified evaluation framework for large language models

Language:PythonLicense:MITStargazers:2402Issues:21Issues:52

OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

Language:PythonLicense:Apache-2.0Stargazers:2113Issues:21Issues:251

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

Language:PythonLicense:MITStargazers:1767Issues:18Issues:80

textgrad

TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.

Language:PythonLicense:MITStargazers:1617Issues:23Issues:70

alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1464Issues:7Issues:142

self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

Language:PythonLicense:MITStargazers:1318Issues:23Issues:17

safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Language:PythonLicense:Apache-2.0Stargazers:1311Issues:18Issues:84

Dromedary

Dromedary: towards helpful, ethical and reliable LLMs.

Language:PythonLicense:GPL-3.0Stargazers:1115Issues:23Issues:12

reward-bench

RewardBench: the first evaluation tool for reward models.

Language:PythonLicense:Apache-2.0Stargazers:374Issues:5Issues:64

Finetune_LLAMA

简单易懂的LLaMA微调指南。

Stable-Alignment

Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".

Language:PythonLicense:NOASSERTIONStargazers:339Issues:5Issues:8

LLM-Agent-Paper-Digest

papers related to LLM-agent that published on top conferences

Visual-Adversarial-Examples-Jailbreak-Large-Language-Models

Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models

Language:PythonLicense:BSD-3-ClauseStargazers:164Issues:3Issues:30

ReMax

Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)

RL-ViGen

This is the repo of "RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization"

Language:PythonLicense:MITStargazers:91Issues:6Issues:19

sdft

[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".

Language:ShellLicense:Apache-2.0Stargazers:85Issues:6Issues:12

DA-in-visualRL

Collection of papers and resources for data augmentation (DA) in visual reinforcement learning (RL).

LLM-Extrapolation

Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"

CUT

Source code of "Reasons to Reject? Aligning Language Models with Judgments"

Language:PythonLicense:Apache-2.0Stargazers:55Issues:1Issues:4
Language:PythonStargazers:9Issues:0Issues:0

tdpo

[ICML 2024] Code for the paper "Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases"

Language:PythonLicense:MITStargazers:9Issues:2Issues:0

TREvaL

Reasonable Reward Evaluation of Large Language Models

Language:PythonLicense:MITStargazers:7Issues:1Issues:1

CycAug

[NeurIPS 2023] CycAug implementation from paper 'Learning Better with Less: Effective Augmentation for Sample-Efficient Visual RL'.

Language:PythonStargazers:3Issues:1Issues:0

models

Models and examples built with TensorFlow

Language:PythonLicense:NOASSERTIONStargazers:1Issues:0Issues:0

reid-strong-baseline

Bag of Tricks and A Strong Baseline for Deep Person Re-identification

Language:PythonLicense:MITStargazers:1Issues:0Issues:0

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1Issues:0Issues:0