rlhf

There are 17 repositories under rlhf topic.

LAION-AI / Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
ai assistant chatgpt discord-bot language-model machine-learning nextjs python rlhf
Language:Python 37044
LLaMA-Factory
hiyouga / LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
agent ai chatglm fine-tuning gpt instruction-tuning language-model large-language-models llama llama3 llm lora mistral moe peft qlora quantization qwen rlhf transformers
Language:Python 33599
RUCAIBox / LLMSurvey
The official GitHub page for the survey paper "A Survey of Large Language Models".
chain-of-thought chatgpt in-context-learning instruction-tuning large-language-models llm llms natural-language-processing pre-trained-language-models pre-training rlhf
Language:Python 10317
Chinese-LLaMA-Alpaca-2
ymcui / Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
alpaca llama llm llama-2 large-language-models nlp alpaca-2 flash-attention llama2 alpaca2 64k yarn rlhf
Language:Python 7085
InternLM / InternLM
Official release of InternLM2.5 base and chat models. 1M context support
chatbot chinese fine-tuning-llm flash-attention gpt large-language-model llm long-context pretrained-models rlhf
Language:Python 6426
alignment-handbook
huggingface / alignment-handbook
Robust recipes to align language models with human and AI preferences
llm rlhf transformers
Language:Python 4642
argilla
argilla-io / argilla
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
active-learning ai annotation-tool developer-tools gpt-4 human-in-the-loop langchain llm machine-learning mlops natural-language-processing nlp rlhf text-annotation text-labeling weak-supervision weakly-supervised-learning
Language:Python 3931
hiyouga / ChatGLM-Efficient-Tuning
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
alpaca chatglm chatglm2 chatgpt fine-tuning huggingface language-model lora peft pytorch qlora rlhf transformers
Language:Python 3664
opendilab / awesome-RLHF
A curated list of reinforcement learning with human feedback resources (continually updated)
deep-learning deep-reinforcement-learning human-feedback large-language-models reinforcement-learning rlhf
3413
docta
Docta-ai / docta
A Doctor for your data
data data-centric-ai data-centric-machine-learning data-curation data-diagnosis language-model rlhf
Language:Python 3223
distilabel
argilla-io / distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
ai huggingface llms openai python rlaif rlhf synthetic-data synthetic-dataset-generation
Language:Python 1600
THUDM / WebGLM
WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
chatgpt llm rlhf webglm
Language:Python 1567
tatsu-lab / alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
deep-learning evaluation foundation-models instruction-following large-language-models leaderboard nlp rlhf
Language:Jupyter Notebook 1505
PKU-Alignment / safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
ai-safety alpaca datasets deepspeed large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf transformers vicuna safe-rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safety gpt transformer beaver
Language:Python 1334
THUDM / ImageReward
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
diffusion-models generative-model human-preferences rlhf
Language:Python 1153
xtreme1
xtreme1-io / xtreme1
Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.
3d-annotation annotation annotation-tool computer-vision image-annotation image-classification image-labelling-tool labeling-tool lidar-annotation lidar-camera-fusion lidar-object-detection lidar-object-tracking multimodal point-cloud rlhf
Language:TypeScript 884
RLHFlow / RLHF-Reward-Modeling
Recipes to train reward model for RLHF.
llama3 llm reward-models rlhf
Language:Python 780
ContextualAI / HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
alignment dpo halos kto ppo rlhf
Language:Python 732
princeton-nlp / SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
alignment large-language-models preference-alignment rlhf
Language:Python 696
GaryYufei / AlignLLMHumanSurvey
Aligning Large Language Models with Human: A Survey
awesome chatgpt chinese-llama gpt-4 large-language-models llama llama2 llms rlhf supervised-finetuning survey
694
jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型，并提供一套高效轻量化的垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
chinese finance large-language-models llama nlp qa rlhf sft text-generation transformers
Language:Python 585
jianzhnie / LLamaTuner
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
llama chatgpt dpo llama3 mixtral ppo qlora qwen rlhf
Language:Python 573
voidful / TextRL
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
chatgpt controlled-nlg gpt-2 gpt-3 language-model nlg nlp pytorch reinforcement-learning rlhf
Language:Python 542
uclaml / SPPO
The official implementation of Self-Play Preference Optimization (SPPO)
deep-learning fine-tuning large-language-models rlhf self-play
Language:Python 485
mindspore-courses / step_into_llm
MindSpore online courses: Step into LLM
bert chatglm chatglm2 chatgpt codegeex gpt gpt2 instruction-tuning large-language-models llama llama2 llm mindspore moe natural-language-processing nlp parallel-computing peft prompt-tuning rlhf
Language:Jupyter Notebook 427
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
preference-learning rlhf
Language:Python 420
CambioML / pykoi-rlhf-finetuned-transformers
pykoi: Active learning in one unified interface
ai chatbot feedback language-model llm machine-learning rlhf
Language:Jupyter Notebook 407
RLHFlow / Online-RLHF
A recipe for online RLHF and online iterative DPO.
llama3 llm rlhf
Language:Python 406
transformerlab / transformerlab-app
Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
electron llama llms lora mlx rlhf transformers
Language:TypeScript 403
Joyce94 / LLM-RLHF-Tuning
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
fine-tuning language-model llama llm lora peft ppo reinforcement-learning rlhf
Language:Python 369
glgh / awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
awesome-list datasets eval human-preferences llm machine-learning nlp rlhf
313
WangRongsheng / MedQA-ChatGLM
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调，我们的眼光不止于医疗问答
chatglm-6b chatgpt dataset fine-tuning freeze huggingface large-language-models llms lora medical rlhf transformer
Language:Python 297
TUDB-Labs / mLoRA
An Efficient "Factory" to Build Multiple LoRA Adapters
baichuan chatglm finetune llama llama2 llm lora peft gpu mlora dpo rlhf
Language:Python 269
haoliuhl / chain-of-hindsight
Chain-of-Hindsight, A Scalable RLHF Method
large-language-models learning-from-human-feedback rlhf
Language:Python 218
mihirp1998 / VADER
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
alignment diffusion reinforcement-learning reinforcement-learning-human-feedback rl rlhf vader video-diffusion video-diffusion-alignment
Language:Python 210
jackaduma / Vicuna-LoRA-RLHF-PyTorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
chatgpt finetune gpt llama llm lora peft ppo pytorch reward-models rlhf vicuna vicuna-7b
Language:Python 207

rlhf

LAION-AI / Open-Assistant

hiyouga / LLaMA-Factory

RUCAIBox / LLMSurvey

ymcui / Chinese-LLaMA-Alpaca-2

InternLM / InternLM

huggingface / alignment-handbook

argilla-io / argilla

hiyouga / ChatGLM-Efficient-Tuning

opendilab / awesome-RLHF

Docta-ai / docta

argilla-io / distilabel

THUDM / WebGLM

tatsu-lab / alpaca_eval

PKU-Alignment / safe-rlhf

THUDM / ImageReward

xtreme1-io / xtreme1

RLHFlow / RLHF-Reward-Modeling

ContextualAI / HALOs

princeton-nlp / SimPO

GaryYufei / AlignLLMHumanSurvey

jerry1993-tech / Cornucopia-LLaMA-Fin-Chinese

jianzhnie / LLamaTuner

voidful / TextRL

uclaml / SPPO

mindspore-courses / step_into_llm

allenai / reward-bench

CambioML / pykoi-rlhf-finetuned-transformers

RLHFlow / Online-RLHF

transformerlab / transformerlab-app

Joyce94 / LLM-RLHF-Tuning

glgh / awesome-llm-human-preference-datasets

WangRongsheng / MedQA-ChatGLM

TUDB-Labs / mLoRA

haoliuhl / chain-of-hindsight

mihirp1998 / VADER

jackaduma / Vicuna-LoRA-RLHF-PyTorch