There are 14 repositories under rlhf topic.
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Unify Efficient Fine-tuning of 100+ LLMs
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
Robust recipes to align language models with human and AI preferences
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
A curated list of reinforcement learning with human feedback resources (continually updated)
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.
Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency
Aligning Large Language Models with Human: A Survey
聚宝盆(Cornucopia): 中文金融系列开源可商用大模型,并提供一套高效的金融垂直领域LLM训练框架(Pretraining、SFT、RLHF、Quantize等)
A library with extensible implementations of DPO, KTO, PPO, and other human-aware loss functions (HALOs).
pykoi: Active learning in one unified interface
LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)
MindSpore online courses: Step into LLM
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调,我们的眼光不止于医疗问答
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Chain-of-Hindsight, A Scalable RLHF Method
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF
The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.
Code accompanying the paper Pretraining Language Models with Human Preferences
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat