reward-models

There are 1 repository under reward-models topic.

jackaduma / Vicuna-LoRA-RLHF-PyTorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT but with Vicuna
chatgpt finetune gpt llama llm lora peft ppo pytorch reward-models rlhf vicuna vicuna-7b
Language:Python 198
jackaduma / ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatGPT but with ChatGLM
lora chatglm chatglm-6b chatgpt finetune gpt llm pytorch rlhf llama deepspeed peft ppo reward-models
Language:Python 120
jackaduma / Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca
alpaca chatgpt llama llm lora pytorch rlhf gpt finetune deepspeed peft ppo reward-models
Language:Python 53
vicgalle / zero-shot-reward-models
ZYN: Zero-Shot Reward Models with Yes-No Questions
llm reinforcement-learning rlhf zero-shot reward-models trlx rlaif
Language:Python 31
tlc4418 / llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
best-of-n deep-learning ensembles large-language-models reinforcement-learning-from-human-feedback reward-models
Language:Python 21

reward-models

jackaduma / Vicuna-LoRA-RLHF-PyTorch

jackaduma / ChatGLM-LoRA-RLHF-PyTorch

jackaduma / Alpaca-LoRA-RLHF-PyTorch

vicgalle / zero-shot-reward-models

tlc4418 / llm_optimization