lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

lucidrains/PaLM-rlhf-pytorch Issues

A bug in the implementation of the top-p sampling
Updated 2 months ago
Flash Attention 2
Closed 5 months ago
Is there any documentation to train this on my own data ?
Updated 9 months ago
How to use lora?
Updated 9 months ago
Confusion about KL divergence calculation for human feedback policies
Closed 2 years ago13
Should critic's input be prompt only?
Updated a year ago
✨ 😅 Is possibale to use the ChatGPT of OpenAI to train this ChatGPT?
Updated a year ago8
Possible incorrect creation of Rotary Embeddinigs
Closed a year ago1
I looked at the llama source code and there is an intermedie layer
Updated a year ago
Column and Row Parallel Linear for Apex Tensor Parallel
Closed 2 years ago1
Model Name
Closed 2 years ago3
Is it possible to replace PaLM with other huggingface pretrained language model?
Updated 2 years ago2
memory-efficient attention is default opened? if i dont use flash attn
Updated 2 years ago3
A few questions on training
Updated 2 years ago3
speed up with flash attn in A6000?
Closed 2 years ago2
i use other params with palm, but got error
Closed 2 years ago4
norm.gamma not used during backprop
Closed 2 years ago2
Can we just replace PPO+RLHF with a preference models thats basically a transformer encoder + sigmoid model, trained with BCE. And during finetuning perform a reward maximization by just making the reward model predict 1s?
Closed 2 years ago5
Calculating the kl loss seems has a mistake.
Closed 2 years ago1
Reason for using pooled critic embedding instead of the last embedding for value head
Closed 2 years ago3
KL divergence loss
Closed 2 years ago1
train your reward model issue
Updated 2 years ago1
Can not train the model using PyTorch version 2?
Closed 2 years ago1
Is it possible to release a code based on jax?
Closed 2 years ago7
mask raised error
Closed 2 years ago2
Value function
Updated 2 years ago
Is it possible to train this ai using open-assistant or vice versa?
Closed 2 years ago1
Do you need cuda for this?
Closed 2 years ago1
Can we exploiting AGI ability of chatGPT ?
Closed 2 years ago
Is this shift right for the action logits?
Closed 2 years ago4
Are there some pictures that describe PaLM architecture?
Closed 2 years ago1
value function input
Closed 2 years ago1
The loss function of reward model.
Updated 2 years ago2
KL_div/ratio on policy
Closed 2 years ago
Encoder-Decoder
Closed 2 years ago39
How to fine-tune and train on my own data?
Updated 2 years ago
Training the reward model
Closed 2 years ago8
PaLM-rlhf-pytorch Roadmap
Closed 2 years ago4
Help with computational power
Closed 2 years ago4
Noob question: How can I use this model for inference?
Closed 2 years ago1
Simple Web Interface
Closed 2 years ago2
Why the value calculate in generate and learn use different mask？
Closed 2 years ago1
Palm
Closed 2 years ago
I'm dumb
Closed 2 years ago1
Can I train a model on my own data?
Closed 2 years ago1
Unified reward function/model architecture for a wide range of tasks
Updated 2 years ago2
GPU requirements
Closed 2 years ago3