lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Model Name

conceptofmind opened this issue · comments

@lucidrains What would you like the first model to be named?

Screenshot from 2023-05-02 15-46-13
One restart. 160B tokens.

Screenshot from 2023-05-03 00-04-41
Did some tests with qk_norm vs no qk_norm as well. When using AdamW decided to go with qk_norm=False. I will explore this with Lion after.

Screenshot from 2023-05-05 21-09-30
PaLM 1B