LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Home Page:https://open-assistant.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why use L2 regularization in reward model training?

hannlp opened this issue · comments

Hello, respected developers of Open Assistant. @andreaskoepf While studying your reward model training code, I noticed that besides the ranking loss, there is an additional L2 regularization term. What is the purpose of this regularization term? Are there any papers that mention it?

l2 = 0.5 * (pos_logits**2 + neg_logits**2)