Why use L2 regularization in reward model training?
hannlp opened this issue · comments
Yuchen Han commented
Hello, respected developers of Open Assistant. @andreaskoepf While studying your reward model training code, I noticed that besides the ranking loss, there is an additional L2 regularization term. What is the purpose of this regularization term? Are there any papers that mention it?
Ravenclaw1 commented
My apologies I don’t know if there is any research that covers it as I am
new to the AI system. However I can explain that I am interested in
attempting to make a conversational AI companion that responds with miare
personality so it’s less boring, I understand if you would like me to cease
some of my testing on your online version.--
The one, the only, Kaegan R. Bruce