LLM-Pref-Mark-UI

This project provides a Gradio UI for marking preferences of human feedback on generated text. This could be used to train a reward model for RLHF.

There are two files, app.py for basic and advanced_app.py for advanced usage. Both are heavily inspired by Anthropic’s “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback” paper.

NOTE: basic version is fully functioning except you have to fill in record function for your specific use case. That is how you would like to handle the clicks on chosen preferences. However, the advanced version is not fully functioning application. Instead, it provides only the UI.

Basic version

The basic version is demonstrated with Flan Alpaca model. All you need to do is the followings in app.py:

replace model variable with your own model
replace GenerationConfig with your own choice
complete record() function.
- each choice on A and B is scored between 1 to 4
- do whatever action you want with the scores

JonathanFly / LLM-Pref-Mark-UI

LLM-Pref-Mark-UI

Basic version

Advanced version

About

Languages