JonathanFly / LLM-Pref-Mark-UI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LLM-Pref-Mark-UI

This project provides a Gradio UI for marking preferences of human feedback on generated text. This could be used to train a reward model for RLHF.

There are two files, app.py for basic and advanced_app.py for advanced usage. Both are heavily inspired by Anthropic’s “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback” paper.

NOTE: basic version is fully functioning except you have to fill in record function for your specific use case. That is how you would like to handle the clicks on chosen preferences. However, the advanced version is not fully functioning application. Instead, it provides only the UI.

Basic version

The basic version is demonstrated with Flan Alpaca model. All you need to do is the followings in app.py:

  1. replace model variable with your own model
  2. replace GenerationConfig with your own choice
  3. complete record() function.
    • each choice on A and B is scored between 1 to 4
    • do whatever action you want with the scores

Advanced version

About

License:Apache License 2.0


Languages

Language:Python 100.0%