CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TRLX Environment customization

heraldiclily opened this issue · comments

I am currently working with TRLX library for reinforcement learning and have a few questions regarding the customization of the learning process:

  • Is it possible to modify the reward function used in the TRLX framework? If so, could you please provide guidance or point me to the relevant documentation or examples on how to implement these changes?
  • Can the state and action parameters also be customized within the library? I am interested in tailoring these aspects to better fit the specific needs of my project.
  • If modifications are possible, are there any particular considerations or limitations I should be aware of when implementing these changes?