liziniu / policy_optimization

Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

liziniu/policy_optimization Stargazers