louieworth / awesome-rlhf

An index of algorithms for reinforcement learning from human feedback (rlhf))

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

awesome-rlhf

Welcome to our curated collection of research and review papers focused on Reinforcement Learning from Human Feedback (RLHF). We encourage you to star, fork, and contribute to this repository. We're actively seeking additional contributors and maintainers!

Maintained by:

Please follow this format for contributions:

- [Paper Title](paper link) [Additional Links]
  - Author1, Author2, and Author3. arXiv/Conference/Journal, Year.

For any inquiries, don't hesitate to contact: li.jiang3@mail.mcgill.ca

Some notes:

  • This resource is dedicated to the latest papers and does not include past academic works, even those published earlier in 2023. For a review of prominent historical papers and other sources, please refer to hugging face blog and this link from OpendiLab.
  • Most of the paper collections is credited to RLHF papers.

Table of Contents

Papers

Review/Survey

  • AI Alignment: A Comprehensive Survey
    • Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao. arXiv, 2023.
  • Aligning Large Language Models with Human: A Survey
    • Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, Qun Liu. arXiv, 2023.
  • Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
    • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, and Dylan Hadfield-Menell. arXiv, 2023.

RLHF for LLMs: Theory / Methods

RLHF for Other Domains

Datasets

Blogs/Talks/Reports

Blogs

Talks

Reports

Open Source Software/Implementations

  • trl
    • Train transformer language models with reinforcement learning.
  • OpenRLHF
    • A Ray-based High-performance RLHF framework (for 34b+ models)

About

An index of algorithms for reinforcement learning from human feedback (rlhf))

License:Apache License 2.0