🩺 A Collection of Alignments for Large Language Models and Beyond

👋 This is a collection of papers, surveys, etc for the research of language model alignments and beyond, covering learning from human feedback, interactive NLP, and language model alignments.

📘 Surveys

Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang, Defu Lian, Enhong Chen. When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities. arXiv preprint 2023
Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang. Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies. arXiv preprint 2023
Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment. arXiv preprint 2023
Yufei Wang, Wanjun Zhong, Liangyou Li, Fei Mi, Xingshan Zeng, Wenyong Huang, Lifeng Shang, Xin Jiang, Qun Liu. Aligning Large Language Models with Human: A Survey. arXiv preprint 2023
Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong Yuan Sim, Xiuying Chen, Qingqing Zhu, Zhenzhu Yang, Adam Nik, Qi Liu, Chenghua Lin, Shi Wang, Ruibo Liu, Wenhu Chen, Ke Xu, Dayiheng Liu, Yike Guo, Jie Fu. Interactive Natural Language Processing. arXiv preprint 2023
Zijie J. Wang and Dongjin Choi and Shenyu Xu and Diyi Yang. Putting Humans in the Natural Language Processing Loop: {A} Survey. CoRR, abs/2103.04044, 2021
Settles, Burr. Active learning literature survey. arXiv, 0, 2009

📔 Blogs

📘 Projects

📘 Leadboards (LLM evaluations)

📚 Papers

H Dong, W Xiong, D Goyal, R Pan, S Diao, J Zhang, K Shum, T Zhang. Raft: Reward ranked finetuning for generative foundation model alignment. arXiv preprint 2023
Jian Hu, Li Tao, June Yang, Chandler Zhou. Aligning Language Models with Offline Reinforcement Learning from Human Feedback. arXiv preprint 2023
AN Lee, CJ Hunter, N Ruiz. Platypus: Quick, Cheap, and Powerful Refinement of LLMs. arXiv preprint 2023
Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo. FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets. arXiv preprint 2023
Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang. SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. arXiv preprint 2023
Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian. RLCD: Reinforcement Learning from Contrast Distillation for Language Model Alignment. arXiv preprint 2023
Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki. ARB: Advanced Reasoning Benchmark for Large Language Models. arXiv preprint 2023
Neel Jain, Khalid Saifullah, Yuxin Wen, John Kirchenbauer, Manli Shu, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein. Bring Your Own Data! Self-Supervised Evaluation for Large Language Models. arXiv preprint 2023
Chenxin An, Shansan Gong, Ming Zhong, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu. L-Eval: Instituting Standardized Evaluation for Long Context Language Models. arXiv preprint 2023
Shihao Liang, Kunlun Zhu, Runchu Tian, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun. Exploring Format Consistency for Instruction Tuning. arXiv preprint 2023
Ruosen Li, Teerth Patel, Xinya Du. PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations. arXiv preprint 2023
Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, Chang Zhou. Scaling Relationship on Learning Mathematical Reasoning with Large Language Models. arXiv preprint 2023
Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. arXiv preprint 2023
Siddhartha Jain, Xiaofei Ma, Anoop Deoras, Bing Xiang. Self-consistency for open-ended generations. arXiv preprint 2023
Feifan Song, Bowen Yu, Minghao Li, Haiyang Yu, Fei Huang, Yongbin Li, Houfeng Wang. Preference Ranking Optimization for Human Alignment. arXiv preprint 2023
Markus Anderljung, Joslyn Barnhart, Anton Korinek, Jade Leung, Cullen O'Keefe, Jess Whittlestone, Shahar Avin, Miles Brundage, Justin Bullock, Duncan Cass-Beggs, Ben Chang, Tantum Collins, Tim Fist, Gillian Hadfield, Alan Hayes, Lewis Ho, Sara Hooker, Eric Horvitz, Noam Kolt, Jonas Schuett, Yonadav Shavit, Divya Siddarth, Robert Trager, Kevin Wolf. Frontier AI Regulation: Managing Emerging Risks to Public Safety arXiv preprint 2023
Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin. AlpaGasus: Training A Better Alpaca with Fewer Data arXiv preprint 2023
Wenxuan Zhang, Sharifah Mahani Aljunied, Chang Gao, Yew Ken Chia, Lidong Bing M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models arXiv preprint 2023
Shayne Longpre, Le Hou, Tu Vu, Albert Webson, Hyung Won Chung, Yi Tay, Denny Zhou, Quoc V. Le, Barret Zoph, Jason Wei, Adam Roberts. The Flan Collection: Designing Data and Methods for Effective Instruction Tuning arXiv preprint 2023
Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awadallah. Orca: Progressive Learning from Complex Explanation Traces of GPT-4 arXiv preprint 2023
Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song. The False Promise of Imitating Proprietary LLMs. arXiv preprint 2023
Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A Smith, Mari Ostendorf, Hannaneh Hajishirzi. Fine-Grained Human Feedback Gives Better Rewards for Language Model Training. arXiv preprint 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric. P Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv preprint 2023
Peiyi Wang, Lei Li, Liang Chen, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, Zhifang Sui. Large Language Models are not Fair Evaluators. arXiv preprint 2023
Can Xu, Qingfeng Sun, Kai Zheng, Xiubo Geng, Pu Zhao, Jiazhan Feng, Chongyang Tao, Daxin Jian. WizardLM: Empowering Large Language Models to Follow Complex Instructions. arXiv preprint 2023
Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi. How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources. arXiv preprint 2023
Yidong Wang, Zhuohao Yu, Zhengran Zeng, Linyi Yang, Cunxiang Wang, Hao Chen, Chaoya Jiang, Rui Xie, Jindong Wang, Xing Xie, Wei Ye, Shikun Zhang, Yue Zhang. PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization. arXiv preprint 2023
Yew Ken Chia, Pengfei Hong, Lidong Bing, Soujanya Poria. INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models. arXiv preprint 2023
Yuxin Jiang, Chunkit Chan, Mingyang Chen, Wei Wang. Lion: Adversarial Distillation of Closed-Source Large Language Model. arXiv preprint 2023
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Fin. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv preprint 2023

Comparing to PPO, DPO directly uses the preference data to optimize the model, without learning a reward model. Thus, the drawback of DPO is that DPO can not utilize data without human preference. You can understand DPO as a supervised learning method, but PPO is a semi-supervised learning method.

Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, Soroush Vosoughi. Training Socially Aligned Language Models in Simulated Human Society. arXiv preprint 2023
Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang. Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation. arXiv preprint 2023
Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Min Yoo, Minjoon Se Aligning Large Language Models through Synthetic Feedback. arXiv preprint 2023
Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy. LIMA: Less Is More for Alignment. arXiv preprint 2023
Yuan Z, Yuan H, Tan C, Wang W, Huang S, Huang F. RRHF: Rank Responses to Align Language Models with Human Feedback without tears. arXiv preprint 2023
Sun Z, Shen Y, Zhou Q, Zhang H, Chen Z, Cox D, Yang Y, Gan C. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. arXiv preprint 2023
Wang Y, Kordi Y, Mishra S, Liu A, Smith NA, Khashabi D, Hajishirzi H. Self-Instruct: Aligning Language Model with Self Generated Instructions. ACL 2023.
Zhao Y, Joshi R, Liu T, Khalman M, Saleh M, Liu PJ. SLiC-HF: Sequence Likelihood Calibration with Human Feedback. arXiv preprint arXiv:2305.10425. 2023
Yan H, Srivastava S, Tai Y, Wang SI, Yih WT, Yao Z. Learning to Simulate Natural Language Feedback for Interactive Semantic Parsing. ACL 2023
Akyürek AF, Akyürek E, Madaan A, Kalyan A, Clark P, Wijaya D, Tandon N. RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023
Jérémy Scheurer, Jon Ander Campos, Tomasz Korbak, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez. Training Language Models with Language Feedback at Scale. axXiv, 2023
Sean Welleck and Ximing Lu and Peter West and Faeze Brahman and Tianxiao Shen and Daniel Khashabi and Yejin Choi. Generating Sequences by Learning to Self-Correct. ICLR, 2023
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, etc. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073.
Kurt Shuster and Jing Xu and Mojtaba Komeili and Da Ju and Eric Michael Smith and Stephen Roller and Megan Ung and Moya Chen and Kushal Arora and Joshua Lane and Morteza Behrooz and William Ngan and Spencer Poff and Naman Goyal and Arthur Szlam and YLan Boureau and Melanie Kambadur and Jason Weston. BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage. CoRR, abs/2208.03188, 2022
Rongzhi Zhang and Yue Yu and Pranav Shetty and Le Song and Chao Zhang. PRBoost: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning. ACL, 2022
Mina Lee and Megha Srivastava and Amelia Hardy and John Thickstun and Esin Durmus and Ashwin Paranjape and Ines Gerard-Ursin and Xiang Lisa Li and Faisal Ladhak and Frieda Rong and Rose E. Wang and Minae Kwon and Joon Sung Park and Hancheng Cao and Tony Lee and Rishi Bommasani and Michael S. Bernstein and Percy Liang. Evaluating Human-Language Model Interaction. CoRR, abs/2212.09746, 2022
Long Ouyang and Jeff Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and Fraser Kelton and Luke Miller and Maddie Simens and Amanda Askell and Peter Welinder and Paul F. Christiano and Jan Leike and Ryan Lowe. Training language models to follow instructions with human feedback. CoRR, abs/2203.02155, 2022
Ge Gao and Eunsol Choi and Yoav Artzi. Simulating Bandit Learning from User Feedback for Extractive Question Answering. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), {ACL} 2022, Dublin, Ireland, May 22-27, 2022
William Saunders and Catherine Yeh and Jeff Wu and Steven Bills and Long Ouyang and Jonathan Ward and Jan Leike. Self-critiquing models for assisting human evaluators. CoRR, abs/2206.05802, 2022
Krishna, Ranjay and Lee, Donsuk and Fei-Fei, Li and Bernstein, Michael S. Socially situated artificial intelligence enables learning from human interaction. Proceedings of the National Academy of Sciences, 119, 2022
Jeff Wu and Long Ouyang and Daniel M. Ziegler and Nisan Stiennon and Ryan Lowe and Jan Leike and Paul F. Christiano. Recursively Summarizing Books with Human Feedback. CoRR, abs/2109.10862, 2021
Reiichiro Nakano and Jacob Hilton and Suchir Balaji and Jeff Wu and Long Ouyang and Christina Kim and Christopher Hesse and Shantanu Jain and Vineet Kosaraju and William Saunders and Xu Jiang and Karl Cobbe and Tyna Eloundou and Gretchen Krueger and Kevin Button and Matthew Knight and Benjamin Chess and John Schulman. WebGPT: Browser-assisted question-answering with human feedback. CoRR, abs/2112.09332, 2021
Vania Mendonca and Ricardo Rei and Luisa Coheur and Alberto Sardinha and Ana Lucia Santos. Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, {ACL/IJCNLP} 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021
Noriyuki Kojima and Alane Suhr and Yoav Artzi. Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior. Trans. Assoc. Comput. Linguistics, 9, 2021
Ahmed Elgohary and Christopher Meek and Matthew Richardson and Adam Fourney and Gonzalo A. Ramos and Ahmed Hassan Awadallah. {NL-EDIT:} Correcting Semantic Parse Errors through Natural Language Interaction. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, {NAACL-HLT} 2021, Online, June 6-11, 2021
Tobias Falke and Patrick Lehnen. Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, {EMNLP} 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021
Ahmed Elgohary and Saghar Hosseini and Ahmed Hassan Awadallah. Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, {ACL} 2020, Online, July 5-10, 2020
Liat Ein-Dor and Alon Halfon and Ariel Gera and Eyal Shnarch and Lena Dankin and Leshem Choshen and Marina Danilevsky and Ranit Aharonov and Yoav Katz and Noam Slonim. Active Learning for {BERT:} An Empirical Study. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, {EMNLP} 2020, Online, November 16-20, 2020
Jon Ander Campos and Kyunghyun Cho and Arantxa Otegi and Aitor Soroa and Eneko Agirre and Gorka Azkune. Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning. Proceedings of the 28th International Conference on Computational Linguistics, {COLING} 2020, Barcelona, Spain (Online), December 8-13, 2020
Natasha Jaques and Judy Hanwen Shen and Asma Ghandeharioun and Craig Ferguson and Agata Lapedriza and Noah Jones and Shixiang Gu and Rosalind W. Picard. Human-centric dialog training via offline reinforcement learning. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, {EMNLP} 2020, Online, November 16-20, 2020
Nisan Stiennon and Long Ouyang and Jeff Wu and Daniel M. Ziegler and Ryan Lowe and Chelsea Voss and Alec Radford and Dario Amodei and Paul F. Christiano. Learning to summarize from human feedback. CoRR, abs/2009.01325, 2020
Ziyu Yao and Yiqi Tang and Wen-tau Yih and Huan Sun and Yu Su. An Imitation Game for Learning Semantic Parsers from User Interaction. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, {EMNLP} 2020, Online, November 16-20, 2020
Bernhard Kratzwald and Stefan Feuerriegel and Huan Sun. Learning a Cost-Effective Annotation Policy for Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, {EMNLP} 2020, Online, November 16-20, 2020
Julia Kreutzer and Stefan Riezler. Self-Regulated Interactive Sequence-to-Sequence Learning. Proceedings of the 57th Conference of the Association for Computational Linguistics, {ACL} 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers
Julia Kreutzer and Shahram Khadivi and Evgeny Matusov and Stefan Riezler. Can Neural Machine Translation be Improved with User Feedback?. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, {NAACL-HLT} 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 3 (Industry Papers)
Yang Gao and Christian M. Meyer and Iryna Gurevych. {APRIL:} Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018
Julia Kreutzer and Joshua Uyheng and Stefan Riezler. Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, {ACL} 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers
Carolin Lawrence and Stefan Riezler. Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, {ACL} 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers
Khanh Nguyen and Hal Daume III and Jordan L. Boyd-Graber. Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, {EMNLP} 2017, Copenhagen, Denmark, September 9-11, 2017
Artem Sokolov and Julia Kreutzer and Kellen Sunderland and Pavel Danchenko and Witold Szymaniak and Hagen Furstenau and Stefan Riezler. A Shared Task on Bandit Learning for Machine Translation. Proceedings of the Second Conference on Machine Translation, {WMT} 2017, Copenhagen, Denmark, September 7-8, 2017
Carolin Lawrence and Artem Sokolov and Stefan Riezler. Counterfactual Learning from Bandit Feedback under Deterministic Logging : {A} Case Study in Statistical Machine Translation. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, {EMNLP} 2017, Copenhagen, Denmark, September 9-11, 2017
Artem Sokolov and Julia Kreutzer and Christopher Lo and Stefan Riezler. Learning Structured Predictors from Bandit Feedback for Interactive {NLP}. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, {ACL} 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers
Volodymyr Mnih and Koray Kavukcuoglu and David Silver and Andrei A. Rusu and Joel Veness and Marc G. Bellemare and Alex Graves and Martin A. Riedmiller and Andreas Fidjeland and Georg Ostrovski and Stig Petersen and Charles Beattie and Amir Sadik and Ioannis Antonoglou and Helen King and Dharshan Kumaran and Daan Wierstra and Shane Legg and Demis Hassabis. Human-level control through deep reinforcement learning. Nat., 518, 2015

oceanypt / HITL-NLP

🩺 A Collection of Alignments for Large Language Models and Beyond

📘 Surveys

📔 Blogs

📘 Projects

📘 Leadboards (LLM evaluations)

📚 Papers

About