https://github.com/andstor/bibliography/wiki/Home.md
Codex
Evaluating Large Language Models Trained on Code, Chen Mark. et al. (2021).
- (survey) Reinforcement learning: A survey, Kaelbling L. et al. (1996).
-
PPO
Proximal Policy Optimization Algorithms, Schulman J. et al. (2017). ποΈ -
InstructGPT
Training language models to follow instructions with human feedback, Ouyang L. et al. (2022). π
- (survey) Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing, Liu P. et al. (2021). π
Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators