Can Language Models Teach Themselves to Prove Better?

Setup (for Ubuntu 22.04.1 LTS)

Install dependencies for pycoq
Use conda to install the "coq" and "train" environments from the yml files (for interacting with coq/testing the model and training the model respectively)
gather theorem files from coq and for the test set (not provided as it contains class material and assignments)
run coq_parser.py to extract theorems from the *.v files
run test_proof.py to filter out theorems that don't compile
fill in OpenAI API keys in codex.py
run training_scripts/splits.py to create a train/validation/test split for the training data
run train.sh to train the model; fill in the path for the checkpoint for the model_name_or_path parameter in the training script for subsequent training sessions and the model_path variable in gpt_neo.py

Project for CS 386L: Programming Languages - Can Language Models Teach Themselves to Prove Better?

Language:Python 98.7%Language:Shell 1.3%