Code Repository for Code-COT: Does Adapting Chain-of-Thought Help with Code Generation? | COMP-550 Project (Fall 2023)
- Installation:
$ pip install -r requirements.txt
-
Evaluation for Codellama and GPT3.5 models:
2.1.: Generate the test results by the model you'd like between codellama 7b and 34b Instruct variants:
$ python eval_codellama_humaneval.py --model_name codellama/CodeLlama-7b-Instruct-hf --length 100
OR if you plan to evaluate GPT3.5
$ python eval_gpt_humaneval.py
2.2 To evaluate for a number greater/lesser than 100, you would need to change the length on line 57 in human-eval/human_eval/evaluation.py to match the length set.
$ evaluate_functional_correctness results/codellama/humaneval_CodeLlama-7b-Instruct-hf_100.jsonl
for GPT3.5
$ evaluate_functional_correctness results/openai/gpt3.5-turbo_100.jsonl
2.3 Change prompt: Different prompts are provided in the eval_codellama_human_eval.py (more details below).
3.1 Generate:
$ python eval_codellama_mbpp.py --model_name codellama/CodeLlama-7b-Instruct-hf --length 100
OR if you plan to evaluate GPT3.5
$ python eval_gpt_mbpp.py
2.2 Evaluate: # you may need to change the output directory of the model as per your choice of model in
eval_mbpp.py
line 254$ python eval_mbpp.py
-
File info:
eval_gpt_mbpp.py
: We include several functions such aswrap_code_template
,wrap_code_template_baseline
,wrap_with_steps
,one_shot_pseudocode
,one_shot_steps
,zero_shot_pseudocode
to construct prompts according to different settings (one shot, with and without steps and psuedocode). We also include some other variations we had tried with GPT.eval_gpt_humaneval.py
: We include similar functions like above, for testing GPT on human eval as well.- For
eval_codellama_humaneval.py
we include similar functions: Functions likeconstruct_codellama_prompt
,construct_codellama_prompt_v2
,construct_codellama_prompt_oneshot_examples
,construct_codellama_comment_prompt_one_shot_psuedocode
,construct_codellama_comment_prompt_one_shot
, help us create prompts for performing baseline, zero shot steps/pseudocode, as well as 1-shot steps or pseudocode evalutions. - For
eval_codellama_mbpp.py
also contains such prompt functions like:construct_codellama_prompt
,construct_codellama_pseudo_prompt
,construct_codellama_pseudo_prompt_example
,construct_codellama_prompt_steps
. - We include the step by step examples or solving process that we include for one-shot steps/solving process prompt for humaneval at
humaneval_steps_magicoder.json
. This is generated by magicoder1. - We include the step by step examples or solving process that we include for one-shot steps/solving process prompt for mbpp at
mbpp_examples_magicoder_reform_v1.json
. This is generated by magicoder1. humaneval_actualpsuedocode_magicoder_reform_v1.json
is the collected set of psuedocode generated by magicoder for the human eval dataset. Each pseudocode was generated using the prompt specified in the paper.mbpp_actualpsuedocode_magicoder_reform_v1.json
is the collected set of psuedocode generated by magicoder for the mbpp dataset. Each pseudocode was generated using the prompt specified in the paper.
-
Misc
- To run certain scripts you need to include an
OPENAI
andHUGGINGFACE
token, to call the required APIs.OPENAI
token can be exported by usingexport OPENAI_API_KEY = <OPENAIKEY>
. TheTOKEN
can be set in the respective files using the variable defined at the beginning of the files using it (codellama). - The above evaluation was performed on a 80GB A100 GPU, with 128GB RAM, alongwith 24 CPUs. We acknowledge Mila for supporting us with the compute resources.
- To run certain scripts you need to include an