Megh-Thakkar/Code-CoT-COMP-550

Code Repository for Code-COT: Does Adapting Chain-of-Thought Help with Code Generation? | COMP-550 Project (Fall 2023)

Installation:

$ pip install -r requirements.txt

Evaluation for Codellama and GPT3.5 models:

HumanEval

2.1.: Generate the test results by the model you'd like between codellama 7b and 34b Instruct variants:
```
$ python eval_codellama_humaneval.py --model_name codellama/CodeLlama-7b-Instruct-hf --length 100
```
OR if you plan to evaluate GPT3.5
```
$ python eval_gpt_humaneval.py 
```
2.2 To evaluate for a number greater/lesser than 100, you would need to change the length on line 57 in human-eval/human_eval/evaluation.py to match the length set.
```
$ evaluate_functional_correctness results/codellama/humaneval_CodeLlama-7b-Instruct-hf_100.jsonl
```
for GPT3.5
```
$ evaluate_functional_correctness results/openai/gpt3.5-turbo_100.jsonl
```
2.3 Change prompt: Different prompts are provided in the eval_codellama_human_eval.py (more details below).

MBPP

3.1 Generate:
```
$ python eval_codellama_mbpp.py --model_name codellama/CodeLlama-7b-Instruct-hf --length 100
```
OR if you plan to evaluate GPT3.5
```
$ python eval_gpt_mbpp.py 
```
2.2 Evaluate: # you may need to change the output directory of the model as per your choice of model in eval_mbpp.py line 254
```
$ python eval_mbpp.py
```
File info:
1. eval_gpt_mbpp.py: We include several functions such as wrap_code_template , wrap_code_template_baseline, wrap_with_steps, one_shot_pseudocode,one_shot_steps,zero_shot_pseudocode to construct prompts according to different settings (one shot, with and without steps and psuedocode). We also include some other variations we had tried with GPT.
2. eval_gpt_humaneval.py: We include similar functions like above, for testing GPT on human eval as well.
3. For eval_codellama_humaneval.py we include similar functions: Functions like construct_codellama_prompt, construct_codellama_prompt_v2, construct_codellama_prompt_oneshot_examples, construct_codellama_comment_prompt_one_shot_psuedocode, construct_codellama_comment_prompt_one_shot, help us create prompts for performing baseline, zero shot steps/pseudocode, as well as 1-shot steps or pseudocode evalutions.
4. For eval_codellama_mbpp.py also contains such prompt functions like: construct_codellama_prompt, construct_codellama_pseudo_prompt, construct_codellama_pseudo_prompt_example, construct_codellama_prompt_steps.
5. We include the step by step examples or solving process that we include for one-shot steps/solving process prompt for humaneval at humaneval_steps_magicoder.json. This is generated by magicoder1.
6. We include the step by step examples or solving process that we include for one-shot steps/solving process prompt for mbpp at
  mbpp_examples_magicoder_reform_v1.json. This is generated by magicoder1.
7. humaneval_actualpsuedocode_magicoder_reform_v1.json is the collected set of psuedocode generated by magicoder for the human eval dataset. Each pseudocode was generated using the prompt specified in the paper.
8. mbpp_actualpsuedocode_magicoder_reform_v1.json is the collected set of psuedocode generated by magicoder for the mbpp dataset. Each pseudocode was generated using the prompt specified in the paper.
Misc
- To run certain scripts you need to include an OPENAI and HUGGINGFACE token, to call the required APIs. OPENAI token can be exported by using export OPENAI_API_KEY = <OPENAIKEY>. The TOKEN can be set in the respective files using the variable defined at the beginning of the files using it (codellama).
- The above evaluation was performed on a 80GB A100 GPU, with 128GB RAM, alongwith 24 CPUs. We acknowledge Mila for supporting us with the compute resources.

Megh-Thakkar / Code-CoT-COMP-550

Code Repository for Code-COT: Does Adapting Chain-of-Thought Help with Code Generation? | COMP-550 Project (Fall 2023)

HumanEval

MBPP

About

Languages