Megh-Thakkar / Code-CoT-COMP-550

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code Repository for Code-COT: Does Adapting Chain-of-Thought Help with Code Generation? | COMP-550 Project (Fall 2023)

  1. Installation:
$ pip install -r requirements.txt
  1. Evaluation for Codellama and GPT3.5 models:

    HumanEval

    2.1.: Generate the test results by the model you'd like between codellama 7b and 34b Instruct variants:

    $ python eval_codellama_humaneval.py --model_name codellama/CodeLlama-7b-Instruct-hf --length 100

    OR if you plan to evaluate GPT3.5

    $ python eval_gpt_humaneval.py 

    2.2 To evaluate for a number greater/lesser than 100, you would need to change the length on line 57 in human-eval/human_eval/evaluation.py to match the length set.

    $ evaluate_functional_correctness results/codellama/humaneval_CodeLlama-7b-Instruct-hf_100.jsonl

    for GPT3.5

    $ evaluate_functional_correctness results/openai/gpt3.5-turbo_100.jsonl

    2.3 Change prompt: Different prompts are provided in the eval_codellama_human_eval.py (more details below).

    MBPP

    3.1 Generate:

    $ python eval_codellama_mbpp.py --model_name codellama/CodeLlama-7b-Instruct-hf --length 100

    OR if you plan to evaluate GPT3.5

    $ python eval_gpt_mbpp.py 

    2.2 Evaluate: # you may need to change the output directory of the model as per your choice of model in eval_mbpp.py line 254

    $ python eval_mbpp.py
  2. File info:

    1. eval_gpt_mbpp.py: We include several functions such as wrap_code_template , wrap_code_template_baseline, wrap_with_steps, one_shot_pseudocode,one_shot_steps,zero_shot_pseudocode to construct prompts according to different settings (one shot, with and without steps and psuedocode). We also include some other variations we had tried with GPT.
    2. eval_gpt_humaneval.py: We include similar functions like above, for testing GPT on human eval as well.
    3. For eval_codellama_humaneval.py we include similar functions: Functions like construct_codellama_prompt, construct_codellama_prompt_v2, construct_codellama_prompt_oneshot_examples, construct_codellama_comment_prompt_one_shot_psuedocode, construct_codellama_comment_prompt_one_shot, help us create prompts for performing baseline, zero shot steps/pseudocode, as well as 1-shot steps or pseudocode evalutions.
    4. For eval_codellama_mbpp.py also contains such prompt functions like: construct_codellama_prompt, construct_codellama_pseudo_prompt, construct_codellama_pseudo_prompt_example, construct_codellama_prompt_steps.
    5. We include the step by step examples or solving process that we include for one-shot steps/solving process prompt for humaneval at humaneval_steps_magicoder.json. This is generated by magicoder1.
    6. We include the step by step examples or solving process that we include for one-shot steps/solving process prompt for mbpp at
      mbpp_examples_magicoder_reform_v1.json. This is generated by magicoder1.
    7. humaneval_actualpsuedocode_magicoder_reform_v1.json is the collected set of psuedocode generated by magicoder for the human eval dataset. Each pseudocode was generated using the prompt specified in the paper.
    8. mbpp_actualpsuedocode_magicoder_reform_v1.json is the collected set of psuedocode generated by magicoder for the mbpp dataset. Each pseudocode was generated using the prompt specified in the paper.
  3. Misc

    • To run certain scripts you need to include an OPENAI and HUGGINGFACE token, to call the required APIs. OPENAI token can be exported by using export OPENAI_API_KEY = <OPENAIKEY>. The TOKEN can be set in the respective files using the variable defined at the beginning of the files using it (codellama).
    • The above evaluation was performed on a 80GB A100 GPU, with 128GB RAM, alongwith 24 CPUs. We acknowledge Mila for supporting us with the compute resources.

About


Languages

Language:Python 100.0%