zjulgc / llmpeft4apr

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LLM PEFT for APR

Dependency

Python

  • Python 3.9.17
  • PyTorch 2.0.1
  • Huggingface transformers 4.35.2
  • wandb
  • pef 0.6.2
  • accelerate 0.24.1

  • datasets 2.13.0

  • trl

  • fire

  • nvitop

Others

  • Java 8

Content

The file structure of the artifact is as follow:

APR-INSTRUCTION_construct;

  • contains source code of constructing APR-INSTRUCTION ,base existing APR dataset[1]

codellama_7b_hf:

  • output: peft weights by different peft method(lora, p-tuning,prefix tuning , $(IA)^3$ and Full-model Fine-tuning

  • results: results of generated pacthes on benchmarks(Humaneval-Java, Defect4j and Quixbugs) inferencing by codellama-7b-hf and codellama-7b-hf with peft weights, validation results of generated pacthes

codellama_13b_hf:

  • output: peft weights by different peft method(lora, p-tuning,prefix tuning , $(IA)^3$
  • results: results of generated pacthes on benchmarks(Humaneval-Java, Defect4j and Quixbugs) inferencing by codellama-13b-hf and codellama-13b-hf with peft weights, validation results of generated pacthes

deepseek_coder_6.7b:

  • output: peft weights by different peft method(lora, p-tuning,prefix tuning , $(IA)^3$
  • results: results of generated pacthes on benchmarks(Humaneval-Java, Defect4j and Quixbugs) inferencing by Deepseek-Coder Base 6.7B and Deepseek-Coder Base 6.7B with peft weights, validation results of generated pacthes

llama2_7b_hf:

  • output: peft weights by different peft method(lora, p-tuning,prefix tuning , $(IA)^3$
  • results: results of generated pacthes on benchmarks(Humaneval-Java, Defect4j and Quixbugs) inferencing by Llama-2-7b-hf and Llama-2-7b-hf with peft weights, validation results of generated pacthes

instruction_tuning_dataset

  • Instruction Dataset used this paper
    • apr_instruction_30k.json: the APR instruction dataset constructed this paper
    • oss_instrcution_30k.json: 30k random selection of OSS-Instruction Dataset
    • code_alpaca_20k.json: Code Alpaca Instruction Dataset
    • The rest of data is used for RQ3 to explore the impact of training data size for performance, which is parted as 10k, 15k, 20k and 25k

inference_and_validation_src:

  • This directory consists of source code used for patches generation and validation of LLMs
    file name description
    defects4j_patch_validate.py patches generation and validation on Defects4j benchmark
    humaneval_patch_validate.py patches generation and validation on Humaneval-Java benchmark
    quixbugs_patch_validate.py patches generation and validation on Quixbugs benchmark
    peft_patch_validation.py Entry of model validation with PEFT methods, and then select different scripts for verification
    fmft_generate_patch.py Entry of model validation with Full-model fine-tuning, and then select different scripts for verification
    generate_patch_infill.py Entry of CodeLlama 7b validation with no fine-tuning and infill templates, and then select different scripts for verification
    prompter.py convert instances of benchmark to instruction
    result_look.py record $pass@k$ of each validation

inference_scripts:

  • This directory consists of bash scripts used for patches generation and validation of LLMs

  • each script is formed as model name_Fine-tuning method_instruction dataset of Fine-tuning_validation.sh

train_scripts:

  • This directory consists of bash scripts used for LLMs training
  • each script is formed as model name_instrcution_Fine-tuning method_hyper-parameters(Optional)_train_instruction dataset of Fine-tuning_validation.sh

train_src:

  • This directory consists of source code used for LLM trainnin

    file name description
    sfttrain_peft.py Training code for PEFT methods
    sfttrain_ft.py Training code for Full-model Fine-tuning
    prompter.py Add additional prompt for instruction

results_hyper_parameters:

  • This directory consists of results of patches generation and validation in experiments of RQ3

NOTICE

  • Due to the size of Fine-tuning weights is too large, so we do not upload them on Github now
  • Considering the anonymous review, we will release weights after review

Cites

[1] Zhu, Qihao, et al. "A syntax-guided edit decoder for neural program repair." Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 2021.

About


Languages

Language:Shell 75.6%Language:Python 24.4%