SequenceR: SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
SequenceR is a seq2seq model designed to predict bug fixes at the line level. The paper (doi:10.1109/TSE.2019.2940179) explains the approach.
If you use SequenceR for academic purposes, please cite the following publication:
@article{chen2018sequencer,
title={SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair},
author={Chen, Zimin and Kommrusch, Steve and Tufano, Michele and Pouchet, Louis-No{\"e}l and Poshyvanyk, Denys and Monperrus, Martin},
journal={IEEE Transaction on Software Engineering},
year={2019}
}
Usage
Docker
Simply run the following two commands to set up use of the SequenceR Golden model:
docker build --tag=sequencer .
docker run -it sequencer
And now all dependecies are installed (including defects4j).
Or, use our this version from the Docker Hub.
Without docker
Install dependencies
First run src/setup_env.sh
to setup enviroment and clone/compile project. Please view src/setup_env.sh
for more details.
All models are versioned using git-lfs, make sure to configure it and correctly fetch the models before using.
Execution
Then run src/sequencer-predict.sh
with the following parameters:
./sequencer-predict.sh --model=[model path] --buggy_file=[abs path] --buggy_line=[int] --beam_size=[int] --output=[abs path]
- --model: Absolute path to the model
- --buggy_file: Absolute path to buggy file
- --buggy_line: Line number indicating where the bug is, or just want it get changed.
- --beam_size: Beam size for prediction
- --output: Output directory to store the generated patches
Experiments
CodRep experiment
The training data consists of results/Golden/src-train.txt
and results/Golden/tgt-train.txt
(line to line correspondence).
The CodRep4 testing data consists of results/Golden/src-test.txt
and results/Golden/tgt-test.txt
(line to line correspondence).
Defects4J experiment
In results/Defects4J_patches
you can find all patches that are found by SequencerR. Patches that are stored in *_compiled
are patches that compiled. Patches that are stored in *_passed
are patches that compiled and passed the test suite. Patches that are stored in *_correct
are patches that compiled, passed the test suite and are equivalent to the human patch.
To rerun our experiment of SequenceR over Defects4J. Run src/Defects4J_Experiment/Defects4J_experiment.sh
, make sure you have defects4j
installed.
Defects4J_oneLiner_metadata.csv
contains metadata for all Defects4J bugs that we consider. src/Defects4J_Experiment/validatePatch.py
contains the precedure for running Defects4J test, we have time limit on compile time (60s) and test running time (300s).
Model creation, training and use:
Prerequisites
SequenceR uses the OpenNMT library to set up program repair as a translation from buggy code to fixed code. Documentation on OpenNMT including parameter setup is at http://opennmt.net/OpenNMT-py/
Setup
Choose a directory and:
git clone https://github.com/OpenNMT/OpenNMT-py
When testing a new configuration, copy a working data directory and modify *sh files as desired.
Set up environment variables:
export CUDA_VISIBLE_DEVICES=0
export THC_CACHING_ALLOCATOR=0
export OpenNMT_py=.../OpenNMT-py
export data_path=.../results/Golden # Or a new directory path as desired
Train
For details on model training, refer to OpenNMT documentation. To run SequenceR training:
cd src
sequencer-train.sh
Test
For details on model usage (translation), refer to OpenNMT documentation. To run SequenceR testing:
cd src
sequencer-test.sh
License
The code and data in this repository are under the MIT license.