DEPTest is an automated dataset purification tool that leverages software testing techniques (i.e., coverage analysis and delta debugging). It can automatically identify and filter out code changes irrelevant to but tangled with the real bug fix in the human-written patch of existing datasets or bug fixing commits.
- 1. DEPTest
An unsolved but important problem is the purification (i.e., filtering out the code changes irrelevant to the real bug fix) of human-written patches when constructing a dataset of real-world bugs.
Such datasets with inaccurate human-written patches will introduce noise and bias to the relevant research that uses human-written patches as the ground truth (i.e., patch correctness assessment in automated program repair (APR)). Therefore, previous studies strongly suggest that such purification is mandatory.
However, only few datasets (e.g., Defects4J) have been purified, due to the high time cost of manual purification. Also, detailed project knowledge as well as the understanding of the intention of all code changes in the bug fixing commit are required, which makes the manual purification a challenging task. In this situation, an automated purification technique is desired to help alleviate the burden of manual purification and improve the accuracy of human-written patches.
With such motivation, we propose a automated technique named DEPTest for purification of datasets that can be used by test-suite based APR. That is, our DEPTest at present serves the scenario where the buggy project, the human-written patch, and the bug triggering tests are available. We run DEPTest on Defects4J and obtain some interesting findings. All the relevant artifacts are available in this repository.
- JDK 1.8
- Maven
- Defects4J v1.4.0
# to get purify-0.0.1-SNAPSHOT-jar-with-dependencies.jar
mvn clean package -DskipTests
# run an example
cd example
unzip Defects4J_Time_2.zip
cd ../src/test/resources/
./defects4j_time_2.sh
# check output (purified patch)
cd -
cat output/purify/purifiedPatch.diff
In this way, you can obtain the purified patch of Time 2 in Defects4J.
The DEPTest can be executed on RepairThemAll. That is, no much effort will be made if you want to run DEPTest on other two real-world datasets included in RepairThemAll (i.e, Bugs.jar and Bears).
Please refer to "Deploy DEPTest into RepairThemAll" for more detail.
As indicated in our paper, we randomly selected 81 (i.e., a half as a statistical size) out of the 162 purified cases without any bias for verification. To obtain reliable results, three authors in our group independently conducted the verification and discussed on inconsistent verification results until an agreement is reached. We find that among the 81 verified cases DEPTEST produced no such patches where part of real bug fixes are eliminated. An explanation is that DEPTEST focuses on filtering out the not covered (i.e., unrelated) statements during coverage analysis and the loosely coupled (i.e., loosely related) code changes tangled in real bug fixes during delta debugging. As a result, the complete logic and components of the real bug fix are maintained during the purification.
We now have made the manual verification results available for researchers or potential users to check the validity of our verification results.
If correctly configured, DEPTest can provide detailed information (e.g., which code change lines are not exposed by bug triggering tests, and which lines can be further purified) that dataset constructors may omit and recommend the purified human-written patch for assisting constructors to make a final decision for the purification.
For end users who need the accurate ground truth patches, DEPTest results can provide helpful information. For example, DEPTest has already been executed on Defects4J and list all the purified patches, which can be used as a reference.
For example, when end users assess the correctness of a patch generated by APR on Math 93, this patch only corresponds to the first two chunks of the original human-written patch in Defects4J. If the end users check our purified patch on Math 93 which purifies the third chunk, they then just need to focus on comparing the first two chunks with the APR patch during assessment. Otherwise, they have to manually verify if the third chunk is relevant to the real bug fix.
Once the irrelevant code changes are tangled in the bug fix, the real bug fixes (ground truth) are hidden. This inhibits the exploration of the real challenges of automated program repair. In some cases, a multi-chunk and complex human-written patch might just a single-chunk bug fix tangled with some code refactorings or irrelevant bug fixes (e.g., Time 2 in Defects4J). Therefore, researcher could use DEPTest to explore the bug fixing commits in the wild (e.g., in GitHub repositories), interesting findings or insights might appear during the process.
Here we also share our literature review results (i.e., the APR techniques evaluated on Defects4J) for researchers or end users who are interested in using it.
We will keep updating it to facilitate relevant research, by including more fruitful information and newest APR publications and tools. It would be also appreciated if you can cite this website when you use the information provided in this repo.
We will consistently develop and maintain this project to make it a better tool for the community. Also, all contributions are welcome.